ploeh blog2025-04-21T10:19:56+00:00Mark Seemanndanish software designhttps://blog.ploeh.dkPorting song recommendations to Haskellhttps://blog.ploeh.dk/2025/04/21/porting-song-recommendations-to-haskell2025-04-21T10:19:00+00:00Mark Seemann
<div id="post">
<p>
<em>An F# code base translated to Haskell.</em>
</p>
<p>
This article is part of a <a href="/2025/04/07/alternative-ways-to-design-with-functional-programming">larger article series</a> that examines variations of how to take on a non-trivial problem using <a href="/2018/11/19/functional-architecture-a-definition">functional architecture</a>. In a <a href="/2025/04/10/characterising-song-recommendations">previous article</a> we established a baseline C# code base. Future articles are going to use that C# code base as a starting point for refactored code. On the other hand, I also want to demonstrate what such solutions may look like in languages like <a href="https://fsharp.org/">F#</a> or <a href="https://www.haskell.org/">Haskell</a>. In this article, you'll see how to port the baseline to Haskell. To be honest, I first <a href="/2025/04/14/porting-song-recommendations-to-f">ported the C# code to F#</a>, and then used the F# code as a guide to implement equivalent Haskell code.
</p>
<p>
If you're following along in the Git repositories, this is a repository separate from the .NET repositories. The code shown here is from its <em>master</em> branch.
</p>
<p>
If you don't care about Haskell, you can always go back to the table of contents in <a href="/2025/04/07/alternative-ways-to-design-with-functional-programming">the 'root' article</a> and proceed to the next topic that interests you.
</p>
<h3 id="2c7511eb050b4d399f7e7fb154c5d990">
Data structures <a href="#2c7511eb050b4d399f7e7fb154c5d990">#</a>
</h3>
<p>
When working with statically typed functional languages like Haskell, it often makes most sense to start by declaring data structures.
</p>
<p>
<pre><span style="color:blue;">data</span> User = User
{ userName :: String
, userScrobbleCount :: Int }
<span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Show</span>, <span style="color:#2b91af;">Eq</span>)</pre>
</p>
<p>
This is much like an F# or C# record declaration, and this one echoes the corresponding types in F# and C#. The most significant difference is that here, a user's total count of scrobbles is called <code>userScrobbleCount</code> rather than <code>TotalScrobbleCount</code>. The motivation behind that variation is that Haskell data 'getters' are actually top-level functions, so it's usually a good idea to prefix them with the name of the data structure they work on. Since the data structure is called <code>User</code>, both 'getter' functions get the <code>user</code> prefix.
</p>
<p>
I found <code>userTotalScrobbleCount</code> a bit too verbose to my tastes, so I dropped the <code>Total</code> part. Whether or not that's appropriate remains to be seen. Naming in programming is always hard, and there's a risk that you don't get it right the first time around. Unless you're publishing a reusable library, however, the option to rename it later remains.
</p>
<p>
The other two data structures are quite similar:
</p>
<p>
<pre><span style="color:blue;">data</span> Song = Song
{ songId :: Int
, songHasVerifiedArtist :: Bool
, songRating :: Word8 }
<span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Show</span>, <span style="color:#2b91af;">Eq</span>)
<span style="color:blue;">data</span> Scrobble = Scrobble
{ scrobbledSong :: Song
, scrobbleCount :: Int }
<span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Show</span>, <span style="color:#2b91af;">Eq</span>)</pre>
</p>
<p>
I thought that <code>scrobbledSong</code> was more descriptive than <code>scrobbleSong</code>, so I allowed myself that little deviation from the <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> naming convention. It didn't cause any problems, but I'm still not sure if that was a good decision.
</p>
<p>
How does one translate a C# interface to Haskell? Although type classes aren't quite the same as C# or Java interfaces, this language feature is close enough that I can use it in that role. I don't consider such a type class idiomatic in Haskell, but as an counterpart to the C# interface, it works well enough.
</p>
<p>
<pre><span style="color:blue;">class</span> <span style="color:#2b91af;">SongService</span> a <span style="color:blue;">where</span>
<span style="color:#2b91af;">getTopListeners</span> <span style="color:blue;">::</span> a <span style="color:blue;">-></span> <span style="color:#2b91af;">Int</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">IO</span> [<span style="color:blue;">User</span>]
<span style="color:#2b91af;">getTopScrobbles</span> <span style="color:blue;">::</span> a <span style="color:blue;">-></span> <span style="color:#2b91af;">String</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">IO</span> [<span style="color:blue;">Scrobble</span>]</pre>
</p>
<p>
Any instance of the <code>SongService</code> class supports queries for top listeners of a particular song, as well as for top scrobbles for a user.
</p>
<p>
To reiterate, I don't intend to keep this type class around if I can help it, but for didactic reasons, it'll remain in some of the future refactorings, so that you can contrast and compare the Haskell code to its C# and F# peers.
</p>
<h3 id="83350f6a8a484249b466fcb72978c64d">
Test Double <a href="#83350f6a8a484249b466fcb72978c64d">#</a>
</h3>
<p>
To support tests, I needed a <a href="https://martinfowler.com/bliki/TestDouble.html">Test Double</a>, so I defined the following <a href="http://xunitpatterns.com/Fake%20Object.html">Fake</a> service, which is nothing but a deterministic in-memory instance. The type itself is just a wrapper of two maps.
</p>
<p>
<pre><span style="color:blue;">data</span> FakeSongService = FakeSongService
{ fakeSongs :: Map Int Song
, fakeUsers :: Map String (Map Int Int) }
<span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Show</span>, <span style="color:#2b91af;">Eq</span>)</pre>
</p>
<p>
Like the equivalent C# class, <code>fakeSongs</code> is a map from song ID to <code>Song</code>, while <code>fakeUsers</code> is a bit more complex. It's a map keyed on user name, but the value is another map. The keys of that inner map are song IDs, while the values are the number of times each song was scrobbled by that user.
</p>
<p>
The <code>FakeSongService</code> data structure is a <code>SongService</code> instance by explicit implementation:
</p>
<p>
<pre><span style="color:blue;">instance</span> <span style="color:blue;">SongService</span> <span style="color:blue;">FakeSongService</span> <span style="color:blue;">where</span>
getTopListeners srvc sid = <span style="color:blue;">do</span>
<span style="color:blue;">return</span> $
<span style="color:blue;">uncurry</span> User <$>
Map.toList (<span style="color:blue;">sum</span> <$> Map.<span style="color:blue;">filter</span> (Map.member sid) (fakeUsers srvc))
getTopScrobbles srvc userName = <span style="color:blue;">do</span>
<span style="color:blue;">return</span> $
<span style="color:blue;">fmap</span> (\(sid, c) -> Scrobble (fakeSongs srvc ! sid) c) $
Map.toList $
Map.findWithDefault Map.empty userName (fakeUsers srvc)</pre>
</p>
<p>
In order to find all the top listeners of a song, it finds all the <code>fakeUsers</code> who have the song ID (<code>sid</code>) in their inner map, sum all of those users' scrobble counts together and creates <code>User</code> values from that data.
</p>
<p>
To find the top scrobbles of a user, the instance finds the user in the <code>fakeUsers</code> map, looks each of that user's scrobbled song up in <code>fakeSongs</code>, and creates <code>Scrobble</code> values from that information.
</p>
<p>
Finally, test code needs a way to add data to a <code>FakeSongService</code> value, which this <a href="http://xunitpatterns.com/Test%20Utility%20Method.html">test-specific helper function</a> accomplishes:
</p>
<p>
<pre>scrobble userName s c (FakeSongService ss us) =
<span style="color:blue;">let</span> sid = songId s
ss' = Map.insertWith (\_ _ -> s) sid s ss
us' = Map.insertWith (Map.unionWith <span style="color:#2b91af;">(+)</span>) userName (Map.singleton sid c) us
<span style="color:blue;">in</span> FakeSongService ss' us'</pre>
</p>
<p>
Given a user name, a song, a scrobble count, and a <code>FakeSongService</code>, this function returns a new <code>FakeSongService</code> value with the new data added to the data already there.
</p>
<h3 id="1e89e2737f3e4fe1849f840052f305e2">
QuickCheck Arbitraries <a href="#1e89e2737f3e4fe1849f840052f305e2">#</a>
</h3>
<p>
In the F# test code I used <a href="https://fscheck.github.io/FsCheck/">FsCheck</a> to get good coverage of the code. For Haskell, I'll use <a href="https://hackage.haskell.org/package/QuickCheck">QuickCheck</a>.
</p>
<p>
Porting the ideas from the F# tests, I define a QuickCheck generator for user names:
</p>
<p>
<pre><span style="color:#2b91af;">alphaNum</span> <span style="color:blue;">::</span> <span style="color:blue;">Gen</span> <span style="color:#2b91af;">Char</span>
alphaNum = elements ([<span style="color:#a31515;">'a'</span>..<span style="color:#a31515;">'z'</span>] ++ [<span style="color:#a31515;">'A'</span>..<span style="color:#a31515;">'Z'</span>] ++ [<span style="color:#a31515;">'0'</span>..<span style="color:#a31515;">'9'</span>])
<span style="color:#2b91af;">userName</span> <span style="color:blue;">::</span> <span style="color:blue;">Gen</span> <span style="color:#2b91af;">String</span>
userName = <span style="color:blue;">do</span>
len <- choose (1, 19)
first <- elements $ [<span style="color:#a31515;">'a'</span>..<span style="color:#a31515;">'z'</span>] ++ [<span style="color:#a31515;">'A'</span>..<span style="color:#a31515;">'Z'</span>]
rest <- vectorOf len alphaNum
<span style="color:blue;">return</span> $ first : rest</pre>
</p>
<p>
It's not that the algorithm only works if usernames are alphanumeric strings that start with a letter and are no longer than twenty characters, but whenever a property is falsified, I'd rather look at a user name like <code>"Yvj0D1I"</code> or <code>"tyD9P1eOqwMMa1Q6u"</code> (which are already bad enough), than something with line breaks and unprintable characters.
</p>
<p>
Working with QuickCheck, it's often <a href="/2019/09/02/naming-newtypes-for-quickcheck-arbitraries">useful to wrap types from the System Under Test in test-specific Arbitrary wrappers</a>:
</p>
<p>
<pre><span style="color:blue;">newtype</span> ValidUserName = ValidUserName { getUserName :: String } <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Show</span>, <span style="color:#2b91af;">Eq</span>)
<span style="color:blue;">instance</span> <span style="color:blue;">Arbitrary</span> <span style="color:blue;">ValidUserName</span> <span style="color:blue;">where</span>
arbitrary = ValidUserName <$> userName</pre>
</p>
<p>
I also defined a (simpler) <code>Arbitrary</code> instance for <code>Song</code> called <code>AnySong</code>.
</p>
<h3 id="fd6ddacfd3c34430abb30288f5fb37a6">
A few properties <a href="#fd6ddacfd3c34430abb30288f5fb37a6">#</a>
</h3>
<p>
With <code>FakeSongService</code> in place, I proceeded to add the test code, starting from the top of the F# test code, and translating each as faithfully as possible. The first one is an <a href="https://agileotter.blogspot.com/2008/12/unit-test-ice-breakers.html">Ice Breaker Test</a> that only verifies that the System Under Test exists and doesn't crash when called.
</p>
<p>
<pre>testProperty <span style="color:#a31515;">"No data"</span> $ \ (ValidUserName un) -> ioProperty $ <span style="color:blue;">do</span>
actual <- getRecommendations emptyService un
<span style="color:blue;">return</span> $ <span style="color:blue;">null</span> actual</pre>
</p>
<p>
As I've done since at least 2019, <a href="/2019/03/11/an-example-of-state-based-testing-in-haskell">it seems</a>, I've <a href="/2018/05/07/inlined-hunit-test-lists">inlined test cases as anonymous functions</a>; this time as QuickCheck properties. This one just creates a <code>FakeSongService</code> that contains no data, and asks for recommendations. The expected result is that <code>actual</code> is empty (<code>null</code>), since there's nothing to recommend.
</p>
<p>
A slightly more involved property adds some data to the service before requesting recommendations:
</p>
<p>
<pre>testProperty <span style="color:#a31515;">"One user, some songs"</span> $ \
(ValidUserName user)
(<span style="color:blue;">fmap</span> getSong -> songs)
-> monadicIO $ <span style="color:blue;">do</span>
scrobbleCounts <- pick $ vectorOf (<span style="color:blue;">length</span> songs) $ choose (1, 100)
<span style="color:blue;">let</span> scrobbles = <span style="color:blue;">zip</span> songs scrobbleCounts
<span style="color:blue;">let</span> srvc = <span style="color:blue;">foldr</span> (<span style="color:blue;">uncurry</span> (scrobble user)) emptyService scrobbles
actual <- run $ getRecommendations srvc user
assertWith (<span style="color:blue;">null</span> actual) <span style="color:#a31515;">"Should be empty"</span></pre>
</p>
<p>
A couple of things are worthy of note. First, the property <a href="/2018/05/14/project-arbitraries-with-view-patterns">uses a view pattern to project a list of songs from a list of Arbitraries</a>, where <code>getSong</code> is the 'getter' that belongs to the <code>AnySong</code> <code>newtype</code> wrapper.
</p>
<p>
I find view patterns quite useful as a declarative way to 'promote' a single <code>Arbitrary</code> instance to a list. In a third property, I take it a step further:
</p>
<p>
<pre>(<span style="color:blue;">fmap</span> getUserName -> NonEmpty users)</pre>
</p>
<p>
This not only turns the singular <code>ValidUserName</code> wrapper into a list, but by projecting it into <code>NonEmpty</code>, the test declares that <code>users</code> is a non-empty list. QuickCheck picks all that up and generates values accordingly.
</p>
<p>
If you're interested in seeing this more advanced view pattern in context, you may consult the Git repository.
</p>
<p>
Secondly, the <code>"One user, some songs"</code> test runs in <code>monadicIO</code>, which I didn't know existed before I wrote these tests. Together with <code>pick</code>, <code>run</code>, and <code>assertWith</code>, <code>monadicIO</code> is defined in <a href="https://hackage.haskell.org/package/QuickCheck/docs/Test-QuickCheck-Monadic.html">Test.QuickCheck.Monadic</a>. It enables you to write properties that run in <code>IO</code>, which these properties need to do, because <code>getRecommendations</code> is <code>IO</code>-bound.
</p>
<p>
There's one more QuickCheck property in the code base, but it mostly repeats techniques already shown here. See the Git repository for all the details, if necessary.
</p>
<h3 id="76fa24be614e460e9af82b2652c82c07">
Examples <a href="#76fa24be614e460e9af82b2652c82c07">#</a>
</h3>
<p>
In addition to the properties, I also ported the F# examples; that is, 'normal' unit tests. Here's one of them:
</p>
<p>
<pre><span style="color:#a31515;">"One verified recommendation"</span> ~: <span style="color:blue;">do</span>
<span style="color:blue;">let</span> srvc =
scrobble <span style="color:#a31515;">"ana"</span> (Song 2 True 5) 9_9990 $
scrobble <span style="color:#a31515;">"ana"</span> (Song 1 False 5) 10 $
scrobble <span style="color:#a31515;">"cat"</span> (Song 1 False 6) 10 emptyService
actual <- getRecommendations srvc <span style="color:#a31515;">"cat"</span>
[Song 2 True 5] @=? actual</pre>
</p>
<p>
This one is straightforward, but as I already discussed when <a href="/2025/04/10/characterising-song-recommendations">characterizing the original code</a>, some of the examples essentially document quirks in the implementation. Here's the relevant test, translated to Haskell:
</p>
<p>
<pre><span style="color:#a31515;">"Only top-rated songs"</span> ~: <span style="color:blue;">do</span>
<span style="color:green;">-- Scale ratings to keep them less than or equal to 10.
</span> <span style="color:blue;">let</span> srvc =
<span style="color:blue;">foldr</span> (\i -> scrobble <span style="color:#a31515;">"hyle"</span> (Song i True (<span style="color:blue;">toEnum</span> i `div` 2)) 500) emptyService [1..20]
actual <- getRecommendations srvc <span style="color:#a31515;">"hyle"</span>
assertBool <span style="color:#a31515;">"Should not be empty"</span> (<span style="color:blue;">not</span> $ <span style="color:blue;">null</span> actual)
<span style="color:green;">-- Since there's only one user, but with 20 songs, the implementation
</span> <span style="color:green;">-- loops over the same songs 20 times, so 400 songs in total (with
</span> <span style="color:green;">-- duplicates). Ordering on rating, only the top-rated 200 remains, that
</span> <span style="color:green;">-- is, those rated 5-10. Note that this is a Characterization Test, so not
</span> <span style="color:green;">-- necessarily reflective of how a real recommendation system should work.
</span> assertBool <span style="color:#a31515;">"Should have 5+ rating"</span> (<span style="color:blue;">all</span> ((>= 5) . songRating) actual)</pre>
</p>
<p>
This test creates twenty scrobbles for one user: One with a zero rating, two with rating <em>1</em>, two with rating <em>2</em>, and so on, up to a single song with rating <em>10</em>.
</p>
<p>
<a href="https://tyrrrz.me/blog/pure-impure-segregation-principle#interleaved-impurities">The implementation of GetRecommendationsAsync</a> uses these twenty songs to find 'other users' who have these top songs as well. In this case, there's only one user, so for every of those twenty songs, you get the same twenty songs, for a total of 400.
</p>
<p>
There are more unit tests than these. You can see them in the Git repository.
</p>
<h3 id="29a2c5558184424eaaba9db49dc30368">
Implementation <a href="#29a2c5558184424eaaba9db49dc30368">#</a>
</h3>
<p>
The most direct translation of the C# and F# 'reference implementation' that I could think of was this:
</p>
<p>
<pre>getRecommendations srvc un = <span style="color:blue;">do</span>
<span style="color:green;">-- 1. Get user's own top scrobbles
</span> <span style="color:green;">-- 2. Get other users who listened to the same songs
</span> <span style="color:green;">-- 3. Get top scrobbles of those users
</span> <span style="color:green;">-- 4. Aggregate the songs into recommendations
</span>
<span style="color:green;">-- Impure
</span> scrobbles <- getTopScrobbles srvc un
<span style="color:green;">-- Pure
</span> <span style="color:blue;">let</span> scrobblesSnapshot = <span style="color:blue;">take</span> 100 $ sortOn (Down . scrobbleCount) scrobbles
recommendationCandidates <- newIORef <span style="color:blue;">[]</span>
forM_ scrobblesSnapshot $ \scrobble -> <span style="color:blue;">do</span>
<span style="color:green;">-- Impure
</span> otherListeners <- getTopListeners srvc $ songId $ scrobbledSong scrobble
<span style="color:green;">-- Pure
</span> <span style="color:blue;">let</span> otherListenersSnapshot =
<span style="color:blue;">take</span> 20 $
sortOn (Down . userScrobbleCount) $
<span style="color:blue;">filter</span> ((10_000 <=) . userScrobbleCount) otherListeners
forM_ otherListenersSnapshot $ \otherListener -> <span style="color:blue;">do</span>
<span style="color:green;">-- Impure
</span> otherScrobbles <- getTopScrobbles srvc $ userName otherListener
<span style="color:green;">-- Pure
</span> <span style="color:blue;">let</span> otherScrobblesSnapshot =
<span style="color:blue;">take</span> 10 $
sortOn (Down . songRating . scrobbledSong) $
<span style="color:blue;">filter</span> (songHasVerifiedArtist . scrobbledSong) otherScrobbles
forM_ otherScrobblesSnapshot $ \otherScrobble -> <span style="color:blue;">do</span>
<span style="color:blue;">let</span> song = scrobbledSong otherScrobble
modifyIORef recommendationCandidates (song :)
recommendations <- readIORef recommendationCandidates
<span style="color:green;">-- Pure
</span> <span style="color:blue;">return</span> $ <span style="color:blue;">take</span> 200 $ sortOn (Down . songRating) recommendations</pre>
</p>
<p>
In order to mirror <a href="https://tyrrrz.me/blog/pure-impure-segregation-principle#interleaved-impurities">the original implementation</a> as closely as possible, I declare <code>recommendationCandidates</code> as an <a href="https://hackage.haskell.org/package/base/docs/Data-IORef.html">IORef</a> so that I can incrementally add to it as the action goes through its nested loops. Notice the <code>modifyIORef</code> towards the end of the code listing, which adds a single song to the list.
</p>
<p>
Once all the looping is done, the action uses <code>readIORef</code> to pull the <code>recommendations</code> out of the <code>IORef</code>.
</p>
<p>
As you can see, I also ported the comments from the original C# code.
</p>
<p>
I don't consider this idiomatic Haskell code, but the goal in this article was to mirror the C# code as closely as possible. Once I start refactoring, you'll see some more idiomatic implementations.
</p>
<h3 id="8192c1e94add4e56b95dad78d4eda818">
Conclusion <a href="#8192c1e94add4e56b95dad78d4eda818">#</a>
</h3>
<p>
Together with the previous two articles in this article series, this establishes a baseline from which I can refactor the code. While we might consider the original C# code idiomatic, this port to Haskell isn't. It is, on the other hand, similar enough to both its C# and F# peers that we can compare and contrast all three.
</p>
<p>
Particularly two design choices make this Haskell implementation less than idiomatic. One is the use of <code>IORef</code> to update a list of songs. The other is using a type class to model an external dependency.
</p>
<p>
As I cover various alternative architectures in this article series, you'll see how to get rid of both.
</p>
<p>
<strong>Next:</strong> Song recommendations as an Impureim Sandwich.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Porting song recommendations to F#https://blog.ploeh.dk/2025/04/14/porting-song-recommendations-to-f2025-04-14T08:54:00+00:00Mark Seemann
<div id="post">
<p>
<em>A C# code base translated to F#.</em>
</p>
<p>
This article is part of a <a href="/2025/04/07/alternative-ways-to-design-with-functional-programming">larger article series</a> that examines variations of how to take on a non-trivial problem using <a href="/2018/11/19/functional-architecture-a-definition">functional architecture</a>. In the <a href="/2025/04/10/characterising-song-recommendations">previous article</a> we established a baseline C# code base. Future articles are going to use that C# code base as a starting point for refactored code. On the other hand, I also want to demonstrate what such solutions may look like in languages like <a href="https://fsharp.org/">F#</a> or <a href="https://www.haskell.org/">Haskell</a>. In this article, you'll see how to port the C# baseline to F#.
</p>
<p>
The code shown in this article is from the <em>fsharp-port</em> branch of the accompanying Git repository.
</p>
<h3 id="075d8dc2f6ad4baca4f815137193f814">
Data structures <a href="#075d8dc2f6ad4baca4f815137193f814">#</a>
</h3>
<p>
We may start by defining the required data structures. All are going to be <a href="https://learn.microsoft.com/dotnet/fsharp/language-reference/records">records</a>.
</p>
<p>
<pre><span style="color:blue;">type</span> <span style="color:#2b91af;">User</span> = { UserName : <span style="color:#2b91af;">string</span>; TotalScrobbleCount : <span style="color:#2b91af;">int</span> }</pre>
</p>
<p>
Just like the equivalent C# code, a <code>User</code> is just a <code>string</code> and an <code>int</code>.
</p>
<p>
When creating new values, record syntax can sometimes be awkward, so I also define a curried function to create <code>User</code> values:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">user</span> <span style="font-weight:bold;color:#1f377f;">userName</span> <span style="font-weight:bold;color:#1f377f;">totalScrobbleCount</span> =
{ UserName = <span style="font-weight:bold;color:#1f377f;">userName</span>; TotalScrobbleCount = <span style="font-weight:bold;color:#1f377f;">totalScrobbleCount</span> }</pre>
</p>
<p>
Likewise, I define <code>Song</code> and <code>Scrobble</code> in the same way:
</p>
<p>
<pre><span style="color:blue;">type</span> <span style="color:#2b91af;">Song</span> = { Id : <span style="color:#2b91af;">int</span>; IsVerifiedArtist : <span style="color:#2b91af;">bool</span>; Rating : <span style="color:#2b91af;">byte</span> }
<span style="color:blue;">let</span> <span style="color:#74531f;">song</span> <span style="font-weight:bold;color:#1f377f;">id</span> <span style="font-weight:bold;color:#1f377f;">isVerfiedArtist</span> <span style="font-weight:bold;color:#1f377f;">rating</span> =
{ Id = <span style="font-weight:bold;color:#1f377f;">id</span>; IsVerifiedArtist = <span style="font-weight:bold;color:#1f377f;">isVerfiedArtist</span>; Rating = <span style="font-weight:bold;color:#1f377f;">rating</span> }
<span style="color:blue;">type</span> <span style="color:#2b91af;">Scrobble</span> = { Song : <span style="color:#2b91af;">Song</span>; ScrobbleCount : <span style="color:#2b91af;">int</span> }
<span style="color:blue;">let</span> <span style="color:#74531f;">scrobble</span> <span style="font-weight:bold;color:#1f377f;">song</span> <span style="font-weight:bold;color:#1f377f;">scrobbleCount</span> = { Song = <span style="font-weight:bold;color:#1f377f;">song</span>; ScrobbleCount = <span style="font-weight:bold;color:#1f377f;">scrobbleCount</span> }</pre>
</p>
<p>
To be honest, I only use those curried functions sparingly, so they're somewhat redundant. Perhaps I should consider getting rid of them. For now, however, they stay.
</p>
<p>
Since I'm moving all the code to F#, I also have to translate the interface.
</p>
<p>
<pre><span style="color:blue;">type</span> <span style="color:#2b91af;">SongService</span> =
<span style="color:blue;">abstract</span> <span style="font-weight:bold;color:#74531f;">GetTopListenersAsync</span> : songId : <span style="color:#2b91af;">int</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">User</span>>>
<span style="color:blue;">abstract</span> <span style="font-weight:bold;color:#74531f;">GetTopScrobblesAsync</span> : userName : <span style="color:#2b91af;">string</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">Scrobble</span>>></pre>
</p>
<p>
The syntax is different from C#, but otherwise, this is the same interface.
</p>
<h3 id="4102e5ed7f394f48ac28f4b4a8af55e8">
Implementation <a href="#4102e5ed7f394f48ac28f4b4a8af55e8">#</a>
</h3>
<p>
Those are all the supporting types required to implement the <code>RecommendationsProvider</code>. This is the most direct translation of the C# code that I could think of:
</p>
<p>
<pre><span style="color:blue;">type</span> <span style="color:#2b91af;">RecommendationsProvider</span> (<span style="font-weight:bold;color:#1f377f;">songService</span> : <span style="color:#2b91af;">SongService</span>) =
<span style="color:blue;">member</span> _.<span style="font-weight:bold;color:#74531f;">GetRecommendationsAsync</span> <span style="font-weight:bold;color:#1f377f;">userName</span> = <span style="color:blue;">task</span> {
<span style="color:green;">// 1. Get user's own top scrobbles</span>
<span style="color:green;">// 2. Get other users who listened to the same songs</span>
<span style="color:green;">// 3. Get top scrobbles of those users</span>
<span style="color:green;">// 4. Aggregate the songs into recommendations</span>
<span style="color:green;">// Impure</span>
<span style="color:blue;">let!</span> <span style="font-weight:bold;color:#1f377f;">scrobbles</span> = <span style="font-weight:bold;color:#1f377f;">songService</span>.<span style="font-weight:bold;color:#74531f;">GetTopScrobblesAsync</span> <span style="font-weight:bold;color:#1f377f;">userName</span>
<span style="color:green;">// Pure</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">scrobblesSnapshot</span> =
<span style="font-weight:bold;color:#1f377f;">scrobbles</span>
|> <span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">sortByDescending</span> (<span style="color:blue;">fun</span> <span style="font-weight:bold;color:#1f377f;">s</span> <span style="color:blue;">-></span> <span style="font-weight:bold;color:#1f377f;">s</span>.ScrobbleCount)
|> <span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">truncate</span> 100
|> <span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">toList</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">recommendationCandidates</span> = <span style="color:#2b91af;">ResizeArray</span> ()
<span style="color:blue;">for</span> <span style="font-weight:bold;color:#1f377f;">scrobble</span> <span style="color:blue;">in</span> <span style="font-weight:bold;color:#1f377f;">scrobblesSnapshot</span> <span style="color:blue;">do</span>
<span style="color:green;">// Impure</span>
<span style="color:blue;">let!</span> <span style="font-weight:bold;color:#1f377f;">otherListeners</span> =
<span style="font-weight:bold;color:#1f377f;">songService</span>.<span style="font-weight:bold;color:#74531f;">GetTopListenersAsync</span> <span style="font-weight:bold;color:#1f377f;">scrobble</span>.Song.Id
<span style="color:green;">// Pure</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">otherListenersSnapshot</span> =
<span style="font-weight:bold;color:#1f377f;">otherListeners</span>
|> <span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">filter</span> (<span style="color:blue;">fun</span> <span style="font-weight:bold;color:#1f377f;">u</span> <span style="color:blue;">-></span> <span style="font-weight:bold;color:#1f377f;">u</span>.TotalScrobbleCount >= 10_000)
|> <span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">sortByDescending</span> (<span style="color:blue;">fun</span> <span style="font-weight:bold;color:#1f377f;">u</span> <span style="color:blue;">-></span> <span style="font-weight:bold;color:#1f377f;">u</span>.TotalScrobbleCount)
|> <span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">truncate</span> 20
|> <span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">toList</span>
<span style="color:blue;">for</span> <span style="font-weight:bold;color:#1f377f;">otherListener</span> <span style="color:blue;">in</span> <span style="font-weight:bold;color:#1f377f;">otherListenersSnapshot</span> <span style="color:blue;">do</span>
<span style="color:green;">// Impure</span>
<span style="color:blue;">let!</span> <span style="font-weight:bold;color:#1f377f;">otherScrobbles</span> =
<span style="font-weight:bold;color:#1f377f;">songService</span>.<span style="font-weight:bold;color:#74531f;">GetTopScrobblesAsync</span> <span style="font-weight:bold;color:#1f377f;">otherListener</span>.UserName
<span style="color:green;">// Pure</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">otherScrobblesSnapshot</span> =
<span style="font-weight:bold;color:#1f377f;">otherScrobbles</span>
|> <span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">filter</span> (<span style="color:blue;">fun</span> <span style="font-weight:bold;color:#1f377f;">s</span> <span style="color:blue;">-></span> <span style="font-weight:bold;color:#1f377f;">s</span>.Song.IsVerifiedArtist)
|> <span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">sortByDescending</span> (<span style="color:blue;">fun</span> <span style="font-weight:bold;color:#1f377f;">s</span> <span style="color:blue;">-></span> <span style="font-weight:bold;color:#1f377f;">s</span>.Song.Rating)
|> <span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">truncate</span> 10
|> <span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">toList</span>
<span style="font-weight:bold;color:#1f377f;">otherScrobblesSnapshot</span>
|> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">map</span> (<span style="color:blue;">fun</span> <span style="font-weight:bold;color:#1f377f;">s</span> <span style="color:blue;">-></span> <span style="font-weight:bold;color:#1f377f;">s</span>.Song)
|> <span style="font-weight:bold;color:#1f377f;">recommendationCandidates</span>.<span style="font-weight:bold;color:#74531f;">AddRange</span>
<span style="color:green;">// Pure</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">recommendations</span> =
<span style="font-weight:bold;color:#1f377f;">recommendationCandidates</span>
|> <span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">sortByDescending</span> (<span style="color:blue;">fun</span> <span style="font-weight:bold;color:#1f377f;">s</span> <span style="color:blue;">-></span> <span style="font-weight:bold;color:#1f377f;">s</span>.Rating)
|> <span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">truncate</span> 200
|> <span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">toList</span>
:> <span style="color:#2b91af;">IReadOnlyCollection</span><_>
<span style="color:blue;">return</span> <span style="font-weight:bold;color:#1f377f;">recommendations</span> }</pre>
</p>
<p>
As you can tell, I've kept the comments from <a href="https://tyrrrz.me/blog/pure-impure-segregation-principle">the original</a>, too.
</p>
<h3 id="efa00ed5a67641e18b6fec772f2e6aa7">
Test Double <a href="#efa00ed5a67641e18b6fec772f2e6aa7">#</a>
</h3>
<p>
In <a href="/2025/04/10/characterising-song-recommendations">the previous article</a>, I'd written the <a href="http://xunitpatterns.com/Fake%20Object.html">Fake</a> <code>SongService</code> in C#. Since, in this article, I'm translating everything to F#, I need to translate the Fake, too.
</p>
<p>
<pre><span style="color:blue;">type</span> <span style="color:#2b91af;">FakeSongService</span> () =
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">songs</span> = <span style="color:#2b91af;">ConcurrentDictionary</span><<span style="color:#2b91af;">int</span>, <span style="color:#2b91af;">Song</span>> ()
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">users</span> = <span style="color:#2b91af;">ConcurrentDictionary</span><<span style="color:#2b91af;">string</span>, <span style="color:#2b91af;">ConcurrentDictionary</span><<span style="color:#2b91af;">int</span>, <span style="color:#2b91af;">int</span>>> ()
<span style="color:blue;">interface</span> <span style="color:#2b91af;">SongService</span> <span style="color:blue;">with</span>
<span style="color:blue;">member</span> _.<span style="font-weight:bold;color:#74531f;">GetTopListenersAsync</span> <span style="font-weight:bold;color:#1f377f;">songId</span> =
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">listeners</span> =
<span style="font-weight:bold;color:#1f377f;">users</span>
|> <span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">filter</span> (<span style="color:blue;">fun</span> <span style="font-weight:bold;color:#1f377f;">kvp</span> <span style="color:blue;">-></span> <span style="font-weight:bold;color:#1f377f;">kvp</span>.Value.<span style="font-weight:bold;color:#74531f;">ContainsKey</span> <span style="font-weight:bold;color:#1f377f;">songId</span>)
|> <span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">map</span> (<span style="color:blue;">fun</span> <span style="font-weight:bold;color:#1f377f;">kvp</span> <span style="color:blue;">-></span> <span style="color:#74531f;">user</span> <span style="font-weight:bold;color:#1f377f;">kvp</span>.Key (<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">sum</span> <span style="font-weight:bold;color:#1f377f;">kvp</span>.Value.Values))
|> <span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">toList</span>
<span style="color:#2b91af;">Task</span>.<span style="font-weight:bold;color:#74531f;">FromResult</span> <span style="font-weight:bold;color:#1f377f;">listeners</span>
<span style="color:blue;">member</span> _.<span style="font-weight:bold;color:#74531f;">GetTopScrobblesAsync</span> <span style="font-weight:bold;color:#1f377f;">userName</span> =
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">scrobbles</span> =
<span style="font-weight:bold;color:#1f377f;">users</span>.<span style="font-weight:bold;color:#74531f;">GetOrAdd</span>(<span style="font-weight:bold;color:#1f377f;">userName</span>, <span style="color:#2b91af;">ConcurrentDictionary</span><_, _> ())
|> <span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">map</span> (<span style="color:blue;">fun</span> <span style="font-weight:bold;color:#1f377f;">kvp</span> <span style="color:blue;">-></span> <span style="color:#74531f;">scrobble</span> <span style="font-weight:bold;color:#1f377f;">songs</span>[<span style="font-weight:bold;color:#1f377f;">kvp</span>.Key] <span style="font-weight:bold;color:#1f377f;">kvp</span>.Value)
|> <span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">toList</span>
<span style="color:#2b91af;">Task</span>.<span style="font-weight:bold;color:#74531f;">FromResult</span> <span style="font-weight:bold;color:#1f377f;">scrobbles</span>
<span style="color:blue;">member</span> _.<span style="font-weight:bold;color:#74531f;">Scrobble</span> (<span style="font-weight:bold;color:#1f377f;">userName</span>, <span style="font-weight:bold;color:#1f377f;">song</span> : <span style="color:#2b91af;">Song</span>, <span style="font-weight:bold;color:#1f377f;">scrobbleCount</span>) =
<span style="color:blue;">let</span> <span style="color:#74531f;">addScrobbles</span> (<span style="font-weight:bold;color:#1f377f;">scrobbles</span> : <span style="color:#2b91af;">ConcurrentDictionary</span><_, _>) =
<span style="font-weight:bold;color:#1f377f;">scrobbles</span>.<span style="font-weight:bold;color:#74531f;">AddOrUpdate</span> (
<span style="font-weight:bold;color:#1f377f;">song</span>.Id,
<span style="font-weight:bold;color:#1f377f;">scrobbleCount</span>,
<span style="color:blue;">fun</span> _ <span style="font-weight:bold;color:#1f377f;">oldCount</span> <span style="color:blue;">-></span> <span style="font-weight:bold;color:#1f377f;">oldCount</span> + <span style="font-weight:bold;color:#1f377f;">scrobbleCount</span>)
|> <span style="color:#74531f;">ignore</span>
<span style="font-weight:bold;color:#1f377f;">scrobbles</span>
<span style="font-weight:bold;color:#1f377f;">users</span>.<span style="font-weight:bold;color:#74531f;">AddOrUpdate</span> (
<span style="font-weight:bold;color:#1f377f;">userName</span>,
<span style="color:#2b91af;">ConcurrentDictionary</span><_, _>
[ <span style="color:#2b91af;">KeyValuePair</span>.<span style="font-weight:bold;color:#74531f;">Create</span> (<span style="font-weight:bold;color:#1f377f;">song</span>.Id, <span style="font-weight:bold;color:#1f377f;">scrobbleCount</span>) ],
<span style="color:blue;">fun</span> _ <span style="font-weight:bold;color:#1f377f;">scrobbles</span> <span style="color:blue;">-></span> <span style="color:#74531f;">addScrobbles</span> <span style="font-weight:bold;color:#1f377f;">scrobbles</span>)
|> <span style="color:#74531f;">ignore</span>
<span style="font-weight:bold;color:#1f377f;">songs</span>.<span style="font-weight:bold;color:#74531f;">AddOrUpdate</span> (<span style="font-weight:bold;color:#1f377f;">song</span>.Id, <span style="font-weight:bold;color:#1f377f;">song</span>, <span style="color:blue;">fun</span> _ _ <span style="color:blue;">-></span> <span style="font-weight:bold;color:#1f377f;">song</span>) |> <span style="color:#74531f;">ignore</span></pre>
</p>
<p>
Apart from the code shown here, only minor changes were required for the tests, such as using those curried creation functions instead of constructors, a cast to <code>SongService</code>, and a few other non-behavioural things like that. All tests still pass, so I consider this a faithful translation of the C# code base.
</p>
<h3 id="28c7110824e941098e791aa66e2ed5c4">
Conclusion <a href="#28c7110824e941098e791aa66e2ed5c4">#</a>
</h3>
<p>
This article does more groundwork. Since it may be illuminating to see one problem represented in more than one programming language, I present it in both C#, F#, and Haskell. The next article does exactly that: Translates this F# code to Haskell. Once all three bases are established, we can start introducing solution variations.
</p>
<p>
If you don't care about the Haskell examples, you can always go back to the <a href="/2025/04/07/alternative-ways-to-design-with-functional-programming">first article in this article series</a> and use the table of contents to jump to the next C# example.
</p>
<p>
<strong>Next:</strong> <a href="/2025/04/21/porting-song-recommendations-to-haskell">Porting song recommendations to Haskell</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Characterising song recommendationshttps://blog.ploeh.dk/2025/04/10/characterising-song-recommendations2025-04-10T08:05:00+00:00Mark Seemann
<div id="post">
<p>
<em>Using characterisation tests and mutation testing to describe existing behaviour. An example.</em>
</p>
<p>
This article is part of an <a href="/2025/04/07/alternative-ways-to-design-with-functional-programming">article series that presents multiple design alternatives</a> for a given problem that I call the <em>song recommendations</em> problem. In short, the problem is to recommend songs to a user based on a vast repository of scrobbles. The problem was <a href="https://tyrrrz.me/blog/pure-impure-segregation-principle">originally proposed</a> by <a href="https://tyrrrz.me/">Oleksii Holub</a> as a an example of a problem that may not be a good fit for functional programming (FP).
</p>
<p>
As I've outlined in the <a href="/2025/04/07/alternative-ways-to-design-with-functional-programming">introductory article</a>, I'd like to use the opportunity to demonstrate alternative FP designs. Before I can do that, however, I need a working example of Oleksii Holub's code example, as well as a trustworthy test suite. That's what this article is about.
</p>
<p>
The code in this article mostly come from the <code>master</code> branch of the .NET repository that accompanies this article series, although some of the code is taken from intermediate commits on that branch.
</p>
<h3 id="28085f8e8de44964823dfc4b13edcca6">
Inferred details <a href="#28085f8e8de44964823dfc4b13edcca6">#</a>
</h3>
<p>
The <a href="https://tyrrrz.me/blog/pure-impure-segregation-principle">original article</a> only shows code, but doesn't link to an existing code base. While I suppose I could have asked Oleksii Holub if he had a copy he would share, the existence of such a code base isn't a given. In any case, inferring an entire code base from a comprehensive snippet is an interesting exercise in its own right.
</p>
<p>
The first step was to copy the example code into a code base. Initially it didn't compile because of some missing dependencies that I had to infer. It was only three <a href="https://en.wikipedia.org/wiki/Value_object">Value Objects</a> and an interface:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">record</span> <span style="color:#2b91af;">Song</span>(<span style="color:blue;">int</span> Id, <span style="color:blue;">bool</span> IsVerifiedArtist, <span style="color:blue;">byte</span> Rating);
<span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">record</span> <span style="color:#2b91af;">Scrobble</span>(Song Song, <span style="color:blue;">int</span> ScrobbleCount);
<span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">record</span> <span style="color:#2b91af;">User</span>(<span style="color:blue;">string</span> UserName, <span style="color:blue;">int</span> TotalScrobbleCount);
<span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">SongService</span>
{
Task<IReadOnlyCollection<User>> GetTopListenersAsync(<span style="color:blue;">int</span> songId);
Task<IReadOnlyCollection<Scrobble>> GetTopScrobblesAsync(<span style="color:blue;">string</span> userName);
}</pre>
</p>
<p>
These type declarations are straightforward, but still warrant a few comments. First, <code>Song</code>, <code>Scrobble</code>, and <code>User</code> are C# <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/builtin-types/record">records</a>, which is a more recent addition to the language. If you're reading along here, but using another C-based language, or an older version of C#, you can implement such immutable Value Objects with normal language constructs; it just takes more code, instead of the one-liner syntax. <a href="https://fsharp.org/">F#</a> developers will, of course, be familiar with the concept of <a href="https://learn.microsoft.com/dotnet/fsharp/language-reference/records">records</a>, and <a href="https://www.haskell.org/">Haskell</a> also has them.
</p>
<p>
Another remark about the above type declarations is that while <code>SongService</code> is an interface, it has no <code>I</code> prefix. This is syntactically legal, but not <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> in C#. I've used the name from <a href="https://tyrrrz.me/blog/pure-impure-segregation-principle#interleaved-impurities">the original code sample</a> verbatim, so that's the immediate explanation. It's possible that Oleksii Holub intended the type to be a base class, but for various reasons I prefer interfaces, although in this particular example I don't think it would have made much of a difference. I'm only pointing it out because there's a risk that it might confuse some readers who are used to the C# naming convention. Java programmers, on the other hand, should feel at home.
</p>
<p>
As far as I remember, the only other change I had to make to the code in order to make it compile was to give the <code>RecommendationsProvider</code> class a constructor:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">RecommendationsProvider</span>(SongService songService)
{
_songService = songService;
}</pre>
</p>
<p>
This is just basic Constructor Injection, and while <a href="/2021/05/17/against-consistency">I find the underscore prefix redundant</a>, I've kept it in order to stay as faithful to the original example as possible.
</p>
<p>
At this point the code compiles.
</p>
<h3 id="fb05d3091d2c449ab151b60b2ff5b606">
Test Double <a href="#fb05d3091d2c449ab151b60b2ff5b606">#</a>
</h3>
<p>
The goal of this article series is to present several alternative designs that implement the same behaviour. This means that as I refactor the code, I need to know that I didn't break existing functionality.
</p>
<blockquote>
<p>
"to refactor, the essential precondition is [...] solid tests"
</p>
<footer><cite><a href="/ref/refactoring">Refactoring</a></cite>, Martin Fowler, 1999</footer>
</blockquote>
<p>
Currently, I have no tests, so I'll have to add some. Since <code>RecommendationsProvider</code> makes heavy use of its injected <code>SongService</code>, tests must supply that dependency in order to do meaningful work. Since <a href="/2022/10/17/stubs-and-mocks-break-encapsulation">Stubs and Mocks break encapsulation</a> I instead <a href="/2019/02/18/from-interaction-based-to-state-based-testing">favour state-based testing</a> with <a href="http://xunitpatterns.com/Fake%20Object.html">Fake Objects</a>.
</p>
<p>
After some experimentation, I arrived at this <code>FakeSongService</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FakeSongService</span> : SongService
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> ConcurrentDictionary<<span style="color:blue;">int</span>, Song> songs;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> ConcurrentDictionary<<span style="color:blue;">string</span>, ConcurrentDictionary<<span style="color:blue;">int</span>, <span style="color:blue;">int</span>>> users;
<span style="color:blue;">public</span> <span style="color:#2b91af;">FakeSongService</span>()
{
songs = <span style="color:blue;">new</span> ConcurrentDictionary<<span style="color:blue;">int</span>, Song>();
users = <span style="color:blue;">new</span> ConcurrentDictionary<<span style="color:blue;">string</span>, ConcurrentDictionary<<span style="color:blue;">int</span>, <span style="color:blue;">int</span>>>();
}
<span style="color:blue;">public</span> Task<IReadOnlyCollection<User>> GetTopListenersAsync(<span style="color:blue;">int</span> songId)
{
<span style="color:blue;">var</span> listeners =
<span style="color:blue;">from</span> kvp <span style="color:blue;">in</span> users
<span style="color:blue;">where</span> kvp.Value.ContainsKey(songId)
<span style="color:blue;">select</span> <span style="color:blue;">new</span> User(kvp.Key, kvp.Value.Values.Sum());
<span style="color:blue;">return</span> Task.FromResult<IReadOnlyCollection<User>>(listeners.ToList());
}
<span style="color:blue;">public</span> Task<IReadOnlyCollection<Scrobble>> GetTopScrobblesAsync(
<span style="color:blue;">string</span> userName)
{
<span style="color:blue;">var</span> scrobbles = users
.GetOrAdd(userName, <span style="color:blue;">new</span> ConcurrentDictionary<<span style="color:blue;">int</span>, <span style="color:blue;">int</span>>())
.Select(kvp => <span style="color:blue;">new</span> Scrobble(songs[kvp.Key], kvp.Value));
<span style="color:blue;">return</span> Task.FromResult<IReadOnlyCollection<Scrobble>>(scrobbles.ToList());
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> Scrobble(<span style="color:blue;">string</span> userName, Song song, <span style="color:blue;">int</span> scrobbleCount)
{
users.AddOrUpdate(
userName,
<span style="color:blue;">new</span> ConcurrentDictionary<<span style="color:blue;">int</span>, <span style="color:blue;">int</span>>(
<span style="color:blue;">new</span>[] { KeyValuePair.Create(song.Id, scrobbleCount) }),
(_, scrobbles) => AddScrobbles(scrobbles, song, scrobbleCount));
songs.AddOrUpdate(song.Id, song, (_, _) => song);
}
<span style="color:blue;">private</span> <span style="color:blue;">static</span> ConcurrentDictionary<<span style="color:blue;">int</span>, <span style="color:blue;">int</span>> AddScrobbles(
ConcurrentDictionary<<span style="color:blue;">int</span>, <span style="color:blue;">int</span>> scrobbles,
Song song,
<span style="color:blue;">int</span> scrobbleCount)
{
scrobbles.AddOrUpdate(
song.Id,
scrobbleCount,
(_, oldCount) => oldCount + scrobbleCount);
<span style="color:blue;">return</span> scrobbles;
}
}</pre>
</p>
<p>
If you're wondering about the use of concurrent dictionaries, I use them because it made it easier to write the implementation, and not because I need the implementation to be thread-safe. In fact, I'm fairly sure that it's not thread-safe. That's not an issue. The tests aren't going to use shared mutable state.
</p>
<p>
The <code>GetTopListenersAsync</code> and <code>GetTopScrobblesAsync</code> methods implement the interface, and the <code>Scrobble</code> method (here, <em>scrobble</em> is a verb: <em>to scrobble</em>) is a <a href="http://xunitpatterns.com/Back%20Door%20Manipulation.html">back door</a> that enables tests to populate the <code>FakeSongService</code>.
</p>
<h3 id="4055e47cf8a5409db41c95bb8099cae6">
Icebreaker Test <a href="#4055e47cf8a5409db41c95bb8099cae6">#</a>
</h3>
<p>
While the 'production code' is in C#, I decided to write the tests in F# for two reasons.
</p>
<p>
The first reason was that I wanted to be able to present the various FP designs in both C# and F#. Writing the tests in F# would make it easier to replace the C# code base with an F# alternative.
</p>
<p>
The second reason was that I wanted to leverage a property-based testing framework's ability to produce many randomly-generated test cases. I considered this important to build confidence that my tests weren't just a few specific examples that wouldn't catch errors when I made changes. Since the <code>RecommendationsProvider</code> API is asynchronous, the only .NET property-based framework I knew of that can run <code>Task</code>-valued properties is <a href="https://fscheck.github.io/FsCheck/">FsCheck</a>. While it's possible to use FsCheck from C#, the F# API is more powerful.
</p>
<p>
In order to get started, however, I first wrote an Icebreaker Test without FsCheck:
</p>
<p>
<pre>[<Fact>]
<span style="color:blue;">let</span> ``No data`` () = task {
<span style="color:blue;">let</span> srvc = FakeSongService ()
<span style="color:blue;">let</span> sut = RecommendationsProvider srvc
<span style="color:blue;">let!</span> actual = sut.GetRecommendationsAsync <span style="color:#a31515;">"foo"</span>
Assert.Empty actual }</pre>
</p>
<p>
This is both a trivial case and an edge case, but clearly, if there's no data in the <code>SongService</code>, the <code>RecommendationsProvider</code> can't recommend any songs.
</p>
<p>
As I usually do with <a href="https://en.wikipedia.org/wiki/Characterization_test">Characterisation Tests</a>, I temporarily sabotage the System Under Test so that the test fails. This is to ensure that I didn't write a <a href="/2019/10/14/tautological-assertion">tautological assertion</a>. Once I've <a href="/2019/10/21/a-red-green-refactor-checklist">seen the test fail for the appropriate reason</a>, I undo the sabotage and <a href="https://stackoverflow.blog/2022/12/19/use-git-tactically/">check in the code</a>.
</p>
<h3 id="cf9aa5f1f470468da943f1c8bbd98698">
Refactor to property <a href="#cf9aa5f1f470468da943f1c8bbd98698">#</a>
</h3>
<p>
In the above <code>No data</code> test, the specific input value <code>"foo"</code> is irrelevant. It might as well have been any other string, so why not make it a property?
</p>
<p>
In this particular case, the <code>userName</code> could be any string, but it might be appropriate to write a custom generator that produces 'realistic' user names. Just to make things simple, I'm going to assume that user names are between one and twenty characters and assembled from alphanumeric characters, and that the fist character must be a letter:
</p>
<p>
<pre><span style="color:blue;">module</span> Gen =
<span style="color:blue;">let</span> alphaNumeric = Gen.elements ([<span style="color:#a31515;">'a'</span>..<span style="color:#a31515;">'z'</span>] @ [<span style="color:#a31515;">'A'</span>..<span style="color:#a31515;">'Z'</span>] @ [<span style="color:#a31515;">'0'</span>..<span style="color:#a31515;">'9'</span>])
<span style="color:blue;">let</span> userName = gen {
<span style="color:blue;">let!</span> length = Gen.choose (1, 19)
<span style="color:blue;">let!</span> firstLetter = Gen.elements <| [<span style="color:#a31515;">'a'</span>..<span style="color:#a31515;">'z'</span>] @ [<span style="color:#a31515;">'A'</span>..<span style="color:#a31515;">'Z'</span>]
<span style="color:blue;">let!</span> rest = alphaNumeric |> Gen.listOfLength length
<span style="color:blue;">return</span> firstLetter :: rest |> List.toArray |> String }</pre>
</p>
<p>
Strictly speaking, as long as user names are distinct, the code ought to work, so this generator may be more conservative than necessary. Why am I constraining the generator? For two reasons: First, when FsCheck finds a counter-example, it displays the values that caused the property to fail. A twenty-character alphanumeric string is easier to relate to than some arbitrary string with line breaks and unprintable characters. The second reason is that I'm later going to measure memory load for some of the alternatives, and I wanted data to have realistic size. If user names are chosen by humans, they're unlikely to be longer than twenty characters on average (I've decided).
</p>
<p>
I can now rewrite the above <code>No data</code> test as an FsCheck property:
</p>
<p>
<pre>[<Property>]
<span style="color:blue;">let</span> ``No data`` () =
Gen.userName |> Arb.fromGen |> Prop.forAll <| <span style="color:blue;">fun</span> userName <span style="color:blue;">-></span>
task {
<span style="color:blue;">let</span> srvc = FakeSongService ()
<span style="color:blue;">let</span> sut = RecommendationsProvider srvc
<span style="color:blue;">let!</span> actual = sut.GetRecommendationsAsync userName
Assert.Empty actual } :> Task</pre>
</p>
<p>
You may think that this is overkill just to be able to supply random user names to the <code>GetRecommendationsAsync</code> method. In isolation, I'd be inclined to agree, but this edit was an occasion to get my FsCheck infrastructure in place. I can now use that to add more properties.
</p>
<h3 id="ea6e29adf0a9458e98309a0f66394b32">
Full coverage <a href="#ea6e29adf0a9458e98309a0f66394b32">#</a>
</h3>
<p>
The cyclomatic complexity of the <code>GetRecommendationsAsync</code> method is only <em>3</em>, so it doesn't require many tests to attain full code coverage. <a href="/2015/11/16/code-coverage-is-a-useless-target-measure">Not that 100% code coverage should be a goal in itself</a>, but when adding tests to an untested code base, it can be useful as an indicator of confidence. Despite its low cyclomatic complexity, the method, with all of its filtering and sorting, is actually quite involved. 100% coverage strikes me as a low bar.
</p>
<p>
The above <code>No data</code> test exercises one of the three branches. <a href="/2019/12/09/put-cyclomatic-complexity-to-good-use">At most two more tests are required</a> to attain full coverage. I'll just show the simplest of them here.
</p>
<p>
The next test case requires a new FsCheck generator, in addition to <code>Gen.userName</code> already shown.
</p>
<p>
<pre><span style="color:blue;">let</span> song = ArbMap.generate ArbMap.defaults |> Gen.map Song</pre>
</p>
<p>
As a fairly simple one-liner, this seems close to the <a href="https://wiki.haskell.org/Fairbairn_threshold">Fairbairn threshold</a>, but I think that giving this generator a name makes the test easier to read.
</p>
<p>
<pre>[<Property>]
<span style="color:blue;">let</span> ``One user, some songs`` () =
gen {
<span style="color:blue;">let!</span> user = Gen.userName
<span style="color:blue;">let!</span> songs = Gen.arrayOf Gen.song
<span style="color:blue;">let!</span> scrobbleCounts =
Gen.choose (1, 100) |> Gen.arrayOfLength songs.Length
<span style="color:blue;">return</span> (user, Array.zip songs scrobbleCounts) }
|> Arb.fromGen |> Prop.forAll <| <span style="color:blue;">fun</span> (user, scrobbles) <span style="color:blue;">-></span>
task {
<span style="color:blue;">let</span> srvc = FakeSongService ()
scrobbles |> Array.iter (<span style="color:blue;">fun</span> (s, c) <span style="color:blue;">-></span> srvc.Scrobble (user, s, c))
<span style="color:blue;">let</span> sut = RecommendationsProvider srvc
<span style="color:blue;">let!</span> actual = sut.GetRecommendationsAsync user
Assert.Empty actual } :> Task</pre>
</p>
<p>
This test creates scrobbles for a single user and adds them to the Fake data store. It uses <a href="/2016/05/17/tie-fighter-fscheck-properties">TIE-fighter syntax</a> to connect the generators to the test body.
</p>
<p>
Since all the scrobble counts are generated between <em>1</em> and <em>100</em>, none of them are greater than or equal to <em>10,000</em> and thus the test expects no recommendations.
</p>
<p>
You may think that I'm cheating - after all, why didn't I choose another range for the scrobble count? To be honest, I was still in an exploratory phase, trying to figure out how to express the tests, and as a first step, I was aiming for full code coverage. Even though this test's assertion is weak, it <em>does</em> exercise another branch of the <code>GetRecommendationsAsync</code> method.
</p>
<p>
I had to write only one more test to fully cover the System Under Test. That method is more complicated, so I'll spare you the details. If you're interested, you may consider consulting the example source code repository.
</p>
<h3 id="1fcf0b78bfa94dd8b1b3cd1810068dc8">
Mutation testing <a href="#1fcf0b78bfa94dd8b1b3cd1810068dc8">#</a>
</h3>
<p>
While I don't think that code coverage is useful as a <em>target</em> measure, it can be illuminating as a tool. In this case, knowing that I've now attained full coverage tells me that I need to resort to other techniques if I want another goal to aim for.
</p>
<p>
I chose <a href="https://en.wikipedia.org/wiki/Mutation_testing">mutation testing</a> as my new technique. The <code>GetRecommendationsAsync</code> method makes heavy use of LINQ methods such as <code>OrderByDescending</code>, <code>Take</code>, and <code>Where</code>. <a href="https://stryker-mutator.io/">Stryker</a> for .NET knows about LINQ, so among all the automated sabotage is does, it tries to see what happens if it removes e.g. <code>Where</code> or <code>Take</code>.
</p>
<p>
Although I find the Stryker jargon childish, I set myself the goal to see if I could 'kill mutants' to a degree that I'd at least get a green rating.
</p>
<p>
I found that I could edge closer to that goal by a combination of appending assertions (thus <a href="/2021/12/13/backwards-compatibility-as-a-profunctor">strengthening postconditions</a>) and adding tests. While I sometimes find it <a href="/2021/02/15/when-properties-are-easier-than-examples">easier to define properties than examples</a>, at other times, it's the other way around. In this case, I found it easier to add single examples, like this one:
</p>
<p>
<pre>[<Fact>]
<span style="color:blue;">let</span> ``One verified recommendation`` () = task {
<span style="color:blue;">let</span> srvc = FakeSongService ()
srvc.Scrobble (<span style="color:#a31515;">"cat"</span>, Song (1, <span style="color:blue;">false</span>, 6uy), 10)
srvc.Scrobble (<span style="color:#a31515;">"ana"</span>, Song (1, <span style="color:blue;">false</span>, 5uy), 10)
srvc.Scrobble (<span style="color:#a31515;">"ana"</span>, Song (2, <span style="color:blue;">true</span>, 5uy), 9_9990)
<span style="color:blue;">let</span> sut = RecommendationsProvider srvc
<span style="color:blue;">let!</span> actual = sut.GetRecommendationsAsync <span style="color:#a31515;">"cat"</span>
Assert.Equal<Song> ([ Song (2, <span style="color:blue;">true</span>, 5uy) ], actual) } :> Task</pre>
</p>
<p>
It adds three scrobbles to the data store, but only one of them is verified (which is what the <code>true</code> value indicates), so this is the only recommendation the test expects to see.
</p>
<p>
Notice that although song number <code>2</code> 'only' has <em>9,9990</em> plays, the user <em>ana</em> has exactly <em>10,000</em> plays in all, so barely makes the cut. By carefully adding five examples like this one, I was able to 'kill all mutants'.
</p>
<p>
In all, I have eight tests; three FsCheck properties and five normal <a href="https://xunit.net/">xUnit.net</a> <em>facts</em>.
</p>
<p>
All tests work exclusively by supplying direct and <a href="http://xunitpatterns.com/indirect%20input.html">indirect input</a> to the System Under Test (SUT), and verify the return value of <code>GetRecommendationsAsync</code>. No <a href="http://xunitpatterns.com/Mock%20Object.html">Mocks</a> or <a href="http://xunitpatterns.com/Test%20Stub.html">Stubs</a> have opinions about how the SUT interacts with the injected <code>SongService</code>. This gives me confidence that the tests constitute a trustworthy regression test suite, and that they're still sufficiently decoupled from implementation details to enable me to completely rewrite the SUT.
</p>
<h3 id="ab6808bbac564477b6c1fd1751eafd48">
Quirks <a href="#ab6808bbac564477b6c1fd1751eafd48">#</a>
</h3>
<p>
When you add tests to an existing code base, you may discover edge cases that the original programmer overlooked. The <code>GetRecommendationsAsync</code> method is only a code example, so I don't blame Oleksii Holub for some casual coding, but it turns out that the code has some quirks.
</p>
<p>
For example, there's no deduplication, so I had to <a href="http://butunclebob.com/ArticleS.TimOttinger.ApologizeIncode">apologise in my test code</a>:
</p>
<p>
<pre>[<Fact>]
<span style="color:blue;">let</span> ``Only top-rated songs`` () = task {
<span style="color:blue;">let</span> srvc = FakeSongService ()
<span style="color:green;">// Scale ratings to keep them less than or equal to 10.</span>
[1..20] |> List.iter (<span style="color:blue;">fun</span> i <span style="color:blue;">-></span>
srvc.Scrobble (<span style="color:#a31515;">"hyle"</span>, Song (i, <span style="color:blue;">true</span>, byte i / 2uy), 500))
<span style="color:blue;">let</span> sut = RecommendationsProvider srvc
<span style="color:blue;">let!</span> actual = sut.GetRecommendationsAsync <span style="color:#a31515;">"hyle"</span>
Assert.NotEmpty actual
<span style="color:green;">// Since there's only one user, but with 20 songs, the implementation loops</span>
<span style="color:green;">// over the same songs 20 times, so 400 songs in total (with duplicates).</span>
<span style="color:green;">// Ordering on rating, only the top-rated 200 remains, that is, those rated</span>
<span style="color:green;">// 5-10. Note that this is a Characterization Test, so not necessarily</span>
<span style="color:green;">// reflective of how a real recommendation system should work.</span>
Assert.All (actual, <span style="color:blue;">fun</span> s <span style="color:blue;">-></span> Assert.True (5uy <= s.Rating)) } :> Task</pre>
</p>
<p>
This test creates twenty scrobbles for one user: One with a zero rating, two with rating <em>1</em>, two with rating <em>2</em>, and so on, up to a single song with rating <em>10</em>.
</p>
<p>
<a href="https://tyrrrz.me/blog/pure-impure-segregation-principle#interleaved-impurities">The implementation of GetRecommendationsAsync</a> uses these twenty songs to find 'other users' who have these top songs as well. In this case, there's only one user, so for every of those twenty songs, you get the same twenty songs, for a total of 400.
</p>
<p>
You might protest that this is because my <code>FakeSongService</code> implementation is too unsophisticated. <em>Obviously</em>, it should not return the 'original' user's songs! Do, however, consider the implied signature of the <code>GetTopListenersAsync</code> method:
</p>
<p>
<pre>Task<IReadOnlyCollection<User>> GetTopListenersAsync(<span style="color:blue;">int</span> songId);</pre>
</p>
<p>
The method only accepts a <code>songId</code> as input, and if we assume that the service is stateless, it doesn't know who the 'original' user is.
</p>
<p>
Should I fix the quirks? In a real system, it might be appropriate, but in this context I find it better to keep the them. Real systems often have quirks in the shape of legacy business rules and the like, so I only find it realistic that the system may exhibit some weird behaviour. The goal of this set of articles isn't to refactor <em>this particular system</em>, but rather to showcase alternative designs for a system sufficiently complicated to warrant refactorings. Simplifying the code might defeat that purpose.
</p>
<p>
As shown, I have an automated test that requires me to keep that behaviour. I think I'm in a good position to make sweeping changes to the code.
</p>
<h3 id="c114e6c5e6cd4d4fa1c682daf8ed3169">
Conclusion <a href="#c114e6c5e6cd4d4fa1c682daf8ed3169">#</a>
</h3>
<p>
As <a href="https://martinfowler.com/">Martin Fowler</a> writes, an essential precondition for refactoring is a trustworthy test suite. On a daily basis, millions of developers prove him wrong by deploying untested changes to production. There <em>are</em> other ways to make changes, including manual testing, <a href="https://en.wikipedia.org/wiki/A/B_testing">A/B testing</a>, testing in production, etc. Some of them may even work in some contexts.
</p>
<p>
In contrast to such real-world concerns, I don't have a production system with real users. Neither do I have a product owner or a department of manual testers. The best I can do is to add enough Characterisation Tests that I feel confident that I've described <em>the behaviour</em>, rather than the implementation, in enough detail to hold it in place. A <em>Software Vise</em>, as <a href="https://michaelfeathers.silvrback.com/bio">Michael Feathers</a> calls it in <a href="/ref/wewlc">Working Effectively with Legacy Code</a>.
</p>
<p>
Most systems in 'the real world' have too few automated tests. Adding tests to a legacy code base is a difficult discipline, so I found it worthwhile to document this work before embarking on the actual design changes promised by this article series. Now that this is out of the way, we can proceed.
</p>
<p>
The next two articles do more groundwork to establish equivalent code bases in F# and Haskell. If you only care about the C# examples, you can go back to the <a href="/2025/04/07/alternative-ways-to-design-with-functional-programming">first article in this article series</a> and use the table of contents to jump to the next C# example.
</p>
<p>
<strong>Next:</strong> <a href="/2025/04/14/porting-song-recommendations-to-f">Porting song recommendations to F#</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Alternative ways to design with functional programminghttps://blog.ploeh.dk/2025/04/07/alternative-ways-to-design-with-functional-programming2025-04-07T18:27:00+00:00Mark Seemann
<div id="post">
<p>
<em>A catalogue of FP solutions to a sample problem.</em>
</p>
<p>
If you're past <a href="https://en.wikipedia.org/wiki/Dreyfus_model_of_skill_acquisition">the novice stage</a> of learning programming, you will have noticed that there's usually more than one way to solve a particular problem. Sometimes one way is better than alternatives, but often, there's no single clearly superior option.
</p>
<p>
It's a cliche that you should use the right tool for the job. For programmers, our most important tools are the techniques, patterns, algorithms, and data structures we know, rather than the <a href="https://en.wikipedia.org/wiki/Integrated_development_environment">IDEs</a> we use. You can only choose the right tool for a job if you have more than one to choose from. Again, for programmers this implies knowing of more than one way to solve a problem. This is the reason I find <a href="/2020/01/13/on-doing-katas">doing katas</a> so valuable.
</p>
<p>
Instead of a kata, in the series that this article commences I'll take a look at an example problem and in turn propose multiple alternative solutions. The problem I'll tackle is bigger than a typical kata, but you can think of this article series as a spirit companion to <a href="https://fsharpforfunandprofit.com/posts/13-ways-of-looking-at-a-turtle/">Thirteen ways of looking at a turtle</a> by <a href="https://scottwlaschin.com/">Scott Wlaschin</a>.
</p>
<h3 id="1320481c55ee469d83bf622080b1f47d">
Recommendations <a href="#1320481c55ee469d83bf622080b1f47d">#</a>
</h3>
<p>
The problem I'll tackle was described by <a href="https://tyrrrz.me/">Oleksii Holub</a> in 2020, and I've been considering how to engage with it ever since.
</p>
<p>
Oleksii Holub presents it as <a href="https://tyrrrz.me/blog/pure-impure-segregation-principle#interleaved-impurities">the second of two examples</a> in an article named <a href="https://tyrrrz.me/blog/pure-impure-segregation-principle">Pure-Impure Segregation Principle</a>. In a nutshell, the problem is to identify song recommendations for a user, sourced from a vast repository of scrobbles.
</p>
<p>
The first code example in the article is fine as well, but it's not as rich a source of problems, so I don't plan to talk about it in this context.
</p>
<p>
Oleksii Holub's article does mention my article <a href="/2017/02/02/dependency-rejection">Dependency rejection</a> as well as the <a href="/2020/03/02/impureim-sandwich">Impureim Sandwich</a> pattern.
</p>
<p>
It's my experience that the Impureim Sandwich is surprisingly often applicable, despite its seemingly obvious limitations. More than once, people have responded that it doesn't work in general.
</p>
<p>
I've never claimed that the Impureim Sandwich is a general-purpose solution to everything, only that it surprisingly often fits, once you massage the problem a bit:
</p>
<ul>
<li><a href="/2019/12/02/refactoring-registration-flow-to-functional-architecture">Refactoring registration flow to functional architecture</a></li>
<li><a href="/2022/02/14/a-conditional-sandwich-example">A conditional sandwich example</a></li>
<li><a href="/2019/08/26/functional-file-system">Functional file system</a></li>
<li><a href="/2024/12/16/a-restaurant-sandwich">A restaurant sandwich</a></li>
<li>Song recommendations as a C# Impureim Sandwich</li>
<li>Song recommendations as an F# Impureim Sandwich</li>
<li>Song recommendations as a Haskell Impureim Sandwich</li>
</ul>
<p>
I have, however, <a href="/2017/02/02/dependency-rejection#36c724b49f614104842c47909cd9c916">solicited examples that challenge the pattern</a>, and occasionally readers supply examples, for which I'm thankful. I'm trying to get a sense for just how widely applicable the Impureim Sandwich pattern is, and finding its limitations is an important part of that work.
</p>
<p>
The song recommendations problem is the most elusive one I've seen so far, so I'm grateful to Oleksii Holub for the example.
</p>
<p>
In the articles in this series, I'll present various alternative solutions to that problem. To make things as clear as I can, I don't do this because I think that the code shown in the <a href="https://tyrrrz.me/blog/pure-impure-segregation-principle#interleaved-impurities">original article</a> is bad. Quite the contrary, I'd probably write it like that myself.
</p>
<p>
I offer the alternatives to teach. Only by knowing of more than one way of solving the problem can you pick the right tool for the job. It may turn out that the right design is the one already suggested by Oleksii Holub, but if you change circumstances, perhaps another design is better. Ultimately, I hope that the reader can extrapolate from this problem to other problems that he or she may encounter.
</p>
<p>
The way much online discourse is conducted today, I wish that I didn't have to emphasise the following: Someone may choose to read Oleksii Holub's article as a rebuttal of my ideas about <a href="/2018/11/19/functional-architecture-a-definition">functional architecture</a> and the Impureim Sandwich. I don't read it that way, but rather as honest pursuit of intellectual inquiry. I engage with it in the same spirit, grateful for the rich example that it offers.
</p>
<h3 id="c559bf1beeab46ed8f428291bccb07a2">
Code examples <a href="#c559bf1beeab46ed8f428291bccb07a2">#</a>
</h3>
<p>
I'll show code in the languages with which I'm most comfortable: C#, <a href="https://fsharp.org/">F#</a>, and <a href="https://www.haskell.org/">Haskell</a>. I'll attempt to write the C# code in such a way that programmers of Java, TypeScript, or similar languages can also read along. On the other hand, I'm not going to explain F# or Haskell, but I'll write the articles so that you can skip the F# or Haskell articles and still learn from the C# articles.
</p>
<p>
While I don't expect the majority of my readers to know Haskell, I find it an indispensable tool when evaluating <a href="/2018/11/19/functional-architecture-a-definition">whether or not a design is functional</a>. F# is a good didactic bridge between C# and Haskell.
</p>
<p>
The code is available upon request against a small <a href="/support">support donation</a> of 10 USD (or more). If you're one of my regular supporters, you have my gratitude and can get the code without further donation. Also, on his blog, <a href="https://tyrrrz.me/ukraine">Oleksii Holub asks you to support Ukraine against the aggressor</a>. If you can document that you've donated at least 10 USD to one of the charities listed there, on or after the publication of this article, I'll be happy to send you the code as well. In both cases, please <a href="/about#contact">write to me</a>.
</p>
<p>
I've used Git branches for the various alternatives. In each article, I'll do my best to remember to write which branch corresponds to the article.
</p>
<h3 id="883eb7f7dc37491d95b5246078d50bc1">
Articles <a href="#883eb7f7dc37491d95b5246078d50bc1">#</a>
</h3>
<p>
This article series will present multiple alternatives in more than one programming language. I find it most natural to group the articles according to design first, and language second.
</p>
<p>
While you can view this list as a catalogue of functional programming designs, I'm under no illusion that it's complete.
</p>
<ul>
<li><a href="/2025/04/10/characterising-song-recommendations">Characterising song recommendations</a></li>
<li><a href="/2025/04/14/porting-song-recommendations-to-f">Porting song recommendations to F#</a></li>
<li><a href="/2025/04/21/porting-song-recommendations-to-haskell">Porting song recommendations to Haskell</a></li>
<li>Song recommendations as an Impureim Sandwich
<ul>
<li>Song recommendations as a C# Impureim Sandwich
<ul>
<li>Song recommendations proof-of-concept memory measurements</li>
</ul>
</li>
<li>Song recommendations as an F# Impureim Sandwich</li>
<li>Song recommendations as a Haskell Impureim Sandwich</li>
</ul>
</li>
<li>Song recommendations from combinators
<ul>
<li>Song recommendations from C# combinators</li>
<li>Song recommendations from F# combinators</li>
<li>Song recommendations from Haskell combinators</li>
</ul>
</li>
<li>Song recommendations with pipes and filters
<ul>
<li>Song recommendations with Reactive Extensions for .NET</li>
<li>Song recommendations with F# agents</li>
</ul>
</li>
<li>Song recommendations with free monads
<ul>
<li>Song recommendations with Haskell free monads</li>
<li>Song recommendations with F# free monads</li>
<li>Song recommendations with C# free monads</li>
</ul>
</li>
</ul>
<p>
Some of the design alternatives will require detours to other interesting topics. While I'll do my best to write to enable you to skip the F# and Haskell content, few articles on this blog are self-contained. I do expect the reader to follow links when necessary, but if I've failed to explain anything to your satisfaction, please <a href="https://github.com/ploeh/ploeh.github.com#comments">leave a comment</a>.
</p>
<h3 id="5536e66300444c7da5874eaca5a6f583">
Conclusion <a href="#5536e66300444c7da5874eaca5a6f583">#</a>
</h3>
<p>
This article series examines multiple alternative designs to the song recommendations example presented by Oleksii Holub. The original example has <em>interleaved impurities</em> and is therefore not really functional, even though it looks 'functional' on the surface, due to its heavy use of filters and projections.
</p>
<p>
That example may leave some readers with the impression that there are some problems that, due to size or other 'real-world' constraints, are impossible to solve with functional programming. The present catalogue of design alternatives is my attempt to disabuse readers of that notion.
</p>
<p>
<strong>Next:</strong> <a href="/2025/04/10/characterising-song-recommendations">Characterising song recommendations</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Ports and fat adaptershttps://blog.ploeh.dk/2025/04/01/ports-and-fat-adapters2025-04-01T13:16:00+00:00Mark Seemann
<div id="post">
<p>
<em>Is it worth it having a separate use-case layer?</em>
</p>
<p>
When I occasionally post something about Ports and Adapters (also known as <a href="https://alistair.cockburn.us/hexagonal-architecture/">hexagonal architecture</a>), a few reactions seem to indicate that I'm 'doing it wrong'. I apologize for the use of <a href="https://en.wikipedia.org/wiki/Weasel_word">weasel words</a>, but I don't intend to put particular people on the spot. Everyone has been nice and polite about it, and it's possible that I've misunderstood the criticism. Even so, a few comments have left me with the impression that there's an elephant in the room that I should address.
</p>
<p>
In short, I usually don't abstract application behaviour from frameworks. I don't create 'application layers', 'use-case classes', 'mediators', or similar. This is a deliberate architecture decision.
</p>
<p>
In this article, I'll use a motivating example to describe the reasoning behind such a decision.
</p>
<h3 id="c46a4742981f40c9b241d078e862b7a9">
Humble Objects <a href="#c46a4742981f40c9b241d078e862b7a9">#</a>
</h3>
<p>
A software architect should consider how the choice of particular technologies impact the development and future sustainability of a solution. It's often a good idea to consider whether it makes sense to decouple application code from frameworks and particular external dependencies. For example, should you hide database access behind an abstraction? Should you decouple the Domain Model from the web framework you use?
</p>
<p>
This isn't always the right decision, but in the following, I'll assume that this is the case.
</p>
<p>
When you apply the <a href="https://en.wikipedia.org/wiki/Dependency_inversion_principle">Dependency Inversion Principle</a> (DIP) you let the application code define the abstractions it needs. If it needs to persist data, it may define a Repository interface. If it needs to send notifications, it may define a 'notification gateway' abstraction. Actual code that, say, communicates with a relational database is an <a href="https://en.wikipedia.org/wiki/Adapter_pattern">Adapter</a>. It translates the application interface into database SDK code.
</p>
<p>
I've been <a href="/2024/07/29/using-ports-and-adapters-to-persist-restaurant-table-configurations">over this ground already</a>, but to take an example from the sample code that accompanies <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, here's a single method from the <code>SqlReservationsRepository</code> Adapter:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span> <span style="font-weight:bold;color:#74531f;">Create</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">restaurantId</span>, <span style="color:#2b91af;">Reservation</span> <span style="font-weight:bold;color:#1f377f;">reservation</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">reservation</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">ArgumentNullException</span>(<span style="color:blue;">nameof</span>(<span style="font-weight:bold;color:#1f377f;">reservation</span>));
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">conn</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">SqlConnection</span>(ConnectionString);
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">cmd</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">SqlCommand</span>(createReservationSql, <span style="font-weight:bold;color:#1f377f;">conn</span>);
<span style="font-weight:bold;color:#1f377f;">cmd</span>.Parameters.<span style="font-weight:bold;color:#74531f;">AddWithValue</span>(<span style="color:#a31515;">"@Id"</span>, <span style="font-weight:bold;color:#1f377f;">reservation</span>.Id);
<span style="font-weight:bold;color:#1f377f;">cmd</span>.Parameters.<span style="font-weight:bold;color:#74531f;">AddWithValue</span>(<span style="color:#a31515;">"@RestaurantId"</span>, <span style="font-weight:bold;color:#1f377f;">restaurantId</span>);
<span style="font-weight:bold;color:#1f377f;">cmd</span>.Parameters.<span style="font-weight:bold;color:#74531f;">AddWithValue</span>(<span style="color:#a31515;">"@At"</span>, <span style="font-weight:bold;color:#1f377f;">reservation</span>.At);
<span style="font-weight:bold;color:#1f377f;">cmd</span>.Parameters.<span style="font-weight:bold;color:#74531f;">AddWithValue</span>(<span style="color:#a31515;">"@Name"</span>, <span style="font-weight:bold;color:#1f377f;">reservation</span>.Name.<span style="font-weight:bold;color:#74531f;">ToString</span>());
<span style="font-weight:bold;color:#1f377f;">cmd</span>.Parameters.<span style="font-weight:bold;color:#74531f;">AddWithValue</span>(<span style="color:#a31515;">"@Email"</span>, <span style="font-weight:bold;color:#1f377f;">reservation</span>.Email.<span style="font-weight:bold;color:#74531f;">ToString</span>());
<span style="font-weight:bold;color:#1f377f;">cmd</span>.Parameters.<span style="font-weight:bold;color:#74531f;">AddWithValue</span>(<span style="color:#a31515;">"@Quantity"</span>, <span style="font-weight:bold;color:#1f377f;">reservation</span>.Quantity);
<span style="font-weight:bold;color:#8f08c4;">await</span> <span style="font-weight:bold;color:#1f377f;">conn</span>.<span style="font-weight:bold;color:#74531f;">OpenAsync</span>().<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">await</span> <span style="font-weight:bold;color:#1f377f;">cmd</span>.<span style="font-weight:bold;color:#74531f;">ExecuteNonQueryAsync</span>().<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
}</pre>
</p>
<p>
This is one method of a class named <code>SqlReservationsRepository</code>, which is an Adapter that makes ADO.NET code look like the application-specific <code>IReservationsRepository</code> interface.
</p>
<p>
Such Adapters are often as 'thin' as possible. One dimension of measurement is to look at the <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a>, where the ideal is <em>1</em>, the lowest possible score. The code shown here has a complexity measure of <em>2</em> because of the null guard, which exists because of a static analysis rule.
</p>
<p>
In test parlance, we call such thin Adapters <a href="https://martinfowler.com/bliki/HumbleObject.html">Humble Objects</a>. Or, to paraphrase what <a href="http://blog.jenkster.com/">Kris Jenkins</a> said at the GOTO Copenhagen 2024 conference, separate code into parts that are
<ul>
<li>hard to test, but easy to get right</li>
<li>hard to get right, but easy to test.</li>
</ul>
</p>
<p>
You can do the same when sending email, querying a weather service, raising events on pub-sub infrastructure, getting the current date, etc. This isolates your application code from implementation details, such as particular database servers, SDKs, network protocols, and so on.
</p>
<p>
Shouldn't you be doing the same on the receiving end?
</p>
<h3 id="7a5b7c14c09248999d52f37e2fdc5608">
Fat Adapters <a href="#7a5b7c14c09248999d52f37e2fdc5608">#</a>
</h3>
<p>
In his article on <a href="https://alistair.cockburn.us/hexagonal-architecture/">Hexagonal Architecture</a> Alistair Cockburn acknowledges a certain asymmetry. Some ports are activated by the application. Those are the ones already examined. An application reads from and writes to a database. An application sends emails. An application gets the current date.
</p>
<p>
Other ports, on the other hand, <em>drive</em> the application. According to Tomas Petricek's <a href="https://tomasp.net/blog/2015/library-frameworks/">distinction between frameworks and libraries</a> (that I also use), this kind of behaviour characterizes a <em>framework</em>. Examples include web frameworks such as ASP.NET, Express.js, Django, or UI frameworks like Angular, WPF, and so on.
</p>
<p>
While I usually do <a href="/2023/09/04/decomposing-ctfiyhs-sample-code-base">shield my Domain Model from framework code</a>, I tend to write 'fat' Adapters. As far as I can tell, this is what some people have taken issue with.
</p>
<p>
Here's an example:
</p>
<p>
<pre>[<span style="color:#2b91af;">HttpPost</span>(<span style="color:#a31515;">"restaurants/{restaurantId}/reservations"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">ActionResult</span>> <span style="font-weight:bold;color:#74531f;">Post</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">restaurantId</span>, <span style="color:#2b91af;">ReservationDto</span> <span style="font-weight:bold;color:#1f377f;">dto</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">dto</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">ArgumentNullException</span>(<span style="color:blue;">nameof</span>(<span style="font-weight:bold;color:#1f377f;">dto</span>));
<span style="color:#2b91af;">Reservation</span>? <span style="font-weight:bold;color:#1f377f;">candidate1</span> = <span style="font-weight:bold;color:#1f377f;">dto</span>.<span style="font-weight:bold;color:#74531f;">TryParse</span>();
<span style="font-weight:bold;color:#1f377f;">dto</span>.Id = <span style="color:#2b91af;">Guid</span>.<span style="color:#74531f;">NewGuid</span>().<span style="font-weight:bold;color:#74531f;">ToString</span>(<span style="color:#a31515;">"N"</span>);
<span style="color:#2b91af;">Reservation</span>? <span style="font-weight:bold;color:#1f377f;">candidate2</span> = <span style="font-weight:bold;color:#1f377f;">dto</span>.<span style="font-weight:bold;color:#74531f;">TryParse</span>();
<span style="color:#2b91af;">Reservation</span>? <span style="font-weight:bold;color:#1f377f;">reservation</span> = <span style="font-weight:bold;color:#1f377f;">candidate1</span> ?? <span style="font-weight:bold;color:#1f377f;">candidate2</span>;
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">reservation</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">BadRequestResult</span>(<span style="color:green;">/* Describe the errors here */</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">restaurant</span> = <span style="font-weight:bold;color:#8f08c4;">await</span> RestaurantDatabase.<span style="font-weight:bold;color:#74531f;">GetRestaurant</span>(<span style="font-weight:bold;color:#1f377f;">restaurantId</span>)
.<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">restaurant</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">NotFoundResult</span>();
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">scope</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">TransactionScope</span>(<span style="color:#2b91af;">TransactionScopeAsyncFlowOption</span>.Enabled);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">reservations</span> = <span style="font-weight:bold;color:#8f08c4;">await</span> Repository
.<span style="font-weight:bold;color:#74531f;">ReadReservations</span>(<span style="font-weight:bold;color:#1f377f;">restaurant</span>.Id, <span style="font-weight:bold;color:#1f377f;">reservation</span>.At)
.<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">now</span> = Clock.<span style="font-weight:bold;color:#74531f;">GetCurrentDateTime</span>();
<span style="font-weight:bold;color:#8f08c4;">if</span> (!<span style="font-weight:bold;color:#1f377f;">restaurant</span>.MaitreD.<span style="font-weight:bold;color:#74531f;">WillAccept</span>(<span style="font-weight:bold;color:#1f377f;">now</span>, <span style="font-weight:bold;color:#1f377f;">reservations</span>, <span style="font-weight:bold;color:#1f377f;">reservation</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#74531f;">NoTables500InternalServerError</span>();
<span style="font-weight:bold;color:#8f08c4;">await</span> Repository.<span style="font-weight:bold;color:#74531f;">Create</span>(<span style="font-weight:bold;color:#1f377f;">restaurant</span>.Id, <span style="font-weight:bold;color:#1f377f;">reservation</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#1f377f;">scope</span>.<span style="font-weight:bold;color:#74531f;">Complete</span>();
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#74531f;">Reservation201Created</span>(<span style="font-weight:bold;color:#1f377f;">restaurant</span>.Id, <span style="font-weight:bold;color:#1f377f;">reservation</span>);
}</pre>
</p>
<p>
This is (still) code originating from the example code base that accompanies <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>, although I've here used the variation from <a href="/2022/09/12/coalescing-dtos">Coalescing DTOs</a>. I've also inlined the <code>TryCreate</code> helper method, so that the entire use-case flow is visible as a single method.
</p>
<p>
In a sense, we may consider this an Adapter, too. This <code>Post</code> <em>action method</em> is part of a Controller class that handles incoming HTTP requests. It is, however, not that class that deals with the HTTP protocol. Neither does it parse the request body, or checks headers, etc. The ASP.NET framework takes care of that.
</p>
<p>
By following certain naming conventions, adorning the method with an <code>[HttpPost]</code> attribute, and returning <code>ActionResult</code>, this method plays by the rules of the ASP.NET framework. Even if it doesn't implement any particular interface, or inherits from some ASP.NET base class, it clearly 'adapts' to the ASP.NET framework.
</p>
<p>
It does that by attempting to parse and validate input, look up data in data sources, and in general checking preconditions before delegating work to the Domain Model - which happens in the call to <code>MaitreD.WillAccept</code>.
</p>
<p>
This is where some people seem to get uncomfortable. If this is an Adapter, it's a 'fat' one. In this particular example, the cyclomatic complexity is <em>6</em>. Not really a Humble Object.
</p>
<p>
Shouldn't there be some kind of 'use-case' model?
</p>
<h3 id="21b1d89a43124bff9332d5d99bb32ab7">
Use-case Model API <a href="#21b1d89a43124bff9332d5d99bb32ab7">#</a>
</h3>
<p>
I deliberately avoid 'use-case' model, 'mediators', or whatever other name people tend to use. I'll try to explain why by going through the exercise of actually extracting such a model. My point, in short, is that I find it not worth the effort.
</p>
<p>
In the following, I'll call such a model a 'use-case', since this is one piece of terminology that I tend to run into. You may also call it an 'Application Model' or something else.
</p>
<p>
The 'problem' that I apparently am meant to solve is that most of the code in the above <code>Post</code> method is tightly coupled to ASP.NET. If we want to decouple this code, how do we go about it?
</p>
<p>
It's possible that my imagination fails me, but the best I can think of is some kind of 'use-case model' that models the 'make a reservation' use case. Perhaps we should name it <code>MakeReservationUseCase</code>. Should it be some kind of <a href="https://en.wikipedia.org/wiki/Command_pattern">Command</a> object? It could be, but I think that this is awkward, because it also needs to communicate with various dependencies, such as <code>RestaurantDatabase</code>, <code>Repository</code>, and <code>Clock</code>. A long-lived service object that can wrap around those dependencies seems like a better option, but then we need a method on that object.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">MakeReservationUseCase</span>
{
<span style="color:green;">// What to call this method? What to return? I hate this already.</span>
<span style="color:blue;">public</span> <span style="color:blue;">object</span> <span style="font-weight:bold;color:#74531f;">MakeReservation</span>(<span style="color:green;">/* What to receive here? */</span>)
{
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">NotImplementedException</span>();
}
}</pre>
</p>
<p>
What do we call such a method? Had this been a true Command object, the single parameterless method would be called <code>Execute</code>, but since I'm planning to work with a stateless service, the method should take arguments. I played with options such as <code>Try</code>, <code>Do</code>, or <code>Go</code>, so that you'd have <code>MakeReservationUseCase.Try</code> and so on. Still, I thought this bordered on 'cute' or 'clever' code, and at the very least not particularly <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> C#. So I settled for <code>MakeReservation</code>, but now we have <code>MakeReservationUseCase.MakeReservation</code>, which is clearly redundant. I don't like the direction this design is going.
</p>
<p>
The next question is what parameters this method should take?
</p>
<p>
Considering the above <code>Post</code> method, couldn't we pass the <code>dto</code> object on to the use-case model? Technically, we could, but consider this: The <code>ReservationDto</code> object's raison d'être is to support reception and transmission of JSON objects. As I've already covered in <a href="/2023/12/04/serialization-with-and-without-reflection">an earlier article series</a>, serialization formats are inexplicably coupled to the boundaries of a system.
</p>
<p>
Imagine that we wanted to <a href="/2023/09/04/decomposing-ctfiyhs-sample-code-base">decompose the code base into smaller constituent projects</a>. If so, the use-case model should be independent of the ASP.NET-based code. Does it seem reasonable, then, to define the use-case API in terms of a serialization format?
</p>
<p>
I don't think that's the right design. Perhaps, instead, we should 'explode' the <a href="https://en.wikipedia.org/wiki/Data_transfer_object">Data Transfer Object</a> (DTO) into its primitive constituents?
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">object</span> <span style="font-weight:bold;color:#74531f;">MakeReservation</span>(
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">restaurantId</span>,
<span style="color:blue;">string</span>? <span style="font-weight:bold;color:#1f377f;">id</span>,
<span style="color:blue;">string</span>? <span style="font-weight:bold;color:#1f377f;">at</span>,
<span style="color:blue;">string</span>? <span style="font-weight:bold;color:#1f377f;">email</span>,
<span style="color:blue;">string</span>? <span style="font-weight:bold;color:#1f377f;">name</span>,
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">quantity</span>)</pre>
</p>
<p>
I'm not too happy about this, either. Six parameters is pushing it, and this is even only an example. What if you need to pass more data than that? What if you need to pass a collection? What if each element in that collection contains another collection?
</p>
<p>
<a href="https://refactoring.com/catalog/introduceParameterObject.html">Introduce Parameter Object</a>, you say?
</p>
<p>
Given that this is the way we want to go (in this demonstration), this seems as though it's the only good option, but that means that we'd have to define <em>another</em> reservation object. Not the (JSON) DTO that arrives at the boundary. Not the <code>Reservation</code> Domain Model, because the data has yet to be validated. A <em>third</em> reservation class. I don't even know what to call such a class...
</p>
<p>
So I'll leave those six parameters as above, while pointing out that no matter what we do, there seems to be a problem.
</p>
<h3 id="cc4afd89b36a4b1f9534e995f7ebb546">
Return type woes <a href="#cc4afd89b36a4b1f9534e995f7ebb546">#</a>
</h3>
<p>
What should the <code>MakeReservation</code> method return?
</p>
<p>
The code in the above <code>Post</code> method returns various <code>ActionResult</code> objects that indicate success or various failures. This isn't an option if we want to decouple <code>MakeReservationUseCase</code> from ASP.NET. How may we instead communicate one of four results?
</p>
<p>
Many object-oriented programmers might suggest throwing custom exceptions, and that's a real option. If nothing else, it'd be idiomatic in a language like C#. This would enable us to declare the return type as <code>Reservation</code>, but we would also have to define three custom exception types.
</p>
<p>
There are some advantages to such a design, but it effectively boils down to <a href="https://wiki.c2.com/?DontUseExceptionsForFlowControl">using exceptions for flow control</a>.
</p>
<p>
Is there a way to model heterogeneous, mutually exclusive values? Another object-oriented stable is to introduce a type hierarchy. You could have four different classes that implement the same interface, or inherit from the same base class. If we go in this direction, then what <em>behaviour</em> should we define for this type? What do all four objects have in common? The only thing that they have in common is that we need to convert them to <code>ActionResult</code>.
</p>
<p>
We can't, however, have a method like <code>ToActionResult()</code> that converts the object to <code>ActionResult</code>, because that would couple the API to ASP.NET.
</p>
<p>
You could, of course, use downcasts to check the type of the return value, but if you do that, you might as well leave the method as shown above. If you plan on dynamic type checks and casts, the only base class you need is <code>object</code>.
</p>
<h3 id="a0fa4b1a896d44a79e127226c7776652">
Visitor <a href="#a0fa4b1a896d44a79e127226c7776652">#</a>
</h3>
<p>
If only there was a way to return heterogeneous, mutually exclusive data structures. If only C# had <a href="https://en.wikipedia.org/wiki/Tagged_union">sum types</a>...
</p>
<p>
Fortunately, while C# doesn't have sum types, it <em>is</em> possible to achieve the same goal. Use a <a href="/2018/06/25/visitor-as-a-sum-type">Visitor as a sum type</a>.
</p>
<p>
You could start with a type like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">MakeReservationResult</span>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#74531f;">Accept</span><<span style="color:#2b91af;">T</span>>(<span style="color:#2b91af;">IMakeReservationVisitor</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">visitor</span>)
{
<span style="color:green;">// Implementation to follow...</span>
}
}</pre>
</p>
<p>
As usual with the Visitor design pattern, you'll have to inspect the Visitor interface to learn about the alternatives that it supports:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IMakeReservationVisitor</span><<span style="color:#2b91af;">T</span>>
{
<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#74531f;">Success</span>(<span style="color:#2b91af;">Reservation</span> <span style="font-weight:bold;color:#1f377f;">reservation</span>);
<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#74531f;">InvalidInput</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">message</span>);
<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#74531f;">NoSuchRestaurant</span>();
<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#74531f;">NoTablesAvailable</span>();
}</pre>
</p>
<p>
This enables us to communicate that there's exactly four possible outcomes in a way that doesn't depend on ASP.NET.
</p>
<p>
The 'only' remaining work on the <code>MakeReservationResult</code> class is to implement the <code>Accept</code> method. Are you ready? Okay, here we go:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">MakeReservationResult</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">IMakeReservationResult</span> imp;
<span style="color:blue;">private</span> <span style="color:#2b91af;">MakeReservationResult</span>(<span style="color:#2b91af;">IMakeReservationResult</span> <span style="font-weight:bold;color:#1f377f;">imp</span>)
{
<span style="color:blue;">this</span>.imp = <span style="font-weight:bold;color:#1f377f;">imp</span>;
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">MakeReservationResult</span> <span style="color:#74531f;">Success</span>(<span style="color:#2b91af;">Reservation</span> <span style="font-weight:bold;color:#1f377f;">reservation</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">MakeReservationResult</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">SuccessResult</span>(<span style="font-weight:bold;color:#1f377f;">reservation</span>));
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">MakeReservationResult</span> <span style="color:#74531f;">InvalidInput</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">message</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">MakeReservationResult</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">InvalidInputResult</span>(<span style="font-weight:bold;color:#1f377f;">message</span>));
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">MakeReservationResult</span> <span style="color:#74531f;">NoSuchRestaurant</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">MakeReservationResult</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">NoSuchRestaurantResult</span>());
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">MakeReservationResult</span> <span style="color:#74531f;">NoTablesAvailable</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">MakeReservationResult</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">NoTablesAvailableResult</span>());
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#74531f;">Accept</span><<span style="color:#2b91af;">T</span>>(<span style="color:#2b91af;">IMakeReservationVisitor</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">visitor</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">this</span>.imp.<span style="font-weight:bold;color:#74531f;">Accept</span>(<span style="font-weight:bold;color:#1f377f;">visitor</span>);
}
<span style="color:blue;">private</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IMakeReservationResult</span>
{
<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#74531f;">Accept</span><<span style="color:#2b91af;">T</span>>(<span style="color:#2b91af;">IMakeReservationVisitor</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">visitor</span>);
}
<span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">SuccessResult</span> : <span style="color:#2b91af;">IMakeReservationResult</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">Reservation</span> reservation;
<span style="color:blue;">public</span> <span style="color:#2b91af;">SuccessResult</span>(<span style="color:#2b91af;">Reservation</span> <span style="font-weight:bold;color:#1f377f;">reservation</span>)
{
<span style="color:blue;">this</span>.reservation = <span style="font-weight:bold;color:#1f377f;">reservation</span>;
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#74531f;">Accept</span><<span style="color:#2b91af;">T</span>>(<span style="color:#2b91af;">IMakeReservationVisitor</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">visitor</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">visitor</span>.<span style="font-weight:bold;color:#74531f;">Success</span>(reservation);
}
}
<span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">InvalidInputResult</span> : <span style="color:#2b91af;">IMakeReservationResult</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">string</span> message;
<span style="color:blue;">public</span> <span style="color:#2b91af;">InvalidInputResult</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">message</span>)
{
<span style="color:blue;">this</span>.message = <span style="font-weight:bold;color:#1f377f;">message</span>;
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#74531f;">Accept</span><<span style="color:#2b91af;">T</span>>(<span style="color:#2b91af;">IMakeReservationVisitor</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">visitor</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">visitor</span>.<span style="font-weight:bold;color:#74531f;">InvalidInput</span>(message);
}
}
<span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">NoSuchRestaurantResult</span> : <span style="color:#2b91af;">IMakeReservationResult</span>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#74531f;">Accept</span><<span style="color:#2b91af;">T</span>>(<span style="color:#2b91af;">IMakeReservationVisitor</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">visitor</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">visitor</span>.<span style="font-weight:bold;color:#74531f;">NoSuchRestaurant</span>();
}
}
<span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">NoTablesAvailableResult</span> : <span style="color:#2b91af;">IMakeReservationResult</span>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#74531f;">Accept</span><<span style="color:#2b91af;">T</span>>(<span style="color:#2b91af;">IMakeReservationVisitor</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">visitor</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">visitor</span>.<span style="font-weight:bold;color:#74531f;">NoTablesAvailable</span>();
}
}
}</pre>
</p>
<p>
That's a lot of boilerplate code, but it's so automatable that there are programming languages that can do this for you. On .NET, it's called <a href="https://fsharp.org/">F#</a>, and all of that would be a single line of code.
</p>
<h3 id="6565ec44bc704fdab90010e0fc6c9ec1">
Use Case implementation <a href="#6565ec44bc704fdab90010e0fc6c9ec1">#</a>
</h3>
<p>
Implementing <code>MakeReservation</code> is now easy, since it mostly involves moving code from the Controller to the <code>MakeReservationUseCase</code> class, and changing it so that it returns the appropriate <code>MakeReservationResult</code> objects instead of <code>ActionResult</code> objects.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">MakeReservationUseCase</span>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">MakeReservationUseCase</span>(
<span style="color:#2b91af;">IClock</span> <span style="font-weight:bold;color:#1f377f;">clock</span>,
<span style="color:#2b91af;">IRestaurantDatabase</span> <span style="font-weight:bold;color:#1f377f;">restaurantDatabase</span>,
<span style="color:#2b91af;">IReservationsRepository</span> <span style="font-weight:bold;color:#1f377f;">repository</span>)
{
Clock = <span style="font-weight:bold;color:#1f377f;">clock</span>;
RestaurantDatabase = <span style="font-weight:bold;color:#1f377f;">restaurantDatabase</span>;
Repository = <span style="font-weight:bold;color:#1f377f;">repository</span>;
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">IClock</span> Clock { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:#2b91af;">IRestaurantDatabase</span> RestaurantDatabase { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:#2b91af;">IReservationsRepository</span> Repository { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">MakeReservationResult</span>> <span style="font-weight:bold;color:#74531f;">MakeReservation</span>(
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">restaurantId</span>,
<span style="color:blue;">string</span>? <span style="font-weight:bold;color:#1f377f;">id</span>,
<span style="color:blue;">string</span>? <span style="font-weight:bold;color:#1f377f;">at</span>,
<span style="color:blue;">string</span>? <span style="font-weight:bold;color:#1f377f;">email</span>,
<span style="color:blue;">string</span>? <span style="font-weight:bold;color:#1f377f;">name</span>,
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">quantity</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (!<span style="color:#2b91af;">Guid</span>.<span style="color:#74531f;">TryParse</span>(<span style="font-weight:bold;color:#1f377f;">id</span>, <span style="color:blue;">out</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">rid</span>))
<span style="font-weight:bold;color:#1f377f;">rid</span> = <span style="color:#2b91af;">Guid</span>.<span style="color:#74531f;">NewGuid</span>();
<span style="font-weight:bold;color:#8f08c4;">if</span> (!<span style="color:#2b91af;">DateTime</span>.<span style="color:#74531f;">TryParse</span>(<span style="font-weight:bold;color:#1f377f;">at</span>, <span style="color:blue;">out</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">rat</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">MakeReservationResult</span>.<span style="color:#74531f;">InvalidInput</span>(<span style="color:#a31515;">"Invalid date."</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">email</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">MakeReservationResult</span>.<span style="color:#74531f;">InvalidInput</span>(<span style="color:#a31515;">"Invalid email."</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">quantity</span> < 1)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">MakeReservationResult</span>.<span style="color:#74531f;">InvalidInput</span>(<span style="color:#a31515;">"Invalid quantity."</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">reservation</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">Reservation</span>(
<span style="font-weight:bold;color:#1f377f;">rid</span>,
<span style="font-weight:bold;color:#1f377f;">rat</span>,
<span style="color:blue;">new</span> <span style="color:#2b91af;">Email</span>(<span style="font-weight:bold;color:#1f377f;">email</span>),
<span style="color:blue;">new</span> <span style="color:#2b91af;">Name</span>(<span style="font-weight:bold;color:#1f377f;">name</span> ?? <span style="color:#a31515;">""</span>),
<span style="font-weight:bold;color:#1f377f;">quantity</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">restaurant</span> = <span style="font-weight:bold;color:#8f08c4;">await</span> RestaurantDatabase.<span style="font-weight:bold;color:#74531f;">GetRestaurant</span>(<span style="font-weight:bold;color:#1f377f;">restaurantId</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">restaurant</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">MakeReservationResult</span>.<span style="color:#74531f;">NoSuchRestaurant</span>();
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">scope</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">TransactionScope</span>(<span style="color:#2b91af;">TransactionScopeAsyncFlowOption</span>.Enabled);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">reservations</span> = <span style="font-weight:bold;color:#8f08c4;">await</span> Repository.<span style="font-weight:bold;color:#74531f;">ReadReservations</span>(<span style="font-weight:bold;color:#1f377f;">restaurant</span>.Id, <span style="font-weight:bold;color:#1f377f;">reservation</span>.At)
.<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">now</span> = Clock.<span style="font-weight:bold;color:#74531f;">GetCurrentDateTime</span>();
<span style="font-weight:bold;color:#8f08c4;">if</span> (!<span style="font-weight:bold;color:#1f377f;">restaurant</span>.MaitreD.<span style="font-weight:bold;color:#74531f;">WillAccept</span>(<span style="font-weight:bold;color:#1f377f;">now</span>, <span style="font-weight:bold;color:#1f377f;">reservations</span>, <span style="font-weight:bold;color:#1f377f;">reservation</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">MakeReservationResult</span>.<span style="color:#74531f;">NoTablesAvailable</span>();
<span style="font-weight:bold;color:#8f08c4;">await</span> Repository.<span style="font-weight:bold;color:#74531f;">Create</span>(<span style="font-weight:bold;color:#1f377f;">restaurant</span>.Id, <span style="font-weight:bold;color:#1f377f;">reservation</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#1f377f;">scope</span>.<span style="font-weight:bold;color:#74531f;">Complete</span>();
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">MakeReservationResult</span>.<span style="color:#74531f;">Success</span>(<span style="font-weight:bold;color:#1f377f;">reservation</span>);
}
}</pre>
</p>
<p>
I had to re-implement input validation, because the <code>TryParse</code> method is defined on <code>ReservationDto</code>, and the Use-case Model shouldn't be coupled to that class. Still, you could argue that if I'd immediately implemented the use-case architecture, I would never had had the parser defined on the DTO.
</p>
<h3 id="f55a7173c05d440d99375f72e70ba143">
Decoupled Controller <a href="#f55a7173c05d440d99375f72e70ba143">#</a>
</h3>
<p>
The Controller method may now delegate implementation to a <code>MakeReservationUseCase</code> object:
</p>
<p>
<pre>[<span style="color:#2b91af;">HttpPost</span>(<span style="color:#a31515;">"restaurants/{restaurantId}/reservations"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">ActionResult</span>> <span style="font-weight:bold;color:#74531f;">Post</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">restaurantId</span>, <span style="color:#2b91af;">ReservationDto</span> <span style="font-weight:bold;color:#1f377f;">dto</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">dto</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">ArgumentNullException</span>(<span style="color:blue;">nameof</span>(<span style="font-weight:bold;color:#1f377f;">dto</span>));
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">result</span> = <span style="font-weight:bold;color:#8f08c4;">await</span> makeReservationUseCase.<span style="font-weight:bold;color:#74531f;">MakeReservation</span>(
<span style="font-weight:bold;color:#1f377f;">restaurantId</span>,
<span style="font-weight:bold;color:#1f377f;">dto</span>.Id,
<span style="font-weight:bold;color:#1f377f;">dto</span>.At,
<span style="font-weight:bold;color:#1f377f;">dto</span>.Email,
<span style="font-weight:bold;color:#1f377f;">dto</span>.Name,
<span style="font-weight:bold;color:#1f377f;">dto</span>.Quantity).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">result</span>.<span style="font-weight:bold;color:#74531f;">Accept</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">PostReservationVisitor</span>(<span style="font-weight:bold;color:#1f377f;">restaurantId</span>));
}</pre>
</p>
<p>
While that looks nice and slim, it's not all, because you also need to define the <code>PostReservationVisitor</code> class:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">PostReservationVisitor</span> : <span style="color:#2b91af;">IMakeReservationVisitor</span><<span style="color:#2b91af;">ActionResult</span>>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">int</span> restaurantId;
<span style="color:blue;">public</span> <span style="color:#2b91af;">PostReservationVisitor</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">restaurantId</span>)
{
<span style="color:blue;">this</span>.restaurantId = <span style="font-weight:bold;color:#1f377f;">restaurantId</span>;
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">ActionResult</span> <span style="font-weight:bold;color:#74531f;">Success</span>(<span style="color:#2b91af;">Reservation</span> <span style="font-weight:bold;color:#1f377f;">reservation</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#74531f;">Reservation201Created</span>(restaurantId, <span style="font-weight:bold;color:#1f377f;">reservation</span>);
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">ActionResult</span> <span style="font-weight:bold;color:#74531f;">InvalidInput</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">message</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">BadRequestObjectResult</span>(<span style="font-weight:bold;color:#1f377f;">message</span>);
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">ActionResult</span> <span style="font-weight:bold;color:#74531f;">NoSuchRestaurant</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">NotFoundResult</span>();
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">ActionResult</span> <span style="font-weight:bold;color:#74531f;">NoTablesAvailable</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#74531f;">NoTables500InternalServerError</span>();
}
}</pre>
</p>
<p>
Notice that this implementation has to receive the <code>restaurantId</code> value through its constructor, since this piece of data isn't part of the <code>IMakeReservationVisitor</code> API. If only we could have handled all that pattern matching with a simple closure...
</p>
<p>
Well, you can. You could have used <a href="/2018/05/22/church-encoding">Church encoding</a> instead of the Visitor pattern, but many programmers find that less idiomatic, or not sufficiently object-oriented.
</p>
<p>
Be that as it may, the Controller now plays the role of an Adapter between the ASP.NET framework and the framework-neutral Use-case Model. Is it all worth it?
</p>
<h3 id="e91851ceab124208bfcb3d1705e44bf1">
Reflection <a href="#e91851ceab124208bfcb3d1705e44bf1">#</a>
</h3>
<p>
Where does that put us? It's certainly <em>possible</em> to decouple the Use-case Model from the specific framework, but at what cost?
</p>
<p>
In this example, I had to introduce two new classes and one interface, as well as four <code>private</code> implementation classes and a <code>private</code> interface.
</p>
<p>
And that was to support just one use case. If I want to implement a query (HTTP <code>GET</code>), I would need to go through similar motions, but with slight variations. And again for updates (HTTP <code>PUT</code>). And again for deletes. And again for the next resource, such as the restaurant calendar, daily schedule, management of tenants, and so on.
</p>
<p>
The cost seems rather substantial to me. Do the benefits outweigh them? What <em>are</em> the benefits?
</p>
<p>
Well, you now have a technology-neutral application model. You could, conceivably, tear out ASP.NET and replace it with, oh... <a href="https://servicestack.net/">ServiceStack</a>. Perhaps. Theoretically. I haven't tried.
</p>
<p>
This strikes me as an argument similar to insisting that hiding database access behind an interface enables us to replace SQL Server with a document database. That rarely happens, and is not really why we do that.
</p>
<p>
So to be fair, decoupling also protects us from changes in libraries and frameworks. It makes it easier to modify one part of the system without having to worry (too much) about other parts. It makes it easier to subject subsystems to automated testing.
</p>
<p>
Does the above refactoring improve testability? Not really. <code>MakeReservationUseCase</code> may be testable, but so was the original Controller. The entire code base for <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a> was developed with test-driven development (TDD). The Controllers are mostly covered by <a href="/2021/01/25/self-hosted-integration-tests-in-aspnet">self-hosted integration tests</a>, and bolstered with a few unit tests that directly call their <em>action methods</em>.
</p>
<p>
Another argument for a decoupled Use-case Model is that it might enable you to transplant the entire application to a new context. Since it doesn't depend on ASP.NET, you could reuse it in an Android or iPhone app. Or a batch job. Or an AI-assisted chat bot. Right? Couldn't you?
</p>
<p>
I'd be surprised if that were the case. Every application type has its own style of user interaction, and they tend to be incompatible. The user-interface flow of a web application is fundamentally different from a rich native app.
</p>
<p>
In short, I consider the notion of a technology-neutral Use-case Model to be a distraction. That's why I usually don't bother.
</p>
<h3 id="df0ed0cff3c84095b3993bb56f06bd08">
Conclusion <a href="#df0ed0cff3c84095b3993bb56f06bd08">#</a>
</h3>
<p>
I usually implement Controllers, message handlers, application entry points, and so on as fairly 'fat Adapters' over particular frameworks. I do that because I expect the particular application or user interaction to be intimately tied to that kind of application. This doesn't mean that I just throw everything in there.
</p>
<p>
Fat Adapters should still be covered by automated tests. They should still show appropriate decoupling. Usually, I treat each as an <a href="/2020/03/02/impureim-sandwich">Impureim Sandwich</a>. All impure actions happen in the Fat Adapter, and everything else is done by a <a href="https://en.wikipedia.org/wiki/Pure_function">pure function</a>. Granted, however, this kind of architecture comes much more natural when you are working in a programming language that supports it.
</p>
<p>
C# doesn't really, but you can make it work. And that work, contrary to modelling use cases as classes, is, in my experience, well worth the effort.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="8979c41ff73fb331352e90cc8dd81598">
<div class="comment-author">Thomas Skovsende <a href="#8979c41ff73fb331352e90cc8dd81598">#</a></div>
<div class="comment-content">
<p>
I do not really strongly disagree, but what would you do in the case where you had multiple entry points for your use cases? Ie. I have a http endpoint that creates a reservation, but I also need to be able to listen to a messagebus where reservations can come in.
A lot of the logic is the same in both cases.
</p>
</div>
<div class="comment-date">2025-04-09 09:17 UTC</div>
</div>
<div class="comment" id="1cee4ef1b06f4638822ea504229f4168">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#1cee4ef1b06f4638822ea504229f4168">#</a></div>
<div class="comment-content">
<p>
Thomas, thank you for writing. I'll give you two answers, because I believe that the specific answer doesn't generalize. On the other hand, the specific answer also has merit.
</p>
<p>
In the particular case, it seems as though an 'obvious' architecture would be to have the HTTP endpoint do minimal work required to parse the incoming data, and then put it on the message bus together with all the other messages. I could imagine situations where that would be appropriate, but I can also image some edge(?) cases where that still couldn't work.
</p>
<p>
As a general answer, however, having some common function or object that handles both cases could make sense. That's pretty much the architecture I spend this article discouraging. That said, as I tried to qualify, I <em>usually</em> don't run into situations where such an architecture is warranted. Case in point, I haven't run into a scenario like the one you describe. Other people, however, also wrote to tell me that they have two endpoints, such as both gRPC and HTTP, or both SOAP and REST, so while I, personally, haven't seen this much, it clearly does happen.
</p>
<p>
In short, I don't mind that kind of architecture when it addresses an actual problem. Often, though, that kind of requirement isn't present, and in that case, this kind of 'use-case model' architecture shouldn't be the default.
</p>
<p>
The reason I also gave you the specific answer is that I often get the impression that people seek general, universal solutions, and this could make them miss elegant shortcuts.
</p>
</div>
<div class="comment-date">2025-04-13 15:17 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Phased breaking changeshttps://blog.ploeh.dk/2025/03/17/phased-breaking-changes2025-03-17T14:02:00+00:00Mark Seemann
<div id="post">
<p>
<em>Giving advance warning before breaking client code.</em>
</p>
<p>
I was recently listening to <a href="https://www.dotnetrocks.com/details/1941">Jimmy Bogard on .NET Rocks! talking about 14 versions of Automapper</a>. It made me reminisce on how I dealt with versioning of <a href="https://github.com/AutoFixture/AutoFixture/">AutFixture</a>, in the approximately ten years I helmed that project.
</p>
<p>
Jimmy has done open source longer than I have, and it sounds as though he's found a way that works for him. When I led AutoFixture, I did things a bit differently, which I'll outline in this article. In no way do I mean to imply that that way was better than Jimmy's. It may, however, strike a chord with a reader or two, so I present it in the hope that some readers may find the following ideas useful.
</p>
<h3 id="ae623718cc094595b125b019f9cecd2b">
Scope <a href="#ae623718cc094595b125b019f9cecd2b">#</a>
</h3>
<p>
This article is about versioning a code base. Typically, a code base contains 'modules' of a kind, and client code that relies on those modules. In object-oriented programming, modules are often called <em>classes</em>, but in general, what matters in this context is that some kind of API exists.
</p>
<p>
The distinction between API and client code is most clear if you're maintaining a reusable library, and you don't know the client developers, but even internal application code has APIs and client code. The following may still be relevant if you're working in a code base together with colleagues.
</p>
<p>
This article discusses code-level APIs. Examples include C# code that other .NET code can call, but may also apply to <a href="https://www.java.com">Java</a> objects callable from <a href="https://clojure.org/">Clojure</a>, <a href="https://www.haskell.org/">Haskell</a> code callable by other Haskell code, etc. It does not discuss versioning of <a href="https://en.wikipedia.org/wiki/REST">REST</a> APIs or other kinds of online services. I have, in the past, discussed versioning in such a context, and refer you, among other articles, to <a href="/2015/06/22/rest-implies-content-negotiation">REST implies Content Negotiation</a> and <a href="/2020/06/01/retiring-old-service-versions">Retiring old service versions</a>.
</p>
<p>
Additionally, some of the techniques outlined here are specific to .NET, or even C#. If, as I suspect, JavaScript or other languages don't have those features, then these techniques don't apply. They're hardly universal.
</p>
<h3 id="2bc952afa11a410d934c3fa37f9aa475">
Semantic versioning <a href="#2bc952afa11a410d934c3fa37f9aa475">#</a>
</h3>
<p>
The first few years of AutoFixture, I didn't use a systematic versioning scheme. That changed when I encountered <a href="https://semver.org/">Semantic Versioning</a>: In 2011 I <a href="/2011/09/06/AutoFixturegoesContinuousDeliverywithSemanticVersioning">changed AutoFixture versioning to Semantic Versioning</a>. This forced me to think explicitly about breaking changes.
</p>
<p>
As an aside, in recent years I've encountered the notion that Semantic Versioning is somehow defunct. This is often based on the observation that Semantic Version 2.0.0 was published in 2013. Surely, if no further development has taken place, it's been abandoned by its maintainer? This may or may not be the case. Does it matter?
</p>
<p>
The original author, <a href="https://tom.preston-werner.com/">Tom Preston-Werner</a>, may have lost interest in Semantic Versioning. Or perhaps it's simply <em>done</em>. Regardless of the underlying reasons, I find Semantic Versioning useful as it is. The fact that it hasn't changed since 2013 may be an indication that it's stable. After all, it's not a piece of software. It's a specification that helps you think about versioning, and in my opinion, it does an excellent job of that.
</p>
<p>
As I already stated, once I started using Semantic Versioning I began to think explicitly about breaking changes.
</p>
<h3 id="8301d3062b2f47ae8a292396da4ade8e">
Advance warning <a href="#8301d3062b2f47ae8a292396da4ade8e">#</a>
</h3>
<p>
Chapter 10 in <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> is about making changes to existing code bases. Unless you're working on a solo project with no other programmers, changes you make impact other people. If you can, avoid breaking other people's code. The chapter discusses some techniques for that, but also briefly covers how to introduce breaking changes. Some of that chapter is based on my experience with AutoFixture.
</p>
<p>
If your language has a way to retire an API, use it. In Java you can use the <code>@Deprecated</code> annotation, and in C# the equivalent <code>[Obsolete]</code> attribute. In C#, any client code that uses a method with the <code>[Obsolete]</code> attribute will now emit a compiler warning.
</p>
<p>
By default, this will only be a warning, and there's certainly a risk that people don't look at those. On the other hand, if you follow my advice from Code That Fits in Your Head, you should treat warnings as errors. If you do, however, those warnings emitted by <code>[Obsolete]</code> attributes will prevent your code from compiling. Or, if you're the one who just adorned a method with that attribute, you should understand that you may just have inconvenienced someone else.
</p>
<p>
Therefore, whenever you add such an attribute, do also add a message that tells client developers how to move on from the API that you've just retired. As an example, here's an (ASP.NET) method that handles <code>GET</code> requests for calendar resources:
</p>
<p>
<pre>[<span style="color:#2b91af;">Obsolete</span>(<span style="color:#a31515;">"Use Get method with restaurant ID."</span>)]
[<span style="color:#2b91af;">HttpGet</span>(<span style="color:#a31515;">"calendar/{year}/{month}"</span>)]
<span style="color:blue;">public</span> <span style="color:#2b91af;">ActionResult</span> <span style="font-weight:bold;color:#74531f;">LegacyGet</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">year</span>, <span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">month</span>)</pre>
</p>
<p>
To be honest, that message may be a bit on the terse side, but the point is that there's another method on the same class that takes an additional <code>restaurantId</code>. While I'm clearly not perfect, and should have written a more detailed message, the point is that you should make it as easy as possible for client developers to deal with the problem that you've just given them. My <a href="/2014/12/23/exception-messages-are-for-programmers">rules for exception messages</a> also apply here.
</p>
<p>
It's been more than a decade, but as I remember it, in the AutoFixture code base, I kept a list of APIs that I intended to deprecate at the next major revision. In other words, there were methods I considered fair use in a particular major version, but that I planned to phase out over multiple revisions. There were, however, a few methods that I immediately adorned with the <code>[Obsolete]</code> attribute, because I realized that they created problems for people.
</p>
<p>
The plan, then, was to take it up a notch when releasing a new major version. To be honest, though, I never got to execute the final steps of the plan.
</p>
<h3 id="0fa47e6c5e8c451fbd46ca64ff5a078f">
Escalation <a href="#0fa47e6c5e8c451fbd46ca64ff5a078f">#</a>
</h3>
<p>
By default, the <code>[Obsolete]</code> attribute emits a warning, but by supplying <code>true</code> as a second parameter, you can turn the warning into a compiler error.
</p>
<p>
<pre>[<span style="color:#2b91af;">Obsolete</span>(<span style="color:#a31515;">"Use Get method with restaurant ID."</span>, <span style="color:blue;">true</span>)]
[<span style="color:#2b91af;">HttpGet</span>(<span style="color:#a31515;">"calendar/{year}/{month}"</span>)]
<span style="color:blue;">public</span> <span style="color:#2b91af;">ActionResult</span> <span style="font-weight:bold;color:#74531f;">LegacyGet</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">year</span>, <span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">month</span>)</pre>
</p>
<p>
You could argue that for people who treat warnings as errors, even a warning is a breaking change, but there can be no discussion that when you flip that bit, this is certainly a breaking change.
</p>
<p>
Thus, you should only escalate to this level when you publish a new major release.
</p>
<p>
Code already compiled against previous versions of your deprecated code may still work, but that's it. Code isn't going to compile against an API deprecated like that.
</p>
<p>
That's the reason it's important to give client developers ample warning.
</p>
<p>
With AutoFixture, I personally never got to that point, because I'm not sure that I arrived at this deprecation strategy until major version 3, which then had a run from early 2013 to late 2017. In other words, the library had a run of 4½ years without breaking changes. And when major version 4 rolled around, <a href="https://github.com/AutoFixture/AutoFixture/issues/703">I'd left the project</a>.
</p>
<p>
Even after setting the <code>error</code> flag to <code>true</code>, code already compiled against earlier versions may still be able to run against newer binaries. Thus, you still need to keep the deprecated API around for a while longer. Completely removing a deprecated method should only happen in yet another major version release.
</p>
<h3 id="3ad29c5ed5fe4253ac319af37ddffa0b">
Conclusion <a href="#3ad29c5ed5fe4253ac319af37ddffa0b">#</a>
</h3>
<p>
To summarize, deprecating an API could be considered a breaking change. If you take that position, imagine that your current Semantic Version is 2.44.2. Deprecating a method would then required that you release version 3.0.0.
</p>
<p>
In any case, you make some more changes to your code, reaching version 3.5.12. For various reasons, you decide to release version 4.0.0, in which you can also turn the <code>error</code> flag on. EVen so, the deprecated API remains in the library.
</p>
<p>
Only in version 5.0.0 can you entirely delete it.
</p>
<p>
Depending on how often you change major versions, this whole process may take years. I find that appropriate.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Appeal to aithorityhttps://blog.ploeh.dk/2025/03/10/appeal-to-aithority2025-03-10T14:40:00+00:00Mark Seemann
<div id="post">
<p>
<em>No, it's not a typo.</em>
</p>
<p>
A few months ago, I was listening to a semi-serious programme from the <a href="https://en.wikipedia.org/wiki/DR_P1">Danish public service radio</a>. This is a weekly programme about language that I always listen to as a podcast. The host is the backbone of the show, but in addition to new guests each week, he's flanked by a regular expert who is highly qualified to answer questions about etymology, grammar, semantics, etc.
</p>
<p>
In the episode I'm describing, the expert got a question that a listener had previously emailed. To answer, (s)he started like this (and I'm paraphrasing): <em>I don't actually know the answer to this question, so I did what everyone does these days, when they don't know the answer: I asked ChatGPT.</em>
</p>
<p>
(S)he then proceeded to read aloud what ChatGPT had answered, and concluded with some remarks along the lines that that answer sounded quite plausible.
</p>
<p>
If I used ten to twenty hours of my time re-listening to every episode from the past few months, I could find the particular episode, link to it, transcribe the exact words, and translate them to English to the best of my abilities. I am, however, not going to do that. First, I'm not inclined to use that much time writing an essay on which I make no income. Second, my aim is not to point fingers at anyone in particular, so I'm being deliberately vague. As you may have noticed, I've even masked the person's sex. Not because I don't remember, but to avoid singling out anyone.
</p>
<p>
The expert in question is a regular of the programme, and I've heard him or her give good and knowledgeable answers to many tricky questions. As far as I could tell, this particular question was unanswerable, along the lines of <em>why is 'table' called 'table' rather than 'griungth'?</em>
</p>
<p>
The correct answer would have been <em>I don't know, and I don't think anyone else does.</em>
</p>
<p>
Being a veteran of the programme, (s)he must have realized on beforehand that this wouldn't be good radio, and instead decided to keep it light-hearted.
</p>
<p>
I get that, and I wouldn't be writing about it now if it doesn't look like an example of an increasing trend.
</p>
<p>
People are using large language models (LLMs) to advocate for their positions.
</p>
<h3 id="27bb7d65cc8746f09d92f650f0a612eb">
Appeal to authority <a href="#27bb7d65cc8746f09d92f650f0a612eb">#</a>
</h3>
<p>
<a href="https://en.wikipedia.org/wiki/Argument_from_authority">Appeal to authority</a> is no new technique in discourse.
</p>
<blockquote>
<p>
"You may also, should it be necessary, not only twist your authorities, but actually falsify them, or quote something which you have invented entirely yourself. As a rule, your opponent has no books at hand, and could not use them if he had."
</p>
<footer><cite><a href="https://en.wikipedia.org/wiki/The_Art_of_Being_Right">The Art of Being Right</a></cite>, <a href="https://en.wikipedia.org/wiki/Arthur_Schopenhauer">Arthur Schopenhauer</a>, 1831</footer>
</blockquote>
<p>
This seems similar to how people have begun using so-called artificial intelligence (AI) to do their arguing for them. We may, instead, call this <em>appeal to aithority</em>.
</p>
<h3 id="535b5c4b352241f9bfd089677757b18f">
Epistemological cul-de-sac <a href="#535b5c4b352241f9bfd089677757b18f">#</a>
</h3>
<p>
We've all seen plenty of examples of LLMs being wrong. I'm not going to tire you with any of those here, but I did outline <a href="/2022/12/05/github-copilot-preliminary-experience-report">my experience with GitHub Copilot in 2022</a>. While these technologies may have made some advances since then, they still make basic mistakes.
</p>
<p>
Not only that. They're also non-deterministic. Ask a system a question once, and you get one answer. Ask the same question later, and you may get a variation of the same answer, or perhaps even a contradictory answer. If someone exhibits an answer they got from an LLM as an argument in their favour, consider that they may have been asking it five or six times before they received an answer they liked.
</p>
<p>
Finally, you can easily ask leading questions. Even if someone shows you a screen shot of a chat with an LLM, they may have clipped prior instructions that nudge the system towards a particular bias.
</p>
<p>
I've seen people post screen shots that an LLM claims that <a href="https://fsharp.org/">F#</a> is a better programming language than C#. While I'm sympathetic to that claim, that's not an argument. Just like <a href="/2020/10/12/subjectivity">how you feel about something isn't an argument</a>.
</p>
<p>
This phenomenon seems to be a new trend. People use answers from LLMs as evidence that they are right. I consider this an epistemological dead end.
</p>
<h3 id="3a509e32ddc74ecb8dc0c7bf8048156a">
Real authority <a href="#3a509e32ddc74ecb8dc0c7bf8048156a">#</a>
</h3>
<p>
Regular readers of this blog may have noticed that I often go to great lengths to track down appropriate sources to cite. I do this for several reasons. One is simply out of respect for the people who figured out things before us. Another reason is to strengthen my own arguments.
</p>
<p>
It may seem that I, too, appeal to authority. Indeed, I do. When not used in in the way Schopenhauer describes, citing authority is a necessary epistemological shortcut. If someone who knows much about a particular subject has reached a conclusion based on his or her work, we may (tentatively) accept the conclusion without going through all the same work. As Carl Sagan said, "If you wish to make an apple pie from scratch, you must first invent the universe." You can't do <em>all</em> basic research by yourself. At some point, you'll have to take expert assertions at face value, because you don't have the time, the education, or the money to build your own <a href="https://en.wikipedia.org/wiki/Large_Hadron_Collider">Large Hadron Collider</a>.
</p>
<p>
Don't blindly accept an argument on the only grounds that someone famous said something, but on the other hand, there's no reason to dismiss out of hand what <a href="https://en.wikipedia.org/wiki/Albert_Einstein">Albert Einstein</a> had to say about gravity, just because you've heard that you shouldn't accept an argument based on appeal to authority.
</p>
<h3 id="8dbb3f507d8d49b2aa2f322d80fb4031">
Conclusion <a href="#8dbb3f507d8d49b2aa2f322d80fb4031">#</a>
</h3>
<p>
I'm concerned that people increasingly seem to resort to LLMs to argue a case. The LLMs says this, so it must be right.
</p>
<p>
Sometimes, people will follow up their arguments with <em>of course, it's just an AI, but...</em> and then proceed to unfold their preferred argument. Even if this seems as though the person is making a 'real' argument, starting from an LLM answer establishes a baseline to a discussion. It still lends an aura of truth to something that may be false.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Reactive monadhttps://blog.ploeh.dk/2025/03/03/reactive-monad2025-03-03T09:30:00+00:00Mark Seemann
<div id="post">
<p>
<em>IObservable<T> is (also) a monad.</em>
</p>
<p>
This article is an instalment in <a href="/2022/03/28/monads">an article series about monads</a>. While the previous articles showed, in great detail, how to turn various classes into monads, this article mostly serves as a place-holder. The purpose is only to point out that you don't have to create all monads yourself. Sometimes, they come as part of a reusable library.
</p>
<p>
<a href="http://reactivex.io">Rx</a> define such libraries, and <code>IObservable<T></code> forms a monad. <a href="https://github.com/dotnet/reactive">Reactive Extensions for .NET</a> define a <code>SelectMany</code> method for <code>IObservable<T></code>, so if <code>source</code> is an <code><span style="color:#2b91af;">IObservable</span><<span style="color:blue;">int</span>></code>, you can translate it to <code><span style="color:#2b91af;">IObservable</span><<span style="color:blue;">char</span>></code> like this:
</p>
<p>
<pre><span style="color:#2b91af;">IObservable</span><<span style="color:blue;">char</span>> <span style="font-weight:bold;color:#1f377f;">dest</span> = <span style="font-weight:bold;color:#1f377f;">source</span>.<span style="font-weight:bold;color:#74531f;">SelectMany</span>(<span style="font-weight:bold;color:#1f377f;">i</span> => <span style="color:#2b91af;">Observable</span>.<span style="color:#74531f;">Repeat</span>(<span style="color:#a31515;">'x'</span>, <span style="font-weight:bold;color:#1f377f;">i</span>));</pre>
</p>
<p>
Since the <code>SelectMany</code> method is, indeed, called <code>SelectMany</code> and has the signature
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">IObservable</span><<span style="color:#2b91af;">TResult</span>> <span style="color:#74531f;">SelectMany</span><<span style="color:#2b91af;">TSource</span>, <span style="color:#2b91af;">TResult</span>>(
<span style="color:blue;">this</span> <span style="color:#2b91af;">IObservable</span><<span style="color:#2b91af;">TSource</span>> <span style="font-weight:bold;color:#1f377f;">source</span>,
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">TSource</span>, <span style="color:#2b91af;">IObservable</span><<span style="color:#2b91af;">TResult</span>>> <span style="font-weight:bold;color:#1f377f;">selector</span>)</pre>
</p>
<p>
you can also use C#'s query syntax:
</p>
<p>
<pre><span style="color:#2b91af;">IObservable</span><<span style="color:blue;">char</span>> <span style="font-weight:bold;color:#1f377f;">dest</span> = <span style="color:blue;">from</span> i <span style="color:blue;">in</span> <span style="font-weight:bold;color:#1f377f;">source</span>
<span style="color:blue;">from</span> x <span style="color:blue;">in</span> <span style="color:#2b91af;">Observable</span>.<span style="color:#74531f;">Repeat</span>(<span style="color:#a31515;">'x'</span>, i)
<span style="color:blue;">select</span> x;</pre>
</p>
<p>
In both of the above examples, I've explicitly declared the type of <code>dest</code> instead of using the <code>var</code> keyword. There's no practical reason to do this; I only did it to make the type clear to you.
</p>
<h3 id="2de8539dd63d4b1699e6656866e9615d">
Left identity <a href="#2de8539dd63d4b1699e6656866e9615d">#</a>
</h3>
<p>
As I've already written time and again, a few test cases don't prove that any of the <a href="/2022/04/11/monad-laws">monad laws</a> hold, but they can help illustrate what they imply. For example, here's an illustration of the left-identity law, written as a parametrized <a href="https://xunit.net/">xUnit.net</a> test:
</p>
<p>
<pre>[<span style="color:#2b91af;">Theory</span>]
[<span style="color:#2b91af;">InlineData</span>(1)]
[<span style="color:#2b91af;">InlineData</span>(2)]
[<span style="color:#2b91af;">InlineData</span>(3)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span> <span style="font-weight:bold;color:#74531f;">LeftIdentity</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">a</span>)
{
<span style="color:#2b91af;">IObservable</span><<span style="color:blue;">char</span>> <span style="font-weight:bold;color:#74531f;">h</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">i</span>) => <span style="color:#2b91af;">Observable</span>.<span style="color:#74531f;">Repeat</span>(<span style="color:#a31515;">'x'</span>, <span style="font-weight:bold;color:#1f377f;">i</span>);
<span style="color:#2b91af;">IList</span><<span style="color:blue;">char</span>> <span style="font-weight:bold;color:#1f377f;">left</span> = <span style="font-weight:bold;color:#8f08c4;">await</span> <span style="color:#2b91af;">Observable</span>.<span style="color:#74531f;">Return</span>(<span style="font-weight:bold;color:#1f377f;">a</span>).<span style="font-weight:bold;color:#74531f;">SelectMany</span>(<span style="font-weight:bold;color:#74531f;">h</span>).<span style="font-weight:bold;color:#74531f;">ToList</span>();
<span style="color:#2b91af;">IList</span><<span style="color:blue;">char</span>> <span style="font-weight:bold;color:#1f377f;">right</span> = <span style="font-weight:bold;color:#8f08c4;">await</span> <span style="font-weight:bold;color:#74531f;">h</span>(<span style="font-weight:bold;color:#1f377f;">a</span>).<span style="font-weight:bold;color:#74531f;">ToList</span>();
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Equal</span>(<span style="font-weight:bold;color:#1f377f;">left</span>, <span style="font-weight:bold;color:#1f377f;">right</span>);
}</pre>
</p>
<p>
Not only does the <a href="https://www.nuget.org/packages/System.Reactive/">System.Reactive</a> library define <em>monadic bind</em> in the form of <code>SelectMany</code>, but also <em>return</em>, with the aptly named <code>Observable.Return</code> function. .NET APIs often forget to do so explicitly, which means that I often have to go hunting for it, or guessing what the developers may have called it. Not here; thank you, Rx team.
</p>
<h3 id="04c23a87f0534e4495ef3d644793e1aa">
Right identity <a href="#04c23a87f0534e4495ef3d644793e1aa">#</a>
</h3>
<p>
In the same spirit, we may write another test to illustrate the right-identity law:
</p>
<p>
<pre>[<span style="color:#2b91af;">Theory</span>]
[<span style="color:#2b91af;">InlineData</span>(<span style="color:#a31515;">"foo"</span>)]
[<span style="color:#2b91af;">InlineData</span>(<span style="color:#a31515;">"bar"</span>)]
[<span style="color:#2b91af;">InlineData</span>(<span style="color:#a31515;">"baz"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span> <span style="font-weight:bold;color:#74531f;">RightIdentity</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">a</span>)
{
<span style="color:#2b91af;">IObservable</span><<span style="color:blue;">char</span>> <span style="font-weight:bold;color:#74531f;">f</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">s</span>) => <span style="font-weight:bold;color:#1f377f;">s</span>.<span style="font-weight:bold;color:#74531f;">ToObservable</span>();
<span style="color:#2b91af;">IObservable</span><<span style="color:blue;">char</span>> <span style="font-weight:bold;color:#1f377f;">m</span> = <span style="font-weight:bold;color:#74531f;">f</span>(<span style="font-weight:bold;color:#1f377f;">a</span>);
<span style="color:#2b91af;">IList</span><<span style="color:blue;">char</span>> <span style="font-weight:bold;color:#1f377f;">left</span> = <span style="font-weight:bold;color:#8f08c4;">await</span> <span style="font-weight:bold;color:#1f377f;">m</span>.<span style="font-weight:bold;color:#74531f;">SelectMany</span>(<span style="color:#2b91af;">Observable</span>.<span style="color:#74531f;">Return</span>).<span style="font-weight:bold;color:#74531f;">ToList</span>();
<span style="color:#2b91af;">IList</span><<span style="color:blue;">char</span>> <span style="font-weight:bold;color:#1f377f;">right</span> = <span style="font-weight:bold;color:#8f08c4;">await</span> <span style="font-weight:bold;color:#1f377f;">m</span>.<span style="font-weight:bold;color:#74531f;">ToList</span>();
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Equal</span>(<span style="font-weight:bold;color:#1f377f;">left</span>, <span style="font-weight:bold;color:#1f377f;">right</span>);
}</pre>
</p>
<p>
In both this and the previous test, you can see that the test has to <code>await</code> the observables in order to verify that the resulting collections are identical. Clearly, if you're instead dealing with infinite streams of data, you can't rely on such a simplifying assumption. For the general case, you must instead turn to other (proof) techniques to convince yourself that the laws hold. That's not my agenda here, so I'll skip that part.
</p>
<h3 id="8b5017e1aa3942ee8693e56eec6c060d">
Associativity <a href="#8b5017e1aa3942ee8693e56eec6c060d">#</a>
</h3>
<p>
Finally, we may illustrate the associativity law like this:
</p>
<p>
<pre>[<span style="color:#2b91af;">Theory</span>]
[<span style="color:#2b91af;">InlineData</span>(<span style="color:#a31515;">"foo"</span>)]
[<span style="color:#2b91af;">InlineData</span>(<span style="color:#a31515;">"123"</span>)]
[<span style="color:#2b91af;">InlineData</span>(<span style="color:#a31515;">"4t2"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span> <span style="font-weight:bold;color:#74531f;">Associativity</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">a</span>)
{
<span style="color:#2b91af;">IObservable</span><<span style="color:blue;">char</span>> <span style="font-weight:bold;color:#74531f;">f</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">s</span>) => <span style="font-weight:bold;color:#1f377f;">s</span>.<span style="font-weight:bold;color:#74531f;">ToObservable</span>();
<span style="color:#2b91af;">IObservable</span><<span style="color:blue;">byte</span>> <span style="font-weight:bold;color:#74531f;">g</span>(<span style="color:blue;">char</span> <span style="font-weight:bold;color:#1f377f;">c</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="color:blue;">byte</span>.<span style="color:#74531f;">TryParse</span>(<span style="font-weight:bold;color:#1f377f;">c</span>.<span style="font-weight:bold;color:#74531f;">ToString</span>(), <span style="color:blue;">out</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">b</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">Observable</span>.<span style="color:#74531f;">Return</span>(<span style="font-weight:bold;color:#1f377f;">b</span>);
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">Observable</span>.<span style="color:#74531f;">Empty</span><<span style="color:blue;">byte</span>>();
}
<span style="color:#2b91af;">IObservable</span><<span style="color:blue;">bool</span>> <span style="font-weight:bold;color:#74531f;">h</span>(<span style="color:blue;">byte</span> <span style="font-weight:bold;color:#1f377f;">b</span>) => <span style="color:#2b91af;">Observable</span>.<span style="color:#74531f;">Repeat</span>(<span style="font-weight:bold;color:#1f377f;">b</span> % 2 == 0, <span style="font-weight:bold;color:#1f377f;">b</span>);
<span style="color:#2b91af;">IObservable</span><<span style="color:blue;">char</span>> <span style="font-weight:bold;color:#1f377f;">m</span> = <span style="font-weight:bold;color:#74531f;">f</span>(<span style="font-weight:bold;color:#1f377f;">a</span>);
<span style="color:#2b91af;">IList</span><<span style="color:blue;">bool</span>> <span style="font-weight:bold;color:#1f377f;">left</span> = <span style="font-weight:bold;color:#8f08c4;">await</span> <span style="font-weight:bold;color:#1f377f;">m</span>.<span style="font-weight:bold;color:#74531f;">SelectMany</span>(<span style="font-weight:bold;color:#74531f;">g</span>).<span style="font-weight:bold;color:#74531f;">SelectMany</span>(<span style="font-weight:bold;color:#74531f;">h</span>).<span style="font-weight:bold;color:#74531f;">ToList</span>();
<span style="color:#2b91af;">IList</span><<span style="color:blue;">bool</span>> <span style="font-weight:bold;color:#1f377f;">right</span> = <span style="font-weight:bold;color:#8f08c4;">await</span> <span style="font-weight:bold;color:#1f377f;">m</span>.<span style="font-weight:bold;color:#74531f;">SelectMany</span>(<span style="font-weight:bold;color:#1f377f;">x</span> => <span style="font-weight:bold;color:#74531f;">g</span>(<span style="font-weight:bold;color:#1f377f;">x</span>).<span style="font-weight:bold;color:#74531f;">SelectMany</span>(<span style="font-weight:bold;color:#74531f;">h</span>)).<span style="font-weight:bold;color:#74531f;">ToList</span>();
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Equal</span>(<span style="font-weight:bold;color:#1f377f;">left</span>, <span style="font-weight:bold;color:#1f377f;">right</span>);
}</pre>
</p>
<p>
This test composes three observable-producing functions in two different ways, to verify that they produce the same values.
</p>
<p>
The first function, <code>f</code>, simply turns a string into an observable stream. The string <code>"foo"</code> becomes the stream of characters <code>'f'</code>, <code>'o'</code>, <code>'o'</code>, and so on.
</p>
<p>
The next function, <code>g</code>, tries to parse the incoming character as a number. I've chosen <code>byte</code> as the data type, since there's no reason trying to parse a value that can, at best, be one of the digits <code>0</code> to <code>9</code> into a full 32-bit integer. A <code>byte</code> is already too large. If the character can be parsed, it's published as a byte value; if not, an empty stream of data is returned. For example, the character stream <code>'f'</code>, <code>'o'</code>, <code>'o'</code> results in three empty streams, whereas the stream <code>4</code>, <code>t</code>, <code>2</code> produces one singleton stream containing the <code>byte</code> <code>4</code>, followed by an empty stream, followed again by a stream containing the single number <code>2</code>.
</p>
<p>
The third and final function, <code>h</code>, turns a number into a stream of Boolean values; <code>true</code> if the number is even, and <code>false</code> if it's odd. The number of values is equal to the number itself. Thus, when composed together, <code>"123"</code> becomes the stream <code>false</code>, <code>true</code>, <code>true</code>, <code>false</code>, <code>false</code>, <code>false</code>. This is true for both the <code>left</code> and the <code>right</code> list, even though they're results of two different compositions.
</p>
<h3 id="e4efefbebe564e30a27720fdd3f65f7f">
Conclusion <a href="#e4efefbebe564e30a27720fdd3f65f7f">#</a>
</h3>
<p>
The point of this article is mostly that monads are commonplace. While you may discover them in your own code, they may also come in a reusable library. If you already know C# LINQ based off <code>IEnumerable<T></code>, parts of Rx will be easy for you to learn. After all, it's the same abstraction, and <em>familiar abstractions make code readable</em>.
</p>
<p>
<strong>Next:</strong> <a href="/2023/01/09/the-io-monad">The IO monad</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Easier encapsulation with static typeshttps://blog.ploeh.dk/2025/02/24/easier-encapsulation-with-static-types2025-02-24T14:05:00+00:00Mark Seemann
<div id="post">
<p>
<em>A metaphor.</em>
</p>
<p>
While I'm still <a href="/2021/08/09/am-i-stuck-in-a-local-maximum">struggling</a> with the notion that <a href="/2024/12/09/implementation-and-usage-mindsets">dynamically typed languages may have compelling advantages</a>, I keep coming back to the benefits of statically typed languages. One such benefit is how it enables the communication of contracts, as I recently discussed in <a href="/2025/01/06/encapsulating-rod-cutting">Encapsulating rod-cutting</a>.
</p>
<p>
As usual, I base my treatment of <a href="/encapsulation-and-solid">encapsulation</a> on my reading of <a href="https://en.wikipedia.org/wiki/Bertrand_Meyer">Bertrand Meyer</a>'s seminal <a href="/ref/oosc">Object-Oriented Software Construction</a>. A major aspect of encapsulation is the explicit establishment of <em>contracts</em>. What is expected of client code before it can invoke an operation (preconditions)? What is guaranteed to be true after the operation completes (postconditions)? And what is always true of a particular data structure (invariants)?
</p>
<p>
Contracts constitute the practical part of encapsulation. A contract can give you a rough sense of how well-encapsulated an API is: The more statements you can utter about the contract, the better encapsulation. You may even be able to take all those assertions about the contract and implement them as property-based tests. In other words, if you can think of many properties to write as tests, the API in question probably has good encapsulation. If, on the other hand, you can't think of a single precondition, postcondition, or invariant, this may indicate that encapsulation is lacking.
</p>
<p>
Contracts are the practical part of encapsulation. The overall notion provides guidance of <em>how</em> to achieve encapsulation. Specific contracts describe <em>what</em> is possible, and <em>how</em> to successfully interact with an API. Clearly, the <em>what</em> and <em>how</em>.
</p>
<p>
They don't, however, explain <em>why</em> encapsulation is valuable.
</p>
<h3 id="1a06496713174bba99d05dad211205c2">
Why encapsulate? <a href="#1a06496713174bba99d05dad211205c2">#</a>
</h3>
<p>
Successful code bases are big. Such a code base rarely <a href="/code-that-fits-in-your-head">fits in your head</a> in its entirety. And the situation is only exacerbated by multiple programmers working concurrently on the code. Even if you knew most of the code base by heart, your team members are changing it, and you aren't going to be aware of all the modifications.
</p>
<p>
Encapsulation offers a solution to this problem. Instead of knowing every detail of the entire code base, encapsulation should enable you to interact with an API (originally, an <em>object</em>) <em>without</em> knowing all the implementation details. This is the raison d'être of contracts. Ideally, knowing the contract and the purpose of an object and its methods should be enough.
</p>
<p>
Imagine that you've designed an API with a strong contract. Is your work done? Not yet. Somehow, you'll have to communicate the contract to all present and future client developers.
</p>
<p>
How do you convey a contract to potential users? I can think of a few ways. Good names are important, but <a href="/2020/11/23/good-names-are-skin-deep">only skin-deep</a>. You can also publish documentation, or use the type system. The following metaphor explores those two alternatives.
</p>
<h3 id="5ad0b635686f436d80a753622d6e4f22">
Doing a puzzle <a href="#5ad0b635686f436d80a753622d6e4f22">#</a>
</h3>
<p>
When I was a boy, I had a puzzle called <em>Das verflixte Hunde-Spiel</em>, which roughly translates to <em>the confounded dog game</em>. I've <a href="/2024/10/03/das-verflixte-hunde-spiel">previously described the game and an algorithm for solving it</a>, but that's not my concern here. Rather, I'd like to discuss how one might abstract the information carried by each tile.
</p>
<p>
<img src="/content/binary/hunde-spiel.jpg" alt="A picture of the box of the puzzle, together with the tiles spread out in unordered fashion.">
</p>
<p>
As the picture suggests, the game consists of nine square tiles, each with two dog heads and two tails. The objective of the puzzle is to lay all nine tiles in a three-by-three grid such that all the heads fit the opposing tails. The dogs come in four different colour patterns, and each head must fit a tail of the same pattern.
</p>
<p>
It turns out that there are umpteen variations of this kind of puzzle. This one has cartoon dogs, but you can find similar games with frogs, cola bottles, <a href="https://en.wikipedia.org/wiki/Playing_card_suit">playing card suits</a>, trains, ladybirds, fast food, flowers, baseball players, owls, etc. This suggests that a <em>generalization</em> may exist. Perhaps an abstraction, even.
</p>
<blockquote>
<p>
"Abstraction is the elimination of the irrelevant and the amplification of the essential"
</p>
<footer><cite>Robert C. Martin, <a href="/ref/doocautbm">Designing Object-Oriented C++ Applications Using The Booch Method</a>, ch. 00</cite></footer>
</blockquote>
<p>
How to eliminate the irrelevant and amplify the essential of a tile?
</p>
<p>
To recapitulate, a single tile looks like this:
</p>
<p>
<img src="/content/binary/hundespiel-tile1.jpg" width="200" alt="One of the tiles of the Hunde-Spiel.">
</p>
<p>
In a sense, we may regard most of the information on such a tile as 'implementation details'. In a code metaphor, imagine looking at a tile like this as being equivalent to looking at the source code of a method or function (i.e. API). That's not the <em>essence</em> we need to correctly assemble the puzzle.
</p>
<p>
Imagine that you have to lay down the tiles according to <a href="/2024/10/03/das-verflixte-hunde-spiel">a known solution</a>. Since you already know the solution, this task only involves locating and placing each of the nine tiles. In this case, there are only nine tiles, each with four possible rotations, so if you already know what you're looking for, that is, of course, a tractable endeavour.
</p>
<p>
Now imagine that you'd like to undertake putting together the tiles <em>without</em> having to navigate by the full information content of each tile. In programming, we often need to do this. We have to identify objects that are likely to perform some subtask for us, and we have to figure out how to interact with such an object to achieve our goals. Preferably, we'd like to be able to do this <em>without</em> having to read all the source code of the candidate object. <em>Encapsulation</em> promises that this should be possible.
</p>
<h3 id="f87bc2bd17584b63808bb0581eeb1523">
The backside of the tiles <a href="#f87bc2bd17584b63808bb0581eeb1523">#</a>
</h3>
<p>
If we want to eliminate the irrelevant, we have to hide the information on each tile. As a first step, consider what happens if we flip the tiles around so that we only see their backs.
</p>
<p>
<img src="/content/binary/empty-tile-backside.png" alt="An empty square, symbolizing the backside of a tile." width="200">
</p>
<p>
Obviously, each backside is entirely devoid of information, which means that we're now flying blind. Even if we know how to solve the puzzle, our only recourse is to blindly pick and rotate each of the nine tiles. As the <a href="/2024/10/03/das-verflixte-hunde-spiel">previous article</a> calculated, when picking at random, the odds of arriving at any valid solution is 1 to 5,945,425,920. Not surprisingly, total absence of information doesn't work.
</p>
<p>
We already knew that, because, while we want to eliminate the irrelevant, we also must amplify the essential. Thus, we need to figure out what that might be.
</p>
<p>
Perhaps we could write the essential information on the back of each tile. In the metaphor, this would correspond to writing documentation for an API.
</p>
<h3 id="86c6695ffa884ec28be560309779ab98">
Documentation <a href="#86c6695ffa884ec28be560309779ab98">#</a>
</h3>
<p>
To continue the metaphor, I asked various acquaintances to each 'document' a title. I deliberately only gave them the instruction that they should enable me to assemble the puzzle based on what was on the back of each tile. Some asked for additional directions, as to format, etc., but I refused to give any. People document code in various different ways, and I wanted to capture similar variation. Let's review some of the documentation I received.
</p>
<p>
<img src="/content/binary/tile2-doc.jpg" alt="The back of a tile, with written documentation and some arrows." width="200">
</p>
<p>
Since I asked around among acquaintances, all respondents were Danes, and some chose to write the documentation in Danish, as is the case with this one.
</p>
<p>
Unless you have an explicit, enforced policy, you might run into a similar situation in software documentation. I've seen more than one example of software documentation written in Danish, simply because the programmer who wrote it didn't consider anything else than his or her native language. I'm sure most Europeans have similar experiences.
</p>
<p>
The text on the tile says, from the top and clockwise:
</p>
<ul>
<li>light brown dog/light snout/dark ears</li>
<li>dark brown, white/snout</li>
<li>orange tail/brown spots on/back</li>
<li>orange tail/orange back</li>
</ul>
<p>
Notice the disregard for capitalization rules or punctuation, a tendency among programmers that I've commented upon in <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>.
</p>
<p>
In addition to the text, the back of the above tile also includes six arrows. Four of them ought to be self-evident, but can you figure out what the two larger arrows indicate?
</p>
<p>
It turns out that the person writing this piece of documentation eventually realized that the description should be mirrored, because it was on the backside of the tile. To be fair to that person, I'd asked everyone to write with a permanent marker or pen, so correcting a mistake involved either a 'hack' like the two arrows, or starting over from scratch.
</p>
<p>
Let's look at some more 'documentation'. Another tile looks like this:
</p>
<p>
<img src="/content/binary/tile3-doc.jpg" alt="The back of a tile, with some fairly cryptic symbols in the corners." width="200">
</p>
<p>
At first glance, I thought those symbols were Greek letters, but once you look at it, you start to realize what's going on. In the upper right corner, you see a stylized back and tail. Likewise, the lower left corner has a stylized face in the form of a smiley. The lines then indicate that the sides indicated by a corner has a head or tail.
</p>
<p>
Additionally, each side is encoded with a letter. I'll leave it as an exercise for the reader to figure out what <em>G</em> and <em>B</em> indicate, but also notice the two examples of a modified <em>R</em>. The one to the right indicates <em>red with spots</em>, and the other one uses the minus symbol to indicate <em>red without spots</em>.
</p>
<p>
On the one hand, this example does an admirable job of eliminating the irrelevant, but you may also find that it errs on the side of terseness. At the very least, it demands of the reader that he or she is already intimately familiar with the overall problem domain. You have to know the game well enough to be able to figure out that <em>R-</em> probably means <em>red without spots</em>.
</p>
<p>
Had this been software documentation, we might have been less than happy with this level of information. It may meet formal requirements, but is perhaps too idiosyncratic or esoteric.
</p>
<p>
Be that as it may, it's also possible to err on the other side.
</p>
<p>
<img src="/content/binary/Tile5-doc.jpg" alt="The back of a tile, this time with an almost one-to-one replica of the picture on the front." width="200">
</p>
<p>
In this example, the person writing the documentation essentially copied and described every detail on the front of the tile. Having no colours available, the person instead chose to describe in words the colour of each dog. Metaphorically speaking, the documentation replicates the implementation. It doesn't eliminate any irrelevant detail, and thereby it also fails to amplify the essential.
</p>
<p>
Here's another interesting example:
</p>
<p>
<img src="/content/binary/tile8-doc.jpg" alt="The back of a tile, with text along all four sides." width="200">
</p>
<p>
The text is in Danish. From the top clockwise, it says:
</p>
<ul>
<li>dark brown dog with blue collar</li>
<li>light brown dog with red collar</li>
<li>brown dog with small spots on back</li>
<li>Brown dog with big spots on back</li>
</ul>
<p>
Notice how the person writing this were aware that a tile has no natural up or down. Instead, each side is described with letters facing up when that side is up. You have to rotate the documentation in order to read all four sides. You may find that impractical, but I actually consider that to represent something essential about each tile. To me, this is positive.
</p>
<p>
Even so, an important piece of information is missing. It doesn't say which sides have heads, and which ones have tails.
</p>
<p>
Finally, here's one that, in my eyes, amplifies the essential and eliminates the irrelevant:
</p>
<p>
<img src="/content/binary/tile6-doc.jpg" alt="The back of a tile, with text along all four sides." width="200">
</p>
<p>
Like the previous example, you have to rotate the documentation in order to read all four sides, but the text is much terser. If you ask me, <em>Grey head</em>, <em>Burnt umber tail</em>, <em>Brown tail</em>, and <em>Spotted head</em> amplifies the essential and eliminates everything else.
</p>
<p>
Notice, however, how inconsistent the documentation is. People chose various different ways in their attempt to communicate what they found important. Some people inadvertently left out essential information. Other people provided too much information. Some people never came through, so in a few cases, documentation was entirely absent. And finally, I've hinted at this already, most people forgot to 'mirror' the information, but a few did remember.
</p>
<p>
All of this has direct counterparts in software documentation. The level of detail you get from documentation varies greatly, and often, the information that I actually care about is absent: Can I call this method with a negative number? Does the input string need to be formatted in a particular way? Does the method ever return null? Which exceptions may it throw?
</p>
<p>
I'm not against documentation, but it has limitations. As far as I can tell, though, that's your only option if you're working in a dynamically typed language.
</p>
<h3 id="dfb325aa94c6490b8ecbb36107d8be32">
Static types with limited expression <a href="#dfb325aa94c6490b8ecbb36107d8be32">#</a>
</h3>
<p>
Can you think of a way to constrain which puzzle pieces fit together with other pieces?
</p>
<p>
That's how <a href="https://en.wikipedia.org/wiki/Jigsaw_puzzle">jigsaw puzzles</a> work. As a first attempt, we may try to cut out out the pieces like this:
</p>
<p>
<img src="/content/binary/one-jigsaw-tile.png" alt="An anonymous jigsaw puzzle piece" width="226">
</p>
<p>
This does help some, because when presented with the subtask of having to locate and find the next piece, at least we can't rotate the next piece in four different positions. Instead, we have only two options. Perhaps we'll choose to lay down the next piece like this:
</p>
<p>
<img src="/content/binary/two-jigsaw-tiles.png" alt="Two anonymous jigsaw pieces fit together" width="426">
</p>
<p>
You may also decide to rotate the right piece ninety degrees clockwise, but those are your only two rotation options.
</p>
<p>
We may decide to encode the shape of the pieces so that, say, the tabs indicate heads and the indentations indicate tails. This, at least, prevents us from joining head with head, or tail with tail.
</p>
<p>
This strikes me as an apt metaphor for <a href="https://en.wikipedia.org/wiki/C_(programming_language)">C</a>, or how many programmers use the type systems of C# or <a href="https://www.java.com/">Java</a>. It does prevent some easily preventable mistakes, but the types still don't carry enough information to enable you to identify exactly the pieces you need.
</p>
<h3 id="2196a52cce924b1ab0998f6a840a3e6c">
More expressive static types <a href="#2196a52cce924b1ab0998f6a840a3e6c">#</a>
</h3>
<p>
Static type systems come in various forms, and some are more expressive than others. To be honest, C#'s type system does come with good expressive powers, although it tends to <a href="/2019/12/16/zone-of-ceremony">require much ceremony</a>. As far as I can tell, Java's type system is on par with C#. Let's assume that we either take the time to jump through the hoops that make these type systems expressive, or that we're using a language with a more expressive type system.
</p>
<p>
In the puzzle metaphor, we may decide to give a tile this shape:
</p>
<p>
<img src="/content/binary/strongly-typed-tile.png" alt="A jigsaw puzzle piece with four distinct tab and indentation shapes." width="225">
</p>
<p>
Such a shape encodes all the information that we need, because each tab or indentation has a unique shape. We may not even have to remember exactly what a square indentation represents. If we're presented with the above tile and asked to lay down a compatible tile, we have to find one with a square tab.
</p>
<p>
<img src="/content/binary/strongly-typed-tile-pair.png" alt="Two jigsaw puzzle pieces with distinct tab and indentation shapes, arranged so that they fit together." width="425">
</p>
<p>
Encoding the essential information into tile <em>shapes</em> enables us to not only prevent mistakes, but identify the correct composition of all the tiles.
</p>
<p>
<img src="/content/binary/strongly-typed-completed-puzzle.png" alt="Completed puzzle with nine distinctly shaped pieces." width="625">
</p>
<p>
For years, I've thought about static types as <em>shapes</em> of objects or functions. For practical purposes, <a href="/2022/08/22/can-types-replace-validation">static types can't express everything</a> an operation may do, but I find it useful to use a good type system to my advantage.
</p>
<h3 id="5342222c3624472ab9b638aff7ecc65e">
Code examples <a href="#5342222c3624472ab9b638aff7ecc65e">#</a>
</h3>
<p>
You may find this a nice metaphor, and still fail to see how it translates to actual code. I'm not going to go into details here, but rather point to existing articles that give some introductions.
</p>
<p>
One place to start is to refactor <a href="/2015/01/19/from-primitive-obsession-to-domain-modelling">from primitive obsession to domain models</a>. Just wrapping a string or an integer in a <a href="https://www.hillelwayne.com/post/constructive/">predicative type</a> helps communicate the purpose and constraints of a data type. Consider a constructor like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">Reservation</span>(
Guid id,
DateTime at,
Email email,
Name name,
NaturalNumber quantity)</pre>
</p>
<p>
While hardly sophisticated, it already communicates much information about preconditions for creating a <code>Reservation</code> object. Some of the constituent types (<code>Guid</code> and <code>DateTime</code>) are built in, so likely well-known to any developer working on a relevant code base. If you're wondering whether you can create a reservation with a negative <code>quantity</code>, the types already answer that.
</p>
<p>
Languages with native support for <a href="https://en.wikipedia.org/wiki/Tagged_union">sum types</a> let you easily model mutually exclusive, heterogeneous closed type hierarchies, as shown in <a href="/2016/11/28/easy-domain-modelling-with-types">this example</a>:
</p>
<p>
<pre><span style="color:blue;">type</span> <span style="color:teal;">PaymentService</span> = { Name : <span style="color:teal;">string</span>; Action : <span style="color:teal;">string</span> }
<span style="color:blue;">type</span> <span style="color:teal;">PaymentType</span> =
| <span style="color:navy;">Individual</span> <span style="color:blue;">of</span> <span style="color:teal;">PaymentService</span>
| <span style="color:navy;">Parent</span> <span style="color:blue;">of</span> <span style="color:teal;">PaymentService</span>
| <span style="color:navy;">Child</span> <span style="color:blue;">of</span> originalTransactionKey : <span style="color:teal;">string</span> * paymentService : <span style="color:teal;">PaymentService</span></pre>
</p>
<p>
And if your language doesn't natively support sum types, you can <a href="/2018/06/25/visitor-as-a-sum-type">emulate them with the Visitor design pattern</a>.
</p>
<p>
You can, in fact, do some <a href="/2025/02/03/modelling-data-relationships-with-c-types">quite sophisticated tricks even with .NET's type system</a>.
</p>
<h3 id="783cb14749b64aafac68d97a0096b349">
Conclusion <a href="#783cb14749b64aafac68d97a0096b349">#</a>
</h3>
<p>
People often argue about static types with the assumption that their main use is to prevent mistakes. They can help do that, too, but I also find static types an excellent communication medium. The benefits of using a static type system to model contracts is that, when a type system is already part of a language, it's a consistent, formalized framework for communication. Instead of inconsistent and idiosyncratic documentation, you can embed much information about a contract in the types of an API.
</p>
<p>
And indeed, not only can the types help communicate pre- and postconditions. The type checker <em>also</em> prevents errors.
</p>
<p>
A sufficiently sophisticated type system carries more information that most people realize. When I write <a href="https://www.haskell.org/">Haskell</a> code, I often need to look up a function that I need. Contrary to other languages, I don't try to search for a function by guessing what name it might have. Rather, the <a href="https://hoogle.haskell.org/">Hoogle</a> search engine enables you to search for a function <em>by type</em>.
</p>
<p>
Types are shapes, and shapes are like outlines of objects. Used well, they enable you to eliminate the irrelevant, and amplify the essential information about an API.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.In defence of multiple WiPhttps://blog.ploeh.dk/2025/02/17/in-defence-of-multiple-wip2025-02-17T08:52:00+00:00Mark Seemann
<div id="post">
<p>
<em>Programming isn't like factory work.</em>
</p>
<p>
I was recently stuck on a programming problem. Specifically, <a href="https://adventofcode.com/2024/day/19">part two of an Advent of Code puzzle</a>, if you must know. As is my routine, I went for a run, which always helps to get unstuck. During the few hours away from the keyboard, I'd had a new idea. When I returned to the computer, I had my new algorithm implemented in about an hour, and it calculated the correct result in sub-second time.
</p>
<p>
I'm not writing this to brag. On the contrary, I suck at Advent of Code (which is a major <a href="/2020/01/13/on-doing-katas">reason that I do it</a>). The point is rather that programming is fundamentally non-linear in effort. Not only are some algorithms orders of magnitudes faster than other algorithms, but it's also the case that the amount of time you put into solving a problem doesn't always correlate with the outcome.
</p>
<p>
Sometimes, the most productive way to solve a problem is to let it rest and <em>go do something else</em>.
</p>
<h3 id="007252aeb07e4b0c9c514185a7f9699f">
One-piece flow <a href="#007252aeb07e4b0c9c514185a7f9699f">#</a>
</h3>
<p>
Doesn't this conflict with the ideal of one-piece flow? That is, that you should only have one piece of work in progress (WiP).
</p>
<p>
Yes, it does.
</p>
<p>
It's not that I don't understand basic queue theory, haven't read <a href="/ref/the-goal">The Goal</a>, or that I'm unaware of the <a href="https://youtu.be/Yqi9Gwt-OEA?si=9qLo77p3iJZKBwcx">compelling explanations given by, among other people, Henrik Kniberg</a>. I do, myself, <a href="/2023/01/23/agilean">lean (pun intended) towards lean software development</a>.
</p>
<p>
I only offer the following as a counterpoint to other voices. As I've described before, when I seem to disagree with the mainstream view on certain topics, the explanation may rather be that <a href="/2021/08/09/am-i-stuck-in-a-local-maximum">I'm concerned with a different problem than other people are</a>. If your problem is a dysfunctional organization where everyone have dozens of tasks in progress, nothing ever gets done because it's considered more important to start new work items than completing ongoing work, where 'utilization' is at 100% because of 'efficiency', then yes, I'd also recommend limiting WiP.
</p>
<p>
The idea in one-piece flow is that you're only working on one thing at a time.
</p>
<p>
<img src="/content/binary/one-piece-flow2.png" alt="One-piece flow illustrated as a series of boxes in a row.">
</p>
<p>
Perhaps you can divide the task into subtasks, but you're only supposed to start something new when you're done with the current job. Compared to the alternative of starting a lot concurrent tasks in order to deal with wait times in the system, I agree with the argument that this is often better. One-piece flow often prompts you to take a good, hard look at where and how delays occur in your process.
</p>
<p>
Even so, I find it ironic that most of 'the Lean squad' is so busy blaming <a href="https://en.wikipedia.org/wiki/Scientific_management">Taylorism</a> for everything that's wrong with many software development organizations, only to go advocate for another management style rooted in factory work.
</p>
<p>
Programming isn't manufacturing.
</p>
<h3 id="28c7eb05dd1342d7a3140aae5618db4f">
Urgent or important <a href="#28c7eb05dd1342d7a3140aae5618db4f">#</a>
</h3>
<p>
As <a href="https://en.wikipedia.org/wiki/Dwight_D._Eisenhower">Eisenhower</a> quoted an unnamed college president:
</p>
<blockquote>
<p>
"I have two kinds of problems, the urgent and the important. The urgent are not important, and the important are never urgent."
</p>
</blockquote>
<p>
It's hard to overstate how liberating it can be to ignore the urgent and focus on the important. Over decades, I keep returning to the realization that you often reach the best solutions to software problems by letting them stew.
</p>
<p>
I'm sure I've already told the following story elsewhere, but it bears repeating. Back in 2009 I started an open-source project called <a href="https://github.com/AutoFixture/AutoFixture">AutoFixture</a> and also managed to convince my then-employer, <a href="https://www.safewhere.com/">Safewhere</a>, to use it in our code base.
</p>
<p>
Maintaining or evolving AutoFixture wasn't my job, though. It was a work-related hobby, so nothing related to it was urgent. When in the office, I worked on Safewhere code, but biking back and forth between home and work, I thought about AutoFixture problems. Often, these problems would be informed by how we used it in Safewhere. My point is that the problems I was thinking about were real problems that I'd encountered in my day job, not just something I'd dreamt up for fun.
</p>
<p>
I was mostly thinking about API designs. Given that this was ideally a general-purpose open-source project, I didn't want to solve narrow problems with specific solutions. I wanted to find general designs that would address not only the immediate concerns, but also other problems that I had yet to discover.
</p>
<p>
Many an evening I spent trying out an idea I'd had on my bicycle. Often, it turned out that the idea wouldn't work. While that might be, in one sense, dismaying, on the other hand, it only meant that I'd <a href="https://quoteinvestigator.com/2012/07/31/edison-lot-results/">learned about yet another way that didn't work</a>.
</p>
<p>
Because there was no pressure to release a new version of AutoFixture, I could take the time to get it right. (After a fashion. You may disagree with the notion that AutoFixture is well-designed. I designed its APIs to the best of my abilities during the decade I lead the project. And when I discovered property-based testing, I <a href="https://github.com/AutoFixture/AutoFixture/issues/703">passed on the reins</a>.)
</p>
<h3 id="ae07d9022e714de689fc1900d68a866d">
Get there earlier by starting later <a href="#ae07d9022e714de689fc1900d68a866d">#</a>
</h3>
<p>
There's a 1944 science fiction short story by <a href="https://en.wikipedia.org/wiki/A._E._van_Vogt">A. E. van Vogt</a> called <a href="https://en.wikipedia.org/wiki/Far_Centaurus">Far Centaurus</a> that I'm now going to spoil.
</p>
<p>
In it, four astronauts embark on a 500-year journey to <a href="https://en.wikipedia.org/wiki/Alpha_Centauri">Alpha Centauri</a>, using <a href="https://en.wikipedia.org/wiki/Suspended_animation_in_fiction">suspended animation</a>. When they arrive, they discover that the system is long settled, from Earth.
</p>
<p>
During their 500 years en route, humans invented faster space travel. Even though later generations started later, they arrived earlier. They discovered a better way to get from a to b.
</p>
<p>
Compared to one-piece flow, we may illustrate this metaphor like this:
</p>
<p>
<img src="/content/binary/one-piece-flow-vs-thinking.png" alt="A row of boxes above another row of thinner boxes that are more spread out, but indicates an earlier finish.">
</p>
<p>
When presented with a problem, we don't start working on it right away. Or, we do, but the work we do is <em>thinking</em> rather than typing. We may even do some prototyping at that stage, but if no good solution presents itself, we put away the problem for a while.
</p>
<p>
We may return to the problem from time to time, and what may happen is that we realize that there's a much better, quicker way of accomplishing the goal than we first believed (as, again, <a href="/2025/02/10/geographic-hulls">recently happened to me</a>). Once we have that realization, we may initiate the work, and it it may even turn out that we're done earlier than if we'd immediately started hacking at the problem.
</p>
<p>
By starting later, we've learned more. Like much knowledge work, software development is a profoundly non-linear endeavour. You may find a new way of doing things that are orders of magnitudes faster than what you originally had in mind. Not only in terms of <a href="https://en.wikipedia.org/wiki/Big_O_notation">big-O notation</a>, but also in terms of implementation effort.
</p>
<p>
When doing Advent of Code, I've repeatedly been struck how the more efficient algorithm is often also simpler to implement.
</p>
<h3 id="95ed780f2c3b496383f1af9b68aa2b1b">
Multiple WiP <a href="#95ed780f2c3b496383f1af9b68aa2b1b">#</a>
</h3>
<p>
As the above figure suggests, you're probably not going to spend all your time thinking or doing. The figure has plenty of air in between the activities.
</p>
<p>
This may seem wasteful to efficiency nerds, but again: Knowledge work isn't factory work.
</p>
<p>
You can't think by command. If you've ever tried meditating, you'll know just how hard it is to empty your mind, or in any way control what goes on in your head. Focus on your breath. Indeed, and a few minutes later you snap out of a reverie about what to make for dinner, only to discover that you were able to focus on your breath for all of ten seconds.
</p>
<p>
As I already alluded to in the introduction, I regularly exercise during the work day. I also go grocery shopping, or do other chores. I've consistently found that I solve all hard problems when I'm <em>away</em> from the computer, not while I'm at it. I think <a href="https://en.wikipedia.org/wiki/Rich_Hickey">Rich Hickey</a> calls it hammock-driven development.
</p>
<p>
When presented with an interesting problem, I usually can't help thinking about it. What often happens, however, is that I'm mulling over multiple interesting problems during my day.
</p>
<p>
<img src="/content/binary/multiple-thinking-processes-interleaved.png" alt="Same diagram as above, but now with more boxes representing thinking activities interleaved among each other.">
</p>
<p>
You could say that I actually have multiple pieces of work in progress. Some of them lie dormant for a long time, only to occasionally pop up and be put away again. Even so, I've had problems that I'd essentially given up on, only to resurface later when I'd learned a sufficient amount of new things. At that time, then, I sometimes realize that what I previously thought was impossible is actually quite simple.
</p>
<p>
It's amazing what you can accomplish when you focus on the things that are important, rather than the things that are urgent.
</p>
<h3 id="216b1e2dee074e249ad8044a6ad88a97">
One size doesn't fit all <a href="#216b1e2dee074e249ad8044a6ad88a97">#</a>
</h3>
<p>
How do I know that this will always work? How can I be sure that an orders-of-magnitude insight will occur if I just wait long enough?
</p>
<p>
There are no guarantees. My point is rather that this happens with surprising regularity. To me, at least.
</p>
<p>
Your software organization may include tasks that represent truly menial work. Yet, if you have too much of that, why haven't you automated it away?
</p>
<p>
Still, I'm not going to tell anyone how to run their development team. I'm only pointing out a weakness with the common one-piece narrative: It treats work as mostly a result of effort, and as if it were somehow interchangeable with other development tasks.
</p>
<p>
Most crucially, it models the amount of time required to complete a task as being independent of time: Whether you start a job today or in a month, it'll take <em>x</em> days to complete.
</p>
<p>
What if, instead, the effort was an function of time (as well as other factors)? The later you start, the simpler the work might be.
</p>
<p>
This of course doesn't happen automatically. Even if I have all my good ideas <em>away</em> from the keyboard, I still spend quite a bit of time <em>at</em> the keyboard. You need to work enough with a problem before inspiration can strike.
</p>
<p>
I'd recommend more slack time, more walks in the park, more grocery shopping, more doing the dishes.
</p>
<h3 id="ea87f59e4b014bb4861e71d8b6264394">
Conclusion <a href="#ea87f59e4b014bb4861e71d8b6264394">#</a>
</h3>
<p>
Programming is knowledge work. We may even consider it creative work. And while you can nurture creativity, you can't force it.
</p>
<p>
I find it useful to have multiple things going on at the same time, because concurrent tasks often cross-pollinate. What I learn from engaging with one task may produce a significant insight into another, otherwise unrelated problem. The lack of urgency, the lack of deadlines, foster this approach to problem-solving.
</p>
<p>
But I'm not going to tell you how to run your software development process. If you want to treat it as an assembly line, that's your decision.
</p>
<p>
You'll probably get work done anyway. Months of work can save days of thinking.
</p>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Geographic hullshttps://blog.ploeh.dk/2025/02/10/geographic-hulls2025-02-10T07:14:00+00:00Mark Seemann
<div id="post">
<p>
<em>Seven lines of Python code.</em>
</p>
<p>
Can you tell what this is?
</p>
<p>
<img src="/content/binary/dk-hull.svg" alt="Convex hulls of each of the major Danish islands, as well as Jutland.">
</p>
<p>
I showed this to both my wife and my son, and they immediately recognized it for what it is. On the other hand, they're also both culturally primed for it.
</p>
<p>
After all, it's a map of <a href="https://en.wikipedia.org/wiki/Denmark">Denmark</a>, although I've transformed each of the major islands, as well as the peninsula of <a href="https://en.wikipedia.org/wiki/Jutland">Jutland</a> to their <a href="https://en.wikipedia.org/wiki/Convex_hull">convex hulls</a>.
</p>
<p>
Here's the original map I used for the transformation:
</p>
<p>
<img src="/content/binary/dk-outline.svg" alt="Map of Denmark.">
</p>
<p>
I had a reason to do this, having to do with <a href="https://en.wikipedia.org/wiki/Coastline_paradox">the coastline paradox</a>, but my underlying motivation isn't really that important for this article, since I rather want to discuss how I did it.
</p>
<p>
The short answer is that I used <a href="https://www.python.org/">Python</a>. You have to admit that Python has a fabulous ecosystem for all kinds of data crunching, including visualizations. I'd actually geared up to implementing a <a href="https://en.wikipedia.org/wiki/Graham_scan">Graham scan</a> myself, but that turned out not to be necessary.
</p>
<h3 id="2c60300f58f645d7b45cad678b076ad4">
GeoPandas to the rescue <a href="#2c60300f58f645d7b45cad678b076ad4">#</a>
</h3>
<p>
I'm a novice Python programmer, but I've used <a href="https://matplotlib.org/">Matplotlib</a> before to visualize data, so I found it natural to start with a few web searches to figure out how to get to grips with the problem.
</p>
<p>
I quickly found <a href="https://geopandas.org/">GeoPandas</a>, which works on top of Matplotlib to render and visualize geographical data.
</p>
<p>
My next problem was to find a data set for Denmark, which <a href="https://simplemaps.com/gis/country/dk#all">I found on SimpleMaps</a>. I chose to download and work with the <a href="https://geojson.org/">GeoJSON</a> format.
</p>
<p>
Originally, I'd envisioned implementing a Graham scan myself. After all, <a href="/2015/10/19/visual-value-verification">I'd done that before in F#</a>, and it's a compelling <a href="/2020/01/13/on-doing-katas">exercise</a>. It turned out, however, that this function is already available in the GeoPandas API.
</p>
<p>
I had trouble separating the data file's multi-part geometry into multiple single geometries. This meant that when I tried to find the convex hull, I got the hull of the entire map, instead of each island individually. The solution was to use the <a href="https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.explode.html">explode</a> function.
</p>
<p>
Once I figured that out, it turned out that all I needed was seven lines of Python code, including imports and a blank line:
</p>
<p>
<pre><span style="color:blue;">import</span> geopandas <span style="color:blue;">as</span> gpd
<span style="color:blue;">import</span> matplotlib.pyplot <span style="color:blue;">as</span> plt
<span style="color:blue;">map</span> = gpd.read_file(<span style="color:#a31515;">'dk.json'</span>)
<span style="color:blue;">map</span>.explode().boundary.plot(edgecolor=<span style="color:#a31515;">'green'</span>).set_axis_off()
<span style="color:blue;">map</span>.explode().convex_hull.boundary.plot().set_axis_off()
plt.show()</pre>
</p>
<p>
In this script, I display the unmodified <code>map</code> before the convex hulls. This is only an artefact of my process. As I've already admitted, this is new ground for me, and I initially wanted to verify that I could even read in and display a GeoJSON file.
</p>
<p>
For both maps I use the <code>boundary</code> property to draw only the outline of the map, rather than filled polygons.
</p>
<h3 id="2b5f206417b542f386497d43d1bde322">
Enveloping the map parts <a href="#2b5f206417b542f386497d43d1bde322">#</a>
</h3>
<p>
Mostly for fun, but also to illustrate what a convex hull is, we can layer the two visualizations in a single image. In order to do that, a few changes to the code are required.
</p>
<p>
<pre><span style="color:blue;">import</span> geopandas <span style="color:blue;">as</span> gpd
<span style="color:blue;">import</span> matplotlib.pyplot <span style="color:blue;">as</span> plt
<span style="color:blue;">map</span> = gpd.read_file(<span style="color:#a31515;">'dk.json'</span>)
_, ax = plt.subplots()
<span style="color:blue;">map</span>.explode().boundary.plot(ax=ax, edgecolor=<span style="color:#a31515;">'green'</span>).set_axis_off()
<span style="color:blue;">map</span>.explode().convex_hull.boundary.plot(ax=ax).set_axis_off()
plt.show()</pre>
</p>
<p>
This little script now produces this image:
</p>
<p>
<img src="/content/binary/dk-outlines-in-hulls.svg" alt="Map of Denmark, with each island, as well as the Jutland peninsula, enveloped in their convex hulls.">
</p>
<p>
Those readers who know Danish geography may wonder what's going on with <a href="https://en.wikipedia.org/wiki/Falster">Falster</a>. Since it's the sixth-largest Island in Denmark, shouldn't it have its own convex hull? Yes, it should, yet here it's connected to <a href="https://en.wikipedia.org/wiki/Zealand">Zealand</a>. Granted, two bridges connect the two, but that's hardly sufficient to consider them one island. There are plenty of bridges in Denmark, so according to that criterion, most of Denmark is connected. In fact, on the above map, only <a href="https://en.wikipedia.org/wiki/Bornholm">Bornholm</a>, <a href="https://en.wikipedia.org/wiki/Sams%C3%B8">Samsø</a>, <a href="https://en.wikipedia.org/wiki/L%C3%A6s%C3%B8">Læsø</a>, <a href="https://en.wikipedia.org/wiki/%C3%86r%C3%B8">Ærø</a>, <a href="https://en.wikipedia.org/wiki/Fan%C3%B8">Fanø</a>, and <a href="https://en.wikipedia.org/wiki/Anholt_(Denmark)">Anholt</a> would then remain as islands.
</p>
<p>
Rather, this only highlights the quality, or lack thereof, of the data set. I don't want to complain about a free resource, and the data set has served my purposes well enough. I mostly point this out in case readers were puzzled about this. In fact, a similar case applies to <a href="https://en.wikipedia.org/wiki/North_Jutlandic_Island">Nørrejyske Ø</a>, which in the GeoJSON map is connected to Jutland at <a href="https://en.wikipedia.org/wiki/Aalborg">Aalborg</a>. Yes, there's a bridge there. No, that shouldn't qualify as a land connection.
</p>
<h3 id="92812a33519c47d3a90b20a447ddb7dc">
Other countries <a href="#92812a33519c47d3a90b20a447ddb7dc">#</a>
</h3>
<p>
As you may have noticed, apart from the hard-coded file name, nothing in the code is specific to Denmark. This means that you can play around with other countries. Here I've downloaded various GeoJSON data sets from <a href="https://geojson-maps.kyd.au/">GeoJSON Maps of the globe</a>, which seems to be using the same source data set that the Danish data set is also based on. In other words, if I download the file for Denmark from that site, it looks exactly as above.
</p>
<p>
Can you guess which country this is?
</p>
<p>
<img src="/content/binary/gr-outline.svg" alt="Convex hull of the Greek mainland, and hulls of many Greek islands." title="Convex hull of the Greek mainland, and hulls of many Greek islands.">
</p>
<p>
Or this one?
</p>
<p>
<img src="/content/binary/jp-outline.svg" alt="Convex hull of each larger island of Japan." title="Convex hull of each larger island of Japan.">
</p>
<p>
While this is all good fun, not all countries have interesting convex hull:
</p>
<p>
<img src="/content/binary/ch-outline.svg" alt="Convex hull of Switzerland." title="Convex hull of Switzerland.">
</p>
<p>
While I'll let you have a bit of fun guessing, you can hover your cursor over each image to reveal which country it is.
</p>
<h3 id="ad89b110fca040ebb37db7788ddcf6d6">
Conclusion <a href="#ad89b110fca040ebb37db7788ddcf6d6">#</a>
</h3>
<p>
Your default position when working with Python should probably be: <em>There's already a library for that.</em>
</p>
<p>
In this article, I've described how I wanted to show Denmark, but only the convex hull of each of the larger islands, as well as the Jutland peninsula. Of course, there was already a library for that, so that I only needed to write seven lines of code to produce the figures I wanted.
</p>
<p>
Granted, it took a few hours of research to put those seven lines together, but I'm only a novice Python programmer, and I'm sure an old hand could do it much faster.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Modelling data relationships with C# typeshttps://blog.ploeh.dk/2025/02/03/modelling-data-relationships-with-c-types2025-02-03T07:24:00+00:00Mark Seemann
<div id="post">
<p>
<em>A C# example implementation of Ghosts of Departed Proofs.</em>
</p>
<p>
This article continues where <a href="/2025/01/20/modelling-data-relationships-with-f-types">Modelling data relationships with F# types</a> left off. It ports the <a href="https://fsharp.org/">F#</a> example code to C#. If you don't read F# source code, you may instead want to read <a href="/2024/12/23/implementing-rod-cutting">Implementing rod-cutting</a> to get a sense of the problem being addressed.
</p>
<p>
I'm going to assume that you've read enough of the previous articles to get a sense of the example, but in short, this article examines if it's possible to use the type system to model data relationships. Specifically, we have methods that operate on a collection and a number. The precondition for calling these methods is that the number is a valid (one-based) index into the collection.
</p>
<p>
While you would typically implement such a precondition with a <a href="https://en.wikipedia.org/wiki/Guard_(computer_science)">Guard Clause</a> and communicate it via documentation, you can also use the <em>Ghosts of Departed Proofs</em> technique to instead leverage the type system. Please see <a href="/2025/01/20/modelling-data-relationships-with-f-types">the previous article for an overview</a>.
</p>
<p>
That said, I'll repeat one point here: The purpose of these articles is to showcase a technique, using a simple example to make it, I hope, sufficiently clear what's going on. All this machinery is hardly warranted for an example as simple as this. All of this is a demonstration, not a recommendation.
</p>
<h3 id="24f5e14404424695aca0a2c7e049b0a5">
Size proofs <a href="#24f5e14404424695aca0a2c7e049b0a5">#</a>
</h3>
<p>
As in the previous article, we may start by defining what a 'size proof' looks like. In C#, it may <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatically</a> be a class with an <code>internal</code> constructor.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">T</span>>
{
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Value { <span style="color:blue;">get</span>; }
<span style="color:blue;">internal</span> <span style="color:#2b91af;">Size</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">value</span>)
{
Value = <span style="font-weight:bold;color:#1f377f;">value</span>;
}
<span style="color:green;">// Also override ToString, Equals, and GetHashCode...</span>
}</pre>
</p>
<p>
Since the constructor is <code>internal</code> it means that client code can't create <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">T</span>></code> instances, and thereby client code can't decide a concrete type for the phantom type <code>T</code>.
</p>
<h3 id="0f22adf378da484b8dc85e372f82a0d6">
Issuing size proofs <a href="#0f22adf378da484b8dc85e372f82a0d6">#</a>
</h3>
<p>
How may client code create <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">T</span>></code> objects? It may ask a <code><span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">T</span>></code> object to issue a proof:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">T</span>>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:blue;">int</span>> Prices { <span style="color:blue;">get</span>; }
<span style="color:blue;">internal</span> <span style="color:#2b91af;">PriceList</span>(<span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:blue;">int</span>> <span style="font-weight:bold;color:#1f377f;">prices</span>)
{
Prices = <span style="font-weight:bold;color:#1f377f;">prices</span>;
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">T</span>>? <span style="font-weight:bold;color:#74531f;">TryCreateSize</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">candidate</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (0 < <span style="font-weight:bold;color:#1f377f;">candidate</span> && <span style="font-weight:bold;color:#1f377f;">candidate</span> <= Prices.Count)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">T</span>>(<span style="font-weight:bold;color:#1f377f;">candidate</span>);
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
}
<span style="color:green;">// More members go here...</span></pre>
</p>
<p>
If the requested <code>candidate</code> integer represents a valid (one-indexed) position in the <code><span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">T</span>></code> object, the return value is a <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">T</span>></code> object that contains the <code>candidate</code>. If, on the other hand, the <code>candidate</code> isn't in the valid range, no object is returned.
</p>
<p>
Since both <code><span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">T</span>></code> and <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">T</span>></code> classes are immutable, once a 'size proof' has been issued, it remains valid. As I've previously argued, <a href="/2024/06/12/simpler-encapsulation-with-immutability">immutability makes encapsulation simpler</a>.
</p>
<p>
This kind of API does, however, look like it's <a href="https://en.wikipedia.org/wiki/Turtles_all_the_way_down">turtles all the way down</a>. After all, the <code><span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">T</span>></code> constructor is also <code>internal</code>. Now the question becomes: How does client code create <code><span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">T</span>></code> objects?
</p>
<p>
The short answer is that it doesn't. Instead, it'll be given an object by the library API. You'll see how that works later, but first, let's review what such an API enables us to express.
</p>
<h3 id="079355423c6d4a7d8daaeffc08661adb">
Proof-based Cut API <a href="#079355423c6d4a7d8daaeffc08661adb">#</a>
</h3>
<p>
As described in <a href="/2025/01/06/encapsulating-rod-cutting">Encapsulating rod-cutting</a>, returning a collection of 'cut' objects better communicates postconditions than returning a tuple of two arrays, as <a href="/2024/12/23/implementing-rod-cutting">the original algorithm suggested</a>. In other words, we're going to need a type for that.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">record</span> <span style="color:#2b91af;">Cut</span><<span style="color:#2b91af;">T</span>>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">Revenue</span>, <span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">Size</span>);</pre>
</p>
<p>
In this case we can get by with a simple <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/builtin-types/record">record type</a>. Since one of the properties is of the type <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">T</span>></code>, client code can't create <code><span style="color:#2b91af;">Cut</span><<span style="color:#2b91af;">T</span>></code> instances, just like it can't create <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">T</span>></code> or <code><span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">T</span>></code> objects. This is what we want, because a <code><span style="color:#2b91af;">Cut</span><<span style="color:#2b91af;">T</span>></code> object encapsulates a proof that it's valid, related to the original collection of prices.
</p>
<p>
We can now define the <code>Cut</code> method as an instance method on <code><span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">T</span>></code>. Notice how all the <code>T</code> type arguments line up. As input, the <code>Cut</code> method only accepts <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">T</span>></code> proofs issued by a compatible price list. This is enforced at compile time, not at run time.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">Cut</span><<span style="color:#2b91af;">T</span>>> <span style="font-weight:bold;color:#74531f;">Cut</span>(<span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">n</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">p</span> = Prices.<span style="font-weight:bold;color:#74531f;">Prepend</span>(0).<span style="font-weight:bold;color:#74531f;">ToArray</span>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">r</span> = <span style="color:blue;">new</span> <span style="color:blue;">int</span>[<span style="font-weight:bold;color:#1f377f;">n</span>.Value + 1];
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">s</span> = <span style="color:blue;">new</span> <span style="color:blue;">int</span>[<span style="font-weight:bold;color:#1f377f;">n</span>.Value + 1];
<span style="font-weight:bold;color:#1f377f;">r</span>[0] = 0;
<span style="font-weight:bold;color:#8f08c4;">for</span> (<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">j</span> = 1; <span style="font-weight:bold;color:#1f377f;">j</span> <= <span style="font-weight:bold;color:#1f377f;">n</span>.Value; <span style="font-weight:bold;color:#1f377f;">j</span>++)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">q</span> = <span style="color:blue;">int</span>.MinValue;
<span style="font-weight:bold;color:#8f08c4;">for</span> (<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">i</span> = 1; <span style="font-weight:bold;color:#1f377f;">i</span> <= <span style="font-weight:bold;color:#1f377f;">j</span>; <span style="font-weight:bold;color:#1f377f;">i</span>++)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">candidate</span> = <span style="font-weight:bold;color:#1f377f;">p</span>[<span style="font-weight:bold;color:#1f377f;">i</span>] + <span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">j</span> - <span style="font-weight:bold;color:#1f377f;">i</span>];
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">q</span> < <span style="font-weight:bold;color:#1f377f;">candidate</span>)
{
<span style="font-weight:bold;color:#1f377f;">q</span> = <span style="font-weight:bold;color:#1f377f;">candidate</span>;
<span style="font-weight:bold;color:#1f377f;">s</span>[<span style="font-weight:bold;color:#1f377f;">j</span>] = <span style="font-weight:bold;color:#1f377f;">i</span>;
}
}
<span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">j</span>] = <span style="font-weight:bold;color:#1f377f;">q</span>;
}
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">cuts</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">List</span><<span style="color:#2b91af;">Cut</span><<span style="color:#2b91af;">T</span>>>();
<span style="font-weight:bold;color:#8f08c4;">for</span> (<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">i</span> = 1; <span style="font-weight:bold;color:#1f377f;">i</span> <= <span style="font-weight:bold;color:#1f377f;">n</span>.Value; <span style="font-weight:bold;color:#1f377f;">i</span>++)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">revenue</span> = <span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">i</span>];
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">size</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">T</span>>(<span style="font-weight:bold;color:#1f377f;">s</span>[<span style="font-weight:bold;color:#1f377f;">i</span>]);
<span style="font-weight:bold;color:#1f377f;">cuts</span>.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">Cut</span><<span style="color:#2b91af;">T</span>>(<span style="font-weight:bold;color:#1f377f;">revenue</span>, <span style="font-weight:bold;color:#1f377f;">size</span>));
}
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">cuts</span>;
}</pre>
</p>
<p>
For good measure, I'm showing the entire implementation, but you only need to pay attention to the method signature. The point is that <code>n</code> is constrained <em>by the type system</em> to be in a valid range.
</p>
<h3 id="ce55a32904434bd99c7f42d896fa7102">
Proof-based Solve API <a href="#ce55a32904434bd99c7f42d896fa7102">#</a>
</h3>
<p>
The same technique can be applied to the <code>Solve</code> method. Just align the <code>T</code>s.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">T</span>>> <span style="font-weight:bold;color:#74531f;">Solve</span>(<span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">n</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">cuts</span> = <span style="font-weight:bold;color:#74531f;">Cut</span>(<span style="font-weight:bold;color:#1f377f;">n</span>).<span style="font-weight:bold;color:#74531f;">ToArray</span>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sizes</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">List</span><<span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">T</span>>>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">size</span> = <span style="font-weight:bold;color:#1f377f;">n</span>;
<span style="font-weight:bold;color:#8f08c4;">while</span> (<span style="font-weight:bold;color:#1f377f;">size</span>.Value > 0)
{
<span style="font-weight:bold;color:#1f377f;">sizes</span>.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="font-weight:bold;color:#1f377f;">cuts</span>[<span style="font-weight:bold;color:#1f377f;">size</span>.Value - 1].Size);
<span style="font-weight:bold;color:#1f377f;">size</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">T</span>>(<span style="font-weight:bold;color:#1f377f;">size</span>.Value - <span style="font-weight:bold;color:#1f377f;">cuts</span>[<span style="font-weight:bold;color:#1f377f;">size</span>.Value - 1].Size.Value);
}
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">sizes</span>;
}</pre>
</p>
<p>
This is another instance method on <code><span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">T</span>></code>, which is where <code>T</code> is defined.
</p>
<h3 id="57994590468c40fa87ba24f0d158720b">
Proof-based revenue API <a href="#57994590468c40fa87ba24f0d158720b">#</a>
</h3>
<p>
Finally, we may also implement a method to calculate the revenue from a given sequence of cuts.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">int</span> <span style="font-weight:bold;color:#74531f;">CalculateRevenue</span>(<span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">T</span>>> <span style="font-weight:bold;color:#1f377f;">cuts</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">arr</span> = Prices.<span style="font-weight:bold;color:#74531f;">ToArray</span>();
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">cuts</span>.<span style="font-weight:bold;color:#74531f;">Sum</span>(<span style="font-weight:bold;color:#1f377f;">c</span> => <span style="font-weight:bold;color:#1f377f;">arr</span>[<span style="font-weight:bold;color:#1f377f;">c</span>.Value - 1]);
}</pre>
</p>
<p>
Not surprisingly, I hope, <code>CalculateRevenue</code> is another instance method on <code><span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">T</span>></code>. The <code>cuts</code> will typically come from a call to <code>Solve</code>, but it's entirely possible for client code to create an ad-hoc collection of <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">T</span>></code> objects by repeatedly calling <code>TryCreateSize</code>.
</p>
<h3 id="97eba480d25e4ed8a5f7b8b0efb3a528">
Running client code <a href="#97eba480d25e4ed8a5f7b8b0efb3a528">#</a>
</h3>
<p>
How does client code use this API? It calls an <code>Accept</code> method with an implementation of this interface:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IPriceListVisitor</span><<span style="color:#2b91af;">TResult</span>>
{
<span style="color:#2b91af;">TResult</span> <span style="font-weight:bold;color:#74531f;">Visit</span><<span style="color:#2b91af;">T</span>>(<span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">priceList</span>);
}</pre>
</p>
<p>
Why 'visitor'? This doesn't quite look like a <a href="https://en.wikipedia.org/wiki/Visitor_pattern">Visitor</a>, and yet, it still does.
</p>
<p>
Imagine, for a moment, that we could enumerate all types that <code>T</code> could inhabit.
</p>
<p>
<pre><span style="color:#2b91af;">TResult</span> <span style="font-weight:bold;color:#74531f;">Visit</span>(<span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">Type1</span>> <span style="font-weight:bold;color:#1f377f;">priceList</span>);
<span style="color:#2b91af;">TResult</span> <span style="font-weight:bold;color:#74531f;">Visit</span>(<span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">Type2</span>> <span style="font-weight:bold;color:#1f377f;">priceList</span>);
<span style="color:#2b91af;">TResult</span> <span style="font-weight:bold;color:#74531f;">Visit</span>(<span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">Type3</span>> <span style="font-weight:bold;color:#1f377f;">priceList</span>);
<span style="color:green;">// ⋮</span>
<span style="color:#2b91af;">TResult</span> <span style="font-weight:bold;color:#74531f;">Visit</span>(<span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">TypeN</span>> <span style="font-weight:bold;color:#1f377f;">priceList</span>);</pre>
</p>
<p>
Clearly we can't do that, since <code>T</code> is infinite, but <em>if</em> we could, the interface would look like a Visitor.
</p>
<p>
I find the situation sufficiently similar to name the interface with the <em>Visitor</em> suffix. Now we only need a class with an <code>Accept</code> method.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">RodCutter</span>(<span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:blue;">int</span>> <span style="font-weight:bold;color:#1f377f;">prices</span>)
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">TResult</span> <span style="font-weight:bold;color:#74531f;">Accept</span><<span style="color:#2b91af;">TResult</span>>(<span style="color:#2b91af;">IPriceListVisitor</span><<span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">visitor</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">visitor</span>.<span style="font-weight:bold;color:#74531f;">Visit</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">PriceList</span><<span style="color:blue;">object</span>>(<span style="font-weight:bold;color:#1f377f;">prices</span>));
}
}</pre>
</p>
<p>
Client code may create a <code>RodCutter</code> object, as well as one or more classes that implement <code><span style="color:#2b91af;">IPriceListVisitor</span><<span style="color:#2b91af;">TResult</span>></code>, and in this way interact with the library API.
</p>
<p>
Let's see some examples. We'll start with the original <a href="/ref/clrs">CLRS</a> example, written as an <a href="https://xunit.net/">xUnit.net</a> test.
</p>
<p>
<pre>[<span style="color:#2b91af;">Fact</span>]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">ClrsExample</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">RodCutter</span>([1, 5, 8, 9, 10, 17, 17, 20, 24, 30]);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = <span style="font-weight:bold;color:#1f377f;">sut</span>.<span style="font-weight:bold;color:#74531f;">Accept</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">CutRodVisitor</span>(10));
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = <span style="color:blue;">new</span>[] {
( 1, 1),
( 5, 2),
( 8, 3),
(10, 2),
(13, 2),
(17, 6),
(18, 1),
(22, 2),
(25, 3),
(30, 10)
};
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Equal</span>(<span style="font-weight:bold;color:#1f377f;">expected</span>, <span style="font-weight:bold;color:#1f377f;">actual</span>);
}</pre>
</p>
<p>
<code>CutRodVisitor</code> is a nested class that implements the <code><span style="color:#2b91af;">IPriceListVisitor</span><<span style="color:#2b91af;">TResult</span>></code> interface:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">CutRodVisitor</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">i</span>) :
<span style="color:#2b91af;">IPriceListVisitor</span><<span style="color:#2b91af;">IReadOnlyCollection</span><(<span style="color:blue;">int</span>, <span style="color:blue;">int</span>)>>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">IReadOnlyCollection</span><(<span style="color:blue;">int</span>, <span style="color:blue;">int</span>)> <span style="font-weight:bold;color:#74531f;">Visit</span><<span style="color:#2b91af;">T</span>>(<span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">priceList</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">n</span> = <span style="font-weight:bold;color:#1f377f;">priceList</span>.<span style="font-weight:bold;color:#74531f;">TryCreateSize</span>(<span style="font-weight:bold;color:#1f377f;">i</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">n</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> [];
<span style="font-weight:bold;color:#8f08c4;">else</span>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">cuts</span> = <span style="font-weight:bold;color:#1f377f;">priceList</span>.<span style="font-weight:bold;color:#74531f;">Cut</span>(<span style="font-weight:bold;color:#1f377f;">n</span>);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">cuts</span>.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">c</span> => (<span style="font-weight:bold;color:#1f377f;">c</span>.Revenue, <span style="font-weight:bold;color:#1f377f;">c</span>.Size.Value)).<span style="font-weight:bold;color:#74531f;">ToArray</span>();
}
}
}</pre>
</p>
<p>
The <code>CutRodVisitor</code> class returns a collection of tuples. Why doesn't it just return <code>cuts</code> directly?
</p>
<p>
It can't, because it wouldn't type-check. Think about it for a moment. When you implement the interface, you need to pick a type for <code>TResult</code>. You can't, however, declare it to implement <code><span style="color:#2b91af;">IPriceListVisitor</span><<span style="color:#2b91af;">Cut</span><<span style="color:#2b91af;">T</span>>></code> (where <code>T</code> would be the <code>T</code> from <code><span style="font-weight:bold;color:#74531f;">Visit</span><<span style="color:#2b91af;">T</span>></code>), because at the class level, you don't know what <code>T</code> is. Neither does the compiler.
</p>
<p>
Your <code><span style="font-weight:bold;color:#74531f;">Visit</span><<span style="color:#2b91af;">T</span>></code> implementation must work for <em>any</em> <code>T</code>.
</p>
<h3 id="39d9ed0b81044e5a8f7ac990cd457fad">
Preventing misalignment <a href="#39d9ed0b81044e5a8f7ac990cd457fad">#</a>
</h3>
<p>
Finally, here's a demonstration of how the phantom type prevents confusing or mixing up two (or more) different price lists. Consider this rather artificial example:
</p>
<p>
<pre>[<span style="color:#2b91af;">Fact</span>]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">NestTwoSolutions</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">RodCutter</span>([1, 2, 2]);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">inner</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">RodCutter</span>([1]);
(<span style="color:blue;">int</span>, <span style="color:blue;">int</span>)? <span style="font-weight:bold;color:#1f377f;">actual</span> = <span style="font-weight:bold;color:#1f377f;">sut</span>.<span style="font-weight:bold;color:#74531f;">Accept</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">NestedRevenueVisitor</span>(<span style="font-weight:bold;color:#1f377f;">inner</span>));
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Equal</span>((3, 1), <span style="font-weight:bold;color:#1f377f;">actual</span>);
}</pre>
</p>
<p>
This unit test creates two price arrays and calls <code>Accept</code> on one of them (the 'outer' one, you may say), while passing the <code>inner</code> one to the Visitor, which at first glance just looks like this:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">NestedRevenueVisitor</span>(<span style="color:#2b91af;">RodCutter</span> <span style="font-weight:bold;color:#1f377f;">inner</span>) :
<span style="color:#2b91af;">IPriceListVisitor</span><(<span style="color:blue;">int</span>, <span style="color:blue;">int</span>)?>
{
<span style="color:blue;">public</span> (<span style="color:blue;">int</span>, <span style="color:blue;">int</span>)? <span style="font-weight:bold;color:#74531f;">Visit</span><<span style="color:#2b91af;">T</span>>(<span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">priceList</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">inner</span>.<span style="font-weight:bold;color:#74531f;">Accept</span>(<span style="color:blue;">new</span> InnerRevenueVisitor<<span style="color:#2b91af;">T</span>>(<span style="font-weight:bold;color:#1f377f;">priceList</span>));
}
<span style="color:green;">// Inner visitor goes here...</span>
}</pre>
</p>
<p>
Notice that it only delegates to yet another Visitor, passing the 'outer' <code>priceList</code> as a constructor parameter to the next Visitor. The purpose of this is to bring two <code><span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">T</span>></code> objects in scope at the same time. This will enable us to examine what happens if we make a programming mistake.
</p>
<p>
First, however, here's the proper, working implementation <em>without</em> mistakes:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">InnerRevenueVisitor</span><<span style="color:#2b91af;">T</span>>(<span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">priceList1</span>) : <span style="color:#2b91af;">IPriceListVisitor</span><(<span style="color:blue;">int</span>, <span style="color:blue;">int</span>)?>
{
<span style="color:blue;">public</span> (<span style="color:blue;">int</span>, <span style="color:blue;">int</span>)? <span style="font-weight:bold;color:#74531f;">Visit</span><<span style="color:#2b91af;">T1</span>>(<span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">T1</span>> <span style="font-weight:bold;color:#1f377f;">priceList2</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">n1</span> = <span style="font-weight:bold;color:#1f377f;">priceList1</span>.<span style="font-weight:bold;color:#74531f;">TryCreateSize</span>(3);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">n1</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="font-weight:bold;color:#8f08c4;">else</span>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">cuts1</span> = <span style="font-weight:bold;color:#1f377f;">priceList1</span>.<span style="font-weight:bold;color:#74531f;">Solve</span>(<span style="font-weight:bold;color:#1f377f;">n1</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">revenue1</span> = <span style="font-weight:bold;color:#1f377f;">priceList1</span>.<span style="font-weight:bold;color:#74531f;">CalculateRevenue</span>(<span style="font-weight:bold;color:#1f377f;">cuts1</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">n2</span> = <span style="font-weight:bold;color:#1f377f;">priceList2</span>.<span style="font-weight:bold;color:#74531f;">TryCreateSize</span>(1);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">n2</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="font-weight:bold;color:#8f08c4;">else</span>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">cuts2</span> = <span style="font-weight:bold;color:#1f377f;">priceList2</span>.<span style="font-weight:bold;color:#74531f;">Solve</span>(<span style="font-weight:bold;color:#1f377f;">n2</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">revenue2</span> = <span style="font-weight:bold;color:#1f377f;">priceList2</span>.<span style="font-weight:bold;color:#74531f;">CalculateRevenue</span>(<span style="font-weight:bold;color:#1f377f;">cuts2</span>);
<span style="font-weight:bold;color:#8f08c4;">return</span> (<span style="font-weight:bold;color:#1f377f;">revenue1</span>, <span style="font-weight:bold;color:#1f377f;">revenue2</span>);
}
}
}
}</pre>
</p>
<p>
Notice how both <code>priceList1</code> and <code>priceList2</code> are now both in scope. So far, they're <em>not</em> mixed up, so the <code>Visit</code> implementation queries first one and then another for the optimal revenue. If all works well (which it does), it returns a tuple with the two revenues.
</p>
<p>
What happens if I make a mistake? What if, for example, I write <code>priceList2.Solve(n1)</code>? It shouldn't be possible to use <code>n1</code>, which was issued by <code>pricelist1</code>, with <code>priceList2</code>. And indeed this isn't possible. With that mistake, the code doesn't compile. The compiler error is:
</p>
<blockquote>
<p>
Argument 1: cannot convert from 'Ploeh.Samples.RodCutting.Size<T>' to 'Ploeh.Samples.RodCutting.Size<T1>'
</p>
</blockquote>
<p>
When you look at the types, that makes sense. After all, there's no guarantee that <code>T</code> is equal to <code>T1</code>.
</p>
<p>
You'll run into similar problems if you mix up the two 'contexts' in other ways. The code doesn't compile. Which is what you want.
</p>
<h3 id="1075e4f3ff6548cf8806ae9f419910b1">
Conclusion <a href="#1075e4f3ff6548cf8806ae9f419910b1">#</a>
</h3>
<p>
This article demonstrates how to use the <em>Ghosts of Departed Proofs</em> technique in C#. In some ways, I find that it comes across as more idiomatic in C# than in F#. I think this is because rank-2 polymorphism is only possible in F# when using its object-oriented features. Since F# is a functional-first programming language, it seems a little out of place there, whereas it looks more at home in C#.
</p>
<p>
Perhaps I should have designed the F# code to make use of objects to the same degree as I've done here in C#.
</p>
<p>
I think I actually like how the C# API turned out, although having to define and implement a class every time you need to supply a Visitor may feel a bit cumbersome. Even so, <a href="/2024/05/13/gratification">developer experience shouldn't be exclusively about saving a few keystrokes</a>. After all, <a href="/2018/09/17/typing-is-not-a-programming-bottleneck">typing isn't a bottleneck</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Dependency inversion without inversion of controlhttps://blog.ploeh.dk/2025/01/27/dependency-inversion-without-inversion-of-control2025-01-27T13:02:00+00:00Mark Seemann
<div id="post">
<p>
<em>Here, have a sandwich.</em>
</p>
<p>
For years I've been thinking about the <a href="https://en.wikipedia.org/wiki/Dependency_inversion_principle">Dependency Inversion Principle</a> (DIP) and <a href="https://en.wikipedia.org/wiki/Inversion_of_control">Inversion of Control</a> (IoC) as two different things. While there's some overlap, they're not the same. To make matters more confusing, most people seem to consider IoC and Dependency Injection (DI) as interchangeable synonyms. As <a href="https://blogs.cuttingedge.it/">Steven van Deursen</a> and I explain in <a href="/dippp">DIPPP</a>, they're not the same.
</p>
<p>
I recently found myself in a discussion on Stack Overflow where I was <a href="https://stackoverflow.com/a/78796558/126014">trying to untangle that confusion</a> for a fellow Stack Overflow user. While I hadn't included a pedagogical Venn diagram, perhaps I should have.
</p>
<p>
<img src="/content/binary/dip-ioc-venn.png" alt="Venn diagram with DIP to the left and IoC to the right. The intersection is substantial, but not overwhelming.">
</p>
<p>
This figure suggests that the sets are of equal size, which doesn't have to be the case. The point, rather, is that while the intersection may be substantial, each <a href="https://en.wikipedia.org/wiki/Complement_(set_theory)">relative complement</a> is not only not empty, but richly populated.
</p>
<p>
In this article, I'm not going to spend more time on the complement IoC without DIP. Rather, I'll expand on how to apply the DIP without IoC.
</p>
<h3 id="a51dd63df2474002b81bee601c86eb8d">
Appeal to authority? <a href="#a51dd63df2474002b81bee601c86eb8d">#</a>
</h3>
<p>
While writing the Stack Overflow answer, I'd tried to keep citations to 'original sources'. Sometimes, when a problem is technically concrete, it makes sense for me to link to one of my own works, but I've found that when the discussion is more abstract, that rarely helps convincing people. That's understandable. I'd also be sceptical if I were to run into some rando that immediately proceeded to argue a case by linking to his or her own blog.
</p>
<p>
This strategy, however elicited this response:
</p>
<blockquote>
<p>
"Are you aware of any DIP-compliant example from Robert Martin that does not utilize polymorphism? The <a href="https://web.archive.org/web/20110714224327/http://www.objectmentor.com/resources/articles/dip.pdf">original paper</a> along with some of Martin's <a href="https://www.youtube.com/watch?v=1XRTvj__ZPY">lectures</a> certainly seem to imply the DIP requires polymorphism."
</p>
<p>
<footer><cite><a href="https://stackoverflow.com/questions/78796242/does-the-dependency-inversion-principle-apply-within-application-layers/78796558#comment138931008_78796558">comment</a>, jaco0646</footer>
</blockquote>
<p>
That's a fair question, and once I started looking for such examples, I had to admit that I couldn't find any. Eventually, I asked <a href="https://en.wikipedia.org/wiki/Robert_C._Martin">Robert C. Martin</a> directly.
</p>
<blockquote>
<p>
"Does the DIP require polymorphism? I argue that it does't, but I've managed to entangle myself in a debate where original sources count. Could you help us out?"
</p>
<footer><cite><a href="https://x.com/ploeh/status/1817141831500542202">Tweet</a>, me</cite></footer>
</blockquote>
<p>
To which he answered in much detail, but of which the essential response was:
</p>
<blockquote>
<p>
"The DIP does not require polymorphism. Polymorphism is just one of several mechanisms to achieve dependency inversion."
</p>
<footer><cite><a href="https://x.com/unclebobmartin/status/1817263979774816379">Tweet</a>, Robert C. Martin</cite></footer>
</blockquote>
<p>
While this was the answer I'd hoped for, it's easy to dismiss this exchange as an <a href="https://en.wikipedia.org/wiki/Argument_from_authority">appeal to authority</a>. On the other hand, as Carl Sagan said, "If you wish to make an apple pie from scratch, you must first invent the universe," which obviously isn't practical, and so we instead <a href="https://en.wikipedia.org/wiki/Standing_on_the_shoulders_of_giants">stand on the shoulders of giants</a>.
</p>
<p>
In this context, asking Robert C. Martin was relevant because he's the original author of works that introduce the DIP. It's reasonable to assume that he has relevant insights on the topic.
</p>
<p>
It's not that I can't argue my case independently, but rather that I didn't think that the comments section of a Stack Overflow question was the right place to do that. This blog, on the other hand, is mine, I can use all the words I'd like, and I'll now proceed to do so.
</p>
<h3 id="bfe353d609ea45e48ffb0efe939f4c01">
Kernel of the idea <a href="#bfe353d609ea45e48ffb0efe939f4c01">#</a>
</h3>
<p>
All of Robert C. Martin's treatments of the DIP that I've found starts with the general idea and then proceeds to show examples of implementing it in code. As I've already mentioned, I haven't found a text of Martin's where the <em>example</em> doesn't utilize IoC.
</p>
<p>
The central idea, however, says nothing about IoC.
</p>
<blockquote>
<p>
"A. High-level modules should not depend on low-level modules. Both should depend on abstractions.
</p>
<p>
"B. Abstractions should not depend on details. Details should depend upon abstractions."
</p>
<footer><cite><a href="/ref/appp">APPP</a>, Robert C. Martin</cite></footer>
</blockquote>
<p>
While only Martin knows what he actually meant, I can attempt a congenial reading of the work. What is most important here, I think, is that the word <em>abstraction</em> doesn't have to denote a particular kind of language construct, such as an abstract class or interface. Rather,
</p>
<blockquote>
<p>
"Abstraction is <em>the elimination of the irrelevant and the amplification of the essential.</em>"
</p>
<footer><cite><a href="/ref/doocautbm">Designing Object-Oriented C++ Applications Using The Booch Method</a>, ch. 00, Robert C. Martin, his emphasis</cite></footer>
</blockquote>
<p>
The same connotation of <em>abstraction</em> seems to apply to the definition of the DIP. If, for example, we imagine that we consider a Domain Model, the business logic, as the essence we'd like to amplify, we may rightfully consider a particular persistence mechanism a detail. Even more concretely, if you want to take restaurant reservations via a <a href="https://en.wikipedia.org/wiki/REST">REST</a> API, the <a href="/2020/01/27/the-maitre-d-kata">business rules that determine whether or not you can accept a reservation</a> shouldn't depend on a particular database technology.
</p>
<p>
While code examples are useful, there's evidently a risk that if the examples are too much alike, it may constrain readers' thinking. All Martin's examples seem to involve IoC, but for years now, I've mostly been interested in the Dependency Inversion <em>Principle</em> itself. Abstractions should not depend on details. That's the kernel of the idea.
</p>
<h3 id="cd07b6543fda4612bb0cce38c098cfbc">
IoC isn't functional <a href="#cd07b6543fda4612bb0cce38c098cfbc">#</a>
</h3>
<p>
My thinking was probably helped along by exploring functional programming (FP). A natural question arises when one embarks on learning FP: How does IoC fit with FP? The short answer, it turns out, is that <a href="/2017/01/27/from-dependency-injection-to-dependency-rejection">it doesn't</a>. DI, at least, <a href="/2017/01/30/partial-application-is-dependency-injection">makes everything impure</a>.
</p>
<p>
Does this mean, then, that FP precludes the DIP? That would be a problem, since the notion that abstractions shouldn't depend on details seems important. Doing FP shouldn't entail giving up on important architectural rules. And fortunately, it turns out not being the case. Quite the contrary, a consistent application of <a href="/2018/11/19/functional-architecture-a-definition">functional architecture</a> seems to <a href="/2016/03/18/functional-architecture-is-ports-and-adapters">lead to Ports and Adapters</a>. It'd go against the grain of FP to have a Domain Model query a relational database. Even if abstracted away, a database exists outside the process space of an application, and is inherently impure. IoC doesn't address that concern.
</p>
<p>
In FP, there are other ways to address such problems.
</p>
<h3 id="53b158bd928e4883992c578be989baf5">
DIP sandwich <a href="#53b158bd928e4883992c578be989baf5">#</a>
</h3>
<p>
While you can always model <a href="/2017/07/10/pure-interactions">pure interactions</a> with free <a href="/2022/03/28/monads">monads</a>, it's usually not necessary. In most cases, an <a href="/2020/03/02/impureim-sandwich">Impureim Sandwich</a> suffices.
</p>
<p>
The sample code base that accompanies <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> takes a similar approach. While it's <a href="/2024/12/16/a-restaurant-sandwich">possible to refactor it to an explicit Impureim Sandwich</a>, the code presented in the book follows the kindred notion of <a href="https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell">Functional Core, Imperative Shell</a>.
</p>
<p>
The code base implements an online restaurant reservation system, and the Domain Model is a set of data structures and pure functions that operate on them. The central and most complex function is the <code>WillAccept</code> method <a href="/2020/11/30/name-by-role">shown here</a>. It decides whether to accept a reservation request, based on restaurant table configurations, existing reservations, business rules related to seating durations, etc. It does this without depending on details. It doesn't know about databases, the application's configuration system, or how to send emails in case it decides to accept a reservation.
</p>
<p>
All of this is handled by the application's HTTP Model, using the demarcation shown in <a href="/2023/09/04/decomposing-ctfiyhs-sample-code-base">Decomposing CTFiYH's sample code base</a>. The HTTP Model defines Controllers, <a href="https://en.wikipedia.org/wiki/Data_transfer_object">Data Transfer Objects</a> (DTOs), middleware, and other building blocks required to drive the actual REST API.
</p>
<p>
The <code>ReservationsController</code> class contains, among many other methods, this helper method that illustrates the point:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">async</span> Task<ActionResult> <span style="font-weight:bold;color:#74531f;">TryCreate</span>(Restaurant <span style="font-weight:bold;color:#1f377f;">restaurant</span>, Reservation <span style="font-weight:bold;color:#1f377f;">reservation</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">scope</span> = <span style="color:blue;">new</span> TransactionScope(TransactionScopeAsyncFlowOption.Enabled);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">reservations</span> = <span style="color:blue;">await</span> Repository.ReadReservations(restaurant.Id, reservation.At);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">now</span> = Clock.GetCurrentDateTime();
<span style="font-weight:bold;color:#8f08c4;">if</span> (!restaurant.MaitreD.WillAccept(now, reservations, reservation))
<span style="font-weight:bold;color:#8f08c4;">return</span> NoTables500InternalServerError();
<span style="color:blue;">await</span> Repository.Create(restaurant.Id, reservation);
scope.Complete();
<span style="font-weight:bold;color:#8f08c4;">return</span> Reservation201Created(restaurant.Id, reservation);
}</pre>
</p>
<p>
Notice the call to <code>restaurant.MaitreD.WillAccept</code>. The Controller gathers all data required to call the <a href="https://en.wikipedia.org/wiki/Pure_function">pure function</a> and subsequently acts on the return value. This keeps the abstraction (<code>MaitreD</code>) free of implementation details.
</p>
<h3 id="cb60ef67d81e4754b959ff93d213ed80">
DI addressing another concern <a href="#cb60ef67d81e4754b959ff93d213ed80">#</a>
</h3>
<p>
You may be wondering what exactly <code>Repository</code> is. If you've bought the book, you also have access to the sample code base, in which case you'd be able to look it up. It turns out that it's an injected dependency. While this may seem a bit contradictory, it also gives me the opportunity to discuss that this isn't an all-or-nothing proposition.
</p>
<p>
Consider the architecture diagram from <a href="/2023/09/04/decomposing-ctfiyhs-sample-code-base">Decomposing CTFiYH's sample code base</a>, repeated here for convenience:
</p>
<p>
<img src="/content/binary/ctfiyh-decomposed-architecture.png" alt="Ports-and-adapters architecture diagram.">
</p>
<p>
In the context of this diagram, the DIP is being applied in two different ways. From the outer Web Host to the HTTP Model, the decomposed code base uses ordinary DI. From the HTTP Model to the Domain Model, there's no inversion of control, but rather the important essence of the DIP: That the Domain Model doesn't depend on any of the details that surrounds it. Even so, the dependencies remain inverted, as indicated by the arrows.
</p>
<p>
What little DI that's left remains to support automated testing. Injecting <code>Repository</code> and a few other <a href="https://stackoverflow.blog/2022/01/03/favor-real-dependencies-for-unit-testing/">real dependencies</a> enabled me to test-drive the externally visible behaviour of the system with <a href="/2019/02/18/from-interaction-based-to-state-based-testing">state-based</a> <a href="/2021/01/25/self-hosted-integration-tests-in-aspnet">self-hosted tests</a>.
</p>
<p>
If I hadn't cared about that, I could have hard-coded the <code>SqlReservationsRepository</code> object directly into the Controller and merged the Web Host with the HTTP Module. The Web Host is quite minimal anyway. This would, of course, have meant that the DIP no longer applied at that level, but even so, the interaction between the HTTP Model and the Domain Model would still follow the principle.
</p>
<p>
One important point about the above figure is that it's not to scale. The Web Host is in reality just six small classes, and the SQL and SMTP libraries each only contain a single class.
</p>
<h3 id="4f8fe34378a34bf4b3a6ebe9188c8d9b">
Conclusion <a href="#4f8fe34378a34bf4b3a6ebe9188c8d9b">#</a>
</h3>
<p>
Despite the name similarity, the Dependency Inversion Principle isn't equivalent with Inversion of Control or Dependency Injection. There's a sizeable intersection between the two, but the DIP doesn't <em>require</em> IoC.
</p>
<p>
I often use the Functional Core, Imperative Shell architecture, or the Impureim Sandwich pattern to invert the dependencies without inverting control. This keeps most of my code more functional, which also means that it <a href="/2021/07/28/referential-transparency-fits-in-your-head">fits better in my head</a> and is <a href="/2015/05/07/functional-design-is-intrinsically-testable">intrinsically testable</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Modelling data relationships with F# typeshttps://blog.ploeh.dk/2025/01/20/modelling-data-relationships-with-f-types2025-01-20T07:24:00+00:00Mark Seemann
<div id="post">
<p>
<em>An F# example implementation of Ghosts of Departed Proofs.</em>
</p>
<p>
In a previous article, <a href="/2025/01/06/encapsulating-rod-cutting">Encapsulating rod-cutting</a>, I used a code example to discuss how to communicate an API's contract to client developers; that is, users of the API. In the article, I wrote
</p>
<blockquote>
<p>
"All this said, however, it's also possible that I'm missing an obvious design alternative. If you can think of a way to model this relationship in a non-<a href="https://www.hillelwayne.com/post/constructive/">predicative</a> way, please <a href="https://github.com/ploeh/ploeh.github.com?tab=readme-ov-file#comments">write a comment</a>."
</p>
</blockquote>
<p>
And indeed, a reader helpfully offered an alternative:
</p>
<blockquote>
<p>
"Regarding the relation between the array and the index, you will find the paper called "Ghosts of departed proofs" interesting. Maybe an overkill in this case, maybe not, but a very interesting and useful technique in general."
</p>
<footer><cite><a href="https://x.com/Savlambda/status/1876227452886012014">borar</a></cite></footer>
</blockquote>
<p>
I wouldn't call it 'an <em>obvious</em> design alternative', but nonetheless find it interesting. In this article, I'll pick up the code from <a href="/2025/01/06/encapsulating-rod-cutting">Encapsulating rod-cutting</a> and show how the 'Ghosts of Departed Proofs' (GDP) technique may be applied.
</p>
<h3 id="c43d4e8c8e414b46bf65817e7b00e85e">
Problem review <a href="#c43d4e8c8e414b46bf65817e7b00e85e">#</a>
</h3>
<p>
Before we start with the GDP technique, a brief review of the problem is in order. For the complete overview, you should read the <a href="/2025/01/06/encapsulating-rod-cutting">Encapsulating rod-cutting</a> article. In the present article, however, we'll focus on one particular problem related to <a href="/2022/10/24/encapsulation-in-functional-programming">encapsulation</a>:
</p>
<p>
Ideally, the <code>cut</code> function should take two input arguments. The first argument, <code>p</code>, is an array or list of prices. The second argument, <code>n</code>, is the size of a rod to cut optimally. One precondition states that <code>n</code> must be less than or equal to the length of <code>p</code>. This is because the algorithm needs to look up the price of a rod of size <code>n</code>, and it can't do that if <code>n</code> is greater than the length of <code>p</code>. The implied relationship is that <code>p</code> is indexed by rod size, so that if you want to find the price of a rod of size <code>n</code>, you look at the nth element in <code>p</code>.
</p>
<p>
How may we model such a relationship in a way that protects the precondition?
</p>
<p>
An obvious choice, particularly in object-oriented design, is to use a <a href="https://en.wikipedia.org/wiki/Guard_(computer_science)">Guard Clause</a>. In the <a href="https://fsharp.org/">F#</a> code base, it might look like this:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">cut</span> (<span style="font-weight:bold;color:#1f377f;">p</span> : _ <span style="color:#2b91af;">array</span>) <span style="font-weight:bold;color:#1f377f;">n</span> =
<span style="color:blue;">if</span> <span style="font-weight:bold;color:#1f377f;">p</span>.Length <= <span style="font-weight:bold;color:#1f377f;">n</span>
<span style="color:blue;">then</span> <span style="color:blue;">raise</span> (<span style="color:#2b91af;">ArgumentOutOfRangeException</span> <span style="color:#a31515;">"n must be less than the length of p"</span>)
<span style="color:green;">// The rest of the function body...</span></pre>
</p>
<p>
You might argue that in F# and other functional programming languages, throwing exceptions isn't <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a>. Instead, you ought to return <code>Result</code> or <code>Option</code> values, here the latter:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">cut</span> (<span style="font-weight:bold;color:#1f377f;">p</span> : _ <span style="color:#2b91af;">array</span>) <span style="font-weight:bold;color:#1f377f;">n</span> =
<span style="color:blue;">if</span> <span style="font-weight:bold;color:#1f377f;">p</span>.Length <= <span style="font-weight:bold;color:#1f377f;">n</span>
<span style="color:blue;">then</span> <span style="color:#2b91af;">None</span>
<span style="color:blue;">else</span>
<span style="color:green;">// The rest of the function body...</span></pre>
</p>
<p>
To be clear, in most code bases, this is exactly what I would do. What follows is rather exotic, and hardly suitable for all use cases.
</p>
<h3 id="9457df27904b4b9ea65b4713dea20fbd">
Proofs as values <a href="#9457df27904b4b9ea65b4713dea20fbd">#</a>
</h3>
<p>
It's not too hard to model the lower boundary of the <code>n</code> parameter. As is often the case, it turns out that the number must be a natural number. I already covered that in the <a href="/2025/01/06/encapsulating-rod-cutting">previous article</a>. It's much harder, however, to model the upper boundary of the value, because it depends on the size of <code>p</code>.
</p>
<p>
The following is based on the paper <a href="https://dl.acm.org/doi/10.1145/3242744.3242755">Ghosts of Departed Proofs</a>, as well as <a href="https://gist.github.com/Savelenko/9f21c63fdc00b52a64739122176b7453">a helpful Gist</a> also provided by Borar. (The link to the paper is to what I believe is the 'official' page for it, and since it's part of the ACM digital library, it's behind a paywall. Even so, as is the case with most academic papers, it's easy enough to find a PDF of it somewhere else. Not that I endorse content piracy, but it's my impression that academic papers are usually disseminated by the authors themselves.)
</p>
<p>
The idea is to enable a library to issue a 'proof' about a certain condition. In the example I'm going to use here, the proof is that a certain number is in the valid range for a given list of prices.
</p>
<p>
We actually can't entirely escape the need for a run-time check, but we do gain two other benefits. The first is that we're now using the type system to communicate a relationship that otherwise would have to be described in written documentation. The second is that once the proof has been issued, there's no need to perform additional run-time checks.
</p>
<p>
This can help move an API towards a more total, as opposed to <a href="https://en.wikipedia.org/wiki/Partial_function">partial</a>, definition, which again moves towards what Michael Feathers calls <a href="https://youtu.be/AnZ0uTOerUI?si=1gJXYFoVlNTSbjEt">unconditional code</a>. This is particularly useful if the alternative is an API that 'forgets' which run-time guarantees have already been checked. The paper has some examples. I've also recently encountered similar situations when doing <a href="https://adventofcode.com/2024">Advent of Code 2024</a>. Many days my solution involved immutable maps (like hash tables) that I'd recurse over. In many cases I'd write an algorithm where I with absolute certainty knew that a particular key was in the map (if, for example, I'd just put it there three lines earlier). In such cases, you don't want a total function that returns an option or <a href="/2022/04/25/the-maybe-monad">Maybe</a> value. You want a partial function. Or a type-level guarantee that the value is, indeed, in the map.
</p>
<p>
For the example in this article, it's overkill, so you may wonder what the point is. On the other hand, a simple example makes it easier to follow what's going on. Hopefully, once you understand the technique, you can extrapolate it to situations where it might be more warranted.
</p>
<h3 id="46bc8e47a37f43b3b126946ac29239b0">
Proof contexts <a href="#46bc8e47a37f43b3b126946ac29239b0">#</a>
</h3>
<p>
The overall idea should look familiar to practitioners of statically-typed functional programming. Instead of plain functions and data structures, we introduce a special 'context' in which we have to run our computations. This is similar to how <a href="/2023/01/09/the-io-monad">the IO monad</a> works, or, in fact, most monads. You're not supposed to <a href="/2019/02/04/how-to-get-the-value-out-of-the-monad">get the value out of the monad</a>. Rather, you should inject the desired behaviour <em>into</em> the monad.
</p>
<p>
We find a similar design with existential types, or with the <a href="https://hackage.haskell.org/package/base/docs/Control-Monad-ST.html">ST monad</a>, on which the ideas in the GDP paper are acknowledged to be based. We even see a mutation-based variation in the article <a href="/2024/06/24/a-mutable-priority-collection">A mutable priority collection</a>, where we may think of the <code>Edit</code> API as a variation of the ST monad, since it allows 'localized' state mutation.
</p>
<p>
I'll attempt to illustrate it like this:
</p>
<p>
<img src="/content/binary/library-with-computation-context.png" alt="A box labelled 'library' with a 'sandbox' area inside. To its left, another box labelled 'Client code' with an arrow to the library box, as well as an arrow to a box inside the sandbox area labelled 'Client computation'.">
</p>
<p>
A library offers a set of functions and data structures for immediate use. In addition, it also provides a <a href="https://en.wikipedia.org/wiki/Higher-order_function">higher-oder function</a> that enables client code to embed a computation into a special 'sandbox' area where special rules apply. The paper calls such a context a 'name', which it does because it's trying to be as general as possible. As I'm writing this, I find it easier to think of this 'sandbox' as a 'proof context'. It's a context in which proof values exist. Crucially, as we shall see, they don't exist outside of this context.
</p>
<h3 id="50d1ba9968d04529be3cbccfd6b05061">
Size proofs <a href="#50d1ba9968d04529be3cbccfd6b05061">#</a>
</h3>
<p>
In the rod-cutting example, we particularly care about proving that a given number <code>n</code> is within the size of the price list. We do this by representing the proof as a value:
</p>
<p>
<pre><span style="color:blue;">type</span> <span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">'a</span>> = <span style="color:blue;">private</span> <span style="color:#2b91af;">Size</span> <span style="color:blue;">of</span> <span style="color:#2b91af;">int</span> <span style="color:blue;">with</span>
<span style="color:blue;">member</span> <span style="font-weight:bold;color:#1f377f;">this</span>.Value = <span style="color:blue;">let</span> (<span style="color:#2b91af;">Size</span> <span style="font-weight:bold;color:#1f377f;">i</span>) = <span style="font-weight:bold;color:#1f377f;">this</span> <span style="color:blue;">in</span> <span style="font-weight:bold;color:#1f377f;">i</span>
<span style="color:blue;">override</span> <span style="font-weight:bold;color:#1f377f;">this</span>.<span style="font-weight:bold;color:#74531f;">ToString</span> () = <span style="color:blue;">let</span> (<span style="color:#2b91af;">Size</span> <span style="font-weight:bold;color:#1f377f;">i</span>) = <span style="font-weight:bold;color:#1f377f;">this</span> <span style="color:blue;">in</span> <span style="color:#74531f;">string</span> <span style="font-weight:bold;color:#1f377f;">i</span></pre>
</p>
<p>
Two things are special about this type definition:
</p>
<ul>
<li>The constructor is <code>private</code>.</li>
<li>It has a phantom type <code>'a</code>.</li>
</ul>
<p>
A phantom type is a generic type parameter that has no run-time value. Notice that <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">'a</span>></code> contains no value of the type <code>'a</code>. The type only exists at compile-time.
</p>
<p>
You can think of the type parameter as similar to a security token. The issuer of the proof associates a particular security token to vouch for its validity. Usually, when we talk about security tokens, they do have a run-time representation (typically a byte array) because we need to exchange them with other processes. This is, for example, how claims-based authentication works.
</p>
<p>
<img src="/content/binary/claim-with-certificate.png" alt="A box labelled 'claim'. The box has a ribboned seal in the lower right corner." width="200">
</p>
<p>
In this case, our concern isn't security. Rather, we wish to communicate and enforce certain relationships. Since we wish to leverage the type system, we use a type as a token.
</p>
<p>
<img src="/content/binary/size-with-phantom-type.png" alt="A box labelled 'size'. The box has another label in the lower right corner with the generic type argument 'a." width="200">
</p>
<p>
Since the <code>Size</code> constructor is <code>private</code>, the library controls how it issues proofs, a bit like a claims issuer can sign a claim with its private key.
</p>
<p>
Okay, but how are <code>Size</code> proofs issued?
</p>
<h3 id="28af8638928c4327b0c3a43d6c36711f">
Issuing size proofs <a href="#28af8638928c4327b0c3a43d6c36711f">#</a>
</h3>
<p>
As you'll see later, more than one API may issue <code>Size</code> proofs, but the most fundamental is that you can query a price list for such a proof:
</p>
<p>
<pre><span style="color:blue;">type</span> <span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">'a</span>> = <span style="color:blue;">private</span> <span style="color:#2b91af;">PriceList</span> <span style="color:blue;">of</span> <span style="color:#2b91af;">int</span> <span style="color:#2b91af;">list</span> <span style="color:blue;">with</span>
<span style="color:blue;">member</span> <span style="font-weight:bold;color:#1f377f;">this</span>.Length = <span style="color:blue;">let</span> (<span style="color:#2b91af;">PriceList</span> <span style="font-weight:bold;color:#1f377f;">prices</span>) = <span style="font-weight:bold;color:#1f377f;">this</span> <span style="color:blue;">in</span> <span style="font-weight:bold;color:#1f377f;">prices</span>.Length
<span style="color:blue;">member</span> <span style="font-weight:bold;color:#1f377f;">this</span>.<span style="font-weight:bold;color:#74531f;">trySize</span> <span style="font-weight:bold;color:#1f377f;">candidate</span> : <span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">'a</span>> <span style="color:#2b91af;">option</span> =
<span style="color:blue;">if</span> 0 < <span style="font-weight:bold;color:#1f377f;">candidate</span> && <span style="font-weight:bold;color:#1f377f;">candidate</span> <= <span style="font-weight:bold;color:#1f377f;">this</span>.Length
<span style="color:blue;">then</span> <span style="color:#2b91af;">Some</span> (<span style="color:#2b91af;">Size</span> <span style="font-weight:bold;color:#1f377f;">candidate</span>)
<span style="color:blue;">else</span> <span style="color:#2b91af;">None</span></pre>
</p>
<p>
The <code>trySize</code> member function issues a <code><span style="color:#2b91af;">Some</span> <span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">'a</span>></code> value if the <code>candidate</code> is within the size of the price array. As discussed above, we can't completely avoid a run-time check, but now that we have the proof, we don't need to <em>repeat</em> that run-time check if we wanted to use a particular <code>Size</code> value with the same <code>PriceList</code>.
</p>
<p>
Notice how immutability is an essential part of this design. If, in the object-oriented manner, we allow a price list to change, we could make it shorter. This could invalidate some proof that we previously issued. Since, however, the price list is immutable, we can trust that once we've checked a size, it remains valid. You can also think of this as a sort of <a href="/encapsulation-and-solid">encapsulation</a>, in the sense that once we've assured ourselves that an object, or here rather a value, is valid, it remains valid. Indeed, <a href="/2024/06/12/simpler-encapsulation-with-immutability">encapsulation is simpler with immutability</a>.
</p>
<p>
You probably still have some questions. For instance, how do we ensure that a size proof issued by one price list can't be used against another price list? Imagine that you have two price lists. One has ten prices, the other twenty. You could have the larger one issue a proof that <em>size 17</em> is valid. What prevents you from using that proof with the smaller price list?
</p>
<p>
That's the job of that phantom type. Notice how a <code><span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">'a</span>></code> issues a <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">'a</span>></code> proof. It's the same generic type argument.
</p>
<p>
Usually, I <a href="/2024/11/04/pendulum-swing-no-haskell-type-annotation-by-default">extol F#'s type inference</a>. I prefer not having to use type annotations unless I have to. When it comes to GDP, however, type annotations are necessary, because we need these phantom types to line up. Without the type annotations, they wouldn't do that.
</p>
<p>
In the above example, the smaller price list might have the type <code><span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">'a</span>></code> and the larger one the type <code><span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">'b</span>></code>. The smaller would issue proofs of the type <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">'a</span>></code>, and the larger one proofs of the type <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">'b</span>></code>. As you'll see, you can't use a <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">'a</span>></code> where a <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">'b</span>></code> is required, or vice versa.
</p>
<p>
You may still wonder how one then creates <code><span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">'a</span>></code> values. After all, that type also has a <code>private</code> constructor.
</p>
<p>
We'll get back to that later.
</p>
<h3 id="193350b5b4ab4327b021ac42403dab11">
Proof-based cut API <a href="#193350b5b4ab4327b021ac42403dab11">#</a>
</h3>
<p>
Before we look at how client code may consume APIs based on proofs such as <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">'a</span>></code>, we should review their expressive power. What does this design enable us to say?
</p>
<p>
While the first example above, with the Guard Clause alternative, was based on the initial imperative implementation shown in the article <a href="/2024/12/23/implementing-rod-cutting">Implementing rod-cutting</a>, the rest of the present article builds on the refactored code from <a href="/2025/01/06/encapsulating-rod-cutting">Encapsulating rod-cutting</a>.
</p>
<p>
The first change I need to introduce is to the <code>Cut</code> record type:
</p>
<p>
<pre><span style="color:blue;">type</span> <span style="color:#2b91af;">Cut</span><<span style="color:#2b91af;">'a</span>> = { Revenue : <span style="color:#2b91af;">int</span>; Size : <span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">'a</span>> }</pre>
</p>
<p>
Notice that I've changed the type of the <code>Size</code> property to <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">'a</span>></code>. This has the implication that <code><span style="color:#2b91af;">Cut</span><<span style="color:#2b91af;">'a</span>></code> now also has a phantom type, and since client code can't create <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">'a</span>></code> values, by transitivity it means that neither can client code create <code><span style="color:#2b91af;">Cut</span><<span style="color:#2b91af;">'a</span>></code> values. These values can only be issued as proofs.
</p>
<p>
This enables us to change the type definition of the <code>cut</code> function:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">cut</span> (<span style="color:#2b91af;">PriceList</span> <span style="font-weight:bold;color:#1f377f;">prices</span> : <span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">'a</span>>) (<span style="color:#2b91af;">Size</span> <span style="font-weight:bold;color:#1f377f;">n</span> : <span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">'a</span>>) : <span style="color:#2b91af;">Cut</span><<span style="color:#2b91af;">'a</span>> <span style="color:#2b91af;">list</span> =
<span style="color:green;">// Implementation follows here...</span></pre>
</p>
<p>
Notice how all the phantom types line up. In order to call the function, client code must supply a <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">'a</span>></code> value issued by a compatible <code><span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">'a</span>></code> value. Upon a valid call, the function returns a list of <code><span style="color:#2b91af;">Cut</span><<span style="color:#2b91af;">'a</span>></code> values.
</p>
<p>
Pay attention to what is being communicated. You may find this strange and impenetrable, but for a reader who understands GDP, much about the contract is communicated through the types. We can see that <code>n</code> relates to <code>prices</code>, because the 'proof token' (the generic type parameter <code>'a</code>) is the same for both arguments. A reader who understands how <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">'a</span>></code> proofs are issued will now understand what the preconditions is: The <code>n</code> argument must be valid according to the size of the <code>prices</code> argument.
</p>
<p>
The type of the <code>cut</code> function also communicates a postcondition: It guarantees that the <code>Size</code> values of each <code><span style="color:#2b91af;">Cut</span><<span style="color:#2b91af;">'a</span>></code> returned is valid according to the supplied <code>prices</code>. In other words, it means that no <a href="/2013/07/08/defensive-coding">defensive coding</a> is necessary. Client code doesn't have to check whether or not the price of each indicated cut can actually be found in <code>prices</code>. The types guarantee that they can.
</p>
<p>
You may consider the <code>cut</code> function a 'secondary' issuer of <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">'a</span>></code> proofs, since it returns such values. If you wanted to call <code>cut</code> again with one of those values, you could.
</p>
<p>
Compared to the previous article, I don't think I changed much else in the <code>cut</code> function, besides the initial function declaration, and the last line of code, but for good measure, here's the entire function:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">cut</span> (<span style="color:#2b91af;">PriceList</span> <span style="font-weight:bold;color:#1f377f;">prices</span> : <span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">'a</span>>) (<span style="color:#2b91af;">Size</span> <span style="font-weight:bold;color:#1f377f;">n</span> : <span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">'a</span>>) : <span style="color:#2b91af;">Cut</span><<span style="color:#2b91af;">'a</span>> <span style="color:#2b91af;">list</span> =
<span style="color:green;">// Implementation follows here...</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">p</span> = 0 <span style="color:#2b91af;">::</span> <span style="font-weight:bold;color:#1f377f;">prices</span> |> <span style="color:#2b91af;">Array</span>.<span style="color:#74531f;">ofList</span>
<span style="color:blue;">let</span> <span style="color:#74531f;">findBestCut</span> <span style="font-weight:bold;color:#1f377f;">revenues</span> <span style="font-weight:bold;color:#1f377f;">j</span> =
[1..<span style="font-weight:bold;color:#1f377f;">j</span>]
|> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">map</span> (<span style="color:blue;">fun</span> <span style="font-weight:bold;color:#1f377f;">i</span> <span style="color:blue;">-></span> <span style="font-weight:bold;color:#1f377f;">p</span>[<span style="font-weight:bold;color:#1f377f;">i</span>] + <span style="color:#2b91af;">Map</span>.<span style="color:#74531f;">find</span> (<span style="font-weight:bold;color:#1f377f;">j</span> - <span style="font-weight:bold;color:#1f377f;">i</span>) <span style="font-weight:bold;color:#1f377f;">revenues</span>, <span style="font-weight:bold;color:#1f377f;">i</span>)
|> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">maxBy</span> <span style="color:#74531f;">fst</span>
<span style="color:blue;">let</span> <span style="color:#74531f;">aggregate</span> <span style="font-weight:bold;color:#1f377f;">acc</span> <span style="font-weight:bold;color:#1f377f;">j</span> =
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">revenues</span> = <span style="color:#74531f;">snd</span> <span style="font-weight:bold;color:#1f377f;">acc</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">q</span>, <span style="font-weight:bold;color:#1f377f;">i</span> = <span style="color:#74531f;">findBestCut</span> <span style="font-weight:bold;color:#1f377f;">revenues</span> <span style="font-weight:bold;color:#1f377f;">j</span>
<span style="color:blue;">let</span> <span style="color:#74531f;">cuts</span> = <span style="color:#74531f;">fst</span> <span style="font-weight:bold;color:#1f377f;">acc</span>
<span style="color:#74531f;">cuts</span> << (<span style="color:#74531f;">cons</span> (<span style="font-weight:bold;color:#1f377f;">q</span>, <span style="font-weight:bold;color:#1f377f;">i</span>)), <span style="color:#2b91af;">Map</span>.<span style="color:#74531f;">add</span> <span style="font-weight:bold;color:#1f377f;">revenues</span>.Count <span style="font-weight:bold;color:#1f377f;">q</span> <span style="font-weight:bold;color:#1f377f;">revenues</span>
[1..<span style="font-weight:bold;color:#1f377f;">n</span>]
|> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">fold</span> <span style="color:#74531f;">aggregate</span> (<span style="color:#74531f;">id</span>, <span style="color:#2b91af;">Map</span>.<span style="color:#74531f;">add</span> 0 0 <span style="color:#2b91af;">Map</span>.empty)
|> <span style="color:#74531f;">fst</span> <| [] <span style="color:green;">// Evaluate Hughes list</span>
|> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">map</span> (<span style="color:blue;">fun</span> (<span style="font-weight:bold;color:#1f377f;">r</span>, <span style="font-weight:bold;color:#1f377f;">i</span>) <span style="color:blue;">-></span> { Revenue = <span style="font-weight:bold;color:#1f377f;">r</span>; Size = <span style="color:#2b91af;">Size</span> <span style="font-weight:bold;color:#1f377f;">i</span> })</pre>
</p>
<p>
The <code>cut</code> function is part of the same module as <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">'a</span>></code>, so even though the constructor is <code>private</code>, the <code>cut</code> function can still use it.
</p>
<p>
Thus, the entire proof mechanism is for external use. Internally, the library code may take shortcuts, so it's up to the library author to convince him- or herself that the contract holds. In this case, I'm quite confident that the function only issues valid proofs. After all, I've lifted the algorithm from <a href="/ref/clrs">an acclaimed text book</a>, and this particular implementation is covered by more than 10,000 test cases.
</p>
<h3 id="dcbcbee557a54a8aaa701484ad89e90f">
Proof-based solve API <a href="#dcbcbee557a54a8aaa701484ad89e90f">#</a>
</h3>
<p>
The <code>solve</code> code hasn't changed, I believe:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">solve</span> <span style="font-weight:bold;color:#1f377f;">prices</span> <span style="font-weight:bold;color:#1f377f;">n</span> =
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">cuts</span> = <span style="color:#74531f;">cut</span> <span style="font-weight:bold;color:#1f377f;">prices</span> <span style="font-weight:bold;color:#1f377f;">n</span>
<span style="color:blue;">let</span> <span style="color:blue;">rec</span> <span style="color:#74531f;">imp</span> <span style="font-weight:bold;color:#1f377f;">n</span> =
<span style="color:blue;">if</span> <span style="font-weight:bold;color:#1f377f;">n</span> <= 0 <span style="color:blue;">then</span> [] <span style="color:blue;">else</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">idx</span> = <span style="font-weight:bold;color:#1f377f;">n</span> - 1
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">s</span> = <span style="font-weight:bold;color:#1f377f;">cuts</span>[<span style="font-weight:bold;color:#1f377f;">idx</span>].Size
<span style="font-weight:bold;color:#1f377f;">s</span> <span style="color:#2b91af;">::</span> <span style="color:#74531f;">imp</span> (<span style="font-weight:bold;color:#1f377f;">n</span> - <span style="font-weight:bold;color:#1f377f;">s</span>.Value)
<span style="color:#74531f;">imp</span> <span style="font-weight:bold;color:#1f377f;">n</span>.Value</pre>
</p>
<p>
While the code hasn't changed, the type has. In this case, no explicit type annotations are necessary, because the types are already correctly inferred from the use of <code>cut</code>:
</p>
<p>
<pre>solve: prices: PriceList<'a> <span style="color:blue;">-></span> n: Size<'a> <span style="color:blue;">-></span> Size<'a> list</pre>
</p>
<p>
Again, the phantom types line up as desired.
</p>
<h3 id="f2e9d4cab42b4d74bb472c3aee2e45ef">
Proof-based revenue calculation <a href="#f2e9d4cab42b4d74bb472c3aee2e45ef">#</a>
</h3>
<p>
Although I didn't show it in the previous article, I also included a function to calculate the revenue from a list of cuts. It gets the same treatment as the other functions:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">calculateRevenue</span> (<span style="color:#2b91af;">PriceList</span> <span style="font-weight:bold;color:#1f377f;">prices</span> : <span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">'a</span>>) (<span style="font-weight:bold;color:#1f377f;">cuts</span> : <span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">'a</span>> <span style="color:#2b91af;">list</span>) =
<span style="font-weight:bold;color:#1f377f;">cuts</span> |> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">sumBy</span> (<span style="color:blue;">fun</span> <span style="font-weight:bold;color:#1f377f;">s</span> <span style="color:blue;">-></span> <span style="font-weight:bold;color:#1f377f;">prices</span>[<span style="font-weight:bold;color:#1f377f;">s</span>.Value - 1])</pre>
</p>
<p>
Again we see how the GDP-based API communicates a precondition: The <code>cuts</code> must be valid according to the <code>prices</code>; that is, each cut, indicated by its <code>Size</code> property, must be guaranteed to be within the range defined by the price list. This makes the function total; or, unconditional code, as Michael Feathers would put it. The function can't fail at run time.
</p>
<p>
(I am, once more, deliberately ignoring the entirely independent problem of potential integer overflows.)
</p>
<p>
While you could repeatedly call <code><span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">'a</span>>.<span style="font-weight:bold;color:#74531f;">trySize</span></code> to produce a list of cuts, the most natural way to produce such a list of cuts is to first call <code>cut</code>, and then pass its result to <code>calculateRevenue</code>.
</p>
<p>
The function returns <code>int</code>.
</p>
<h3 id="0a22569816ad4b179a365d2dc3ad81f4">
Proof-based printing <a href="#0a22569816ad4b179a365d2dc3ad81f4">#</a>
</h3>
<p>
Finally, here's <code>printSolution</code>:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">printSolution</span> <span style="font-weight:bold;color:#1f377f;">p</span> <span style="font-weight:bold;color:#1f377f;">n</span> = <span style="color:#74531f;">solve</span> <span style="font-weight:bold;color:#1f377f;">p</span> <span style="font-weight:bold;color:#1f377f;">n</span> |> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">iter</span> (<span style="color:#74531f;">printfn</span> <span style="color:#a31515;">"</span><span style="color:#2b91af;">%O</span><span style="color:#a31515;">"</span>)</pre>
</p>
<p>
It hasn't changed much since the previous incarnation, but the type is now <code>PriceList<'a> <span style="color:blue;">-></span> Size<'a> <span style="color:blue;">-></span> unit</code>. Again, the precondition is the same as for <code>cut</code>.
</p>
<h3 id="2216067c41c44ade8855e4c6f8216d6d">
Running client code <a href="#2216067c41c44ade8855e4c6f8216d6d">#</a>
</h3>
<p>
How in the world do you write client code against this API? After all, the types all have <code>private</code> constructors, so we can't create any values.
</p>
<p>
If you trace the code dependencies, you'll notice that <code><span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">'a</span>></code> sits at the heart of the API. If you have a <code><span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">'a</span>></code>, you'd be able to produce the other values, too.
</p>
<p>
So how do you create a <code><span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">'a</span>></code> value?
</p>
<p>
You don't. You call the following <code>runPrices</code> function, and give it a <code>PriceListRunner</code> that it'll embed and run in the 'sandbox' illustrated above.
</p>
<p>
<pre><span style="color:blue;">type</span> <span style="color:#2b91af;">PriceListRunner</span><<span style="color:#2b91af;">'r</span>> =
<span style="color:blue;">abstract</span> <span style="font-weight:bold;color:#74531f;">Run</span><<span style="color:#2b91af;">'a</span>> : <span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">'a</span>> <span style="color:blue;">-></span> <span style="color:#2b91af;">'r</span>
<span style="color:blue;">let</span> <span style="color:#74531f;">runPrices</span> <span style="font-weight:bold;color:#1f377f;">pl</span> (<span style="font-weight:bold;color:#1f377f;">ctx</span> : <span style="color:#2b91af;">PriceListRunner</span><<span style="color:#2b91af;">'r</span>>) = <span style="font-weight:bold;color:#1f377f;">ctx</span>.<span style="font-weight:bold;color:#74531f;">Run</span> (<span style="color:#2b91af;">PriceList</span> <span style="font-weight:bold;color:#1f377f;">pl</span>)</pre>
</p>
<p>
As the paper describes, the GDP trick hinges on rank-2 polymorphism, and the only way (that I know of) this is supported in F# is on methods. An object is therefore required, and we define the abstract <code><span style="color:#2b91af;">PriceListRunner</span><<span style="color:#2b91af;">'r</span>></code> class for that purpose.
</p>
<p>
Client code must implement the abstract class to call the <code>runPrices</code> function. Fortunately, since F# has <a href="https://learn.microsoft.com/dotnet/fsharp/language-reference/object-expressions">object expressions</a>, client code might look like this:
</p>
<p>
<pre>[<<span style="color:#2b91af;">Fact</span>>]
<span style="color:blue;">let</span> <span style="color:#74531f;">``CLRS example``</span> () =
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">p</span> = [1; 5; 8; 9; 10; 17; 17; 20; 24; 30]
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = <span style="color:#2b91af;">Rod</span>.<span style="color:#74531f;">runPrices</span> <span style="font-weight:bold;color:#1f377f;">p</span> { <span style="color:blue;">new</span> <span style="color:#2b91af;">PriceListRunner</span><_> <span style="color:blue;">with</span>
<span style="color:blue;">member</span> __.<span style="font-weight:bold;color:#74531f;">Run</span> <span style="font-weight:bold;color:#1f377f;">pl</span> = <span style="color:blue;">option</span> {
<span style="color:blue;">let!</span> <span style="font-weight:bold;color:#1f377f;">n</span> = <span style="font-weight:bold;color:#1f377f;">pl</span>.<span style="font-weight:bold;color:#74531f;">trySize</span> 10
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">cuts</span> = <span style="color:#2b91af;">Rod</span>.<span style="color:#74531f;">cut</span> <span style="font-weight:bold;color:#1f377f;">pl</span> <span style="font-weight:bold;color:#1f377f;">n</span>
<span style="color:blue;">return</span> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">map</span> (<span style="color:blue;">fun</span> <span style="font-weight:bold;color:#1f377f;">c</span> <span style="color:blue;">-></span> (<span style="font-weight:bold;color:#1f377f;">c</span>.Revenue, <span style="font-weight:bold;color:#1f377f;">c</span>.Size.Value)) <span style="font-weight:bold;color:#1f377f;">cuts</span> } }
[
( 1, 1)
( 5, 2)
( 8, 3)
(10, 2)
(13, 2)
(17, 6)
(18, 1)
(22, 2)
(25, 3)
(30, 10)
] |> <span style="color:#2b91af;">Some</span> =! <span style="font-weight:bold;color:#1f377f;">actual</span></pre>
</p>
<p>
This is an <a href="https://xunit.net/">xUnit.net</a> test where <code>actual</code> is produced by <code>runPrices</code> and an object expression that defines the code to run in the proof context. When the <code>Run</code> method runs, it runs with a concrete type that the compiler picked for <code>'a</code>. This type is only in scope within that method, and can't escape it.
</p>
<p>
The implementing class is given a <code><span style="color:#2b91af;">PriceList</span><<span style="color:#2b91af;">'a</span>></code> as an input argument. In this example, it tries to create a size of 10, which succeeds because the price list has ten elements.
</p>
<p>
Notice that the <code>Run</code> method transforms the <code>cuts</code> to tuples. Why doesn't it return <code>cuts</code> directly?
</p>
<p>
It can't. It's part of the deal. If I change the last line of <code>Run</code> to <code>return cuts</code>, the code no longer compiles. The compiler error is:
</p>
<blockquote>
<p>
This code is not sufficiently generic. The type variable 'a could not be generalized because it would escape its scope.
</p>
</blockquote>
<p>
Remember I wrote that <code>'a</code> can't escape the scope of <code>Run</code>? This is enforced by the type system.
</p>
<h3 id="6583d683c77c4bfba53f3f2c1195603e">
Preventing misalignment <a href="#6583d683c77c4bfba53f3f2c1195603e">#</a>
</h3>
<p>
You may already consider it a benefit that this kind of API design uses the type system to communicate pre- and postconditions. Perhaps you also wonder how it prevents errors. As already discussed, if you're dealing with multiple price lists, it shouldn't be possible to use a size proof issued by one, with another. Let's see how that might look. We'll start with a correctly coded unit test:
</p>
<p>
<pre>[<<span style="color:#2b91af;">Fact</span>>]
<span style="color:blue;">let</span> <span style="color:#74531f;">``Nest two solutions``</span> () =
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">p1</span> = [1; 2; 2]
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">p2</span> = [1]
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = <span style="color:#2b91af;">Rod</span>.<span style="color:#74531f;">runPrices</span> <span style="font-weight:bold;color:#1f377f;">p1</span> { <span style="color:blue;">new</span> <span style="color:#2b91af;">PriceListRunner</span><_> <span style="color:blue;">with</span>
<span style="color:blue;">member</span> __.<span style="font-weight:bold;color:#74531f;">Run</span> <span style="font-weight:bold;color:#1f377f;">pl1</span> = <span style="color:blue;">option</span> {
<span style="color:blue;">let!</span> <span style="font-weight:bold;color:#1f377f;">n1</span> = <span style="font-weight:bold;color:#1f377f;">pl1</span>.<span style="font-weight:bold;color:#74531f;">trySize</span> 3
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">cuts1</span> = <span style="color:#2b91af;">Rod</span>.<span style="color:#74531f;">solve</span> <span style="font-weight:bold;color:#1f377f;">pl1</span> <span style="font-weight:bold;color:#1f377f;">n1</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">r</span> = <span style="color:#2b91af;">Rod</span>.<span style="color:#74531f;">calculateRevenue</span> <span style="font-weight:bold;color:#1f377f;">pl1</span> <span style="font-weight:bold;color:#1f377f;">cuts1</span>
<span style="color:blue;">let!</span> <span style="font-weight:bold;color:#1f377f;">inner</span> = <span style="color:#2b91af;">Rod</span>.<span style="color:#74531f;">runPrices</span> <span style="font-weight:bold;color:#1f377f;">p2</span> { <span style="color:blue;">new</span> <span style="color:#2b91af;">PriceListRunner</span><_> <span style="color:blue;">with</span>
<span style="color:blue;">member</span> __.<span style="font-weight:bold;color:#74531f;">Run</span> <span style="font-weight:bold;color:#1f377f;">pl2</span> = <span style="color:blue;">option</span> {
<span style="color:blue;">let!</span> <span style="font-weight:bold;color:#1f377f;">n2</span> = <span style="font-weight:bold;color:#1f377f;">pl2</span>.<span style="font-weight:bold;color:#74531f;">trySize</span> 1
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">cuts2</span> = <span style="color:#2b91af;">Rod</span>.<span style="color:#74531f;">solve</span> <span style="font-weight:bold;color:#1f377f;">pl2</span> <span style="font-weight:bold;color:#1f377f;">n2</span>
<span style="color:blue;">return</span> <span style="color:#2b91af;">Rod</span>.<span style="color:#74531f;">calculateRevenue</span> <span style="font-weight:bold;color:#1f377f;">pl2</span> <span style="font-weight:bold;color:#1f377f;">cuts2</span> } }
<span style="color:blue;">return</span> (<span style="font-weight:bold;color:#1f377f;">r</span>, <span style="font-weight:bold;color:#1f377f;">inner</span>) } }
<span style="color:#2b91af;">Some</span> (3, 1) =! <span style="font-weight:bold;color:#1f377f;">actual</span></pre>
</p>
<p>
This code compiles because I haven't mixed up the <code>Size</code> or <code>Cut</code> values. What happens if I 'accidentally' change the 'inner' <code>Rod.solve</code> call to <code><span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">cuts2</span> = <span style="color:#2b91af;">Rod</span>.<span style="color:#74531f;">solve</span> <span style="font-weight:bold;color:#1f377f;">pl2</span> <span style="font-weight:bold;color:#1f377f;">n1</span></code>?
</p>
<p>
The code doesn't compile:
</p>
<blockquote>
<p>
Type mismatch. Expecting a 'Size<'a>' but given a 'Size<'b>' The type ''a' does not match the type ''b'
</p>
</blockquote>
<p>
This is fortunate, because <code>n1</code> wouldn't work with <code>pl2</code>. Consider that <code>n1</code> contains the number <code>3</code>, which is valid for the larger list <code>pl1</code>, but not the shorter list <code>pl2</code>.
</p>
<p>
Proofs are issued with a particular generic type argument - the type-level 'token', if you will. It's possible for a library API to explicitly propagate such proofs; you see a hint of that in <code>cut</code>, which not only takes as input a <code><span style="color:#2b91af;">Size</span><<span style="color:#2b91af;">'a</span>></code> value, but also issues new proofs as a result.
</p>
<p>
At the same time, this design prevents proofs from being mixed up. Each set of proofs belongs to a particular proof context.
</p>
<p>
You get the same compiler error if you accidentally mix up some of the other terms.
</p>
<h3 id="d143cd141fc143c49e14d8e600492dc0">
Conclusion <a href="#d143cd141fc143c49e14d8e600492dc0">#</a>
</h3>
<p>
One goal in the GDP paper is to introduce a type-safe API design that's also <em>ergonomic</em>. Matt Noonan, the author, defines <em>ergonomic</em> as a design where correct use of the API doesn't place an undue burden on the client developer. The paper's example language is <a href="https://www.haskell.org/">Haskell</a> where <a href="https://wiki.haskell.org/Rank-N_types">rank-2 polymorphism</a> has a low impact on the user.
</p>
<p>
F# only supports rank-2 polymorphism in method definitions, which makes consuming a GDP API more awkward than in Haskell. The need to create a new type, and the few lines of boilerplate that entails, is a drawback.
</p>
<p>
Even so, the GDP trick is a nice addition to your functional tool belt. You'll hardly need it every day, but I personally like having some specialized tools lying around together with the everyday ones.
</p>
<p>
But wait! The reason that F# has support for rank-2 polymorphism through object methods is because C# has that language feature. This must mean that the GDP technique works in C# as well, doesn't it? Indeed it does.
</p>
<p>
<strong>Next:</strong> <a href="/2025/02/03/modelling-data-relationships-with-c-types">Modelling data relationships with C# types</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Recawr Sandwichhttps://blog.ploeh.dk/2025/01/13/recawr-sandwich2025-01-13T15:52:00+00:00Mark Seemann
<div id="post">
<p>
<em>A pattern variation.</em>
</p>
<p>
After writing the articles <a href="/2024/11/18/collecting-and-handling-result-values">Collecting and handling result values</a> and <a href="/2024/12/02/short-circuiting-an-asynchronous-traversal">Short-circuiting an asynchronous traversal</a>, I realized that it might be valuable to describe a more disciplined variation of the <a href="/2020/03/02/impureim-sandwich">Impureim Sandwich</a> pattern.
</p>
<p>
The book <a href="/ref/dp">Design Patterns</a> describes each pattern over a number of sections. There's a description of the overall motivation, the structure of the pattern, UML diagrams, examples code, and more. One section discusses various implementation variations. I find it worthwhile, too, to explicitly draw attention to a particular variation of the more overall Impureim Sandwich pattern.
</p>
<p>
This variation imposes an additional constraint to the general pattern. While this may, at first glance, seem limiting, <a href="https://www.dotnetrocks.com/details/1542">constraints liberate</a>.
</p>
<p>
<img src="/content/binary/impureim-superset-of-recawr.png" alt="A subset labeled 'Recawr Sandwiches' contained in a superset labeled 'Impureim Sandwiches'.">
</p>
<p>
As a specialization, you may consider Recawr Sandwiches as a subset of all Impureim Sandwiches.
</p>
<h3 id="7b076cc0cc9148b9ba464bf41feb6128">
Read, calculate, write <a href="#7b076cc0cc9148b9ba464bf41feb6128">#</a>
</h3>
<p>
In short, the constraint is that the Sandwich should be organized in the following order:
</p>
<ul>
<li>Read data. This step is impure.</li>
<li>Calculate a result from the data. This step is a <a href="https://en.wikipedia.org/wiki/Pure_function">pure function</a>.</li>
<li>Write data. This step is impure.</li>
</ul>
<p>
If the sandwich has <a href="/2023/10/09/whats-a-sandwich">more than three layers</a>, this order should still be maintained. Once you start writing data to the network, to disk, to a database, or to the user interface, you shouldn't go back to reading in more data.
</p>
<h3 id="12089f0da99644849da33faf7dd8ffa4">
Naming <a href="#12089f0da99644849da33faf7dd8ffa4">#</a>
</h3>
<p>
The name <em>Recawr Sandwich</em> is made from the first letters of <em>REad CAlculate WRite</em>. It's pronounced <em>recover sandwich</em>.
</p>
<p>
When the idea of naming this variation originally came to me, I first thought of the name <em>read/write sandwich</em>, but then I thought that the most important ingredient, the pure function, was missing. I've considered some other variations, such as <em>read, pure, write sandwich</em> or <em>input, referential transparency, output sandwich</em>, but none of them quite gets the point across, I think, in the same way as <em>read, calculate, write</em>.
</p>
<h3 id="954558da563244edbb98a6685b3f9460">
Precipitating example <a href="#954558da563244edbb98a6685b3f9460">#</a>
</h3>
<p>
To be clear, I've been applying the Recawr Sandwich pattern for years, but it sometimes takes a counter-example before you realize that some implicit, tacit knowledge should be made explicit. This happened to me as I was discussing <a href="/2024/11/18/collecting-and-handling-result-values">this implementation</a> of Impureim Sandwich:
</p>
<p>
<pre><span style="color:green;">// Impure</span>
<span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">OneOf</span><<span style="color:#2b91af;">ShoppingListItem</span>, <span style="color:#2b91af;">NotFound</span><<span style="color:#2b91af;">ShoppingListItem</span>>, <span style="color:#2b91af;">Error</span>>> <span style="font-weight:bold;color:#1f377f;">results</span> =
<span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">itemsToUpdate</span>.<span style="font-weight:bold;color:#74531f;">Traverse</span>(<span style="font-weight:bold;color:#1f377f;">item</span> => <span style="color:#74531f;">UpdateItem</span>(<span style="font-weight:bold;color:#1f377f;">item</span>, <span style="font-weight:bold;color:#1f377f;">dbContext</span>));
<span style="color:green;">// Pure</span>
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">result</span> = <span style="font-weight:bold;color:#1f377f;">results</span>.<span style="font-weight:bold;color:#74531f;">Aggregate</span>(
<span style="color:blue;">new</span> <span style="color:#2b91af;">BulkUpdateResult</span>([], [], []),
(<span style="font-weight:bold;color:#1f377f;">state</span>, <span style="font-weight:bold;color:#1f377f;">result</span>) =>
<span style="font-weight:bold;color:#1f377f;">result</span>.<span style="font-weight:bold;color:#74531f;">Match</span>(
<span style="font-weight:bold;color:#1f377f;">storedItem</span> => <span style="font-weight:bold;color:#1f377f;">state</span>.<span style="font-weight:bold;color:#74531f;">Store</span>(<span style="font-weight:bold;color:#1f377f;">storedItem</span>),
<span style="font-weight:bold;color:#1f377f;">notFound</span> => <span style="font-weight:bold;color:#1f377f;">state</span>.<span style="font-weight:bold;color:#74531f;">Fail</span>(<span style="font-weight:bold;color:#1f377f;">notFound</span>.Item),
<span style="font-weight:bold;color:#1f377f;">error</span> => <span style="font-weight:bold;color:#1f377f;">state</span>.<span style="font-weight:bold;color:#74531f;">Error</span>(<span style="font-weight:bold;color:#1f377f;">error</span>)));
<span style="color:green;">// Impure</span>
<span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">dbContext</span>.<span style="font-weight:bold;color:#74531f;">SaveChangesAsync</span>();
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">OkResult</span>(<span style="font-weight:bold;color:#1f377f;">result</span>);</pre>
</p>
<p>
Notice that the top impure step traverses a collection of items to apply each to an action called <code>UpdateItem</code>. As I discussed in the article, I don't actually know what <code>UpdateItem</code> does, but the name strongly suggests that it updates a particular database row. Even if the actual write doesn't happen until <code>SaveChangesAsync</code> is called, this still seems off.
</p>
<p>
To be honest, I didn't realize this until I started thinking about how I'd go about solving the implied problem, if I had to do it from scratch. Because I probably wouldn't do it like that at all.
</p>
<p>
It strikes me that doing the update 'too early' makes the code more complicated than it has to be.
</p>
<p>
What would a Recawr Sandwich look like?
</p>
<h3 id="e599dadd006a4d179289ba72a1978c1f">
Recawr example <a href="#e599dadd006a4d179289ba72a1978c1f">#</a>
</h3>
<p>
Perhaps one could instead start by querying the database about which items are actually in it, then prepare the result, and finally make the update.
</p>
<p>
<pre><span style="color:green;">// Read</span>
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">existing</span> = <span style="color:blue;">await</span> <span style="color:#74531f;">FilterExisting</span>(<span style="font-weight:bold;color:#1f377f;">itemsToUpdate</span>, <span style="font-weight:bold;color:#1f377f;">dbContext</span>);
<span style="color:green;">// Calculate</span>
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">result</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">BulkUpdateResult</span>([.. <span style="font-weight:bold;color:#1f377f;">existing</span>], [.. <span style="font-weight:bold;color:#1f377f;">itemsToUpdate</span>.<span style="font-weight:bold;color:#74531f;">Except</span>(<span style="font-weight:bold;color:#1f377f;">existing</span>)], []);
<span style="color:green;">// Write</span>
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">results</span> = <span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">existing</span>.<span style="font-weight:bold;color:#74531f;">Traverse</span>(<span style="font-weight:bold;color:#1f377f;">item</span> => <span style="color:#74531f;">UpdateItem</span>(<span style="font-weight:bold;color:#1f377f;">item</span>, <span style="font-weight:bold;color:#1f377f;">dbContext</span>));
<span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">dbContext</span>.<span style="font-weight:bold;color:#74531f;">SaveChangesAsync</span>();
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">OkResult</span>(<span style="font-weight:bold;color:#1f377f;">result</span>);</pre>
</p>
<p>
To be honest, this variation has different behaviour when <code>Error</code> values occur, but then again, I wasn't entirely sure what was even the purpose of the error value. If it's to <a href="/2024/01/29/error-categories-and-category-errors">model errors that client code can't recover from</a>, throw an exception instead.
</p>
<p>
In any case, the example is typical of many <a href="https://en.wikipedia.org/wiki/Input/output">I/O</a>-heavy operations, which veer dangerously close to the degenerate. There really isn't a lot of logic required, so one may reasonably ask whether the example is useful. It was, however, the example that got me thinking about giving the Recawr Sandwich an explicit name.
</p>
<h3 id="ef69b33222b44b3e889fc0c861537d48">
Other examples <a href="#ef69b33222b44b3e889fc0c861537d48">#</a>
</h3>
<p>
All the examples in the original <a href="/2020/03/02/impureim-sandwich">Impureim Sandwich</a> article are actually Recawr Sandwiches. Other articles with clear Recawr Sandwich examples are:
</p>
<ul>
<li><a href="/2019/09/09/picture-archivist-in-haskell">Picture archivist in Haskell</a></li>
<li><a href="/2019/09/16/picture-archivist-in-f">Picture archivist in F#</a></li>
<li><a href="/2021/09/06/the-command-handler-contravariant-functor">The Command Handler contravariant functor</a></li>
<li><a href="/2024/12/16/a-restaurant-sandwich">A restaurant sandwich</a></li>
</ul>
<p>
In other words, I'm just retroactively giving these examples a more specific label.
</p>
<p>
What's an example of an Impureim Sandwich which is <em>not</em> a Recawr Sandwich? Ironically, the first example in this article.
</p>
<h3 id="95dbd8e6364d429db7f040835d89e8e7">
Conclusion <a href="#95dbd8e6364d429db7f040835d89e8e7">#</a>
</h3>
<p>
A Recawr Sandwich is a specialization of the slightly more general Impureim Sandwich pattern. It specializes by assigning roles to the two impure layers of the sandwich. In the first, the code reads data. In the second impure layer, it writes data. In between, it performs referentially transparent calculations.
</p>
<p>
While more constraining, this specialization offers a good rule of thumb. Most well-designed sandwiches follow this template.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Encapsulating rod-cuttinghttps://blog.ploeh.dk/2025/01/06/encapsulating-rod-cutting2025-01-06T10:45:00+00:00Mark Seemann
<div id="post">
<p>
<em>Focusing on usage over implementation.</em>
</p>
<p>
This article is a part of a small article series about <a href="/2024/12/09/implementation-and-usage-mindsets">implementation and usage mindsets</a>. The hypothesis is that programmers who approach a problem with an implementation mindset may gravitate toward dynamically typed languages, whereas developers concerned with long-term maintenance and sustainability of a code base may be more inclined toward statically typed languages. This could be wrong, and is almost certainly too simplistic, but is still, I hope, worth thinking about. In the <a href="/2024/12/23/implementing-rod-cutting">previous article</a> you saw examples of an implementation-centric approach to problem-solving. In this article, I'll discuss what a usage-first perspective entails.
</p>
<p>
A usage perspective indicates that you're first and foremost concerned with how useful a programming interface is. It's what you do when you take advantage of test-driven development (TDD). First, you write a test, which furnishes an example of what a usage scenario looks like. Only then do you figure out how to implement the desired API.
</p>
<p>
In this article I didn't use TDD since I already had a particular implementation. Even so, while I didn't mention it in the previous article, I did add tests to verify that the code works as intended. In fact, because I wrote a few <a href="https://github.com/hedgehogqa/fsharp-hedgehog/">Hedgehog</a> properties, I have more than 10.000 test cases covering my implementation.
</p>
<p>
I bring this up because TDD is only one way to focus on sustainability and <a href="/encapsulation-and-solid">encapsulation</a>. It's the most scientific methodology that I know of, but you can employ more ad-hoc, ex-post analysis processes. I'll do that here.
</p>
<h3 id="03ca7a9c8c8146b6b7f0c1275ae9abcc">
Imperative origin <a href="#03ca7a9c8c8146b6b7f0c1275ae9abcc">#</a>
</h3>
<p>
In the <a href="/2024/12/23/implementing-rod-cutting">previous article</a> you saw how the <code>Extended-Bottom-Up-Cut-Rod</code> pseudocode was translated to this <a href="https://fsharp.org/">F#</a> function:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">cut</span> (<span style="font-weight:bold;color:#1f377f;">p</span> : _ <span style="color:#2b91af;">array</span>) <span style="font-weight:bold;color:#1f377f;">n</span> =
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">r</span> = <span style="color:#2b91af;">Array</span>.<span style="color:#74531f;">zeroCreate</span> (<span style="font-weight:bold;color:#1f377f;">n</span> + 1)
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">s</span> = <span style="color:#2b91af;">Array</span>.<span style="color:#74531f;">zeroCreate</span> (<span style="font-weight:bold;color:#1f377f;">n</span> + 1)
<span style="font-weight:bold;color:#1f377f;">r</span>[0] <span style="color:blue;"><-</span> 0
<span style="color:blue;">for</span> <span style="font-weight:bold;color:#1f377f;">j</span> = 1 <span style="color:blue;">to</span> <span style="font-weight:bold;color:#1f377f;">n</span> <span style="color:blue;">do</span>
<span style="color:blue;">let</span> <span style="color:blue;">mutable</span> <span style="color:#a08000;">q</span> = <span style="color:#2b91af;">Int32</span>.MinValue
<span style="color:blue;">for</span> <span style="font-weight:bold;color:#1f377f;">i</span> = 1 <span style="color:blue;">to</span> <span style="font-weight:bold;color:#1f377f;">j</span> <span style="color:blue;">do</span>
<span style="color:blue;">if</span> <span style="color:#a08000;">q</span> < <span style="font-weight:bold;color:#1f377f;">p</span>[<span style="font-weight:bold;color:#1f377f;">i</span>] + <span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">j</span> - <span style="font-weight:bold;color:#1f377f;">i</span>] <span style="color:blue;">then</span>
<span style="color:#a08000;">q</span> <span style="color:blue;"><-</span> <span style="font-weight:bold;color:#1f377f;">p</span>[<span style="font-weight:bold;color:#1f377f;">i</span>] + <span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">j</span> - <span style="font-weight:bold;color:#1f377f;">i</span>]
<span style="font-weight:bold;color:#1f377f;">s</span>[<span style="font-weight:bold;color:#1f377f;">j</span>] <span style="color:blue;"><-</span> <span style="font-weight:bold;color:#1f377f;">i</span>
<span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">j</span>] <span style="color:blue;"><-</span> <span style="color:#a08000;">q</span>
<span style="font-weight:bold;color:#1f377f;">r</span>, <span style="font-weight:bold;color:#1f377f;">s</span></pre>
</p>
<p>
In case anyone is wondering: This is a bona-fide <a href="https://en.wikipedia.org/wiki/Pure_function">pure function</a>, even if the implementation is as imperative as can be. Given the same input, <code>cut</code> always returns the same output, and there are no side effects. We may wish to implement the function in a more <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> way, but that's not our first concern. <em>My</em> first concern, at least, is to make sure that preconditions, invariants, and postconditions are properly communicated.
</p>
<p>
The same goal applies to the <code>printSolution</code> action, also repeated here for your convenience.
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">printSolution</span> <span style="font-weight:bold;color:#1f377f;">p</span> <span style="font-weight:bold;color:#1f377f;">n</span> =
<span style="color:blue;">let</span> _, <span style="font-weight:bold;color:#1f377f;">s</span> = <span style="color:#74531f;">cut</span> <span style="font-weight:bold;color:#1f377f;">p</span> <span style="font-weight:bold;color:#1f377f;">n</span>
<span style="color:blue;">let</span> <span style="color:blue;">mutable</span> <span style="color:#a08000;">n</span> = <span style="font-weight:bold;color:#1f377f;">n</span>
<span style="color:blue;">while</span> <span style="color:#a08000;">n</span> > 0 <span style="color:blue;">do</span>
<span style="color:#74531f;">printfn</span> <span style="color:#a31515;">"</span><span style="color:#2b91af;">%i</span><span style="color:#a31515;">"</span> <span style="font-weight:bold;color:#1f377f;">s</span>[<span style="color:#a08000;">n</span>]
<span style="color:#a08000;">n</span> <span style="color:blue;"><-</span> <span style="color:#a08000;">n</span> - <span style="font-weight:bold;color:#1f377f;">s</span>[<span style="color:#a08000;">n</span>]</pre>
</p>
<p>
Not that I'm not interested in more idiomatic implementations, but after all, they're <em>by definition</em> just implementation details, so first, I'll discuss encapsulation. Or, if you will, the usage perspective.
</p>
<h3 id="bb978144e56743639e83448c9b1d4f01">
Names and types <a href="#bb978144e56743639e83448c9b1d4f01">#</a>
</h3>
<p>
Based on the above two code snippets, we're given two artefacts: <code>cut</code> and <code>printSolution</code>. Since F# is a statically typed language, each operation also has a type.
</p>
<p>
The type of <code>cut</code> is <code>int array -> int -> int array * int array</code>. If you're not super-comfortable with F# type signatures, this means that <code>cut</code> is a function that takes an integer array and an integer as inputs, and returns a tuple as output. The output tuple is a pair; that is, it contains two elements, and in this particular case, both elements have the same type: They are both integer arrays.
</p>
<p>
Likewise, the type of <code>printSolution</code> is <code>int array -> int -> unit</code>, which again indicates that inputs must be an integer array and an integer. In this case the output is <code>unit</code>, which, in a sense, corresponds to <code>void</code> in many <a href="https://en.wikipedia.org/wiki/C_(programming_language)">C</a>-based languages.
</p>
<p>
Both operations belong to a module called <code>Rod</code>, so their slightly longer, more formal names are <code>Rod.cut</code> and <code>Rod.printSolution</code>. Even so, <a href="/2020/11/23/good-names-are-skin-deep">good names are only skin-deep</a>, and I'm not even convinced that these are particularly good names. To be fair to myself, I adopted the names from the pseudocode from <a href="/ref/clrs">Introduction to Algorithms</a>. Had I been freer to name function and design APIs, I might have chosen different names. As it is, currently, there's no documentation, so the types are the only source of additional information.
</p>
<p>
Can we infer proper usage from these types? Do they sufficiently well communicate preconditions, invariants, and postconditions? In other words, do the types satisfactorily indicate the <em>contract</em> of each operation? Do the functions exhibit good <a href="/encapsulation-and-solid">encapsulation</a>?
</p>
<p>
We may start with the <code>cut</code> function. It takes as inputs an integer array and an integer. Are empty arrays allowed? Are all integers valid, or perhaps only natural numbers? What about zeroes? Are duplicates allowed? Does the array need to be sorted? Is there a relationship between the array and the integer? Can the single integer parameter be negative?
</p>
<p>
And what about the return value? Are the two integer arrays related in any way? Can one be empty, but the other large? Can they both be empty? May negative numbers or zeroes be present?
</p>
<p>
Similar questions apply to the <code>printSolution</code> action.
</p>
<p>
<a href="/2022/08/22/can-types-replace-validation">Not all such questions can be answered by types</a>, but since we already have a type system at our disposal, we might as well use it to address those questions that are easily modelled.
</p>
<h3 id="2a9f41707fb9425da8078a7181f6e7d6">
Encapsulating the relationship between price array and rod length <a href="#2a9f41707fb9425da8078a7181f6e7d6">#</a>
</h3>
<p>
The first question I decided to answer was this: <em>Is there a relationship between the array and the integer?</em>
</p>
<p>
The array, you may recall, is an array of prices. The integer is the length of the rod to cut up.
</p>
<p>
A relationship clearly exists. The length of the rod must not exceed the length of the array. If it does, <code>cut</code> throws an <a href="https://learn.microsoft.com/dotnet/api/system.indexoutofrangeexception">IndexOutOfRangeException</a>. We can't calculate the optimal cuts if we lack price information.
</p>
<p>
Likewise, we can already infer that the length must be a non-negative number.
</p>
<p>
While we could choose to enforce this relationship with Guard Clauses, we may also consider a simpler API. Let the function infer the rod length from the array length.
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">cut</span> (<span style="font-weight:bold;color:#1f377f;">p</span> : _ <span style="color:#2b91af;">array</span>) =
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">n</span> = <span style="font-weight:bold;color:#1f377f;">p</span>.Length - 1
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">r</span> = <span style="color:#2b91af;">Array</span>.<span style="color:#74531f;">zeroCreate</span> (<span style="font-weight:bold;color:#1f377f;">n</span> + 1)
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">s</span> = <span style="color:#2b91af;">Array</span>.<span style="color:#74531f;">zeroCreate</span> (<span style="font-weight:bold;color:#1f377f;">n</span> + 1)
<span style="font-weight:bold;color:#1f377f;">r</span>[0] <span style="color:blue;"><-</span> 0
<span style="color:blue;">for</span> <span style="font-weight:bold;color:#1f377f;">j</span> = 1 <span style="color:blue;">to</span> <span style="font-weight:bold;color:#1f377f;">n</span> <span style="color:blue;">do</span>
<span style="color:blue;">let</span> <span style="color:blue;">mutable</span> <span style="color:#a08000;">q</span> = <span style="color:#2b91af;">Int32</span>.MinValue
<span style="color:blue;">for</span> <span style="font-weight:bold;color:#1f377f;">i</span> = 1 <span style="color:blue;">to</span> <span style="font-weight:bold;color:#1f377f;">j</span> <span style="color:blue;">do</span>
<span style="color:blue;">if</span> <span style="color:#a08000;">q</span> < <span style="font-weight:bold;color:#1f377f;">p</span>[<span style="font-weight:bold;color:#1f377f;">i</span>] + <span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">j</span> - <span style="font-weight:bold;color:#1f377f;">i</span>] <span style="color:blue;">then</span>
<span style="color:#a08000;">q</span> <span style="color:blue;"><-</span> <span style="font-weight:bold;color:#1f377f;">p</span>[<span style="font-weight:bold;color:#1f377f;">i</span>] + <span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">j</span> - <span style="font-weight:bold;color:#1f377f;">i</span>]
<span style="font-weight:bold;color:#1f377f;">s</span>[<span style="font-weight:bold;color:#1f377f;">j</span>] <span style="color:blue;"><-</span> <span style="font-weight:bold;color:#1f377f;">i</span>
<span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">j</span>] <span style="color:blue;"><-</span> <span style="color:#a08000;">q</span>
<span style="font-weight:bold;color:#1f377f;">r</span>, <span style="font-weight:bold;color:#1f377f;">s</span></pre>
</p>
<p>
You may argue that this API is more implicit, which <a href="https://peps.python.org/pep-0020/">we generally don't like</a>. The implication is that the rod length is determined by the array length. If you have a (one-indexed) price array of length <em>10</em>, then how do you calculate the optimal cuts for a rod of length <em>7?</em>
</p>
<p>
By shortening the price array:
</p>
<p>
<pre>> let p = [|0; 1; 5; 8; 9; 10; 17; 17; 20; 24; 30|];;
val p: int array = [|0; 1; 5; 8; 9; 10; 17; 17; 20; 24; 30|]
> cut (p |> Array.take (7 + 1));;
val it: int array * int array =
([|0; 1; 5; 8; 10; 13; 17; 18|], [|0; 1; 2; 3; 2; 2; 6; 1|])</pre>
</p>
<p>
This is clearly still sub-optimal. Notice, for example, how you need to add <code>1</code> to <code>7</code> in order to deal with the prefixed <code>0</code>. On the other hand, we're not done with the redesign, so it may be worth pursuing this course a little further.
</p>
<p>
(To be honest, while this is the direction I ultimately choose, I'm not blind to the disadvantages of this implicit design. It makes it less clear to a client developer how to indicate a rod length. An alternative design would keep the price array and the rod length as two separate parameters, but then introduce a Guard Clause to check that the rod length doesn't exceed the length of the price array. Outside of <a href="https://en.wikipedia.org/wiki/Dependent_type">dependent types</a> I can't think of a way to model such a relationship between two values, and I admit to having no practical experience with dependent types. All this said, however, it's also possible that I'm missing an obvious design alternative. If you can think of a way to model this relationship in a non-<a href="https://www.hillelwayne.com/post/constructive/">predicative</a> way, please <a href="https://github.com/ploeh/ploeh.github.com?tab=readme-ov-file#comments">write a comment</a>.)
</p>
<p>
I gave the <code>printSolution</code> the same treatment, after first having extracted a <code>solve</code> function in order to <a href="/2016/09/26/decoupling-decisions-from-effects">separate decisions from effects</a>.
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">solve</span> <span style="font-weight:bold;color:#1f377f;">p</span> =
<span style="color:blue;">let</span> _, <span style="font-weight:bold;color:#1f377f;">s</span> = <span style="color:#74531f;">cut</span> <span style="font-weight:bold;color:#1f377f;">p</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">l</span> = <span style="color:#2b91af;">ResizeArray</span> ()
<span style="color:blue;">let</span> <span style="color:blue;">mutable</span> <span style="color:#a08000;">n</span> = <span style="font-weight:bold;color:#1f377f;">p</span>.Length - 1
<span style="color:blue;">while</span> <span style="color:#a08000;">n</span> > 0 <span style="color:blue;">do</span>
<span style="font-weight:bold;color:#1f377f;">l</span>.<span style="font-weight:bold;color:#74531f;">Add</span> <span style="font-weight:bold;color:#1f377f;">s</span>[<span style="color:#a08000;">n</span>]
<span style="color:#a08000;">n</span> <span style="color:blue;"><-</span> <span style="color:#a08000;">n</span> - <span style="font-weight:bold;color:#1f377f;">s</span>[<span style="color:#a08000;">n</span>]
<span style="font-weight:bold;color:#1f377f;">l</span> |> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">ofSeq</span>
<span style="color:blue;">let</span> <span style="color:#74531f;">printSolution</span> <span style="font-weight:bold;color:#1f377f;">p</span> = <span style="color:#74531f;">solve</span> <span style="font-weight:bold;color:#1f377f;">p</span> |> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">iter</span> (<span style="color:#74531f;">printfn</span> <span style="color:#a31515;">"</span><span style="color:#2b91af;">%i</span><span style="color:#a31515;">"</span>)</pre>
</p>
<p>
The <em>implementation</em> of the <code>solve</code> function is still imperative, but if you view it as a black box, it's <a href="https://en.wikipedia.org/wiki/Referential_transparency">referentially transparent</a>. We'll get back to the implementation later.
</p>
<h3 id="01d4ef562a9d4552870ef093ae907f45">
Returning a list of cuts <a href="#01d4ef562a9d4552870ef093ae907f45">#</a>
</h3>
<p>
Let's return to all the questions I enumerated above, particularly the questions about the return value. Are the two integer arrays related?
</p>
<p>
Indeed they are! In fact, they have the same length.
</p>
<p>
As explained in the <a href="/2024/12/23/implementing-rod-cutting">previous article</a>, in the original pseudocode, the <code>r</code> array is supposed to be zero-indexed, but non-empty and containing <code>0</code> as the first element. The <code>s</code> array is supposed to be one-indexed, and be exactly one element shorter than the <code>r</code> array. In practice, in all three implementations shown in that article, I made both arrays zero-indexed, non-empty, and of the exact same length. This is also true for the F# implementation.
</p>
<p>
We can communicate this relationship much better to client developers by changing the return type of the <code>cut</code> function. Currently, the return type is <code>int array * int array</code>, indicating a pair of arrays. Instead, we can change the return type to an array of pairs, thereby indicating that the values are related two-and-two.
</p>
<p>
That would be a decent change, but we can further improve the API. A pair of integers are still implicit, because it isn't clear which integer represents the revenue and which one represents the size. Instead, we introduce a custom type with clear labels:
</p>
<p>
<pre><span style="color:blue;">type</span> <span style="color:#2b91af;">Cut</span> = { Revenue : <span style="color:#2b91af;">int</span>; Size : <span style="color:#2b91af;">int</span> }</pre>
</p>
<p>
Then we change the <code>cut</code> function to return a collection of <code>Cut</code> values:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">cut</span> (<span style="font-weight:bold;color:#1f377f;">p</span> : _ <span style="color:#2b91af;">array</span>) =
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">n</span> = <span style="font-weight:bold;color:#1f377f;">p</span>.Length - 1
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">r</span> = <span style="color:#2b91af;">Array</span>.<span style="color:#74531f;">zeroCreate</span> (<span style="font-weight:bold;color:#1f377f;">n</span> + 1)
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">s</span> = <span style="color:#2b91af;">Array</span>.<span style="color:#74531f;">zeroCreate</span> (<span style="font-weight:bold;color:#1f377f;">n</span> + 1)
<span style="font-weight:bold;color:#1f377f;">r</span>[0] <span style="color:blue;"><-</span> 0
<span style="color:blue;">for</span> <span style="font-weight:bold;color:#1f377f;">j</span> = 1 <span style="color:blue;">to</span> <span style="font-weight:bold;color:#1f377f;">n</span> <span style="color:blue;">do</span>
<span style="color:blue;">let</span> <span style="color:blue;">mutable</span> <span style="color:#a08000;">q</span> = <span style="color:#2b91af;">Int32</span>.MinValue
<span style="color:blue;">for</span> <span style="font-weight:bold;color:#1f377f;">i</span> = 1 <span style="color:blue;">to</span> <span style="font-weight:bold;color:#1f377f;">j</span> <span style="color:blue;">do</span>
<span style="color:blue;">if</span> <span style="color:#a08000;">q</span> < <span style="font-weight:bold;color:#1f377f;">p</span>[<span style="font-weight:bold;color:#1f377f;">i</span>] + <span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">j</span> - <span style="font-weight:bold;color:#1f377f;">i</span>] <span style="color:blue;">then</span>
<span style="color:#a08000;">q</span> <span style="color:blue;"><-</span> <span style="font-weight:bold;color:#1f377f;">p</span>[<span style="font-weight:bold;color:#1f377f;">i</span>] + <span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">j</span> - <span style="font-weight:bold;color:#1f377f;">i</span>]
<span style="font-weight:bold;color:#1f377f;">s</span>[<span style="font-weight:bold;color:#1f377f;">j</span>] <span style="color:blue;"><-</span> <span style="font-weight:bold;color:#1f377f;">i</span>
<span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">j</span>] <span style="color:blue;"><-</span> <span style="color:#a08000;">q</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">result</span> = <span style="color:#2b91af;">ResizeArray</span> ()
<span style="color:blue;">for</span> <span style="font-weight:bold;color:#1f377f;">i</span> = 0 <span style="color:blue;">to</span> <span style="font-weight:bold;color:#1f377f;">n</span> <span style="color:blue;">do</span>
<span style="font-weight:bold;color:#1f377f;">result</span>.<span style="font-weight:bold;color:#74531f;">Add</span> { Revenue = <span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">i</span>]; Size = <span style="font-weight:bold;color:#1f377f;">s</span>[<span style="font-weight:bold;color:#1f377f;">i</span>] }
<span style="font-weight:bold;color:#1f377f;">result</span> |> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">ofSeq</span></pre>
</p>
<p>
The type of <code>cut</code> is now <code>int array -> Cut list</code>. Notice that I decided to return a linked list rather than an array. This is mostly because I consider linked lists to be more idiomatic than arrays in a context of functional programming (FP), but to be honest, I'm not sure that it makes much difference as a return value.
</p>
<p>
In any case, you'll observe that the implementation is still imperative. The main topic of this article is how to give an API good encapsulation, so I treat the actual code as an implementation detail. It's not the most important thing.
</p>
<h3 id="8dca0872ab584d0ebefc10200877adde">
Linked list input <a href="#8dca0872ab584d0ebefc10200877adde">#</a>
</h3>
<p>
Although I wrote that I'm not sure it makes much difference whether <code>cut</code> returns an array or a list, it does matter when it comes to input values. Currently, <code>cut</code> takes an <code>int array</code> as input.
</p>
<p>
As the implementation so amply demonstrates, F# arrays are mutable; you can mutate the cells of an array. A client developer may worry, then, whether <code>cut</code> modifies the input array.
</p>
<p>
From the implementation code we know that it doesn't, but encapsulation is all about sparing client developers the burden of having to read the implementation. Rather, an API should communicate its contract in as succinct a way as possible, either via documentation or the type system.
</p>
<p>
In this case, we can use the type system to communicate this postcondition. Changing the input type to a linked list effectively communicates to all users of the API that <code>cut</code> doesn't mutate the input. This is because F# linked lists are truly immutable.
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">cut</span> <span style="font-weight:bold;color:#1f377f;">prices</span> =
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">p</span> = <span style="font-weight:bold;color:#1f377f;">prices</span> |> <span style="color:#2b91af;">Array</span>.<span style="color:#74531f;">ofList</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">n</span> = <span style="font-weight:bold;color:#1f377f;">p</span>.Length - 1
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">r</span> = <span style="color:#2b91af;">Array</span>.<span style="color:#74531f;">zeroCreate</span> (<span style="font-weight:bold;color:#1f377f;">n</span> + 1)
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">s</span> = <span style="color:#2b91af;">Array</span>.<span style="color:#74531f;">zeroCreate</span> (<span style="font-weight:bold;color:#1f377f;">n</span> + 1)
<span style="font-weight:bold;color:#1f377f;">r</span>[0] <span style="color:blue;"><-</span> 0
<span style="color:blue;">for</span> <span style="font-weight:bold;color:#1f377f;">j</span> = 1 <span style="color:blue;">to</span> <span style="font-weight:bold;color:#1f377f;">n</span> <span style="color:blue;">do</span>
<span style="color:blue;">let</span> <span style="color:blue;">mutable</span> <span style="color:#a08000;">q</span> = <span style="color:#2b91af;">Int32</span>.MinValue
<span style="color:blue;">for</span> <span style="font-weight:bold;color:#1f377f;">i</span> = 1 <span style="color:blue;">to</span> <span style="font-weight:bold;color:#1f377f;">j</span> <span style="color:blue;">do</span>
<span style="color:blue;">if</span> <span style="color:#a08000;">q</span> < <span style="font-weight:bold;color:#1f377f;">p</span>[<span style="font-weight:bold;color:#1f377f;">i</span>] + <span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">j</span> - <span style="font-weight:bold;color:#1f377f;">i</span>] <span style="color:blue;">then</span>
<span style="color:#a08000;">q</span> <span style="color:blue;"><-</span> <span style="font-weight:bold;color:#1f377f;">p</span>[<span style="font-weight:bold;color:#1f377f;">i</span>] + <span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">j</span> - <span style="font-weight:bold;color:#1f377f;">i</span>]
<span style="font-weight:bold;color:#1f377f;">s</span>[<span style="font-weight:bold;color:#1f377f;">j</span>] <span style="color:blue;"><-</span> <span style="font-weight:bold;color:#1f377f;">i</span>
<span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">j</span>] <span style="color:blue;"><-</span> <span style="color:#a08000;">q</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">result</span> = <span style="color:#2b91af;">ResizeArray</span> ()
<span style="color:blue;">for</span> <span style="font-weight:bold;color:#1f377f;">i</span> = 0 <span style="color:blue;">to</span> <span style="font-weight:bold;color:#1f377f;">n</span> <span style="color:blue;">do</span>
<span style="font-weight:bold;color:#1f377f;">result</span>.<span style="font-weight:bold;color:#74531f;">Add</span> { Revenue = <span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">i</span>]; Size = <span style="font-weight:bold;color:#1f377f;">s</span>[<span style="font-weight:bold;color:#1f377f;">i</span>] }
<span style="font-weight:bold;color:#1f377f;">result</span> |> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">ofSeq</span></pre>
</p>
<p>
The type of the <code>cut</code> function is now <code>int list -> Cut list</code>, which informs client developers of an invariant. You can trust that <code>cut</code> will not change the input arguments.
</p>
<h3 id="fe67c3b6121e4be780bc3d7f3b166a00">
Natural numbers <a href="#fe67c3b6121e4be780bc3d7f3b166a00">#</a>
</h3>
<p>
You've probably gotten the point by now, so let's move a bit quicker. There are still issues that we'd like to document. Perhaps the worst part of the current API is that client code is required to supply a <code>prices</code> list where the first element <em>must</em> be zero. That's a very specific requirement. It's easy to forget, and if you do, the <code>cut</code> function just silently fails. It doesn't throw an exception; it just gives you a wrong answer.
</p>
<p>
We may choose to add a Guard Clause, but why are we even putting that responsibility on the client developer? Why can't the <code>cut</code> function add that prefix itself? It can, and it turns out that once you do that, and also remove the initial zero element from the output, you're now working with natural numbers.
</p>
<p>
First, add a <code>NaturalNumber</code> wrapper of integers:
</p>
<p>
<pre>type <span style="color:#2b91af;">NaturalNumber</span> = private <span style="color:#2b91af;">NaturalNumber</span> of <span style="color:#2b91af;">int</span> with
member <span style="font-weight:bold;color:#1f377f;">this</span>.Value = let (<span style="color:#2b91af;">NaturalNumber</span> <span style="font-weight:bold;color:#1f377f;">i</span>) = <span style="font-weight:bold;color:#1f377f;">this</span> in <span style="font-weight:bold;color:#1f377f;">i</span>
static member <span style="font-weight:bold;color:#74531f;">tryCreate</span> <span style="font-weight:bold;color:#1f377f;">candidate</span> =
if <span style="font-weight:bold;color:#1f377f;">candidate</span> < 1 then <span style="color:#2b91af;">None</span> else <span style="color:#2b91af;">Some</span> <| <span style="color:#2b91af;">NaturalNumber</span> <span style="font-weight:bold;color:#1f377f;">candidate</span>
override <span style="font-weight:bold;color:#1f377f;">this</span>.<span style="font-weight:bold;color:#74531f;">ToString</span> () = let (<span style="color:#2b91af;">NaturalNumber</span> <span style="font-weight:bold;color:#1f377f;">i</span>) = <span style="font-weight:bold;color:#1f377f;">this</span> in <span style="color:#74531f;">string</span> <span style="font-weight:bold;color:#1f377f;">i</span></pre>
</p>
<p>
Since the case constructor is <code>private</code>, external code can only <em>try</em> to create values. Once you have a <code>NaturalNumber</code> value, you know that it's valid, but creation requires a run-time check. In other words, this is what Hillel Wayne calls <a href="https://www.hillelwayne.com/post/constructive/">predicative data</a>.
</p>
<p>
Armed with this new type, however, we can now strengthen the definition of the <code>Cut</code> record type:
</p>
<p>
<pre><span style="color:blue;">type</span> <span style="color:#2b91af;">Cut</span> = { Revenue : <span style="color:#2b91af;">int</span>; Size : <span style="color:#2b91af;">NaturalNumber</span> } <span style="color:blue;">with</span>
<span style="color:blue;">static</span> <span style="color:blue;">member</span> <span style="font-weight:bold;color:#74531f;">tryCreate</span> <span style="font-weight:bold;color:#1f377f;">revenue</span> <span style="font-weight:bold;color:#1f377f;">size</span> =
<span style="color:#2b91af;">NaturalNumber</span>.<span style="font-weight:bold;color:#74531f;">tryCreate</span> <span style="font-weight:bold;color:#1f377f;">size</span>
|> <span style="color:#2b91af;">Option</span>.<span style="color:#74531f;">map</span> (<span style="color:blue;">fun</span> <span style="font-weight:bold;color:#1f377f;">size</span> <span style="color:blue;">-></span> { Revenue = <span style="font-weight:bold;color:#1f377f;">revenue</span>; Size = <span style="font-weight:bold;color:#1f377f;">size</span> })</pre>
</p>
<p>
The <code>Revenue</code> may still be any integer, because it turns out that the algorithm also works with negative prices. (For a book that's very meticulous in its analysis of algorithms, <a href="/ref/clrs">CLRS</a> is surprisingly silent on this topic. Thorough testing with <a href="https://github.com/hedgehogqa/fsharp-hedgehog">Hedgehog</a>, however, indicates that this is so.) On the other hand, the <code>Size</code> of the <code>Cut</code> must be a <code>NaturalNumber</code>. Since, again, we don't have any constructive way (outside of using <a href="https://en.wikipedia.org/wiki/Refinement_type">refinement types</a>) of modelling this requirement, we also supply a <code>tryCreate</code> function.
</p>
<p>
This enables us to define the <code>cut</code> function like this:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">cut</span> <span style="font-weight:bold;color:#1f377f;">prices</span> =
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">p</span> = <span style="font-weight:bold;color:#1f377f;">prices</span> |> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">append</span> [0] |> <span style="color:#2b91af;">Array</span>.<span style="color:#74531f;">ofList</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">n</span> = <span style="font-weight:bold;color:#1f377f;">p</span>.Length - 1
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">r</span> = <span style="color:#2b91af;">Array</span>.<span style="color:#74531f;">zeroCreate</span> (<span style="font-weight:bold;color:#1f377f;">n</span> + 1)
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">s</span> = <span style="color:#2b91af;">Array</span>.<span style="color:#74531f;">zeroCreate</span> (<span style="font-weight:bold;color:#1f377f;">n</span> + 1)
<span style="font-weight:bold;color:#1f377f;">r</span>[0] <span style="color:blue;"><-</span> 0
<span style="color:blue;">for</span> <span style="font-weight:bold;color:#1f377f;">j</span> = 1 <span style="color:blue;">to</span> <span style="font-weight:bold;color:#1f377f;">n</span> <span style="color:blue;">do</span>
<span style="color:blue;">let</span> <span style="color:blue;">mutable</span> <span style="color:#a08000;">q</span> = <span style="color:#2b91af;">Int32</span>.MinValue
<span style="color:blue;">for</span> <span style="font-weight:bold;color:#1f377f;">i</span> = 1 <span style="color:blue;">to</span> <span style="font-weight:bold;color:#1f377f;">j</span> <span style="color:blue;">do</span>
<span style="color:blue;">if</span> <span style="color:#a08000;">q</span> < <span style="font-weight:bold;color:#1f377f;">p</span>[<span style="font-weight:bold;color:#1f377f;">i</span>] + <span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">j</span> - <span style="font-weight:bold;color:#1f377f;">i</span>] <span style="color:blue;">then</span>
<span style="color:#a08000;">q</span> <span style="color:blue;"><-</span> <span style="font-weight:bold;color:#1f377f;">p</span>[<span style="font-weight:bold;color:#1f377f;">i</span>] + <span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">j</span> - <span style="font-weight:bold;color:#1f377f;">i</span>]
<span style="font-weight:bold;color:#1f377f;">s</span>[<span style="font-weight:bold;color:#1f377f;">j</span>] <span style="color:blue;"><-</span> <span style="font-weight:bold;color:#1f377f;">i</span>
<span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">j</span>] <span style="color:blue;"><-</span> <span style="color:#a08000;">q</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">result</span> = <span style="color:#2b91af;">ResizeArray</span> ()
<span style="color:blue;">for</span> <span style="font-weight:bold;color:#1f377f;">i</span> = 1 <span style="color:blue;">to</span> <span style="font-weight:bold;color:#1f377f;">n</span> <span style="color:blue;">do</span>
<span style="color:#2b91af;">Cut</span>.<span style="font-weight:bold;color:#74531f;">tryCreate</span> <span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">i</span>] <span style="font-weight:bold;color:#1f377f;">s</span>[<span style="font-weight:bold;color:#1f377f;">i</span>] |> <span style="color:#2b91af;">Option</span>.<span style="color:#74531f;">iter</span> <span style="font-weight:bold;color:#1f377f;">result</span>.<span style="font-weight:bold;color:#74531f;">Add</span>
<span style="font-weight:bold;color:#1f377f;">result</span> |> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">ofSeq</span></pre>
</p>
<p>
It still has the type <code>int list -> Cut list</code>, but the <code>Cut</code> type is now more restrictively designed. In other words, we've provided a more conservative definition of what we return, in keeping with <a href="https://en.wikipedia.org/wiki/Robustness_principle">Postel's law</a>.
</p>
<p>
Furthermore, notice that the first line prepends <code>0</code> to the <code>p</code> array, so that the client developer doesn't have to do that. Likewise, when returning the result, the <code>for</code> loop goes from <code>1</code> to <code>n</code>, which means that it omits the first zero cut.
</p>
<p>
These changes ripple through and also improves encapsulation of the <code>solve</code> function:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">solve</span> <span style="font-weight:bold;color:#1f377f;">prices</span> =
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">cuts</span> = <span style="color:#74531f;">cut</span> <span style="font-weight:bold;color:#1f377f;">prices</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">l</span> = <span style="color:#2b91af;">ResizeArray</span> ()
<span style="color:blue;">let</span> <span style="color:blue;">mutable</span> <span style="color:#a08000;">n</span> = <span style="font-weight:bold;color:#1f377f;">prices</span>.Length
<span style="color:blue;">while</span> <span style="color:#a08000;">n</span> > 0 <span style="color:blue;">do</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">idx</span> = <span style="color:#a08000;">n</span> - 1
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">s</span> = <span style="font-weight:bold;color:#1f377f;">cuts</span>.[<span style="font-weight:bold;color:#1f377f;">idx</span>].Size
<span style="font-weight:bold;color:#1f377f;">l</span>.<span style="font-weight:bold;color:#74531f;">Add</span> <span style="font-weight:bold;color:#1f377f;">s</span>
<span style="color:#a08000;">n</span> <span style="color:blue;"><-</span> <span style="color:#a08000;">n</span> - <span style="font-weight:bold;color:#1f377f;">s</span>.Value
<span style="font-weight:bold;color:#1f377f;">l</span> |> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">ofSeq</span></pre>
</p>
<p>
The type of <code>solve</code> is now <code>int list -> NaturalNumber list</code>.
</p>
<p>
This is about as strong as I can think of making the API using F#'s type system. A type like <code>int list -> NaturalNumber list</code> tells you something about what you're allowed to do, what you're expected to do, and what you can expect in return. You can provide (almost) any list of integers, both positive, zero, or negative. You may also give an empty list. If we had wanted to prevent that, we could have used a <code>NonEmpty</code> list, as seen (among other places) in the article <a href="/2024/05/06/conservative-codomain-conjecture">Conservative codomain conjecture</a>.
</p>
<p>
Okay, to be perfectly honest, there's one more change that might be in order, but this is where I ran out of steam. One remaining precondition that I haven't yet discussed is that the input list must not contain 'too big' numbers. The problem is that the algorithm adds numbers together, and since 32-bit integers are bounded, you could run into overflow situations. Ask me how I know.
</p>
<p>
Changing the types to use 64-bit integers doesn't solve that problem (it only moves the boundary of where overflow happens), but consistently changing the API to work with <a href="https://learn.microsoft.com/dotnet/api/system.numerics.biginteger">BigInteger</a> values might. To be honest, I haven't tried.
</p>
<h3 id="641bc16e730542a1a4a231886d208f24">
Functional programming <a href="#641bc16e730542a1a4a231886d208f24">#</a>
</h3>
<p>
From an encapsulation perspective, we're done now. By using the type system, we've emphasized how to <em>use</em> the API, rather than how it's implemented. Along the way, we even hid away some warts that came with the implementation. If I wanted to take this further, I would seriously consider making the <code>cut</code> function a <code>private</code> helper function, because it doesn't really return a solution. It only returns an intermediary value that makes it easier for the <code>solve</code> function to return the actual solution.
</p>
<p>
If you're even just a little bit familiar with F# or functional programming, you may have found it painful to read this far. <em>All that imperative code. My eyes! For the love of God, please rewrite the implementation with proper FP idioms and patterns.</em>
</p>
<p>
Well, the point of the whole article is that the implementation doesn't really matter. It's how client code may <em>use</em> the API that's important.
</p>
<p>
That is, of course, until you have to go and change the implementation code. In any case, as a little consolation prize for those brave FP readers who've made it all this way, here follows more functional implementations of the functions.
</p>
<p>
The <code>NaturalNumber</code> and <code>Cut</code> types haven't changed, so the first change comes with the <code>cut</code> function:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:blue;">private</span> <span style="color:#74531f;">cons</span> <span style="font-weight:bold;color:#1f377f;">x</span> <span style="font-weight:bold;color:#1f377f;">xs</span> = <span style="font-weight:bold;color:#1f377f;">x</span> <span style="color:#2b91af;">::</span> <span style="font-weight:bold;color:#1f377f;">xs</span>
<span style="color:blue;">let</span> <span style="color:#74531f;">cut</span> <span style="font-weight:bold;color:#1f377f;">prices</span> =
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">p</span> = 0 <span style="color:#2b91af;">::</span> <span style="font-weight:bold;color:#1f377f;">prices</span> |> <span style="color:#2b91af;">Array</span>.<span style="color:#74531f;">ofList</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">n</span> = <span style="font-weight:bold;color:#1f377f;">p</span>.Length - 1
<span style="color:blue;">let</span> <span style="color:#74531f;">findBestCut</span> <span style="font-weight:bold;color:#1f377f;">revenues</span> <span style="font-weight:bold;color:#1f377f;">j</span> =
[1..<span style="font-weight:bold;color:#1f377f;">j</span>]
|> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">map</span> (<span style="color:blue;">fun</span> <span style="font-weight:bold;color:#1f377f;">i</span> <span style="color:blue;">-></span> <span style="font-weight:bold;color:#1f377f;">p</span>[<span style="font-weight:bold;color:#1f377f;">i</span>] + <span style="color:#2b91af;">Map</span>.<span style="color:#74531f;">find</span> (<span style="font-weight:bold;color:#1f377f;">j</span> - <span style="font-weight:bold;color:#1f377f;">i</span>) <span style="font-weight:bold;color:#1f377f;">revenues</span>, <span style="font-weight:bold;color:#1f377f;">i</span>)
|> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">maxBy</span> <span style="color:#74531f;">fst</span>
<span style="color:blue;">let</span> <span style="color:#74531f;">aggregate</span> <span style="font-weight:bold;color:#1f377f;">acc</span> <span style="font-weight:bold;color:#1f377f;">j</span> =
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">revenues</span> = <span style="color:#74531f;">snd</span> <span style="font-weight:bold;color:#1f377f;">acc</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">q</span>, <span style="font-weight:bold;color:#1f377f;">i</span> = <span style="color:#74531f;">findBestCut</span> <span style="font-weight:bold;color:#1f377f;">revenues</span> <span style="font-weight:bold;color:#1f377f;">j</span>
<span style="color:blue;">let</span> <span style="color:#74531f;">cuts</span> = <span style="color:#74531f;">fst</span> <span style="font-weight:bold;color:#1f377f;">acc</span>
<span style="color:#74531f;">cuts</span> << (<span style="color:#74531f;">cons</span> (<span style="font-weight:bold;color:#1f377f;">q</span>, <span style="font-weight:bold;color:#1f377f;">i</span>)), <span style="color:#2b91af;">Map</span>.<span style="color:#74531f;">add</span> <span style="font-weight:bold;color:#1f377f;">revenues</span>.Count <span style="font-weight:bold;color:#1f377f;">q</span> <span style="font-weight:bold;color:#1f377f;">revenues</span>
[1..<span style="font-weight:bold;color:#1f377f;">n</span>]
|> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">fold</span> <span style="color:#74531f;">aggregate</span> (<span style="color:#74531f;">id</span>, <span style="color:#2b91af;">Map</span>.<span style="color:#74531f;">add</span> 0 0 <span style="color:#2b91af;">Map</span>.empty)
|> <span style="color:#74531f;">fst</span> <| [] <span style="color:green;">// Evaluate Hughes list</span>
|> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">choose</span> (<span style="color:blue;">fun</span> (<span style="font-weight:bold;color:#1f377f;">r</span>, <span style="font-weight:bold;color:#1f377f;">i</span>) <span style="color:blue;">-></span> <span style="color:#2b91af;">Cut</span>.<span style="font-weight:bold;color:#74531f;">tryCreate</span> <span style="font-weight:bold;color:#1f377f;">r</span> <span style="font-weight:bold;color:#1f377f;">i</span>)</pre>
</p>
<p>
Even here, however, some implementation choices are dubious at best. For instance, I decided to use a Hughes list or difference list (see <a href="/2015/12/22/tail-recurse">Tail Recurse</a> for a detailed explanation of how this works in F#) without measuring whether or not it was better than just using normal <em>list consing</em> followed by <code>List.rev</code> (which is, in fact, often faster). That's one of the advantages of writing code for articles; such things don't really matter that much in that context.
</p>
<p>
Another choice that may leave you scratching your head is that I decided to model the <code>revenues</code> as a map (that is, an immutable dictionary) rather than an array. I did this because I was concerned that with the move towards immutable code, I'd have <code>n</code> reallocations of arrays. Perhaps, I thought, adding incrementally to a <code>Map</code> structure would be more efficient.
</p>
<p>
But really, all of that is just wanking, because I haven't measured.
</p>
<p>
The FP-style implementation of <code>solve</code> is, I believe, less controversial:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">solve</span> <span style="font-weight:bold;color:#1f377f;">prices</span> =
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">cuts</span> = <span style="color:#74531f;">cut</span> <span style="font-weight:bold;color:#1f377f;">prices</span>
<span style="color:blue;">let</span> <span style="color:blue;">rec</span> <span style="color:#74531f;">imp</span> <span style="font-weight:bold;color:#1f377f;">n</span> =
<span style="color:blue;">if</span> <span style="font-weight:bold;color:#1f377f;">n</span> <= 0 <span style="color:blue;">then</span> [] <span style="color:blue;">else</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">idx</span> = <span style="font-weight:bold;color:#1f377f;">n</span> - 1
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">s</span> = <span style="font-weight:bold;color:#1f377f;">cuts</span>[<span style="font-weight:bold;color:#1f377f;">idx</span>].Size
<span style="font-weight:bold;color:#1f377f;">s</span> <span style="color:#2b91af;">::</span> <span style="color:#74531f;">imp</span> (<span style="font-weight:bold;color:#1f377f;">n</span> - <span style="font-weight:bold;color:#1f377f;">s</span>.Value)
<span style="color:#74531f;">imp</span> <span style="font-weight:bold;color:#1f377f;">prices</span>.Length</pre>
</p>
<p>
This is a fairly standard implementation using a local recursive helper function.
</p>
<p>
Both <code>cut</code> and <code>solve</code> have the types previously reported. In other words, this final refactoring to functional implementations didn't change their types.
</p>
<h3 id="c009b6e42470466c9556f52a7c5af175">
Conclusion <a href="#c009b6e42470466c9556f52a7c5af175">#</a>
</h3>
<p>
This article goes through a series of code improvements to illustrate how a static type system can make it easier to use an API. Use it <em>correctly</em>, that is.
</p>
<p>
There's a common misconception about ease of use that it implies typing fewer characters, or getting instant <a href="/2024/05/13/gratification">gratification</a>. That's not my position. <a href="/2018/09/17/typing-is-not-a-programming-bottleneck">Typing is not a bottleneck</a>, and in any case, not much is gained if you make it easier for client developers to get the wrong answers from your API.
</p>
<p>
Static types gives you a consistent vocabulary you can use to communicate an API's contract to client developers. What must client code do in order to make a valid method or function call? What guarantees can client code rely on? <a href="/encapsulation-and-solid">Encapsulation</a>, in other words.
</p>
<ins datetime="2025-01-20">
<p>
<strong>P.S. 2025-01-20:</strong>
</p>
<p>
For a type-level technique for modelling the relationship between rod size and price list, see <a href="/2025/01/20/modelling-data-relationships-with-f-types">Modelling data relationships with F# types</a>.
</p>
</ins>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Pytest is fasthttps://blog.ploeh.dk/2024/12/30/pytest-is-fast2024-12-30T16:01:00+00:00Mark Seemann
<div id="post">
<p>
<em>One major attraction of Python. A recent realization.</em>
</p>
<p>
Ever since I became aware of the distinction between statically and dynamically typed languages, I've struggled to understand the attraction of dynamically typed languages. As regular readers may have noticed, this is <a href="/2021/08/09/am-i-stuck-in-a-local-maximum">a bias that doesn't sit well with me</a>. Clearly, there are advantages to dynamic languages that I fail to notice. Is it <a href="/2024/12/09/implementation-and-usage-mindsets">a question of mindset</a>? Or is it a combination of several small advantages?
</p>
<p>
In this article, I'll discuss another potential benefit of at least one dynamically typed language, <a href="https://www.python.org/">Python</a>.
</p>
<h3 id="c9d1927a5f6e4df0bdb1e71766f37d1f">
Fast feedback <a href="#c9d1927a5f6e4df0bdb1e71766f37d1f">#</a>
</h3>
<p>
Rapid feedback is a cornerstone of <a href="/ref/modern-software-engineering">modern software engineering</a>. I've always considered the <a href="/2011/04/29/Feedbackmechanismsandtradeoffs">feedback from the compiler an important mechanism</a>, but I've recently begun to realize that it comes with a price. While a good type system keeps you honest, compilation takes time, too.
</p>
<p>
Since I've been so entrenched in the camp of statically typed languages (C#, <a href="https://fsharp.org/">F#</a>, <a href="https://www.haskell.org/">Haskell</a>), I've tended to regard compilation as a mandatory step. And since the compiler needs to run anyway, you might as well take advantage of it. Use the type system to <a href="https://blog.janestreet.com/effective-ml-video/">make illegal states unrepresentable</a>, and all that.
</p>
<p>
Even so, I've noticed that compilation time isn't a fixed size. This observation surely borders on the banal, but with sufficient cognitive bias, it can, apparently, take years to come to even such a trivial conclusion. After initial years with various programming languages, my formative years as a programmer were spent with C#. As it turns out, the C# compiler is relatively fast.
</p>
<p>
This is probably a combination of factors. Since C# is a one of the most popular languages, it has a big and skilled engineering team, and it's my impression that much effort goes into making it as fast and efficient as possible.
</p>
<p>
I also speculate that, since the C# type system isn't as powerful as F#'s or Haskell's, there's simply work that it can't do. When you can't expression certain constraints or relationships with the type system, the compiler can't check them, either.
</p>
<p>
That said, the C# compiler seems to have become slower over the years. This could be a consequence of all the extra language features that accumulate.
</p>
<p>
The F# compiler, in comparison, has always taken longer than the C# compiler. Again, this may be due to a combination of a smaller engineering team and that it actually <em>can</em> check more things at compile time, since the type system is more expressive.
</p>
<p>
This, at least, seems to fit with the observation that the Haskell compiler is even slower than F#. The language is incredibly expressive. There's a lot of constraints and relationships that you can model with the type system. Clearly, the compiler has to perform extra work to check that your types line up.
</p>
<p>
You're often left with the impression that <em>if it compiles, it works</em>. The drawback is that getting Haskell code to compile may be a non-trivial undertaking.
</p>
<p>
One thing is that you'll have to wait for the compiler. Another is that if you practice test-driven development (TDD), you'll have to compile the test code, too. Only once the tests are compiled can you run them. And <a href="/2012/05/24/TDDtestsuitesshouldrunin10secondsorless">TDD test suites should run in 10 seconds or less</a>.
</p>
<h3 id="c156a0786a0940a29d37d9982881d5d5">
Skipping compilation with pytest <a href="#c156a0786a0940a29d37d9982881d5d5">#</a>
</h3>
<p>
A few years ago I had to learn a bit of Python, so I decided to try <a href="https://adventofcode.com/2022">Advent of Code 2022</a> in Python. As the puzzles got harder, I added unit tests with <a href="https://pytest.org/">pytest</a>. When I ran them, I was taken aback at how fast they ran.
</p>
<p>
There's no compilation step, so the test suite runs immediately. Obviously, if you've made a mistake that a compiler would have caught, the test fails, but if the code makes sense to the interpreter, it just runs.
</p>
<p>
For various reasons, I ran out of steam, as one does with Advent of Code, but I managed to write a good little test suite. Until day 17, it ran in 0.15-0.20 seconds on my little laptop. To be honest, though, once I added tests for day 17, feedback time jumped to just under two seconds. This is clearly because I'd written some inefficient code for my System Under Test.
</p>
<p>
I can't really blame a test framework for being slow, when it's really my own code that slows it down.
</p>
<p>
A counter-argument is that a compiled language is much faster than an interpreted one. Thus, one might think that a faster language would counter poor implementations. Not so.
</p>
<h3 id="c97e1f3c539a4ef78d2a07f25e9c6b0c">
TDD with Haskell <a href="#c97e1f3c539a4ef78d2a07f25e9c6b0c">#</a>
</h3>
<p>
As I've already outlined, the Haskell compiler takes more time than C#, and obviously it takes more time than a language that isn't compiled at all. On the other hand, Haskell compiles to native machine code. My experience with it is that once you've compiled your program, it's <em>fast</em>.
</p>
<p>
In order to compare the two styles, I decided to record compilation and test times while doing <a href="https://adventofcode.com/">Advent of Code 2024</a> in Haskell. I set up a Haskell code base with <a href="https://haskellstack.org/">Stack</a> and <a href="https://hackage.haskell.org/package/HUnit">HUnit</a>, as <a href="/2018/05/07/inlined-hunit-test-lists">I usually do</a>. As I worked through the puzzles, I'd be adding and running tests. Every time I recorded the time it took, using the <a href="https://en.wikipedia.org/wiki/Time_(Unix)">time</a> command to measure the time it took for <code>stack test</code> to run.
</p>
<p>
I've plotted the observations in this chart:
</p>
<p>
<img src="/content/binary/haskell-compile-and-test-times.png" alt="Scatter plot of more than a thousand compile-and-test times, measured in seconds.">
</p>
<p>
The chart shows more than a thousand observations, with the first to the left, and the latest to the right. The times recorded are the entire time it took from I started a test run until I had an answer. For this, I used the time command's <em>real</em> time measurement, rather than <em>user</em> or <em>sys</em> time. What matters is the feedback time; not the CPU time.
</p>
<p>
Each measurement is in seconds. The dashed orange line indicates the linear trend.
</p>
<p>
It's not the first time I've written Haskell code, so I knew what to expect. While you get the occasional fast turnaround time, it easily takes around ten seconds to compile even an empty code base. It seems that there's a constant overhead of that size. While there's an upward trend line as I added more and more code, and more tests, actually running the tests takes almost no time. The initial 'average' feedback time was around eight seconds, and 1100 observations later, the trends sits around 11.5 seconds. At this time, I had more than 200 test cases.
</p>
<p>
You may also notice that the observations vary quite a bit. You occasionally see sub-second times, but also turnaround times over thirty seconds. There's an explanation for both.
</p>
<p>
The sub-second times usually happen if I run the test suite twice without changing any code. In that case, the Haskell Stack correctly skips recompiling the code and instead just reruns the tests. This only highlights that I'm not waiting for the tests to execute. The tests are fast. It's the compiler that causes over 90% of the delay.
</p>
<p>
(Why would I rerun a test suite without changing any code? This mostly happens when I take a break from programming, or if I get distracted by another task. In such cases, when I return to the code, I usually run the test suite in order to remind myself of the state in which I left it. Sometimes, it turns out, I'd left the code in a state were the last thing I did was to run all tests.)
</p>
<p>
The other extremes have a different explanation.
</p>
<h3 id="8fa24383e06349718fca6cb70f95c98f">
IDE woes <a href="#8fa24383e06349718fca6cb70f95c98f">#</a>
</h3>
<p>
Why do I have to suffer through those turnaround times over twenty seconds? A few times over thirty?
</p>
<p>
The short answer is that these represent complete rebuilds. Most of these are caused by problems with the <a href="https://en.wikipedia.org/wiki/Integrated_development_environment">IDE</a>. For Haskell development, I use <a href="https://code.visualstudio.com/">Visual Studio Code</a> with the <a href="https://marketplace.visualstudio.com/items?itemName=haskell.haskell">Haskell extension</a>.
</p>
<p>
Perhaps it's only my setup that's messed up, but whenever I change a function in the System Under Test (SUT), I can. not. make. VS Code pick up that the API changed. Even if I correct my tests so that they still compile and run successfully from the command line, VS Code will keep insisting that the code is wrong.
</p>
<p>
This is, of course, annoying. One of the major selling points of statically type languages is that a good IDE can tell you if you made mistakes. Well, if it operates on an outdated view of what the SUT looks like, this no longer works.
</p>
<p>
I've tried restarting the Haskell Language Server, but that doesn't work. The only thing that works, as far as I've been able to discover, is to close VS Code, delete <code>.stack-work</code>, recompile, and reopen VS Code. Yes, that takes a minute or two, so not something I like doing too often.
</p>
<p>
Deleting <code>.stack-work</code> does trigger a full rebuild, which is why we see those long build times.
</p>
<h3 id="81ef3b41740145af98094e9335a130eb">
Striking a good balance <a href="#81ef3b41740145af98094e9335a130eb">#</a>
</h3>
<p>
What bothers me about dynamic languages is that I find discoverability and encapsulation so hard. I can't just look at the type of an operation and deduce what inputs it might take, or what the output might look like.
</p>
<p>
To be honest, if you give me a plain text file with F# or Haskell, I can't do that either. A static type system doesn't magically surface that kind of information. Instead, you may rely on an IDE to provide such information at your fingertips. The Haskell extension, for example, gives you a little automatic type annotation above your functions, as discussed in the article <a href="/2024/11/04/pendulum-swing-no-haskell-type-annotation-by-default">Pendulum swing: no Haskell type annotation by default</a>, and shown in a figure reprinted here for your convenience:
</p>
<p>
<img src="/content/binary/haskell-code-with-inferred-type-displayed-by-vs-code.png" alt="Screen shot of a Haskell function in Visual Studio Code with the function's type automatically displayed above it by the Haskell extension.">
</p>
<p>
If this is to work well, this information must be immediate and responsive. On my system it isn't.
</p>
<p>
It may, again, be that there's some problem with my particular tool chain setup. Or perhaps a four-year-old Lenovo X1 Carbon is just too puny a machine to effectively run such a tool.
</p>
<p>
On the other hand, I don't have similar issues with C# in Visual Studio (not VS Code). When I make changes, the IDE quickly responds and tells me if I've made a mistake. To be honest, even here, I feel that <a href="/2023/07/24/is-software-getting-worse">it was faster and more responsive a decade ago</a>, but compared to Haskell programming, the feedback I get with C# is close to immediate.
</p>
<p>
My experience with F# is somewhere in between. Visual Studio is quicker to pick up changes in F# code than VS Code is to reflect changes in Haskell, but it's not as fast as C#.
</p>
<p>
With Python, what little IDE integration is available is usually not trustworthy. Essentially, when suggesting callable operations, the IDE is mostly guessing, based on what it's already seen.
</p>
<p>
But, good Lord! The tests run fast.
</p>
<h3 id="4ea6dd100fbf4cc1b9c2941520a051bf">
Conclusion <a href="#4ea6dd100fbf4cc1b9c2941520a051bf">#</a>
</h3>
<p>
My recent experiences with both Haskell and Python programming is giving me a better understanding of the balances and trade-offs involved with picking a language. While I still favour statically typed languages, I'm beginning to see some attractive qualities on the other side.
</p>
<p>
Particularly, if you buy the argument that TDD suites should run in 10 seconds or less, this effectively means that I can't do TDD in Haskell. Not with the hardware I'm running. Python, on the other hand, seems eminently well-suited for TDD.
</p>
<p>
That doesn't sit too well with me, but on the other hand, I'm glad. I've learned about a benefit of a dynamically typed language. While you may consider all of this ordinary and trite, it feels like a small breakthrough to me. I've been trying hard to see past my own limitations, and it finally feels as though I've found a few chinks in the armour of my biases.
</p>
<p>
I'll keep pushing those envelopes to see what else I may learn.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="b760e53201c74532a33f1ae4a10407f9">
<div class="comment-author">Daniel Tartaglia <a href="#b760e53201c74532a33f1ae4a10407f9">#</a></div>
<div class="commentt-content">
<p>An interesting insight, but if you consider that the compiler is effectively an additional test suit that is verifying the types are being used correctly, that extra compilation time is really just a whole suite of tests that you didn't have to write. I can't help but wonder how long it would take to manually implement all the tests that would be required to satisfy those checks in Python, and how much slower the Python test suite would then be.</p>
<p>Like you, I have a strong bias for typesafe languages (or at least moderately typesafe ones). The way I've always explained it is as follows: Developers tend to work faster when writing with dynamic typed languages because they don't have to explain as much to a compiler. This literally means less code to write. However, because the developer <i>hasen't</i> fully explained themself, any follow-on developer does not have as much context to work with.</p>
<p>After all, whether the language requires it or not, the developers need to define and consider types. The only question is, do they have to <i>write it down</i></p>
</div>
<div class="comment-date">2025-01-01 01:26 UTC</div>
</div>
<div class="comment" id="d9b64e35daa34be7b5a5c34c55043583">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#d9b64e35daa34be7b5a5c34c55043583">#</a></div>
<div class="comment-content">
<p>
Daniel, thank you for writing. I'm <a href="/2011/04/29/Feedbackmechanismsandtradeoffs">well aware that a type checker is a 'first line of defence'</a>, and I agree that if we truly had to replicate everything that a type checker does, as tests, it would take a long time. It would take a long time to write all those tests, and it would probably also take a long time to execute them all.
</p>
<p>
That said, I think that any sane proponent of dynamically typed languages would counter that that's an unreasonable demand. After all, in most cases, it's hardly the case that the code was written by <a href="https://en.wikipedia.org/wiki/Infinite_monkey_theorem">a monkey with a typewriter</a>, but rather by a well-meaning human who did his or her best to write correct code.
</p>
<p>
In the end, however, it's all a question about context. <a href="/2018/11/12/what-to-test-and-not-to-test">How important is correctness</a>, after all?
<a href="https://dannorth.net/about/">Dan North</a> once kindly pointed out to me that in many cases, the software owner doesn't even know what he or she wants. It's only through a series of iterations that we learn what a business system is supposed to do. Until we reach that point, correctness is, at best, a secondary priority. On the other hand, you <a href="https://en.wikipedia.org/wiki/Mars_Climate_Orbiter">should really test your outer space proble software</a>.
</p>
<p>
But you're right. The <a href="https://lexi-lambda.github.io/blog/2020/01/19/no-dynamic-type-systems-are-not-inherently-more-open/">types are still there</a>, either way.
</p>
<p>
The last word in this debate are hardly said yet, but you may also find my recent article series <a href="/2024/12/09/implementation-and-usage-mindsets">Implementation and usage mindsets</a> interesting.
</p>
</div>
<div class="comment-date">2025-01-07 06:53 UTC</div>
</div>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Implementing rod-cuttinghttps://blog.ploeh.dk/2024/12/23/implementing-rod-cutting2024-12-23T08:53:00+00:00Mark Seemann
<div id="post">
<p>
<em>From pseudocode to implementation in three languages.</em>
</p>
<p>
This article picks up where <a href="/2024/12/09/implementation-and-usage-mindsets">Implementation and usage mindsets</a> left off, examining how <a href="https://www.infoq.com/presentations/Simple-Made-Easy/">easy</a> it is to implement an algorithm in three different programming languages.
</p>
<p>
As an example, I'll use the bottom-up rod-cutting algorithm from <a href="/ref/clrs">Introduction to Algorithms</a>.
</p>
<h3 id="0a09280df48e48c7b5257346dc98eab3">
Rod-cutting <a href="#0a09280df48e48c7b5257346dc98eab3">#</a>
</h3>
<p>
The problem is simple:
</p>
<blockquote>
<p>
"Serling Enterprises buys long steel rods and cuts them into shorter rods, which it then sells. Each cut is free. The management of Serling Enterprises wants to know the best way to cut up the rods."
</p>
<footer><cite><a href="/ref/clrs">Introduction to Algorithms. Fourth edition</a>, ch. 14.1</cite></footer>
</blockquote>
<p>
You're given an array of prices, or rather revenues, that each size is worth. The example from the book is given as a table:
</p>
<table>
<tbody>
<tr>
<td>length <em>i</em></td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
<td>9</td>
<td>10</td>
</tr>
<tr>
<td>price <em>p<sub>i</sub></em></td>
<td>1</td>
<td>5</td>
<td>8</td>
<td>9</td>
<td>10</td>
<td>17</td>
<td>17</td>
<td>20</td>
<td>24</td>
<td>30</td>
</tr>
</tbody>
</table>
<p>
Notice that while this implies an array like <code>[1, 5, 8, 9, 10, 17, 17, 20, 24, 30]</code>, the array is understood to be one-indexed, as is the most common case in the book. Most languages, including all three languages in this article, have zero-indexed arrays, but it turns out that we can get around the issue by adding a leading zero to the array: <code>[0, 1, 5, 8, 9, 10, 17, 17, 20, 24, 30]</code>.
</p>
<p>
Thus, given that price array, the best you can do with a rod of length <em>10</em> is to leave it uncut, yielding a revenue of <em>30</em>.
</p>
<p>
<img src="/content/binary/rod-size-10-no-cut.png" alt="A rod divided into 10 segments, left uncut, with the number 30 above it." width="400">
</p>
<p>
On the other hand, if you have a rod of length <em>7</em>, you can cut it into two rods of lengths <em>1</em> and <em>6</em>.
</p>
<p>
<img src="/content/binary/rod-size-7-cut-into-2.png" alt="Two rods, one of a single segment, and one made from six segments. Above the single segment is the number 1, and above the six segments is the number 17." width="320">
</p>
<p>
Another solution for a rod of length <em>7</em> is to cut it into three rods of sizes <em>2</em>, <em>2</em>, and <em>3</em>. Both solutions yield a total revenue of <em>18</em>. Thus, while more than one optimal solution exists, the algorithm given here only identifies one of them.
</p>
<p>
<pre>Extended-Bottom-Up-Cut-Rod(p, n)
1 let r[0:n] and s[1:n] be new arrays
2 r[0] = 0
3 for j = 1 to n // for increasing rod length j
4 q = -∞
5 for i = 1 to j // i is the position of the first cut
6 if q < p[i] + r[j - i]
7 q = p[i] + r[j - i]
8 s[j] = i // best cut location so far for length j
9 r[j] = q // remember the solution value for length j
10 return r and s</pre>
</p>
<p>
Which programming language is this? It's no particular language, but rather pseudocode.
</p>
<p>
The reason that the function is called <code>Extended-Bottom-Up-Cut-Rod</code> is that the book pedagogically goes through a few other algorithms before arriving at this one. Going forward, I don't intend to keep that rather verbose name, but instead just call the function <code>cut_rod</code>, <code>cutRod</code>, or <code>Rod.cut</code>.
</p>
<p>
The <code>p</code> parameter is a one-indexed price (or revenue) array, as explained above, and <code>n</code> is a rod size (e.g. <code>10</code> or <code>7</code>, reflecting the above examples).
</p>
<p>
Given the above price array and <code>n = 10</code>, the algorithm returns two arrays, <code>r</code> for maximum possible revenue for a given cut, and <code>s</code> for the size of the maximizing cut.
</p>
<table>
<thead>
<tr>
<td><em>i</em></td>
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
<td>9</td>
<td>10</td>
</tr>
</thead>
<tbody>
<tr>
<td><em>r</em>[<em>i</em>]</td>
<td>0</td>
<td>1</td>
<td>5</td>
<td>8</td>
<td>10</td>
<td>13</td>
<td>17</td>
<td>18</td>
<td>22</td>
<td>25</td>
<td>30</td>
</tr>
<tr>
<td><em>s</em>[<em>i</em>]</td>
<td></td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>2</td>
<td>2</td>
<td>6</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>10</td>
</tr>
</tbody>
</table>
<p>
Such output doesn't really give a <em>solution</em>, but rather the raw data to find a solution. For example, for <code>n = 10</code> (= <em>i</em>), you consult the table for (one-indexed) index <em>10</em>, and see that you can get the revenue <em>30</em> from making a cut at position <em>10</em> (which effectively means no cut). For <code>n = 7</code>, you consult the table for index 7 and observe that you can get the total revenue <em>18</em> by making a cut at position <em>1</em>. This leaves you with two rods, and you again consult the table. For <code>n = 1</code>, you can get the revenue <em>1</em> by making a cut at position <em>1</em>; i.e. no further cut. For <code>n = 7 - 1 = 6</code> you consult the table and observe that you can get the revenue <em>17</em> by making a cut at position <em>6</em>, again indicating that no further cut is necessary.
</p>
<p>
Another procedure prints the solution, using the above process:
</p>
<p>
<pre>Print-Cut-Rod-Solution(p, n)
1 (r, s) = Extended-Bottom-Up-Cut-Rod(p, n)
2 while n > 0
3 print s[n] // cut location for length n
4 n = n - s[n] // length of the remainder of the rod</pre>
</p>
<p>
Again, the procedure is given as pseudocode.
</p>
<p>
How easy is it translate this algorithm into code in a real programming language? Not surprisingly, this depends on the language.
</p>
<h3 id="36447b3aa2a14becbb895fd70fdd9d4a">
Translation to Python <a href="#36447b3aa2a14becbb895fd70fdd9d4a">#</a>
</h3>
<p>
The hypothesis of the <a href="/2024/12/09/implementation-and-usage-mindsets">previous</a> article is that dynamically typed languages may be more suited for implementation tasks. The dynamically typed language that I know best is <a href="https://www.python.org/">Python</a>, so let's try that.
</p>
<p>
<pre><span style="color:blue;">def</span> <span style="color:#2b91af;">cut_rod</span>(p, n):
r = [0] * (n + 1)
s = [0] * (n + 1)
r[0] = 0
<span style="color:blue;">for</span> j <span style="color:blue;">in</span> <span style="color:blue;">range</span>(1, n + 1):
q = <span style="color:#2b91af;">float</span>(<span style="color:#a31515;">'-inf'</span>)
<span style="color:blue;">for</span> i <span style="color:blue;">in</span> <span style="color:blue;">range</span>(1, j + 1):
<span style="color:blue;">if</span> q < p[i] + r[j - i]:
q = p[i] + r[j - i]
s[j] = i
r[j] = q
<span style="color:blue;">return</span> r, s</pre>
</p>
<p>
That does, indeed, turn out to be straightforward. I had to figure out the syntax for initializing arrays, and how to represent negative infinity, but a combination of <a href="https://github.com/features/copilot">GitHub Copilot</a> and a few web searches quickly cleared that up.
</p>
<p>
The same is true for the <code>Print-Cut-Rod-Solution</code> procedure.
</p>
<p>
<pre><span style="color:blue;">def</span> <span style="color:#2b91af;">print_cut_rod_solution</span>(p, n):
r, s = cut_rod(p, n)
<span style="color:blue;">while</span> n > 0:
<span style="color:blue;">print</span>(s[n])
n = n - s[n]</pre>
</p>
<p>
Apart from minor syntactical differences, the pseudocode translates directly to Python.
</p>
<p>
So far, the hypothesis seems to hold. This particular dynamically typed language, at least, easily implements that particular algorithm. If we must speculate about underlying reasons, we may argue that a dynamically typed language is <a href="/2019/12/16/zone-of-ceremony">low on ceremony</a>. You don't have to get side-tracked by declaring types of parameters, variables, or return values.
</p>
<p>
That, at least, is a common complaint about statically typed languages that I hear when I discuss with lovers of dynamically typed languages.
</p>
<p>
Let us, then, try to implement the rod-cutting algorithm in a statically typed language.
</p>
<h3 id="a55c4ff33cf247f0b57ae58aa6795343">
Translation to Java <a href="#a55c4ff33cf247f0b57ae58aa6795343">#</a>
</h3>
<p>
Together with other <a href="https://en.wikipedia.org/wiki/C_(programming_language)">C</a>-based languages, <a href="https://www.java.com/">Java</a> is infamous for requiring a high amount of ceremony to get anything done. How easy is it to translate the rod-cutting pseudocode to Java? Not surprisingly, it turns out that one has to jump through a few more hoops.
</p>
<p>
First, of course, one has to set up a code base and choose a build system. I'm not well-versed in Java development, but here I (more or less) arbitrarily chose <a href="https://gradle.org/">gradle</a>. When you're new to an ecosystem, this can be a significant barrier, but I know from decades of C# programming that tooling alleviates much of that pain. Still, a single <code>.py</code> file this isn't.
</p>
<p>
Apart from that, the biggest hurdle turned out to be that, as far as I can tell, Java doesn't have native tuple support. Thus, in order to return two arrays, I would have to either pick a reusable package that implements tuples, or define a custom class for that purpose. Object-oriented programmers often argue that tuples represent poor design, since a tuple doesn't really communicate the role or intent of each element. Given that the rod-cutting algorithm returns two integer arrays, I'd be inclined to agree. You can't even tell them apart based on their types. For that reason, I chose to define a class to hold the result of the algorithm.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">RodCuttingSolution</span> {
<span style="color:blue;">private</span> <span style="color:blue;">int</span>[] revenues;
<span style="color:blue;">private</span> <span style="color:blue;">int</span>[] sizes;
<span style="color:blue;">public</span> <span style="color:#2b91af;">RodCuttingSolution</span>(<span style="color:blue;">int</span>[] revenues, <span style="color:blue;">int</span>[] sizes) {
<span style="color:blue;">this</span>.revenues = revenues;
<span style="color:blue;">this</span>.sizes = sizes;
}
<span style="color:blue;">public</span> <span style="color:blue;">int</span>[] <span style="color:#2b91af;">getRevenues</span>() {
<span style="color:blue;">return</span> revenues;
}
<span style="color:blue;">public</span> <span style="color:blue;">int</span>[] <span style="color:#2b91af;">getSizes</span>() {
<span style="color:blue;">return</span> sizes;
}
}</pre>
</p>
<p>
Armed with this return type, the rest of the translation went smoothly.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">RodCuttingSolution</span> <span style="color:#2b91af;">cutRod</span>(<span style="color:blue;">int</span>[] p, <span style="color:blue;">int</span> n) {
var r = <span style="color:blue;">new</span> <span style="color:blue;">int</span>[n + 1];
var s = <span style="color:blue;">new</span> <span style="color:blue;">int</span>[n + 1];
r[0] = 0;
<span style="color:blue;">for</span> (<span style="color:blue;">int</span> j = 1; j <= n; j++) {
var q = <span style="color:blue;">Integer</span>.MIN_VALUE;
<span style="color:blue;">for</span> (<span style="color:blue;">int</span> i = 1; i <= j; i++) {
<span style="color:blue;">if</span> (q < p[i] + r[j - i]) {
q = p[i] + r[j - i];
s[j] = i;
}
}
r[j] = q;
}
<span style="color:blue;">return</span> <span style="color:blue;">new</span> <span style="color:blue;">RodCuttingSolution</span>(r, s);
}</pre>
</p>
<p>
Granted, there's a bit more ceremony involved compared to the Python code, since one must declare the types of both input parameters and method return type. You also have to declare the type of the arrays when initializing them, and you could argue that the <code>for</code> loop syntax is more complicated than Python's <code>for ... in range ...</code> syntax. One may also complain that all the brackets and parentheses makes it harder to read the code.
</p>
<p>
While I'm used to such C-like code, I'm not immune to such criticism. I actually do find the Python code more readable.
</p>
<p>
Translating the <code>Print-Cut-Rod-Solution</code> pseudocode is a bit easier:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">void</span> <span style="color:#2b91af;">printCutRodSolution</span>(<span style="color:blue;">int</span>[] p, <span style="color:blue;">int</span> n) {
var result = cutRod(p, n);
<span style="color:blue;">while</span> (n > 0) {
<span style="color:blue;">System</span>.out.println(result.getSizes()[n]);
n = n - result.getSizes()[n];
}
}</pre>
</p>
<p>
The overall structure of the code remains intact, but again we're burdened with extra ceremony. We have to declare input and output types, and call that awkward <code>getSizes</code> method to retrieve the array of cut sizes.
</p>
<p>
It's possible that my Java isn't perfectly <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a>. After all, although I've read many books with Java examples over the years, I rarely write Java code. Additionally, you may argue that <code>static</code> methods exhibit a code smell like <a href="https://wiki.c2.com/?FeatureEnvySmell">Feature Envy</a>. I might agree, but the purpose of the current example is to examine how easy or difficult it is to implement a particular algorithm in various languages. Now that we have an implementation in Java, we might wish to refactor to a more object-oriented design, but that's outside the scope of this article.
</p>
<p>
Given that the rod-cutting algorithm isn't the most complex algorithm that exists, we may jump to the conclusion that Java isn't <em>that bad</em> compared to Python. Consider, however, how the extra ceremony on display here impacts your work if you have to implement a larger algorithm, or if you need to iterate to find an algorithm on your own.
</p>
<p>
To be clear, C# would require a similar amount of ceremony, and I don't even want to think about doing this in C.
</p>
<p>
All that said, it'd be irresponsible to extrapolate from only a few examples. You'd need both more languages and more problems before it even seems reasonable to draw any conclusions. I don't, however, intend the present example to constitute a full argument. Rather, it's an illustration of an idea that I haven't pulled out of thin air.
</p>
<p>
One of the points of <a href="/2019/12/16/zone-of-ceremony">Zone of Ceremony</a> is that the degree of awkwardness isn't necessarily correlated to whether types are dynamically or statically defined. While I'm sure that I miss lots of points made by 'dynamists', this is a point that I often feel is missed by that camp. One language that exemplifies that 'beyond-ceremony' zone is <a href="https://fsharp.org/">F#</a>.
</p>
<h3 id="0bb95c0e7967419680fe3e6fcc9aed41">
Translation to F# <a href="#0bb95c0e7967419680fe3e6fcc9aed41">#</a>
</h3>
<p>
If I'm right, we should be able to translate the rod-cutting pseudocode to F# with approximately the same amount of trouble than when translating to Python. How do we fare?
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">cut</span> (<span style="font-weight:bold;color:#1f377f;">p</span> : _ <span style="color:#2b91af;">array</span>) <span style="font-weight:bold;color:#1f377f;">n</span> =
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">r</span> = <span style="color:#2b91af;">Array</span>.<span style="color:#74531f;">zeroCreate</span> (<span style="font-weight:bold;color:#1f377f;">n</span> + 1)
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">s</span> = <span style="color:#2b91af;">Array</span>.<span style="color:#74531f;">zeroCreate</span> (<span style="font-weight:bold;color:#1f377f;">n</span> + 1)
<span style="font-weight:bold;color:#1f377f;">r</span>[0] <span style="color:blue;"><-</span> 0
<span style="color:blue;">for</span> <span style="font-weight:bold;color:#1f377f;">j</span> = 1 <span style="color:blue;">to</span> <span style="font-weight:bold;color:#1f377f;">n</span> <span style="color:blue;">do</span>
<span style="color:blue;">let</span> <span style="color:blue;">mutable</span> <span style="color:#a08000;">q</span> = <span style="color:#2b91af;">Int32</span>.MinValue
<span style="color:blue;">for</span> <span style="font-weight:bold;color:#1f377f;">i</span> = 1 <span style="color:blue;">to</span> <span style="font-weight:bold;color:#1f377f;">j</span> <span style="color:blue;">do</span>
<span style="color:blue;">if</span> <span style="color:#a08000;">q</span> < <span style="font-weight:bold;color:#1f377f;">p</span>[<span style="font-weight:bold;color:#1f377f;">i</span>] + <span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">j</span> - <span style="font-weight:bold;color:#1f377f;">i</span>] <span style="color:blue;">then</span>
<span style="color:#a08000;">q</span> <span style="color:blue;"><-</span> <span style="font-weight:bold;color:#1f377f;">p</span>[<span style="font-weight:bold;color:#1f377f;">i</span>] + <span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">j</span> - <span style="font-weight:bold;color:#1f377f;">i</span>]
<span style="font-weight:bold;color:#1f377f;">s</span>[<span style="font-weight:bold;color:#1f377f;">j</span>] <span style="color:blue;"><-</span> <span style="font-weight:bold;color:#1f377f;">i</span>
<span style="font-weight:bold;color:#1f377f;">r</span>[<span style="font-weight:bold;color:#1f377f;">j</span>] <span style="color:blue;"><-</span> <span style="color:#a08000;">q</span>
<span style="font-weight:bold;color:#1f377f;">r</span>, <span style="font-weight:bold;color:#1f377f;">s</span></pre>
</p>
<p>
Fairly well, as it turns out, although we <em>do</em> have to annotate <code>p</code> by indicating that it's an array. Still, the underscore in front of the <code>array</code> keyword indicates that we're happy to let the compiler infer the type of array (which is <code>int array</code>).
</p>
<p>
(We <em>can</em> get around that issue by writing <code>Array.item i p</code> instead of <code>p[i]</code>, but that's verbose in a different way.)
</p>
<p>
Had we chosen to instead implement the algorithm based on an input list or map, we wouldn't have needed the type hint. One could therefore argue that the reason that the hint is even required is because arrays aren't the most idiomatic data structure for a functional language like F#.
</p>
<p>
Otherwise, I don't find that this translation was much harder than translating to Python, and I personally prefer <code><span style="color:blue;">for</span> <span style="color:#1f377f;">j</span> = 1 <span style="color:blue;">to</span> <span style="color:#1f377f;">n</span> <span style="color:blue;">do</span></code> over <code><span style="color:blue;">for</span> j <span style="color:blue;">in</span> <span style="color:blue;">range</span>(1, n + 1):</code>.
</p>
<p>
We also need to add the <code>mutable</code> keyword to allow <code>q</code> to change during the loop. You could argue that this is another example of additional ceremony, While I agree, it's not much related to static versus dynamic typing, but more to how values are immutable by default in F#. If I recall correctly, JavaScript similarly distinguishes between <code>let</code>, <code>var</code>, and <code>const</code>.
</p>
<p>
Translating <code>Print-Cut-Rod-Solution</code> requires, again due to values being immutable by default, a bit more effort than Python, but not much:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">printSolution</span> <span style="font-weight:bold;color:#1f377f;">p</span> <span style="font-weight:bold;color:#1f377f;">n</span> =
<span style="color:blue;">let</span> _, <span style="font-weight:bold;color:#1f377f;">s</span> = <span style="color:#74531f;">cut</span> <span style="font-weight:bold;color:#1f377f;">p</span> <span style="font-weight:bold;color:#1f377f;">n</span>
<span style="color:blue;">let</span> <span style="color:blue;">mutable</span> <span style="color:#a08000;">n</span> = <span style="font-weight:bold;color:#1f377f;">n</span>
<span style="color:blue;">while</span> <span style="color:#a08000;">n</span> > 0 <span style="color:blue;">do</span>
<span style="color:#74531f;">printfn</span> <span style="color:#a31515;">"</span><span style="color:#2b91af;">%i</span><span style="color:#a31515;">"</span> <span style="font-weight:bold;color:#1f377f;">s</span>[<span style="color:#a08000;">n</span>]
<span style="color:#a08000;">n</span> <span style="color:blue;"><-</span> <span style="color:#a08000;">n</span> - <span style="font-weight:bold;color:#1f377f;">s</span>[<span style="color:#a08000;">n</span>]</pre>
</p>
<p>
I had to shadow the <code>n</code> parameter with a <code>mutable</code> variable to stay as close to the pseudocode as possible. Again, one may argue that the overall problem here isn't the static type system, but that programming based on mutation isn't idiomatic for F# (or other functional programming languages). As you'll see in the next article, a more idiomatic implementation is even simpler than this one.
</p>
<p>
Notice, however, that the <code>printSolution</code> action requires no type declarations or annotations.
</p>
<p>
Let's see it all in use:
</p>
<p>
<pre>> let p = [|0; 1; 5; 8; 9; 10; 17; 17; 20; 24; 30|];;
val p: int array = [|0; 1; 5; 8; 9; 10; 17; 17; 20; 24; 30|]
> Rod.printSolution p 7;;
1
6</pre>
</p>
<p>
This little interactive session reproduces the example illustrated in the beginning of this article, when given the price array from the book and a rod of size <em>7</em>, the solution printed indicates cuts at positions <em>1</em> and <em>6</em>.
</p>
<p>
I find it telling that the translation to F# is on par with the translation to Python, even though the structure of the pseudocode is quite imperative.
</p>
<h3 id="eb28d2ce98b34628b2ec4d0df8905492">
Conclusion <a href="#eb28d2ce98b34628b2ec4d0df8905492">#</a>
</h3>
<p>
You could, perhaps, say that if your mindset is predominantly imperative, implementing an algorithm using Python is likely easier than both F# or Java. If, on the other hand, you're mostly in an implementation mindset, but not strongly attached to whether the implementation should be imperative, object-oriented, or functional, I'd offer the conjecture that a language like F# is as implementation-friendly as a language like Python.
</p>
<p>
If, on the other hand, you're more focused on encapsulating and documenting how an existing API works, perhaps that shift of perspective suggests another evaluation of dynamically versus statically typed languages.
</p>
<p>
In any case, the F# code shown here is hardly idiomatic, so it might be illuminating to see what happens if we refactor it.
</p>
<p>
<strong>Next:</strong> <a href="/2025/01/06/encapsulating-rod-cutting">Encapsulating rod-cutting</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.A restaurant sandwichhttps://blog.ploeh.dk/2024/12/16/a-restaurant-sandwich2024-12-16T19:11:00+00:00Mark Seemann
<div id="post">
<p>
<em>An Impureim Sandwich example in C#.</em>
</p>
<p>
When learning functional programming (FP) people often struggle with how to organize code. How do you <a href="/2020/02/24/discerning-and-maintaining-purity">discern and maintain purity</a>? <a href="/2017/02/02/dependency-rejection">How do you do Dependency Injection in FP?</a> What does <a href="/2018/11/19/functional-architecture-a-definition">a functional architecture</a> look like?
</p>
<p>
A common FP design pattern is the <a href="/2020/03/02/impureim-sandwich">Impureim Sandwich</a>. The entry point of an application is always impure, so you push all impure actions to the boundary of the system. This is also known as <a href="https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell">Functional Core, Imperative Shell</a>. If you have a <a href="/2017/07/10/pure-interactions">micro-operation-based architecture</a>, which includes all web-based systems, you can often get by with a 'sandwich'. Perform impure actions to collect all the data you need. Pass all data to a <a href="https://en.wikipedia.org/wiki/Pure_function">pure function</a>. Finally, use impure actions to handle the <a href="https://en.wikipedia.org/wiki/Referential_transparency">referentially transparent</a> return value from the pure function.
</p>
<p>
No design pattern applies universally, and neither does this one. In my experience, however, it's surprisingly often possible to apply this architecture. We're far past the <a href="https://en.wikipedia.org/wiki/Pareto_principle">Pareto principle</a>'s 80 percent.
</p>
<p>
Examples may help illustrate the pattern, as well as explore its boundaries. In this article you'll see how I refactored an entry point of a <a href="https://en.wikipedia.org/wiki/REST">REST</a> API, specifically the <code>PUT</code> handler in the sample code base that accompanies <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>.
</p>
<h3 id="67463d3ade684a2b9de807b261ebb03c">
Starting point <a href="#67463d3ade684a2b9de807b261ebb03c">#</a>
</h3>
<p>
As discussed in the book, the architecture of the sample code base is, in fact, Functional Core, Imperative Shell. This isn't, however, the main theme of the book, and the code doesn't explicitly apply the Impureim Sandwich. In spirit, that's actually what's going on, but it isn't clear from looking at the code. This was a deliberate choice I made, because I wanted to highlight other software engineering practices. This does have the effect, though, that the Impureim Sandwich is invisible.
</p>
<p>
For example, the book follows <a href="/2019/11/04/the-80-24-rule">the 80/24 rule</a> closely. This was a didactic choice on my part. Most code bases I've seen in the wild have far too big methods, so I wanted to hammer home the message that it's possible to develop and maintain a non-trivial code base with small code blocks. This meant, however, that I had to split up HTTP request handlers (in ASP.NET known as <em>action methods</em> on Controllers).
</p>
<p>
The most complex HTTP handler is the one that handles <code>PUT</code> requests for reservations. Clients use this action when they want to make changes to a restaurant reservation.
</p>
<p>
The action method actually invoked by an HTTP request is this <code>Put</code> method:
</p>
<p>
<pre>[<span style="color:#2b91af;">HttpPut</span>(<span style="color:#a31515;">"restaurants/{restaurantId}/reservations/{id}"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">ActionResult</span>> <span style="font-weight:bold;color:#74531f;">Put</span>(
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">restaurantId</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">id</span>,
<span style="color:#2b91af;">ReservationDto</span> <span style="font-weight:bold;color:#1f377f;">dto</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">dto</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">ArgumentNullException</span>(<span style="color:blue;">nameof</span>(<span style="font-weight:bold;color:#1f377f;">dto</span>));
<span style="font-weight:bold;color:#8f08c4;">if</span> (!<span style="color:#2b91af;">Guid</span>.<span style="color:#74531f;">TryParse</span>(<span style="font-weight:bold;color:#1f377f;">id</span>, <span style="color:blue;">out</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">rid</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">NotFoundResult</span>();
<span style="color:#2b91af;">Reservation</span>? <span style="font-weight:bold;color:#1f377f;">reservation</span> = <span style="font-weight:bold;color:#1f377f;">dto</span>.<span style="font-weight:bold;color:#74531f;">Validate</span>(<span style="font-weight:bold;color:#1f377f;">rid</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">reservation</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">BadRequestResult</span>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">restaurant</span> = <span style="color:blue;">await</span> RestaurantDatabase
.<span style="font-weight:bold;color:#74531f;">GetRestaurant</span>(<span style="font-weight:bold;color:#1f377f;">restaurantId</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">restaurant</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">NotFoundResult</span>();
<span style="font-weight:bold;color:#8f08c4;">return</span>
<span style="color:blue;">await</span> <span style="font-weight:bold;color:#74531f;">TryUpdate</span>(<span style="font-weight:bold;color:#1f377f;">restaurant</span>, <span style="font-weight:bold;color:#1f377f;">reservation</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
}</pre>
</p>
<p>
Since I, for pedagogical reasons, wanted to fit each method inside an 80x24 box, I made a few somewhat unnatural design choices. The above code is one of them. While I don't consider it completely indefensible, this method does a bit of up-front input validation and verification, and then delegates execution to the <code>TryUpdate</code> method.
</p>
<p>
This may seem all fine and dandy until you realize that the only caller of <code>TryUpdate</code> is that <code>Put</code> method. A similar thing happens in <code>TryUpdate</code>: It calls a method that has only that one caller. We may try to inline those two methods to see if we can spot the Impureim Sandwich.
</p>
<h3 id="dab8cd4011a5493ea55b47cb2240839b">
Inlined Transaction Script <a href="#dab8cd4011a5493ea55b47cb2240839b">#</a>
</h3>
<p>
Inlining those two methods leave us with a larger, <a href="https://martinfowler.com/eaaCatalog/transactionScript.html">Transaction Script</a>-like entry point:
</p>
<p>
<pre>[<span style="color:#2b91af;">HttpPut</span>(<span style="color:#a31515;">"restaurants/{restaurantId}/reservations/{id}"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">ActionResult</span>> <span style="font-weight:bold;color:#74531f;">Put</span>(
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">restaurantId</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">id</span>,
<span style="color:#2b91af;">ReservationDto</span> <span style="font-weight:bold;color:#1f377f;">dto</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">dto</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">ArgumentNullException</span>(<span style="color:blue;">nameof</span>(<span style="font-weight:bold;color:#1f377f;">dto</span>));
<span style="font-weight:bold;color:#8f08c4;">if</span> (!<span style="color:#2b91af;">Guid</span>.<span style="color:#74531f;">TryParse</span>(<span style="font-weight:bold;color:#1f377f;">id</span>, <span style="color:blue;">out</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">rid</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">NotFoundResult</span>();
<span style="color:#2b91af;">Reservation</span>? <span style="font-weight:bold;color:#1f377f;">reservation</span> = <span style="font-weight:bold;color:#1f377f;">dto</span>.<span style="font-weight:bold;color:#74531f;">Validate</span>(<span style="font-weight:bold;color:#1f377f;">rid</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">reservation</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">BadRequestResult</span>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">restaurant</span> = <span style="color:blue;">await</span> RestaurantDatabase
.<span style="font-weight:bold;color:#74531f;">GetRestaurant</span>(<span style="font-weight:bold;color:#1f377f;">restaurantId</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">restaurant</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">NotFoundResult</span>();
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">scope</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">TransactionScope</span>(
<span style="color:#2b91af;">TransactionScopeAsyncFlowOption</span>.Enabled);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">existing</span> = <span style="color:blue;">await</span> Repository
.<span style="font-weight:bold;color:#74531f;">ReadReservation</span>(<span style="font-weight:bold;color:#1f377f;">restaurant</span>.Id, <span style="font-weight:bold;color:#1f377f;">reservation</span>.Id)
.<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">existing</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">NotFoundResult</span>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">reservations</span> = <span style="color:blue;">await</span> Repository
.<span style="font-weight:bold;color:#74531f;">ReadReservations</span>(<span style="font-weight:bold;color:#1f377f;">restaurant</span>.Id, <span style="font-weight:bold;color:#1f377f;">reservation</span>.At)
.<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#1f377f;">reservations</span> =
<span style="font-weight:bold;color:#1f377f;">reservations</span>.<span style="font-weight:bold;color:#74531f;">Where</span>(<span style="font-weight:bold;color:#1f377f;">r</span> => <span style="font-weight:bold;color:#1f377f;">r</span>.Id <span style="font-weight:bold;color:#74531f;">!=</span> <span style="font-weight:bold;color:#1f377f;">reservation</span>.Id).<span style="font-weight:bold;color:#74531f;">ToList</span>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">now</span> = Clock.<span style="font-weight:bold;color:#74531f;">GetCurrentDateTime</span>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">ok</span> = <span style="font-weight:bold;color:#1f377f;">restaurant</span>.MaitreD.<span style="font-weight:bold;color:#74531f;">WillAccept</span>(
<span style="font-weight:bold;color:#1f377f;">now</span>,
<span style="font-weight:bold;color:#1f377f;">reservations</span>,
<span style="font-weight:bold;color:#1f377f;">reservation</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (!<span style="font-weight:bold;color:#1f377f;">ok</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#74531f;">NoTables500InternalServerError</span>();
<span style="color:blue;">await</span> Repository.<span style="font-weight:bold;color:#74531f;">Update</span>(<span style="font-weight:bold;color:#1f377f;">restaurant</span>.Id, <span style="font-weight:bold;color:#1f377f;">reservation</span>)
.<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#1f377f;">scope</span>.<span style="font-weight:bold;color:#74531f;">Complete</span>();
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">OkObjectResult</span>(<span style="font-weight:bold;color:#1f377f;">reservation</span>.<span style="font-weight:bold;color:#74531f;">ToDto</span>());
}</pre>
</p>
<p>
While I've definitely seen longer methods in the wild, this variation is already so big that it no longer fits on my laptop screen. I have to scroll up and down to read the whole thing. When looking at the bottom of the method, I have to <em>remember</em> what was at the top, because I can no longer see it.
</p>
<p>
A major point of <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a> is that what limits programmer productivity is human cognition. If you have to scroll your screen because you can't see the whole method at once, does that fit in your brain? Chances are, it doesn't.
</p>
<p>
Can you spot the Impureim Sandwich now?
</p>
<p>
If you can't, that's understandable. It's not really clear because there's quite a few small decisions being made in this code. You could argue, for example, that this decision is referentially transparent:
</p>
<p>
<pre><span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">existing</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">NotFoundResult</span>();</pre>
</p>
<p>
These two lines of code are deterministic and have no side effects. The branch only returns a <code>NotFoundResult</code> when <code>existing is null</code>. Additionally, these two lines of code are surrounded by impure actions both before and after. Is this the Sandwich, then?
</p>
<p>
No, it's not. This is how <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> imperative code looks. To borrow a diagram from <a href="/2020/03/23/repeatable-execution">another article</a>, pure and impure code is interleaved without discipline:
</p>
<p>
<img src="/content/binary/impure-with-stripes-of-purity.png" alt="A box of mostly impure (red) code with vertical stripes of green symbolising pure code.">
</p>
<p>
Even so, the above <code>Put</code> method implements the Functional Core, Imperative Shell architecture. The <code>Put</code> method <em>is</em> the Imperative Shell, but where's the Functional Core?
</p>
<h3 id="e9ccab8ae8234c139934b87238dcf672">
Shell perspective <a href="#e9ccab8ae8234c139934b87238dcf672">#</a>
</h3>
<p>
One thing to be aware of is that when looking at the Imperative Shell code, the Functional Core is close to invisible. This is because it's typically only a single function call.
</p>
<p>
In the above <code>Put</code> method, this is the Functional Core:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">ok</span> = <span style="font-weight:bold;color:#1f377f;">restaurant</span>.MaitreD.<span style="font-weight:bold;color:#74531f;">WillAccept</span>(
<span style="font-weight:bold;color:#1f377f;">now</span>,
<span style="font-weight:bold;color:#1f377f;">reservations</span>,
<span style="font-weight:bold;color:#1f377f;">reservation</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (!<span style="font-weight:bold;color:#1f377f;">ok</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#74531f;">NoTables500InternalServerError</span>();</pre>
</p>
<p>
It's only a few lines of code, and had I not given myself the constraint of staying within an 80 character line width, I could have instead laid it out like this and inlined the <code>ok</code> flag:
</p>
<p>
<pre><span style="font-weight:bold;color:#8f08c4;">if</span> (!<span style="font-weight:bold;color:#1f377f;">restaurant</span>.MaitreD.<span style="font-weight:bold;color:#74531f;">WillAccept</span>(<span style="font-weight:bold;color:#1f377f;">now</span>, <span style="font-weight:bold;color:#1f377f;">reservations</span>, <span style="font-weight:bold;color:#1f377f;">reservation</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#74531f;">NoTables500InternalServerError</span>();</pre>
</p>
<p>
Now that I try this, in fact, it turns out that this actually still stays within 80 characters. To be honest, I don't know exactly why I had that former code instead of this, but perhaps I found the latter alternative too dense. Or perhaps I simply didn't think of it. Code is rarely perfect. Usually when I revisit a piece of code after having been away from it for some time, I find some thing that I want to change.
</p>
<p>
In any case, that's beside the point. What matters here is that when you're looking through the Imperative Shell code, the Functional Core looks insignificant. Blink and you'll miss it. Even if we ignore all the other small pure decisions (the <code>if</code> statements) and pretend that we already have an Impureim Sandwich, from this viewpoint, the architecture <em>looks</em> like this:
</p>
<p>
<img src="/content/binary/impure-tiny-pure-impure-sandwich-box.png" alt="A box with a big red section on top, a thin green sliver middle, and another big red part at the bottom.">
</p>
<p>
It's tempting to ask, then: What's all the fuss about? Why even bother?
</p>
<p>
This is a natural experience for a code reader. After all, if you don't know a code base well, you often start at the entry point to try to understand how the application handles a certain stimulus. Such as an HTTP <code>PUT</code> request. When you do that, you see all of the Imperative Shell code before you see the Functional Core code. This could give you the wrong impression about the balance of responsibility.
</p>
<p>
After all, code like the above <code>Put</code> method has inlined most of the impure code so that it's right in your face. Granted, there's still some code hiding behind, say, <code>Repository.ReadReservations</code>, but a substantial fraction of the imperative code is visible in the method.
</p>
<p>
On the other hand, the Functional Core is just a single function call. If we inlined all of that code, too, the picture might rather look like this:
</p>
<p>
<img src="/content/binary/impure-pure-impure-sandwich-box.png" alt="A box with a thin red slice on top, a thick green middle, and a thin red slice at the bottom.">
</p>
<p>
This obviously depends on the de-facto ratio of pure to imperative code. In any case, inlining the pure code is a thought experiment only, because the whole point of functional architecture is that <a href="/2021/07/28/referential-transparency-fits-in-your-head">a referentially transparent function fits in your head</a>. Regardless of the complexity and amount of code hiding behind that <code>MaitreD.WillAccept</code> function, the return value is <em>equal</em> to the function call. It's the ultimate abstraction.
</p>
<h3 id="f3019f4107254a82b4280753cbbfab5f">
Standard combinators <a href="#f3019f4107254a82b4280753cbbfab5f">#</a>
</h3>
<p>
As I've already suggested, the inlined <code>Put</code> method looks like a Transaction Script. The <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a> fortunately hovers on <a href="https://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus_or_Minus_Two">the magical number seven</a>, and branching is exclusively organized around <a href="https://en.wikipedia.org/wiki/Guard_(computer_science)">Guard Clauses</a>. Apart from that, there are no nested <code>if</code> statements or <code>for</code> loops.
</p>
<p>
Apart from the Guard Clauses, this mostly looks like a procedure that runs in a straight line from top to bottom. The exception is all those small conditionals that may cause the procedure to exit prematurely. Conditions like this:
</p>
<p>
<pre><span style="font-weight:bold;color:#8f08c4;">if</span> (!<span style="color:#2b91af;">Guid</span>.<span style="color:#74531f;">TryParse</span>(<span style="font-weight:bold;color:#1f377f;">id</span>, <span style="color:blue;">out</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">rid</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">NotFoundResult</span>();</pre>
</p>
<p>
or
</p>
<p>
<pre><span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">reservation</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">BadRequestResult</span>();</pre>
</p>
<p>
Such checks occur throughout the method. Each of them are actually small pure islands amidst all the imperative code, but each is ad hoc. Each checks if it's possible for the procedure to continue, and returns a kind of error value if it decides that it's not.
</p>
<p>
Is there a way to model such 'derailments' from the main flow?
</p>
<p>
If you've ever encountered Scott Wlaschin's <a href="https://fsharpforfunandprofit.com/rop/">Railway Oriented Programming</a> you may already see where this is going. Railway-oriented programming is a fantastic metaphor, because it gives you a way to visualize that you have, indeed, a main track, but then you have a side track that you may shuffle some trains too. And once the train is on the side track, it can't go back to the main track.
</p>
<p>
That's how the <a href="/2022/05/09/an-either-monad">Either monad</a> works. Instead of all those ad-hoc <code>if</code> statements, we should be able to replace them with what we may call <em>standard combinators</em>. The most important of these combinators is <a href="/2022/03/28/monads">monadic bind</a>. Composing a Transaction Script like <code>Put</code> with standard combinators will 'hide away' those small decisions, and make the Sandwich nature more apparent.
</p>
<p>
If we had had pure code, we could just have composed Either-valued functions. Unfortunately, most of what's going on in the <code>Put</code> method happens in a Task-based context. Thankfully, Either is one of those monads that nest well, implying that we can <a href="/2024/11/25/nested-monads">turn the combination into a composed TaskEither monad</a>. The linked article shows the core <code>TaskEither</code> <code>SelectMany</code> implementations.
</p>
<p>
The way to encode all those small decisions between 'main track' or 'side track', then, is to wrap 'naked' values in the desired <code><span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>>> </code> <a href="https://bartoszmilewski.com/2014/01/14/functors-are-containers/">container</a>:
</p>
<p>
<pre><span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">FromResult</span>(<span style="font-weight:bold;color:#1f377f;">id</span>.<span style="font-weight:bold;color:#74531f;">TryParseGuid</span>().<span style="font-weight:bold;color:#74531f;">OnNull</span>((<span style="color:#2b91af;">ActionResult</span>)<span style="color:blue;">new</span> <span style="color:#2b91af;">NotFoundResult</span>()))</pre>
</p>
<p>
This little code snippet makes use of a few small building blocks that we also need to introduce. First, .NET's standard <code>TryParse</code> APIs don't, compose, but since <a href="/2019/07/15/tester-doer-isomorphisms">they're isomorphic to Maybe-valued functions</a>, you can write an adapter like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">Guid</span>? <span style="color:#74531f;">TryParseGuid</span>(<span style="color:blue;">this</span> <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">candidate</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="color:#2b91af;">Guid</span>.<span style="color:#74531f;">TryParse</span>(<span style="font-weight:bold;color:#1f377f;">candidate</span>, <span style="color:blue;">out</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">guid</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">guid</span>;
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
}</pre>
</p>
<p>
In this code base, I treat <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/builtin-types/nullable-reference-types">nullable reference types</a> as equivalent to the <a href="/2022/04/25/the-maybe-monad">Maybe monad</a>, but if your language doesn't have that feature, you can use Maybe instead.
</p>
<p>
To implement the <code>Put</code> method, however, we don't want nullable (or Maybe) values. We need Either values, so we may introduce a <a href="/2022/07/18/natural-transformations">natural transformation</a>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>> <span style="color:#74531f;">OnNull</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>>(<span style="color:blue;">this</span> <span style="color:#2b91af;">R</span>? <span style="font-weight:bold;color:#1f377f;">candidate</span>, <span style="color:#2b91af;">L</span> <span style="font-weight:bold;color:#1f377f;">left</span>) <span style="color:blue;">where</span> <span style="color:#2b91af;">R</span> : <span style="color:blue;">struct</span>
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">candidate</span>.HasValue)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#74531f;">Right</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>>(<span style="font-weight:bold;color:#1f377f;">candidate</span>.Value);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#74531f;">Left</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>>(<span style="font-weight:bold;color:#1f377f;">left</span>);
}</pre>
</p>
<p>
In <a href="https://www.haskell.org/">Haskell</a> one might just make use of the <a href="https://hackage.haskell.org/package/base/docs/Data-Maybe.html#v:maybe">built-in</a> <a href="/2019/05/20/maybe-catamorphism">Maybe catamorphism</a>:
</p>
<p>
<pre>ghci> maybe (Left "boo!") Right $ Just 123
Right 123
ghci> maybe (Left "boo!") Right $ Nothing
Left "boo!"</pre>
</p>
<p>
Such conversions from <code>Maybe</code> to <code>Either</code> hover just around the <a href="https://wiki.haskell.org/Fairbairn_threshold">Fairbairn threshold</a>, but since we are going to need it more than once, it makes sense to add a specialized <code>OnNull</code> transformation to the C# code base. The one shown here handles <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/builtin-types/nullable-value-types">nullable value types</a>, but the code base also includes an overload that handles nullable reference types. It's almost identical.
</p>
<h3 id="f158fc8250db4f07b1419a044fe23f91">
Support for query syntax <a href="#f158fc8250db4f07b1419a044fe23f91">#</a>
</h3>
<p>
There's more than one way to consume monadic values in C#. While many C# developers like <a href="https://learn.microsoft.com/dotnet/csharp/linq/">LINQ</a>, most seem to prefer the familiar <em>method call syntax</em>; that is, just call the <code>Select</code>, <code>SelectMany</code>, and <code>Where</code> methods as the normal <a href="https://learn.microsoft.com/dotnet/csharp/programming-guide/classes-and-structs/extension-methods">extension methods</a> they are. Another option, however, is to use <a href="https://learn.microsoft.com/dotnet/csharp/linq/get-started/query-expression-basics">query syntax</a>. This is what I'm aiming for here, since it'll make it easier to spot the Impureim Sandwich.
</p>
<p>
You'll see the entire sandwich later in the article. Before that, I'll highlight details and explain how to implement them. You can always scroll down to see the end result, and then scroll back here, if that's more to your liking.
</p>
<p>
The sandwich starts by parsing the <code>id</code> into a <a href="https://learn.microsoft.com/dotnet/api/system.guid">GUID</a> using the above building blocks:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sandwich</span> =
<span style="color:blue;">from</span> rid <span style="color:blue;">in</span> <span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">FromResult</span>(<span style="font-weight:bold;color:#1f377f;">id</span>.<span style="font-weight:bold;color:#74531f;">TryParseGuid</span>().<span style="font-weight:bold;color:#74531f;">OnNull</span>((<span style="color:#2b91af;">ActionResult</span>)<span style="color:blue;">new</span> <span style="color:#2b91af;">NotFoundResult</span>()))</pre>
</p>
<p>
It then immediately proceeds to <code>Validate</code> (<a href="https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/">parse</a>, really) the <code>dto</code> into a proper Domain Model:
</p>
<p>
<pre><span style="color:blue;">from</span> reservation <span style="color:blue;">in</span> <span style="font-weight:bold;color:#1f377f;">dto</span>.<span style="font-weight:bold;color:#74531f;">Validate</span>(rid).<span style="font-weight:bold;color:#74531f;">OnNull</span>((<span style="color:#2b91af;">ActionResult</span>)<span style="color:blue;">new</span> <span style="color:#2b91af;">BadRequestResult</span>())</pre>
</p>
<p>
Notice that the second <code>from</code> expression doesn't wrap the result with <code>Task.FromResult</code>. How does that work? Is the return value of <code>dto.Validate</code> already a <code>Task</code>? No, this works because I added 'degenerate' <code>SelectMany</code> overloads:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R1</span>>> <span style="color:#74531f;">SelectMany</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>, <span style="color:#2b91af;">R1</span>>(
<span style="color:blue;">this</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>>> <span style="font-weight:bold;color:#1f377f;">source</span>,
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">R</span>, <span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R1</span>>> <span style="font-weight:bold;color:#1f377f;">selector</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">source</span>.<span style="font-weight:bold;color:#74531f;">SelectMany</span>(<span style="font-weight:bold;color:#1f377f;">x</span> => <span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">FromResult</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>(<span style="font-weight:bold;color:#1f377f;">x</span>)));
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R1</span>>> <span style="color:#74531f;">SelectMany</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">U</span>, <span style="color:#2b91af;">R</span>, <span style="color:#2b91af;">R1</span>>(
<span style="color:blue;">this</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>>> <span style="font-weight:bold;color:#1f377f;">source</span>,
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">R</span>, <span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">U</span>>> <span style="font-weight:bold;color:#1f377f;">k</span>,
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">R</span>, <span style="color:#2b91af;">U</span>, <span style="color:#2b91af;">R1</span>> <span style="font-weight:bold;color:#1f377f;">s</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">source</span>.<span style="font-weight:bold;color:#74531f;">SelectMany</span>(<span style="font-weight:bold;color:#1f377f;">x</span> => <span style="font-weight:bold;color:#1f377f;">k</span>(<span style="font-weight:bold;color:#1f377f;">x</span>).<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">y</span> => <span style="font-weight:bold;color:#1f377f;">s</span>(<span style="font-weight:bold;color:#1f377f;">x</span>, <span style="font-weight:bold;color:#1f377f;">y</span>)));
}</pre>
</p>
<p>
Notice that the <code>selector</code> only produces an <code><span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R1</span>></code> value, rather than <code><span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R1</span>>></code>. This allows query syntax to 'pick up' the previous value (<code>rid</code>, which is 'really' a <code><span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">ActionResult</span>, <span style="color:#2b91af;">Guid</span>>></code>) and continue with a function that doesn't produce a <code>Task</code>, but rather just an <code>Either</code> value. The first of these two overloads then wraps that <code>Either</code> value and wraps it with <code>Task.FromResult</code>. The second overload is just the usual <a href="/2019/12/16/zone-of-ceremony">ceremony</a> that enables query syntax.
</p>
<p>
Why, then, doesn't the <code>sandwich</code> use the same trick for <code>rid</code>? Why does it explicitly call <code>Task.FromResult</code>?
</p>
<p>
As far as I can tell, this is because of type inference. It looks as though the C# compiler infers the monad's type from the first expression. If I change the first expression to
</p>
<p>
<pre><span style="color:blue;">from</span> rid <span style="color:blue;">in</span> <span style="font-weight:bold;color:#1f377f;">id</span>.<span style="font-weight:bold;color:#74531f;">TryParseGuid</span>().<span style="font-weight:bold;color:#74531f;">OnNull</span>((<span style="color:#2b91af;">ActionResult</span>)<span style="color:blue;">new</span> <span style="color:#2b91af;">NotFoundResult</span>())</pre>
</p>
<p>
the compiler thinks that the query expression is based on <code><span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>></code>, rather than <code><span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>>></code>. This means that once we run into the first <code>Task</code> value, the entire expression no longer works.
</p>
<p>
By explicitly wrapping the first expression in a <code>Task</code>, the compiler correctly infers the monad we'd like it to. If there's a more elegant way to do this, I'm not aware of it.
</p>
<h3 id="066473b442cc4dd4904b43dccd257fa4">
Values that don't fail <a href="#066473b442cc4dd4904b43dccd257fa4">#</a>
</h3>
<p>
The sandwich proceeds to query various databases, using the now-familiar <code>OnNull</code> combinators to transform nullable values to <code>Either</code> values.
</p>
<p>
<pre><span style="color:blue;">from</span> restaurant <span style="color:blue;">in</span> RestaurantDatabase
.<span style="font-weight:bold;color:#74531f;">GetRestaurant</span>(<span style="font-weight:bold;color:#1f377f;">restaurantId</span>)
.<span style="font-weight:bold;color:#74531f;">OnNull</span>((<span style="color:#2b91af;">ActionResult</span>)<span style="color:blue;">new</span> <span style="color:#2b91af;">NotFoundResult</span>())
<span style="color:blue;">from</span> existing <span style="color:blue;">in</span> Repository
.<span style="font-weight:bold;color:#74531f;">ReadReservation</span>(restaurant.Id, reservation.Id)
.<span style="font-weight:bold;color:#74531f;">OnNull</span>((<span style="color:#2b91af;">ActionResult</span>)<span style="color:blue;">new</span> <span style="color:#2b91af;">NotFoundResult</span>())</pre>
</p>
<p>
This works like before because both <code>GetRestaurant</code> and <code>ReadReservation</code> are queries that may fail to return a value. Here's the interface definition of <code>ReadReservation</code>:
</p>
<p>
<pre><span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Reservation</span>?> <span style="font-weight:bold;color:#74531f;">ReadReservation</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">restaurantId</span>, <span style="color:#2b91af;">Guid</span> <span style="font-weight:bold;color:#1f377f;">id</span>);</pre>
</p>
<p>
Notice the question mark that indicates that the result may be <code>null</code>.
</p>
<p>
The <code>GetRestaurant</code> method is similar.
</p>
<p>
The next query that the sandwich has to perform, however, is different. The return type of the <code>ReadReservations</code> method is <code><span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">Reservation</span>>></code>. Notice that the type contained in the <code>Task</code> is <em>not</em> nullable. Barring database connection errors, this query <a href="/2024/01/29/error-categories-and-category-errors">can't fail</a>. If it finds no data, it returns an empty collection.
</p>
<p>
Since the value isn't nullable, we can't use <code>OnNull</code> to turn it into a <code><span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>>></code> value. We could try to use the <code>Right</code> creation function for that.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>> <span style="color:#74531f;">Right</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>>(<span style="color:#2b91af;">R</span> <span style="font-weight:bold;color:#1f377f;">right</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>>.<span style="color:#74531f;">Right</span>(<span style="font-weight:bold;color:#1f377f;">right</span>);
}</pre>
</p>
<p>
This works, but is awkward:
</p>
<p>
<pre><span style="color:blue;">from</span> reservations <span style="color:blue;">in</span> Repository
.<span style="font-weight:bold;color:#74531f;">ReadReservations</span>(restaurant.Id, reservation.At)
.<span style="font-weight:bold;color:#74531f;">Traverse</span>(<span style="font-weight:bold;color:#1f377f;">rs</span> => <span style="color:#2b91af;">Either</span>.<span style="color:#74531f;">Right</span><<span style="color:#2b91af;">ActionResult</span>, <span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">Reservation</span>>>(<span style="font-weight:bold;color:#1f377f;">rs</span>))</pre>
</p>
<p>
The problem with calling <code>Either.Right</code> is that while the compiler can infer which type to use for <code>R</code>, it doesn't know what the <code>L</code> type is. Thus, we have to tell it, and we can't tell it what <code>L</code> is without <em>also</em> telling it what <code>R</code> is. Even though it already knows that.
</p>
<p>
In such scenarios, the <a href="https://fsharp.org/">F#</a> compiler can usually figure it out, and <a href="https://en.wikipedia.org/wiki/Glasgow_Haskell_Compiler">GHC</a> always can (unless you add some exotic language extensions to your code). C# doesn't have any syntax that enables you to tell the compiler about only the type that it doesn't know about, and let it infer the rest.
</p>
<p>
All is not lost, though, because there's a little trick you can use in cases such as this. You <em>can</em> let the C# compiler infer the <code>R</code> type so that you only have to tell it what <code>L</code> is. It's a two-stage process. First, define an extension method on <code>R</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">RightBuilder</span><<span style="color:#2b91af;">R</span>> <span style="color:#74531f;">ToRight</span><<span style="color:#2b91af;">R</span>>(<span style="color:blue;">this</span> <span style="color:#2b91af;">R</span> <span style="font-weight:bold;color:#1f377f;">right</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">RightBuilder</span><<span style="color:#2b91af;">R</span>>(<span style="font-weight:bold;color:#1f377f;">right</span>);
}</pre>
</p>
<p>
The only type argument on this <code>ToRight</code> method is <code>R</code>, and since the <code>right</code> parameter is of the type <code>R</code>, the C# compiler can always infer the type of <code>R</code> from the type of <code>right</code>.
</p>
<p>
What's <code><span style="color:#2b91af;">RightBuilder</span><<span style="color:#2b91af;">R</span>></code>? It's this little auxiliary class:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">RightBuilder</span><<span style="color:#2b91af;">R</span>>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">R</span> right;
<span style="color:blue;">public</span> <span style="color:#2b91af;">RightBuilder</span>(<span style="color:#2b91af;">R</span> <span style="font-weight:bold;color:#1f377f;">right</span>)
{
<span style="color:blue;">this</span>.right = <span style="font-weight:bold;color:#1f377f;">right</span>;
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>> <span style="font-weight:bold;color:#74531f;">WithLeft</span><<span style="color:#2b91af;">L</span>>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">Either</span>.<span style="color:#74531f;">Right</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>>(right);
}
}</pre>
</p>
<p>
The code base for <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a> was written on .NET 3.1, but today you could have made this a <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/builtin-types/record">record</a> instead. The only purpose of this class is to break the type inference into two steps so that the <code>R</code> type can be automatically inferred. In this way, you only need to tell the compiler what the <code>L</code> type is.
</p>
<p>
<pre><span style="color:blue;">from</span> reservations <span style="color:blue;">in</span> Repository
.<span style="font-weight:bold;color:#74531f;">ReadReservations</span>(restaurant.Id, reservation.At)
.<span style="font-weight:bold;color:#74531f;">Traverse</span>(<span style="font-weight:bold;color:#1f377f;">rs</span> => <span style="font-weight:bold;color:#1f377f;">rs</span>.<span style="font-weight:bold;color:#74531f;">ToRight</span>().<span style="font-weight:bold;color:#74531f;">WithLeft</span><<span style="color:#2b91af;">ActionResult</span>>())</pre>
</p>
<p>
As indicated, this style of programming isn't language-neutral. Even if you find this little trick neat, I'd much rather have the compiler just figure it out for me. The entire <code>sandwich</code> query expression is already defined as working with <code><span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">ActionResult</span>, <span style="color:#2b91af;">R</span>>></code>, and the <code>L</code> type can't change like the <code>R</code> type can. Functional compilers can figure this out, and while I intend this article to show object-oriented programmers how functional programming sometimes work, I don't wish to pretend that it's a good idea to write code like this in C#. I've <a href="/2019/03/18/the-programmer-as-decision-maker">covered that ground already</a>.
</p>
<p>
Not surprisingly, there's a mirror-image <code>ToLeft</code>/<code>WithRight</code> combo, too.
</p>
<h3 id="8628f41e4a8d4c6e8d282c5a64ad1c44">
Working with Commands <a href="#8628f41e4a8d4c6e8d282c5a64ad1c44">#</a>
</h3>
<p>
The ultimate goal with the <code>Put</code> method is to modify a row in the database. The method to do that has this interface definition:
</p>
<p>
<pre><span style="color:#2b91af;">Task</span> <span style="font-weight:bold;color:#74531f;">Update</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">restaurantId</span>, <span style="color:#2b91af;">Reservation</span> <span style="font-weight:bold;color:#1f377f;">reservation</span>);</pre>
</p>
<p>
I usually call that non-generic <a href="https://learn.microsoft.com/dotnet/api/system.threading.tasks.task">Task</a> class for 'asynchronous <code>void</code>' when explaining it to non-C# programmers. The <code>Update</code> method is an asynchronous <a href="https://en.wikipedia.org/wiki/Command%E2%80%93query_separation">Command</a>.
</p>
<p>
<code>Task</code> and <code>void</code> aren't legal values for use with LINQ query syntax, so you have to find a way to work around that limitation. In this case I defined a local helper method to make it look like a Query:
</p>
<p>
<pre><span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Reservation</span>> <span style="font-weight:bold;color:#74531f;">RunUpdate</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">restaurantId</span>, <span style="color:#2b91af;">Reservation</span> <span style="font-weight:bold;color:#1f377f;">reservation</span>, <span style="color:#2b91af;">TransactionScope</span> <span style="font-weight:bold;color:#1f377f;">scope</span>)
{
<span style="color:blue;">await</span> Repository.<span style="font-weight:bold;color:#74531f;">Update</span>(<span style="font-weight:bold;color:#1f377f;">restaurantId</span>, <span style="font-weight:bold;color:#1f377f;">reservation</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#1f377f;">scope</span>.<span style="font-weight:bold;color:#74531f;">Complete</span>();
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">reservation</span>;
}</pre>
</p>
<p>
It just echoes back the <code>reservation</code> parameter once the <code>Update</code> has completed. This makes it composable in the larger query expression.
</p>
<p>
You'll probably not be surprised when I tell you that both F# and Haskell handle this scenario gracefully, without requiring any hoop-jumping.
</p>
<h3 id="d7c6feabfcb74e2eb5174a9ad3dd9c7f">
Full sandwich <a href="#d7c6feabfcb74e2eb5174a9ad3dd9c7f">#</a>
</h3>
<p>
Those are all the building block. Here's the full <code>sandwich</code> definition, colour-coded like the examples in <a href="/2020/03/02/impureim-sandwich">Impureim sandwich</a>.
</p>
<p>
<pre><span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">ActionResult</span>, <span style="color:#2b91af;">OkObjectResult</span>>> <span style="font-weight:bold;color:#1f377f;">sandwich</span> =
<span style="background-color: palegreen;"> <span style="color:blue;">from</span> rid <span style="color:blue;">in</span> <span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">FromResult</span>(
<span style="font-weight:bold;color:#1f377f;">id</span>.<span style="font-weight:bold;color:#74531f;">TryParseGuid</span>().<span style="font-weight:bold;color:#74531f;">OnNull</span>((<span style="color:#2b91af;">ActionResult</span>)<span style="color:blue;">new</span> <span style="color:#2b91af;">NotFoundResult</span>()))
<span style="color:blue;">from</span> reservation <span style="color:blue;">in</span>
<span style="font-weight:bold;color:#1f377f;">dto</span>.<span style="font-weight:bold;color:#74531f;">Validate</span>(rid).<span style="font-weight:bold;color:#74531f;">OnNull</span>(
(<span style="color:#2b91af;">ActionResult</span>)<span style="color:blue;">new</span> <span style="color:#2b91af;">BadRequestResult</span>())</span>
<span style="background-color: lightsalmon;"> <span style="color:blue;">from</span> restaurant <span style="color:blue;">in</span> RestaurantDatabase
.<span style="font-weight:bold;color:#74531f;">GetRestaurant</span>(<span style="font-weight:bold;color:#1f377f;">restaurantId</span>)
.<span style="font-weight:bold;color:#74531f;">OnNull</span>((<span style="color:#2b91af;">ActionResult</span>)<span style="color:blue;">new</span> <span style="color:#2b91af;">NotFoundResult</span>())
<span style="color:blue;">from</span> existing <span style="color:blue;">in</span> Repository
.<span style="font-weight:bold;color:#74531f;">ReadReservation</span>(restaurant.Id, reservation.Id)
.<span style="font-weight:bold;color:#74531f;">OnNull</span>((<span style="color:#2b91af;">ActionResult</span>)<span style="color:blue;">new</span> <span style="color:#2b91af;">NotFoundResult</span>())
<span style="color:blue;">from</span> reservations <span style="color:blue;">in</span> Repository
.<span style="font-weight:bold;color:#74531f;">ReadReservations</span>(restaurant.Id, reservation.At)
.<span style="font-weight:bold;color:#74531f;">Traverse</span>(<span style="font-weight:bold;color:#1f377f;">rs</span> => <span style="font-weight:bold;color:#1f377f;">rs</span>.<span style="font-weight:bold;color:#74531f;">ToRight</span>().<span style="font-weight:bold;color:#74531f;">WithLeft</span><<span style="color:#2b91af;">ActionResult</span>>())
<span style="color:blue;">let</span> now = Clock.<span style="font-weight:bold;color:#74531f;">GetCurrentDateTime</span>()</span>
<span style="background-color: palegreen;"> <span style="color:blue;">let</span> reservations2 =
reservations.<span style="font-weight:bold;color:#74531f;">Where</span>(<span style="font-weight:bold;color:#1f377f;">r</span> => <span style="font-weight:bold;color:#1f377f;">r</span>.Id <span style="font-weight:bold;color:#74531f;">!=</span> reservation.Id)
<span style="color:blue;">let</span> ok = restaurant.MaitreD.<span style="font-weight:bold;color:#74531f;">WillAccept</span>(
now,
reservations2,
reservation)
<span style="color:blue;">from</span> reservation2 <span style="color:blue;">in</span>
ok
? reservation.<span style="font-weight:bold;color:#74531f;">ToRight</span>().<span style="font-weight:bold;color:#74531f;">WithLeft</span><<span style="color:#2b91af;">ActionResult</span>>()
: <span style="color:#74531f;">NoTables500InternalServerError</span>().<span style="font-weight:bold;color:#74531f;">ToLeft</span>().<span style="font-weight:bold;color:#74531f;">WithRight</span><<span style="color:#2b91af;">Reservation</span>>()</span>
<span style="background-color: lightsalmon;"> <span style="color:blue;">from</span> reservation3 <span style="color:blue;">in</span>
<span style="font-weight:bold;color:#74531f;">RunUpdate</span>(restaurant.Id, reservation2, <span style="font-weight:bold;color:#1f377f;">scope</span>)
.<span style="font-weight:bold;color:#74531f;">Traverse</span>(<span style="font-weight:bold;color:#1f377f;">r</span> => <span style="font-weight:bold;color:#1f377f;">r</span>.<span style="font-weight:bold;color:#74531f;">ToRight</span>().<span style="font-weight:bold;color:#74531f;">WithLeft</span><<span style="color:#2b91af;">ActionResult</span>>())</span>
<span style="color:blue;">select</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">OkObjectResult</span>(reservation3.<span style="font-weight:bold;color:#74531f;">ToDto</span>());</pre>
</p>
<p>
As is evident from the colour-coding, this isn't quite a sandwich. The structure is honestly more accurately depicted like this:
</p>
<p>
<img src="/content/binary/pure-impure-pure-impure-box.png" alt="A box with green, red, green, and red horizontal tiers.">
</p>
<p>
As I've previously argued, <a href="/2023/10/09/whats-a-sandwich">while the metaphor becomes strained, this still works well as a functional-programming architecture</a>.
</p>
<p>
As defined here, the <code>sandwich</code> value is a <code>Task</code> that must be awaited.
</p>
<p>
<pre><span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">ActionResult</span>, <span style="color:#2b91af;">OkObjectResult</span>> <span style="font-weight:bold;color:#1f377f;">either</span> = <span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">sandwich</span>.<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">either</span>.<span style="font-weight:bold;color:#74531f;">Match</span>(<span style="font-weight:bold;color:#1f377f;">x</span> => <span style="font-weight:bold;color:#1f377f;">x</span>, <span style="font-weight:bold;color:#1f377f;">x</span> => <span style="font-weight:bold;color:#1f377f;">x</span>);</pre>
</p>
<p>
By awaiting the task, we get an <code>Either</code> value. The <code>Put</code> method, on the other hand, must return an <code>ActionResult</code>. How do you turn an <code>Either</code> object into a single object?
</p>
<p>
By pattern matching on it, as the code snippet shows. The <code>L</code> type is already an <code>ActionResult</code>, so we return it without changing it. If C# had had a built-in identity function, I'd used that, but idiomatically, we instead use the <code><span style="font-weight:bold;color:#1f377f;">x</span> => <span style="font-weight:bold;color:#1f377f;">x</span></code> lambda expression.
</p>
<p>
The same is the case for the <code>R</code> type, because <code>OkObjectResult</code> inherits from <code>ActionResult</code>. The identity expression automatically performs the type conversion for us.
</p>
<p>
This, by the way, is a recurring pattern with Either values that I run into in all languages. You've essentially computed an <code>Either<T, T></code>, with the same type on both sides, and now you just want to return whichever <code>T</code> value is contained in the Either value. You'd think this is such a common pattern that Haskell has a nice abstraction for it, but <a href="https://hoogle.haskell.org/?hoogle=Either%20a%20a%20-%3E%20a">even Hoogle fails to suggest a commonly-accepted function that does this</a>. Apparently, <code>either id id</code> is considered below the Fairbairn threshold, too.
</p>
<h3 id="fc8f5dd20a494cc297a55ce57c865212">
Conclusion <a href="#fc8f5dd20a494cc297a55ce57c865212">#</a>
</h3>
<p>
This article presents an example of a non-trivial Impureim Sandwich. When I introduced the pattern, I gave a few examples. I'd deliberately chosen these examples to be simple so that they highlighted the structure of the idea. The downside of that didactic choice is that some commenters found the examples too simplistic. Therefore, I think that there's value in going through more complex examples.
</p>
<p>
The code base that accompanies <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a> is complex enough that it borders on the realistic. It was deliberately written that way, and since I assume that the code base is familiar to readers of the book, I thought it'd be a good resource to show how an Impureim Sandwich might look. I explicitly chose to refactor the <code>Put</code> method, since it's easily the most complicated process in the code base.
</p>
<p>
The benefit of that code base is that it's written in a programming language that reach a large audience. Thus, for the reader curious about functional programming I thought that this could also be a useful introduction to some intermediate concepts.
</p>
<p>
As I've commented along the way, however, I wouldn't expect anyone to write production C# code like this. If you're able to do this, you're also able to do it in a language better suited for this programming paradigm.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Implementation and usage mindsetshttps://blog.ploeh.dk/2024/12/09/implementation-and-usage-mindsets2024-12-09T21:45:00+00:00Mark Seemann
<div id="post">
<p>
<em>A one-dimensional take on the enduring static-versus-dynamic debate.</em>
</p>
<p>
It recently occurred to me that one possible explanation for the standing, and probably never-ending, debate about static versus dynamic types may be that each camp have disjoint perspectives on the kinds of problems their favourite languages help them solve. In short, my hypothesis is that perhaps lovers of dynamically-typed languages often approach a problem from an implementation mindset, whereas proponents of static types emphasize usage.
</p>
<p>
<img src="/content/binary/implementation-versus-usage.png" alt="A question mark in the middle. An arrow from left labelled 'implementation' points to the question mark from a figure indicating a person. Another arrow from the right labelled 'usage' points to the question mark from another figure indicating a person.">
</p>
<p>
I'll expand on this idea here, and then provide examples in two subsequent articles.
</p>
<h3 id="d748f29ae31543fbb6db537711800c62">
Background <a href="#d748f29ae31543fbb6db537711800c62">#</a>
</h3>
<p>
For years I've struggled to understand 'the other side'. While I'm firmly in the statically typed camp, I realize that many highly skilled programmers and thought leaders enjoy, or get great use out of, dynamically typed languages. This worries me, because it <a href="/2021/08/09/am-i-stuck-in-a-local-maximum">might indicate that I'm stuck in a local maximum</a>.
</p>
<p>
In other words, just because I, personally, prefer static types, it doesn't follow that static types are universally better than dynamic types.
</p>
<p>
In reality, it's probably rather the case that we're dealing with a false dichotomy, and that the problem is really multi-dimensional.
</p>
<blockquote>
<p>
"Let me stop you right there: I don't think there is a real dynamic typing versus static typing debate.
</p>
<p>
"What such debates normally are is language X vs language Y debates (where X happens to be dynamic and Y happens to be static)."
</p>
<footer><cite><a href="https://twitter.com/KevlinHenney/status/1425513161252278280">Kevlin Henney</a></cite></footer>
</blockquote>
<p>
Even so, I can't help thinking about such things. Am I missing something?
</p>
<p>
For the past few years, I've dabbled with <a href="https://www.python.org/">Python</a> to see what writing in a popular dynamically typed language is like. It's not a bad language, and I can clearly see how it's attractive. Even so, I'm still frustrated every time I return to some Python code after a few weeks or more. The lack of static types makes it hard for me to pick up, or revisit, old code.
</p>
<h3 id="8b6d87e0536d40b6aaec28d8e6356553">
A question of perspective? <a href="#8b6d87e0536d40b6aaec28d8e6356553">#</a>
</h3>
<p>
Whenever I run into a difference of opinion, I often interpret it as a difference in perspective. Perhaps it's my academic background as an economist, but I consider it a given that people have different motivations, and that incentives influence actions.
</p>
<p>
A related kind of analysis deals with problem definitions. Are we even trying to solve the same problem?
</p>
<p>
I've <a href="/2021/08/09/am-i-stuck-in-a-local-maximum">discussed such questions before, but in a different context</a>. Here, it strikes me that perhaps programmers who gravitate toward dynamically typed languages are focused on another problem than the other group.
</p>
<p>
Again, I'd like to emphasize that I don't consider the world so black and white in reality. Some developers straddle the two camps, and as the above Kevlin Henney quote suggests, there really aren't only two kinds of languages. <a href="https://en.wikipedia.org/wiki/C_(programming_language)">C</a> and <a href="https://www.haskell.org/">Haskell</a> are both statically typed, but the similarities stop there. Likewise, I don't know if it's fair to put JavaScript and <a href="https://clojure.org/">Clojure</a> in the same bucket.
</p>
<p>
That said, I'd still like to offer the following hypothesis, in the spirit that although <a href="https://en.wikipedia.org/wiki/All_models_are_wrong">all models are wrong</a>, some are useful.
</p>
<p>
The idea is that if you're trying to solve a problem related to <em>implementation</em>, dynamically typed languages may be more suitable. If you're trying to implement an algorithm, or even trying to invent one, a dynamic language seems useful. One year, I did a good chunk of <a href="https://adventofcode.com/">Advent of Code</a> in Python, and didn't find it harder than in Haskell. (I ultimately ran out of steam for reasons unrelated to Python.)
</p>
<p>
On the other hand, if your main focus may be <em>usage</em> of your code, perhaps you'll find a statically typed language more useful. At least, I do. I can use the static type system to communicate how my APIs work. How to instantiate my classes. How to call my functions. How return values are shaped. In other words, the preconditions, invariants, and postconditions of my reusable code: <a href="/encapsulation-and-solid/">Encapsulation</a>.
</p>
<h3 id="f0cbf02e11484e9a8c8d0fab9a6463f2">
Examples <a href="#f0cbf02e11484e9a8c8d0fab9a6463f2">#</a>
</h3>
<p>
Some examples may be in order. In the next two articles, I'll first examine how easy it is to implement an algorithm in various programming languages. Then I'll discuss how to encapsulate that algorithm.
</p>
<ul>
<li><a href="/2024/12/23/implementing-rod-cutting">Implementing rod-cutting</a></li>
<li><a href="/2025/01/06/encapsulating-rod-cutting">Encapsulating rod-cutting</a></li>
</ul>
<p>
The articles will both discuss the rod-cutting problem from <a href="/ref/clrs">Introduction to Algorithms</a>, but I'll introduce the problem in the next article.
</p>
<h3 id="97b3e722024b4228924faa2d6ff5d188">
Conclusion <a href="#97b3e722024b4228924faa2d6ff5d188">#</a>
</h3>
<p>
I'd be naive if I believed that a single model can fully explain why some people prefer dynamically typed languages, and others rather like statically typed languages. Even so, suggesting a model helps me understand how to analyze problems.
</p>
<p>
My hypothesis is that dynamically typed languages may be suitable for implementing algorithms, whereas statically typed languages offer better encapsulation.
</p>
<p>
This may be used as a heuristic for 'picking the right tool for the job'. If I need to suss out an algorithm, perhaps I should do it in Python. If, on the other hand, I need to publish a reusable library, perhaps Haskell is a better choice.
</p>
<p>
<strong>Next:</strong> <a href="/2024/12/23/implementing-rod-cutting">Implementing rod-cutting</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Short-circuiting an asynchronous traversalhttps://blog.ploeh.dk/2024/12/02/short-circuiting-an-asynchronous-traversal2024-12-02T09:32:00+00:00Mark Seemann
<div id="post">
<p>
<em>Another C# example.</em>
</p>
<p>
This article is a continuation of <a href="/2024/11/18/collecting-and-handling-result-values">an earlier post</a> about refactoring a piece of imperative code to a <a href="/2018/11/19/functional-architecture-a-definition">functional architecture</a>. It all started with <a href="https://stackoverflow.com/q/79112836/126014">a Stack Overflow question</a>, but read the previous article, and you'll be up to speed.
</p>
<h3 id="2bf66b90d3ba4dfe980538175b647070">
Imperative outset <a href="#2bf66b90d3ba4dfe980538175b647070">#</a>
</h3>
<p>
To begin, consider this mostly imperative code snippet:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">storedItems</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">List</span><<span style="color:#2b91af;">ShoppingListItem</span>>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">failedItems</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">List</span><<span style="color:#2b91af;">ShoppingListItem</span>>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">state</span> = (<span style="font-weight:bold;color:#1f377f;">storedItems</span>, <span style="font-weight:bold;color:#1f377f;">failedItems</span>, hasError: <span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">item</span> <span style="font-weight:bold;color:#8f08c4;">in</span> <span style="font-weight:bold;color:#1f377f;">itemsToUpdate</span>)
{
<span style="color:#2b91af;">OneOf</span><<span style="color:#2b91af;">ShoppingListItem</span>, <span style="color:#2b91af;">NotFound</span>, <span style="color:#2b91af;">Error</span>> <span style="font-weight:bold;color:#1f377f;">updateResult</span> = <span style="color:blue;">await</span> <span style="color:#74531f;">UpdateItem</span>(<span style="font-weight:bold;color:#1f377f;">item</span>, <span style="font-weight:bold;color:#1f377f;">dbContext</span>);
<span style="font-weight:bold;color:#1f377f;">state</span> = <span style="font-weight:bold;color:#1f377f;">updateResult</span>.<span style="font-weight:bold;color:#74531f;">Match</span><(<span style="color:#2b91af;">List</span><<span style="color:#2b91af;">ShoppingListItem</span>>, <span style="color:#2b91af;">List</span><<span style="color:#2b91af;">ShoppingListItem</span>>, <span style="color:blue;">bool</span>)>(
<span style="font-weight:bold;color:#1f377f;">storedItem</span> => { <span style="font-weight:bold;color:#1f377f;">storedItems</span>.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="font-weight:bold;color:#1f377f;">storedItem</span>); <span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">state</span>; },
<span style="font-weight:bold;color:#1f377f;">notFound</span> => { <span style="font-weight:bold;color:#1f377f;">failedItems</span>.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="font-weight:bold;color:#1f377f;">item</span>); <span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">state</span>; },
<span style="font-weight:bold;color:#1f377f;">error</span> => { <span style="font-weight:bold;color:#1f377f;">state</span>.hasError = <span style="color:blue;">true</span>; <span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">state</span>; }
);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">state</span>.hasError)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">Results</span>.<span style="color:#74531f;">BadRequest</span>();
}
<span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">dbContext</span>.<span style="font-weight:bold;color:#74531f;">SaveChangesAsync</span>();
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">Results</span>.<span style="color:#74531f;">Ok</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">BulkUpdateResult</span>([.. <span style="font-weight:bold;color:#1f377f;">storedItems</span>], [.. <span style="font-weight:bold;color:#1f377f;">failedItems</span>]));</pre>
</p>
<p>
I'll recap a few points from the previous article. Apart from one crucial detail, it's similar to the other post. One has to infer most of the types and APIs, since the original post didn't show more code than that. If you're used to engaging with Stack Overflow questions, however, it's not too hard to figure out what most of the moving parts do.
</p>
<p>
The most non-obvious detail is that the code uses a library called <a href="https://github.com/mcintyre321/OneOf/">OneOf</a>, which supplies general-purpose, but rather abstract, sum types. Both the container type <code>OneOf</code>, as well as the two indicator types <code>NotFound</code> and <code>Error</code> are defined in that library.
</p>
<p>
The <code>Match</code> method implements standard <a href="/2018/05/22/church-encoding">Church encoding</a>, which enables the code to pattern-match on the three alternative values that <code>UpdateItem</code> returns.
</p>
<p>
One more detail also warrants an explicit description: The <code>itemsToUpdate</code> object is an input argument of the type <code><span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">ShoppingListItem</span>></code>.
</p>
<p>
The major difference from before is that now the update process short-circuits on the first <code>Error</code>. If an error occurs, it stops processing the rest of the items. In that case, it now returns <code>Results.BadRequest()</code>, and it <em>doesn't</em> save the changes to <code>dbContext</code>.
</p>
<p>
The implementation makes use of mutable state and undisciplined I/O. How do you refactor it to a more functional design?
</p>
<h3 id="d5b47b3ebb0345ea9b1d2879755bec12">
Short-circuiting traversal <a href="#d5b47b3ebb0345ea9b1d2879755bec12">#</a>
</h3>
<p>
<a href="/2024/11/11/traversals">The standard Traverse function</a> isn't lazy, or rather, it does consume the entire input sequence. Even various <a href="https://www.haskell.org/">Haskell</a> data structures I investigated do that. And yes, I even tried to <code>traverse</code> <a href="https://hackage.haskell.org/package/list-t/docs/ListT.html">ListT</a>. If there's a data structure that you can <code>traverse</code> with deferred execution of I/O-bound actions, I'm not aware of it.
</p>
<p>
That said, all is not lost, but you'll need to implement a more specialized traversal. While consuming the input sequence, the function needs to know when to stop. It can't do that on just any <a href="https://learn.microsoft.com/dotnet/api/system.collections.generic.ienumerable-1">IEnumerable<T></a>, because it has no information about <code>T</code>.
</p>
<p>
If, on the other hand, you specialize the traversal to a sequence of items with more information, you can stop processing if it encounters a particular condition. You could generalize this to, say, <code>IEnumerable<Either<L, R>></code>, but since I already have the OneOf library in scope, I'll use that, instead of implementing or pulling in a general-purpose <a href="/2018/06/11/church-encoded-either">Either</a> data type.
</p>
<p>
In fact, I'll just use a three-way <code>OneOf</code> type compatible with the one that <code>UpdateItem</code> returns.
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">static</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">OneOf</span><<span style="color:#2b91af;">T1</span>, <span style="color:#2b91af;">T2</span>, <span style="color:#2b91af;">Error</span>>>> <span style="color:#74531f;">Sequence</span><<span style="color:#2b91af;">T1</span>, <span style="color:#2b91af;">T2</span>>(
<span style="color:blue;">this</span> <span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">OneOf</span><<span style="color:#2b91af;">T1</span>, <span style="color:#2b91af;">T2</span>, <span style="color:#2b91af;">Error</span>>>> <span style="font-weight:bold;color:#1f377f;">tasks</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">results</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">List</span><<span style="color:#2b91af;">OneOf</span><<span style="color:#2b91af;">T1</span>, <span style="color:#2b91af;">T2</span>, <span style="color:#2b91af;">Error</span>>>();
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">task</span> <span style="font-weight:bold;color:#8f08c4;">in</span> <span style="font-weight:bold;color:#1f377f;">tasks</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">result</span> = <span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">task</span>;
<span style="font-weight:bold;color:#1f377f;">results</span>.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="font-weight:bold;color:#1f377f;">result</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">result</span>.IsT2)
<span style="font-weight:bold;color:#8f08c4;">break</span>;
}
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">results</span>;
}</pre>
</p>
<p>
This implementation doesn't care what <code>T1</code> or <code>T2</code> is, so they're free to be <code>ShoppingListItem</code> and <code>NotFound</code>. The third type argument, on the other hand, must be <code>Error</code>.
</p>
<p>
The <code>if</code> conditional looks a bit odd, but as I wrote, the types that ship with the OneOf library have rather abstract APIs. A three-way <code>OneOf</code> value comes with three case tests called <code>IsT0</code>, <code>IsT1</code>, and <code>IsT2</code>. Notice that the library uses a zero-indexed naming convention for its type parameters. <code>IsT2</code> returns <code>true</code> if the value is the <em>third</em> kind, in this case <code>Error</code>. If a <code>task</code> turns out to produce an <code>Error</code>, the <code>Sequence</code> method adds that one error, but then stops processing any remaining items.
</p>
<p>
Some readers may complain that the entire implementation of <code>Sequence</code> is imperative. It hardly matters that much, since the mutation doesn't escape the method. The behaviour is as functional as it's possible to make it. It's fundamentally I/O-bound, so we can't consider it a <a href="https://en.wikipedia.org/wiki/Pure_function">pure function</a>. That said, if we hypothetically imagine that all the <code>tasks</code> are deterministic and have no side effects, the <code>Sequence</code> function does become a pure function when viewed as a black box. From the outside, you can't tell that the implementation is imperative.
</p>
<p>
It <em>is</em> possible to implement <code>Sequence</code> in a proper functional style, and it might make <a href="/2020/01/13/on-doing-katas">a good exercise</a>. I think, however, that it'll be difficult in C#. In <a href="https://fsharp.org/">F#</a> or Haskell I'd use recursion, and while you <em>can</em> do that in C#, I admit that I've lost sight of whether or not <a href="/2015/12/22/tail-recurse">tail recursion</a> is supported by the C# compiler.
</p>
<p>
Be that as it may, the traversal implementation doesn't change.
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">OneOf</span><<span style="color:#2b91af;">TResult</span>, <span style="color:#2b91af;">T2</span>, <span style="color:#2b91af;">Error</span>>>> <span style="color:#74531f;">Traverse</span><<span style="color:#2b91af;">T1</span>, <span style="color:#2b91af;">T2</span>, <span style="color:#2b91af;">TResult</span>>(
<span style="color:blue;">this</span> <span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">T1</span>> <span style="font-weight:bold;color:#1f377f;">items</span>,
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T1</span>, <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">OneOf</span><<span style="color:#2b91af;">TResult</span>, <span style="color:#2b91af;">T2</span>, <span style="color:#2b91af;">Error</span>>>> <span style="font-weight:bold;color:#1f377f;">selector</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">items</span>.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>).<span style="font-weight:bold;color:#74531f;">Sequence</span>();
}</pre>
</p>
<p>
You can now <code>Traverse</code> the <code>itemsToUpdate</code>:
</p>
<p>
<pre><span style="color:green;">// Impure</span>
<span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">OneOf</span><<span style="color:#2b91af;">ShoppingListItem</span>, <span style="color:#2b91af;">NotFound</span><<span style="color:#2b91af;">ShoppingListItem</span>>, <span style="color:#2b91af;">Error</span>>> <span style="font-weight:bold;color:#1f377f;">results</span> =
<span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">itemsToUpdate</span>.<span style="font-weight:bold;color:#74531f;">Traverse</span>(<span style="font-weight:bold;color:#1f377f;">item</span> => <span style="color:#74531f;">UpdateItem</span>(<span style="font-weight:bold;color:#1f377f;">item</span>, <span style="font-weight:bold;color:#1f377f;">dbContext</span>));</pre>
</p>
<p>
As the <code>// Impure</code> comment may suggest, this constitutes the first impure layer of an <a href="/2020/03/02/impureim-sandwich">Impureim Sandwich</a>.
</p>
<h3 id="e7d6b741e8e1406b9588a5788df0ff9b">
Aggregating the results <a href="#e7d6b741e8e1406b9588a5788df0ff9b">#</a>
</h3>
<p>
Since the above statement awaits the traversal, the <code>results</code> object is a 'pure' object that can be passed to a pure function. This does, however, assume that <code>ShoppingListItem</code> is an immutable object.
</p>
<p>
The next step must collect results and <code>NotFound</code>-related failures, but contrary to the previous article, it must short-circuit if it encounters an <code>Error</code>. This again suggests an Either-like data structure, but again I'll repurpose a <code>OneOf</code> container. I'll start by defining a <code>seed</code> for an aggregation (a <em>left fold</em>).
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">seed</span> =
<span style="color:#2b91af;">OneOf</span><(<span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">ShoppingListItem</span>>, <span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">ShoppingListItem</span>>), <span style="color:#2b91af;">Error</span>>
.<span style="color:#74531f;">FromT0</span>(([], []));</pre>
</p>
<p>
This type can be either a tuple or an error. The .NET tendency is often to define an explicit <code>Result<TSuccess, TFailure></code> type, where <code>TSuccess</code> is defined to the left of <code>TFailure</code>. This, for example, is <a href="https://learn.microsoft.com/dotnet/fsharp/language-reference/results">how F# defines Result types</a>, and other .NET libraries tend to emulate that design. That's also what I've done here, although I admit that I'm regularly confused when going back and forth between F# and Haskell, where the <code>Right</code> case is <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatically</a> considered to indicate success.
</p>
<p>
As already discussed, OneOf follows a zero-indexed naming convention for type parameters, so <code>FromT0</code> indicates the first (or leftmost) case. The seed is thus initialized with a tuple that contains two empty sequences.
</p>
<p>
As in the previous article, you can now use the <a href="https://learn.microsoft.com/dotnet/api/system.linq.enumerable.aggregate">Aggregate</a> method to collect the result you want.
</p>
<p>
<pre><span style="color:#2b91af;">OneOf</span><<span style="color:#2b91af;">BulkUpdateResult</span>, <span style="color:#2b91af;">Error</span>> <span style="font-weight:bold;color:#1f377f;">result</span> = <span style="font-weight:bold;color:#1f377f;">results</span>
.<span style="font-weight:bold;color:#74531f;">Aggregate</span>(
<span style="font-weight:bold;color:#1f377f;">seed</span>,
(<span style="font-weight:bold;color:#1f377f;">state</span>, <span style="font-weight:bold;color:#1f377f;">result</span>) =>
<span style="font-weight:bold;color:#1f377f;">result</span>.<span style="font-weight:bold;color:#74531f;">Match</span>(
<span style="font-weight:bold;color:#1f377f;">storedItem</span> => <span style="font-weight:bold;color:#1f377f;">state</span>.<span style="font-weight:bold;color:#74531f;">MapT0</span>(
<span style="font-weight:bold;color:#1f377f;">t</span> => (<span style="font-weight:bold;color:#1f377f;">t</span>.Item1.<span style="font-weight:bold;color:#74531f;">Append</span>(<span style="font-weight:bold;color:#1f377f;">storedItem</span>), <span style="font-weight:bold;color:#1f377f;">t</span>.Item2)),
<span style="font-weight:bold;color:#1f377f;">notFound</span> => <span style="font-weight:bold;color:#1f377f;">state</span>.<span style="font-weight:bold;color:#74531f;">MapT0</span>(
<span style="font-weight:bold;color:#1f377f;">t</span> => (<span style="font-weight:bold;color:#1f377f;">t</span>.Item1, <span style="font-weight:bold;color:#1f377f;">t</span>.Item2.<span style="font-weight:bold;color:#74531f;">Append</span>(<span style="font-weight:bold;color:#1f377f;">notFound</span>.Item))),
<span style="font-weight:bold;color:#1f377f;">e</span> => <span style="font-weight:bold;color:#1f377f;">e</span>))
.<span style="font-weight:bold;color:#74531f;">MapT0</span>(<span style="font-weight:bold;color:#1f377f;">t</span> => <span style="color:blue;">new</span> <span style="color:#2b91af;">BulkUpdateResult</span>(<span style="font-weight:bold;color:#1f377f;">t</span>.Item1.<span style="font-weight:bold;color:#74531f;">ToArray</span>(), <span style="font-weight:bold;color:#1f377f;">t</span>.Item2.<span style="font-weight:bold;color:#74531f;">ToArray</span>()));</pre>
</p>
<p>
This expression is a two-step composition. I'll get back to the concluding <code>MapT0</code> in a moment, but let's first discuss what happens in the <code>Aggregate</code> step. Since the <code>state</code> is now a discriminated union, the big lambda expression not only has to <code>Match</code> on the <code>result</code>, but it also has to deal with the two mutually exclusive cases in which <code>state</code> can be.
</p>
<p>
Although it comes third in the code listing, it may be easiest to explain if we start with the error case. Keep in mind that the <code>seed</code> starts with the optimistic assumption that the operation is going to succeed. If, however, we encounter an error <code>e</code>, we now switch the <code>state</code> to the <code>Error</code> case. Once in that state, it stays there.
</p>
<p>
The two other <code>result</code> cases map over the first (i.e. the success) case, appending the result to the appropriate sequence in the tuple <code>t</code>. Since these expressions map over the first (zero-indexed) case, these updates only run as long as the <code>state</code> is in the success case. If the <code>state</code> is in the error state, these lambda expressions don't run, and the <code>state</code> doesn't change.
</p>
<p>
After having collected the tuple of sequences, the final step is to map over the success case, turning the tuple <code>t</code> into a <code>BulkUpdateResult</code>. That's what <code>MapT0</code> does: It maps over the first (zero-indexed) case, which contains the tuple of sequences. It's a standard <a href="/2018/03/22/functors">functor</a> projection.
</p>
<h3 id="e4c3b20a30c34b4785ccdd886b20d197">
Saving the changes and returning the results <a href="#e4c3b20a30c34b4785ccdd886b20d197">#</a>
</h3>
<p>
The final, impure step in the sandwich is to save the changes and return the results:
</p>
<p>
<pre><span style="color:green;">// Impure</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">result</span>.<span style="font-weight:bold;color:#74531f;">Match</span>(
<span style="color:blue;">async</span> <span style="font-weight:bold;color:#1f377f;">bulkUpdateResult</span> =>
{
<span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">dbContext</span>.<span style="font-weight:bold;color:#74531f;">SaveChangesAsync</span>();
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">Results</span>.<span style="color:#74531f;">Ok</span>(<span style="font-weight:bold;color:#1f377f;">bulkUpdateResult</span>);
},
<span style="font-weight:bold;color:#1f377f;">_</span> => <span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">FromResult</span>(<span style="color:#2b91af;">Results</span>.<span style="color:#74531f;">BadRequest</span>()));</pre>
</p>
<p>
Note that it only calls <code>dbContext.SaveChangesAsync()</code> in case the <code>result</code> is a success.
</p>
<h3 id="a6d28bd9d66a4e068bc4cd4ba21dde32">
Accumulating the bulk-update result <a href="#a6d28bd9d66a4e068bc4cd4ba21dde32">#</a>
</h3>
<p>
So far, I've assumed that the final <code>BulkUpdateResult</code> class is just a simple immutable container without much functionality. If, however, we add some copy-and-update functions to it, we can use that to aggregate the result, instead of an anonymous tuple.
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:#2b91af;">BulkUpdateResult</span> <span style="font-weight:bold;color:#74531f;">Store</span>(<span style="color:#2b91af;">ShoppingListItem</span> <span style="font-weight:bold;color:#1f377f;">item</span>) =>
<span style="color:blue;">new</span>([.. StoredItems, <span style="font-weight:bold;color:#1f377f;">item</span>], FailedItems);
<span style="color:blue;">internal</span> <span style="color:#2b91af;">BulkUpdateResult</span> <span style="font-weight:bold;color:#74531f;">Fail</span>(<span style="color:#2b91af;">ShoppingListItem</span> <span style="font-weight:bold;color:#1f377f;">item</span>) =>
<span style="color:blue;">new</span>(StoredItems, [.. FailedItems, <span style="font-weight:bold;color:#1f377f;">item</span>]);</pre>
</p>
<p>
I would have personally preferred the name <code>NotFound</code> instead of <code>Fail</code>, but I was going with the original post's <code>failedItems</code> terminology, and I thought that it made more sense to call a method <code>Fail</code> when it adds to a collection called <code>FailedItems</code>.
</p>
<p>
Adding these two instance methods to <code>BulkUpdateResult</code> simplifies the composing code:
</p>
<p>
<pre><span style="color:green;">// Pure</span>
<span style="color:#2b91af;">OneOf</span><<span style="color:#2b91af;">BulkUpdateResult</span>, <span style="color:#2b91af;">Error</span>> <span style="font-weight:bold;color:#1f377f;">result</span> = <span style="font-weight:bold;color:#1f377f;">results</span>
.<span style="font-weight:bold;color:#74531f;">Aggregate</span>(
<span style="color:#2b91af;">OneOf</span><<span style="color:#2b91af;">BulkUpdateResult</span>, <span style="color:#2b91af;">Error</span>>.<span style="color:#74531f;">FromT0</span>(<span style="color:blue;">new</span>([], [])),
(<span style="font-weight:bold;color:#1f377f;">state</span>, <span style="font-weight:bold;color:#1f377f;">result</span>) =>
<span style="font-weight:bold;color:#1f377f;">result</span>.<span style="font-weight:bold;color:#74531f;">Match</span>(
<span style="font-weight:bold;color:#1f377f;">storedItem</span> => <span style="font-weight:bold;color:#1f377f;">state</span>.<span style="font-weight:bold;color:#74531f;">MapT0</span>(<span style="font-weight:bold;color:#1f377f;">bur</span> => <span style="font-weight:bold;color:#1f377f;">bur</span>.<span style="font-weight:bold;color:#74531f;">Store</span>(<span style="font-weight:bold;color:#1f377f;">storedItem</span>)),
<span style="font-weight:bold;color:#1f377f;">notFound</span> => <span style="font-weight:bold;color:#1f377f;">state</span>.<span style="font-weight:bold;color:#74531f;">MapT0</span>(<span style="font-weight:bold;color:#1f377f;">bur</span> => <span style="font-weight:bold;color:#1f377f;">bur</span>.<span style="font-weight:bold;color:#74531f;">Fail</span>(<span style="font-weight:bold;color:#1f377f;">notFound</span>.Item)),
<span style="font-weight:bold;color:#1f377f;">e</span> => <span style="font-weight:bold;color:#1f377f;">e</span>));</pre>
</p>
<p>
This variation starts with an empty <code>BulkUpdateResult</code> and then uses <code>Store</code> or <code>Fail</code> as appropriate to update the state. The final, impure step of the sandwich remains the same.
</p>
<h3 id="ed88649e2d75403ab654fe7c034b6c1f">
Conclusion <a href="#ed88649e2d75403ab654fe7c034b6c1f">#</a>
</h3>
<p>
It's a bit more tricky to implement a short-circuiting traversal than the standard traversal. You can, still, implement a specialized <code>Sequence</code> or <code>Traverse</code> method, but it requires that the input stream carries enough information to decide when to stop processing more items. In this article, I used a specialized three-way union, but you could generalize this to use a standard Either or Result type.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Nested monadshttps://blog.ploeh.dk/2024/11/25/nested-monads2024-11-25T07:31:00+00:00Mark Seemann
<div id="post">
<p>
<em>You can stack some monads in such a way that the composition is also a monad.</em>
</p>
<p>
This article is part of <a href="/2022/07/11/functor-relationships">a series of articles about functor relationships</a>. In a previous article you learned that <a href="/2024/10/28/functor-compositions">nested functors form a functor</a>. You may have wondered if <a href="/2022/03/28/monads">monads</a> compose in the same way. Does a monad nested in a monad form a monad?
</p>
<p>
As far as I know, there's no universal rule like that, but some monads compose well. Fortunately, it's been my experience that the combinations that you need in practice are among those that exist and are well-known. In a <a href="https://www.haskell.org/">Haskell</a> context, it's often the case that you need to run some kind of 'effect' inside <code>IO</code>. Perhaps you want to use <code>Maybe</code> or <code>Either</code> nested within <code>IO</code>.
</p>
<p>
In .NET, you may run into a similar need to compose task-based programming with an effect. This happens more often in <a href="https://fsharp.org/">F#</a> than in C#, since F# comes with other native monads (<code>option</code> and <code>Result</code>, to name the most common).
</p>
<h3 id="d84f448d09124e31a8fbeb27abe3d826">
Abstract shape <a href="#d84f448d09124e31a8fbeb27abe3d826">#</a>
</h3>
<p>
You'll see some real examples in a moment, but as usual it helps to outline what it is that we're looking for. Imagine that you have a monad. We'll call it <code>F</code> in keeping with tradition. In this article series, you've seen how two or more <a href="/2018/03/22/functors">functors</a> compose. When discussing the abstract shapes of things, we've typically called our two abstract functors <code>F</code> and <code>G</code>. I'll stick to that naming scheme here, because monads are functors (<a href="/2022/03/28/monads">that you can flatten</a>).
</p>
<p>
Now imagine that you have a value that stacks two monads: <code>F<G<T>></code>. If the inner monad <code>G</code> is the 'right' kind of monad, that configuration itself forms a monad.
</p>
<p>
<img src="/content/binary/nested-monads-transformed-to-single-monad.png" alt="Nested monads depicted as concentric circles. To the left the circle F contains the circle G that again contains the circle a. To the right the wider circle FG contains the circle that contains a. An arrow points from the left circles to the right circles.">
</p>
<p>
In the diagram, I've simply named the combined monad <code>FG</code>, which is a naming strategy I've seen in the real world, too: <code>TaskResult</code>, etc.
</p>
<p>
As I've already mentioned, if there's a general theorem that says that this is always possible, I'm not aware of it. To the contrary, I seem to recall reading that this is distinctly not the case, but the source escapes me at the moment. One hint, though, is offered in the documentation of <a href="https://hackage.haskell.org/package/base/docs/Data-Functor-Compose.html">Data.Functor.Compose</a>:
</p>
<blockquote>
<p>
"The composition of applicative functors is always applicative, but the composition of monads is not always a monad."
</p>
</blockquote>
<p>
Thankfully, the monads that you mostly need to compose do, in fact, compose. They include <a href="/2022/04/25/the-maybe-monad">Maybe</a>, <a href="/2022/05/09/an-either-monad">Either</a>, <a href="/2022/06/20/the-state-monad">State</a>, <a href="/2022/11/14/the-reader-monad">Reader</a>, and <a href="/2022/05/16/the-identity-monad">Identity</a> (okay, that one maybe isn't that useful). In other words, any monad <code>F</code> that composes with e.g. <code>Maybe</code>, that is, <code>F<Maybe<T>></code>, also forms a monad.
</p>
<p>
Notice that it's the 'inner' monad that determines whether composition is possible. Not the 'outer' monad.
</p>
<p>
For what it's worth, I'm basing much of this on my personal experience, which was again helpfully guided by <a href="https://hackage.haskell.org/package/transformers/docs/Control-Monad-Trans-Class.html">Control.Monad.Trans.Class</a>. I don't, however, wish to turn this article into an article about monad transformers, because if you already know Haskell, you can read the documentation and look at examples. And if you don't know Haskell, the specifics of monad transformers don't readily translate to languages like C# or F#.
</p>
<p>
The conclusions do translate, but the specific language mechanics don't.
</p>
<p>
Let's look at some common examples.
</p>
<h3 id="51dcb0d54afc46d7b26b7f4021e08dbc">
TaskMaybe monad <a href="#51dcb0d54afc46d7b26b7f4021e08dbc">#</a>
</h3>
<p>
We'll start with a simple, yet realistic example. The article <a href="/2019/02/11/asynchronous-injection">Asynchronous Injection</a> shows a simple operation that involves reading from a database, making a decision, and potentially writing to the database. The final composition, repeated here for your convenience, is an asynchronous (that is, <code>Task</code>-based) process.
</p>
<p>
<pre><span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">await</span> Repository.<span style="font-weight:bold;color:#74531f;">ReadReservations</span>(<span style="font-weight:bold;color:#1f377f;">reservation</span>.Date)
.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">rs</span> => maîtreD.<span style="font-weight:bold;color:#74531f;">TryAccept</span>(<span style="font-weight:bold;color:#1f377f;">rs</span>, <span style="font-weight:bold;color:#1f377f;">reservation</span>))
.<span style="font-weight:bold;color:#74531f;">SelectMany</span>(<span style="font-weight:bold;color:#1f377f;">m</span> => <span style="font-weight:bold;color:#1f377f;">m</span>.<span style="font-weight:bold;color:#74531f;">Traverse</span>(Repository.<span style="font-weight:bold;color:#74531f;">Create</span>))
.<span style="font-weight:bold;color:#74531f;">Match</span>(<span style="font-weight:bold;color:#74531f;">InternalServerError</span>(<span style="color:#a31515;">"Table unavailable"</span>), <span style="font-weight:bold;color:#74531f;">Ok</span>);</pre>
</p>
<p>
The problem here is that <code>TryAccept</code> returns <code><span style="color:#2b91af;">Maybe</span><<span style="color:#2b91af;">Reservation</span>></code>, but since the overall workflow already 'runs in' an <a href="/2022/06/06/asynchronous-monads">asynchronous monad</a> (<code>Task</code>), the monads are now nested as <code><span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Maybe</span><<span style="color:#2b91af;">T</span>>></code>.
</p>
<p>
The way I dealt with that issue in the above code snippet was to rely on a <a href="/2024/11/11/traversals">traversal</a>, but it's actually an inelegant solution. The way that the <code>SelectMany</code> invocation maps over the <code><span style="color:#2b91af;">Maybe</span><<span style="color:#2b91af;">Reservation</span>></code> <code>m</code> is awkward. Instead of <a href="/2018/07/02/terse-operators-make-business-code-more-readable">composing a business process</a>, the scaffolding is on display, so to speak. Sometimes this is unavoidable, but at other times, there may be a better way.
</p>
<p>
In my defence, when I wrote that article in 2019 I had another pedagogical goal than teaching nested monads. It turns out, however, that you can rewrite the business process using the <code><span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Maybe</span><<span style="color:#2b91af;">T</span>>></code> stack as a monad in its own right.
</p>
<p>
A monad needs two functions: <em>return</em> and either <em>bind</em> or <em>join</em>. In C# or F#, you can often treat <em>return</em> as 'implied', in the sense that you can always wrap <code><span style="color:blue;">new</span> <span style="color:#2b91af;">Maybe</span><<span style="color:#2b91af;">T</span>></code> in a call to <a href="https://learn.microsoft.com/dotnet/api/system.threading.tasks.task.fromresult">Task.FromResult</a>. You'll see that in a moment.
</p>
<p>
While you can be cavalier about monadic <em>return</em>, you'll need to explicitly implement either <em>bind</em> or <em>join</em>. In this case, it turns out that the sample code base already had a <code>SelectMany</code> implementation:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Maybe</span><<span style="color:#2b91af;">TResult</span>>> <span style="color:#74531f;">SelectMany</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">TResult</span>>(
<span style="color:blue;">this</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Maybe</span><<span style="color:#2b91af;">T</span>>> <span style="font-weight:bold;color:#1f377f;">source</span>,
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Maybe</span><<span style="color:#2b91af;">TResult</span>>>> <span style="font-weight:bold;color:#1f377f;">selector</span>)
{
<span style="color:#2b91af;">Maybe</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">m</span> = <span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">source</span>;
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">m</span>.<span style="font-weight:bold;color:#74531f;">Match</span>(
<span style="font-weight:bold;color:#1f377f;">nothing</span>: <span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">FromResult</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">Maybe</span><<span style="color:#2b91af;">TResult</span>>()),
<span style="font-weight:bold;color:#1f377f;">just</span>: <span style="font-weight:bold;color:#1f377f;">x</span> => <span style="font-weight:bold;color:#1f377f;">selector</span>(<span style="font-weight:bold;color:#1f377f;">x</span>));
}</pre>
</p>
<p>
The method first awaits the <code>Maybe</code> value, and then proceeds to <code>Match</code> on it. In the <code>nothing</code> case, you see the implicit <em>return</em> being used. In the <code>just</code> case, the <code>SelectMany</code> method calls <code>selector</code> with whatever <code>x</code> value was contained in the <code>Maybe</code> object. The result of calling <code>selector</code> already has the desired type <code><span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Maybe</span><<span style="color:#2b91af;">TResult</span>>></code>, so the implementation simply returns that value without further ado.
</p>
<p>
This enables you to rewrite the <code>SelectMany</code> call in the business process so that it instead looks like this:
</p>
<p>
<pre><span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">await</span> Repository.<span style="font-weight:bold;color:#74531f;">ReadReservations</span>(<span style="font-weight:bold;color:#1f377f;">reservation</span>.Date)
.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">rs</span> => maîtreD.<span style="font-weight:bold;color:#74531f;">TryAccept</span>(<span style="font-weight:bold;color:#1f377f;">rs</span>, <span style="font-weight:bold;color:#1f377f;">reservation</span>))
.<span style="font-weight:bold;color:#74531f;">SelectMany</span>(<span style="font-weight:bold;color:#1f377f;">r</span> => Repository.<span style="font-weight:bold;color:#74531f;">Create</span>(<span style="font-weight:bold;color:#1f377f;">r</span>).<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">i</span> => <span style="color:blue;">new</span> <span style="color:#2b91af;">Maybe</span><<span style="color:blue;">int</span>>(<span style="font-weight:bold;color:#1f377f;">i</span>)))
.<span style="font-weight:bold;color:#74531f;">Match</span>(<span style="font-weight:bold;color:#74531f;">InternalServerError</span>(<span style="color:#a31515;">"Table unavailable"</span>), <span style="font-weight:bold;color:#74531f;">Ok</span>);</pre>
</p>
<p>
At first glance, it doesn't look like much of an improvement. To be sure, the lambda expression within the <code>SelectMany</code> method no longer operates on a <code>Maybe</code> value, but rather on the <code>Reservation</code> Domain Model <code>r</code>. On the other hand, we're now saddled with that graceless <code><span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">i</span> => <span style="color:blue;">new</span> <span style="color:#2b91af;">Maybe</span><<span style="color:blue;">int</span>>(<span style="font-weight:bold;color:#1f377f;">i</span>))</code>.
</p>
<p>
Had this been Haskell, we could have made this more succinct by eta reducing the <code>Maybe</code> case constructor and used the <code><$></code> infix operator instead of <code>fmap</code>; something like <code>Just <$> create r</code>. In C#, on the other hand, we can do something that Haskell doesn't allow. We can overload the <code>SelectMany</code> method:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Maybe</span><<span style="color:#2b91af;">TResult</span>>> <span style="color:#74531f;">SelectMany</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">TResult</span>>(
<span style="color:blue;">this</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Maybe</span><<span style="color:#2b91af;">T</span>>> <span style="font-weight:bold;color:#1f377f;">source</span>,
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">TResult</span>>> <span style="font-weight:bold;color:#1f377f;">selector</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">source</span>.<span style="font-weight:bold;color:#74531f;">SelectMany</span>(<span style="font-weight:bold;color:#1f377f;">x</span> => <span style="font-weight:bold;color:#1f377f;">selector</span>(<span style="font-weight:bold;color:#1f377f;">x</span>).<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">y</span> => <span style="color:blue;">new</span> <span style="color:#2b91af;">Maybe</span><<span style="color:#2b91af;">TResult</span>>(<span style="font-weight:bold;color:#1f377f;">y</span>)));
}</pre>
</p>
<p>
This overload generalizes the 'pattern' exemplified by the above business process composition. Instead of a specific method call, it now works with any <code>selector</code> function that returns <code><span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">TResult</span>></code>. Since <code>selector</code> only returns a <code><span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">TResult</span>></code> value, and not a <code><span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Maybe</span><<span style="color:#2b91af;">TResult</span>>></code> value, as actually required in this nested monad, the overload has to map (that is, <code>Select</code>) the result by wrapping it in a <code><span style="color:blue;">new</span> <span style="color:#2b91af;">Maybe</span><<span style="color:#2b91af;">TResult</span>></code>.
</p>
<p>
This now enables you to improve the business process composition to something more readable.
</p>
<p>
<pre><span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">await</span> Repository.<span style="font-weight:bold;color:#74531f;">ReadReservations</span>(<span style="font-weight:bold;color:#1f377f;">reservation</span>.Date)
.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">rs</span> => maîtreD.<span style="font-weight:bold;color:#74531f;">TryAccept</span>(<span style="font-weight:bold;color:#1f377f;">rs</span>, <span style="font-weight:bold;color:#1f377f;">reservation</span>))
.<span style="font-weight:bold;color:#74531f;">SelectMany</span>(Repository.<span style="font-weight:bold;color:#74531f;">Create</span>)
.<span style="font-weight:bold;color:#74531f;">Match</span>(<span style="font-weight:bold;color:#74531f;">InternalServerError</span>(<span style="color:#a31515;">"Table unavailable"</span>), <span style="font-weight:bold;color:#74531f;">Ok</span>);</pre>
</p>
<p>
It even turned out to be possible to eta reduce the lambda expression instead of the (also valid, but more verbose) <code><span style="font-weight:bold;color:#1f377f;">r</span> => Repository.<span style="font-weight:bold;color:#74531f;">Create</span>(<span style="font-weight:bold;color:#1f377f;">r</span>)</code>.
</p>
<p>
If you're interested in the sample code, I've pushed a branch named <code>use-monad-stack</code> to <a href="https://github.com/ploeh/asynchronous-injection">the GitHub repository</a>.
</p>
<p>
Not surprisingly, the F# <code>bind</code> function is much terser:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">bind</span> <span style="color:#74531f;">f</span> <span style="font-weight:bold;color:#1f377f;">x</span> = <span style="color:blue;">async</span> {
<span style="color:blue;">match!</span> <span style="font-weight:bold;color:#1f377f;">x</span> <span style="color:blue;">with</span>
| <span style="color:#2b91af;">Some</span> <span style="font-weight:bold;color:#1f377f;">x'</span> <span style="color:blue;">-></span> <span style="color:blue;">return!</span> <span style="color:#74531f;">f</span> <span style="font-weight:bold;color:#1f377f;">x'</span>
| <span style="color:#2b91af;">None</span> <span style="color:blue;">-></span> <span style="color:blue;">return</span> <span style="color:#2b91af;">None</span> }</pre>
</p>
<p>
You can find that particular snippet in the code base that accompanies the article <a href="/2019/12/02/refactoring-registration-flow-to-functional-architecture">Refactoring registration flow to functional architecture</a>, although as far as I can tell, it's not actually in use in that code base. I probably just added it because I could.
</p>
<p>
You can find Haskell examples of combining <a href="https://hackage.haskell.org/package/transformers/docs/Control-Monad-Trans-Maybe.html">MaybeT</a> with <code>IO</code> in various articles on this blog. One of them is <a href="/2017/02/02/dependency-rejection">Dependency rejection</a>.
</p>
<h3 id="74c0764ee623459596700a6462dd5452">
TaskResult monad <a href="#74c0764ee623459596700a6462dd5452">#</a>
</h3>
<p>
A similar, but slightly more complex, example involves nesting Either values in asynchronous workflows. In some languages, such as F#, Either is rather called <a href="https://learn.microsoft.com/dotnet/fsharp/language-reference/results">Result</a>, and asynchronous workflows are modelled by a <code>Task</code> <a href="https://bartoszmilewski.com/2014/01/14/functors-are-containers/">container</a>, as already demonstrated above. Thus, on .NET at least, this nested monad is often called <em>TaskResult</em>, but you may also see <em>AsyncResult</em>, <em>AsyncEither</em>, or other combinations. Depending on the programming language, such names may be used only for modules, and not for the container type itself. In C# or F# code, for example, you may look in vain after a class called <code>TaskResult<T></code>, but rather find a <code>TaskResult</code> static class or module.
</p>
<p>
In C# you can define monadic <em>bind</em> like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R1</span>>> <span style="color:#74531f;">SelectMany</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>, <span style="color:#2b91af;">R1</span>>(
<span style="color:blue;">this</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>>> <span style="font-weight:bold;color:#1f377f;">source</span>,
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">R</span>, <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R1</span>>>> <span style="font-weight:bold;color:#1f377f;">selector</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">source</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">ArgumentNullException</span>(<span style="color:blue;">nameof</span>(<span style="font-weight:bold;color:#1f377f;">source</span>));
<span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>> <span style="font-weight:bold;color:#1f377f;">x</span> = <span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">source</span>.<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">x</span>.<span style="font-weight:bold;color:#74531f;">Match</span>(
<span style="font-weight:bold;color:#1f377f;">l</span> => <span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">FromResult</span>(<span style="color:#2b91af;">Either</span>.<span style="color:#74531f;">Left</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R1</span>>(<span style="font-weight:bold;color:#1f377f;">l</span>)),
<span style="font-weight:bold;color:#1f377f;">selector</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
}</pre>
</p>
<p>
Here I've again passed the eta-reduced <code>selector</code> straight to the <em>right</em> case of the <code>Either</code> value, but <code><span style="font-weight:bold;color:#1f377f;">r</span> => <span style="font-weight:bold;color:#1f377f;">selector</span>(<span style="font-weight:bold;color:#1f377f;">r</span>)</code> works, too.
</p>
<p>
The <em>left</em> case shows another example of 'implicit monadic <em>return</em>'. I didn't bother defining an explicit <code>Return</code> function, but rather use <code><span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">FromResult</span>(<span style="color:#2b91af;">Either</span>.<span style="color:#74531f;">Left</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R1</span>>(<span style="font-weight:bold;color:#1f377f;">l</span>))</code> to return a <code><span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R1</span>>></code> value.
</p>
<p>
As is the case with C#, you'll also need to add a special overload to enable the syntactic sugar of <a href="https://learn.microsoft.com/dotnet/csharp/linq/get-started/query-expression-basics">query expressions</a>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R1</span>>> <span style="color:#74531f;">SelectMany</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">U</span>, <span style="color:#2b91af;">R</span>, <span style="color:#2b91af;">R1</span>>(
<span style="color:blue;">this</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>>> <span style="font-weight:bold;color:#1f377f;">source</span>,
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">R</span>, <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Either</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">U</span>>>> <span style="font-weight:bold;color:#1f377f;">k</span>,
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">R</span>, <span style="color:#2b91af;">U</span>, <span style="color:#2b91af;">R1</span>> <span style="font-weight:bold;color:#1f377f;">s</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">source</span>.<span style="font-weight:bold;color:#74531f;">SelectMany</span>(<span style="font-weight:bold;color:#1f377f;">x</span> => <span style="font-weight:bold;color:#1f377f;">k</span>(<span style="font-weight:bold;color:#1f377f;">x</span>).<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">y</span> => <span style="font-weight:bold;color:#1f377f;">s</span>(<span style="font-weight:bold;color:#1f377f;">x</span>, <span style="font-weight:bold;color:#1f377f;">y</span>)));
}</pre>
</p>
<p>
You'll see a comprehensive example using these functions in a future article.
</p>
<p>
In F# I'd often first define a module with a few functions including <code>bind</code>, and then use those implementations to define a <a href="https://learn.microsoft.com/dotnet/fsharp/language-reference/computation-expressions">computation expression</a>, but in <a href="/2016/04/11/async-as-surrogate-io">one article</a>, I jumped straight to the expression builder:
</p>
<p>
<pre><span style="color:blue;">type</span> <span style="color:#4ec9b0;">AsyncEitherBuilder</span> () =
<span style="color:green;">// Async<Result<'a,'c>> * ('a -> Async<Result<'b,'c>>)</span>
<span style="color:green;">// -> Async<Result<'b,'c>></span>
<span style="color:blue;">member</span> this.<span style="color:navy;">Bind</span>(x, <span style="color:navy;">f</span>) =
<span style="color:blue;">async</span> {
<span style="color:blue;">let!</span> x' = x
<span style="color:blue;">match</span> x' <span style="color:blue;">with</span>
| <span style="color:navy;">Success</span> s <span style="color:blue;">-></span> <span style="color:blue;">return!</span> <span style="color:navy;">f</span> s
| <span style="color:navy;">Failure</span> f <span style="color:blue;">-></span> <span style="color:blue;">return</span> <span style="color:navy;">Failure</span> f }
<span style="color:green;">// 'a -> 'a</span>
<span style="color:blue;">member</span> this.<span style="color:navy;">ReturnFrom</span> x = x
<span style="color:blue;">let</span> asyncEither = <span style="color:#4ec9b0;">AsyncEitherBuilder</span> ()</pre>
</p>
<p>
That article also shows usage examples. Another article, <a href="/2022/02/14/a-conditional-sandwich-example">A conditional sandwich example</a>, shows more examples of using this nested monad, although there, the computation expression is named <code>taskResult</code>.
</p>
<h3 id="e6426619b2ae4f8d97d62edfe9cae0ca">
Stateful computations that may fail <a href="#e6426619b2ae4f8d97d62edfe9cae0ca">#</a>
</h3>
<p>
To be honest, you mostly run into a scenario where nested monads are useful when some kind of 'effect' (errors, mostly) is embedded in an <a href="https://en.wikipedia.org/wiki/Input/output">I/O</a>-bound computation. In Haskell, this means <code>IO</code>, in C# <code>Task</code>, and in F# either <code>Task</code> or <code>Async</code>.
</p>
<p>
Other combinations are possible, however, but I've rarely encountered a need for additional nested monads outside of Haskell. In multi-paradigmatic languages, you can usually find other good designs that address issues that you may occasionally run into in a purely functional language. The following example is a Haskell-only example. You can skip it if you don't know or care about Haskell.
</p>
<p>
Imagine that you want to keep track of some statistics related to a software service you offer. If the <a href="https://en.wikipedia.org/wiki/Variance">variance</a> of some number (say, response time) exceeds 10 then you want to issue an alert that the <a href="https://en.wikipedia.org/wiki/Service-level_agreement">SLA</a> was violated. Apparently, in your system, reliability means staying consistent.
</p>
<p>
You have millions of observations, and they keep arriving, so you need an <a href="https://en.wikipedia.org/wiki/Online_algorithm">online algorithm</a>. For average and variance we'll use <a href="https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance">Welford's algorithm</a>.
</p>
<p>
The following code uses these imports:
</p>
<p>
<pre><span style="color:blue;">import</span> Control.Monad
<span style="color:blue;">import</span> Control.Monad.Trans.State.Strict
<span style="color:blue;">import</span> Control.Monad.Trans.Maybe</pre>
</p>
<p>
First, you can define a data structure to hold the aggregate values required for the algorithm, as well as an initial, empty value:
</p>
<p>
<pre><span style="color:blue;">data</span> Aggregate = Aggregate { count :: Int, meanA :: Double, m2 :: Double } <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Eq</span>, <span style="color:#2b91af;">Show</span>)
<span style="color:#2b91af;">emptyA</span> <span style="color:blue;">::</span> <span style="color:blue;">Aggregate</span>
emptyA = Aggregate 0 0 0</pre>
</p>
<p>
You can also define a function to update the aggregate values with a new observation:
</p>
<p>
<pre><span style="color:#2b91af;">update</span> <span style="color:blue;">::</span> <span style="color:blue;">Aggregate</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Double</span> <span style="color:blue;">-></span> <span style="color:blue;">Aggregate</span>
update (Aggregate count mean m2) x =
<span style="color:blue;">let</span> count' = count + 1
delta = x - mean
mean' = mean + delta / <span style="color:blue;">fromIntegral</span> count'
delta2 = x - mean'
m2' = m2 + delta * delta2
<span style="color:blue;">in</span> Aggregate count' mean' m2'</pre>
</p>
<p>
Given an existing <code>Aggregate</code> record and a new observation, this function implements the algorithm to calculate a new <code>Aggregate</code> record.
</p>
<p>
The values in an <code>Aggregate</code> record, however, are only intermediary values that you can use to calculate statistics such as mean, variance, and sample variance. You'll need a data type and function to do that, as well:
</p>
<p>
<pre><span style="color:blue;">data</span> Statistics =
Statistics
{ mean :: Double, variance :: Double, sampleVariance :: Maybe Double }
<span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Eq</span>, <span style="color:#2b91af;">Show</span>)
<span style="color:#2b91af;">extractStatistics</span> <span style="color:blue;">::</span> <span style="color:blue;">Aggregate</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Maybe</span> <span style="color:blue;">Statistics</span>
extractStatistics (Aggregate count mean m2) =
<span style="color:blue;">if</span> count < 1 <span style="color:blue;">then</span> Nothing
<span style="color:blue;">else</span>
<span style="color:blue;">let</span> variance = m2 / <span style="color:blue;">fromIntegral</span> count
sampleVariance =
<span style="color:blue;">if</span> count < 2 <span style="color:blue;">then</span> Nothing <span style="color:blue;">else</span> Just $ m2 / <span style="color:blue;">fromIntegral</span> (count - 1)
<span style="color:blue;">in</span> Just $ Statistics mean variance sampleVariance</pre>
</p>
<p>
This is where the computation becomes 'failure-prone'. Granted, we only have a real problem when we have zero observations, but this still means that we need to return a <code>Maybe Statistics</code> value in order to avoid division by zero.
</p>
<p>
(There might be other designs that avoid that problem, or you might simply decide to tolerate that edge case and code around it in other ways. I've decided to design the <code>extractStatistics</code> function in this particular way in order to furnish an example. Work with me here.)
</p>
<p>
Let's say that as the next step, you'd like to compose these two functions into a single function that both adds a new observation, computes the statistics, but also returns the updated <code>Aggregate</code>.
</p>
<p>
You <em>could</em> write it like this:
</p>
<p>
<pre><span style="color:#2b91af;">addAndCompute</span> <span style="color:blue;">::</span> <span style="color:#2b91af;">Double</span> <span style="color:blue;">-></span> <span style="color:blue;">Aggregate</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Maybe</span> (<span style="color:blue;">Statistics</span>, <span style="color:blue;">Aggregate</span>)
addAndCompute x agg = <span style="color:blue;">do</span>
<span style="color:blue;">let</span> agg' = update agg x
stats <- extractStatistics agg'
<span style="color:blue;">return</span> (stats, agg')</pre>
</p>
<p>
This implementation uses <code>do</code> notation to automate handling of <code>Nothing</code> values. Still, it's a bit inelegant with its two <code>agg</code> values only distinguishable by the prime sign after one of them, and the need to explicitly return a tuple of the value and the new state.
</p>
<p>
This is the kind of problem that the State monad addresses. You could instead write the function like this:
</p>
<p>
<pre><span style="color:#2b91af;">addAndCompute</span> <span style="color:blue;">::</span> <span style="color:#2b91af;">Double</span> <span style="color:blue;">-></span> <span style="color:blue;">State</span> <span style="color:blue;">Aggregate</span> (<span style="color:#2b91af;">Maybe</span> <span style="color:blue;">Statistics</span>)
addAndCompute x = <span style="color:blue;">do</span>
modify $ <span style="color:blue;">flip</span> update x
gets extractStatistics</pre>
</p>
<p>
You could actually also write it as a one-liner, but that's already a bit too terse to my liking:
</p>
<p>
<pre><span style="color:#2b91af;">addAndCompute</span> <span style="color:blue;">::</span> <span style="color:#2b91af;">Double</span> <span style="color:blue;">-></span> <span style="color:blue;">State</span> <span style="color:blue;">Aggregate</span> (<span style="color:#2b91af;">Maybe</span> <span style="color:blue;">Statistics</span>)
addAndCompute x = modify (`update` x) >> gets extractStatistics</pre>
</p>
<p>
And if you really hate your co-workers, you can always visit <a href="https://pointfree.io">pointfree.io</a> to entirely obscure that expression, but I digress.
</p>
<p>
The point is that the State monad <a href="/ref/doocautbm">amplifies the essential and eliminates the irrelevant</a>.
</p>
<p>
Now you'd like to add a function that issues an alert if the variance is greater than 10. Again, you <em>could</em> write it like this:
</p>
<p>
<pre><span style="color:#2b91af;">monitor</span> <span style="color:blue;">::</span> <span style="color:#2b91af;">Double</span> <span style="color:blue;">-></span> <span style="color:blue;">State</span> <span style="color:blue;">Aggregate</span> (<span style="color:#2b91af;">Maybe</span> <span style="color:#2b91af;">String</span>)
monitor x = <span style="color:blue;">do</span>
stats <- addAndCompute x
<span style="color:blue;">case</span> stats <span style="color:blue;">of</span>
Just Statistics { variance } -> <span style="color:blue;">return</span> $
<span style="color:blue;">if</span> 10 < variance
<span style="color:blue;">then</span> Just <span style="color:#a31515;">"SLA violation"</span>
<span style="color:blue;">else</span> Nothing
Nothing -> <span style="color:blue;">return</span> Nothing</pre>
</p>
<p>
But again, the code is graceless with its explicit handling of <code>Maybe</code> cases. Whenever you see code that matches <code>Maybe</code> cases and maps <code>Nothing</code> to <code>Nothing</code>, your spider sense should be tingling. Could you abstract that away with a functor or monad?
</p>
<p>
Yes you can! You can use the <code>MaybeT</code> monad transformer, which nests <code>Maybe</code> computations inside another monad. In this case <code>State</code>:
</p>
<p>
<pre><span style="color:#2b91af;">monitor</span> <span style="color:blue;">::</span> <span style="color:#2b91af;">Double</span> <span style="color:blue;">-></span> <span style="color:blue;">State</span> <span style="color:blue;">Aggregate</span> (<span style="color:#2b91af;">Maybe</span> <span style="color:#2b91af;">String</span>)
monitor x = runMaybeT $ <span style="color:blue;">do</span>
Statistics { variance } <- MaybeT $ addAndCompute x
guard (10 < variance)
<span style="color:blue;">return</span> <span style="color:#a31515;">"SLA Violation"</span></pre>
</p>
<p>
The function type is the same, but the implementation is much simpler. First, the code lifts the <code>Maybe</code>-valued <code>addAndCompute</code> result into <code>MaybeT</code> and pattern-matches on the <code>variance</code>. Since the code is now 'running in' a <code>Maybe</code>-like context, this line of code only executes if there's a <code>Statistics</code> value to extract. If, on the other hand, <code>addAndCompute</code> returns <code>Nothing</code>, the function already short-circuits there.
</p>
<p>
The <code>guard</code> works just like imperative <a href="https://en.wikipedia.org/wiki/Guard_(computer_science)">Guard Clauses</a>. The third line of code only runs if the <code>variance</code> is greater than 10. In that case, it returns an alert message.
</p>
<p>
The entire <code>do</code> workflow gets unwrapped with <code>runMaybeT</code> so that we return back to a normal stateful computation that may fail.
</p>
<p>
Let's try it out:
</p>
<p>
<pre>ghci> (evalState $ monitor 1 >> monitor 7) emptyA
Nothing
ghci> (evalState $ monitor 1 >> monitor 8) emptyA
Just "SLA Violation"</pre>
</p>
<p>
Good, rigorous testing suggests that it's working.
</p>
<h3 id="e67fa8bc1b40459c91c1c8b45595c379">
Conclusion <a href="#e67fa8bc1b40459c91c1c8b45595c379">#</a>
</h3>
<p>
You sometimes run into situations where monads are nested. This mostly happens in I/O-bound computations, where you may have a Maybe or Either value embedded inside <code>Task</code> or <code>IO</code>. This can sometimes make working with the 'inner' monad awkward, but in many cases there's a good solution at hand.
</p>
<p>
Some monads, like Maybe, Either, State, Reader, and Identity, nest nicely inside other monads. Thus, if your 'inner' monad is one of those, you can turn the nested arrangement into a monad in its own right. This may help simplify your code base.
</p>
<p>
In addition to the common monads listed here, there are few more exotic ones that also play well in a nested configuration. Additionally, if your 'inner' monad is a custom data structure of your own creation, it's up to you to investigate if it nests nicely in another monad. As far as I can tell, though, if you can make it nest in one monad (e.g Task, Async, or IO) you can probably make it nest in any monad.
</p>
<p>
<strong>Next:</strong> <a href="/2018/01/08/software-design-isomorphisms">Software design isomorphisms</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Collecting and handling result valueshttps://blog.ploeh.dk/2024/11/18/collecting-and-handling-result-values2024-11-18T07:39:00+00:00Mark Seemann
<div id="post">
<p>
<em>The answer is traverse. It's always traverse.</em>
</p>
<p>
I recently came across <a href="https://stackoverflow.com/q/79112836/126014">a Stack Overflow question</a> about collecting and handling <a href="https://en.wikipedia.org/wiki/Tagged_union">sum types</a> (AKA discriminated unions or, in this case, result types). While the question was tagged <em>functional-programming</em>, the overall structure of the code was so imperative, with so much interleaved <a href="https://en.wikipedia.org/wiki/Input/output">I/O</a>, that it hardly <a href="/2018/11/19/functional-architecture-a-definition">qualified as functional architecture</a>.
</p>
<p>
Instead, I gave <a href="https://stackoverflow.com/a/79112992/126014">an answer which involved a minimal change to the code</a>. Subsequently, the original poster asked to see a more functional version of the code. That's a bit too large a task for a Stack Overflow answer, I think, so I'll do it here on the blog instead.
</p>
<p>
Further comments and discussion on the original post reveal that the poster is interested in two alternatives. I'll start with the alternative that's only discussed, but not shown, in the question. The motivation for this ordering is that this variation is easier to implement than the other one, and I consider it pedagogical to start with the simplest case.
</p>
<p>
I'll do that in this article, and then follow up with another article that covers the short-circuiting case.
</p>
<h3 id="9b3987ad5daf4df48c8155a54fb39318">
Imperative outset <a href="#9b3987ad5daf4df48c8155a54fb39318">#</a>
</h3>
<p>
To begin, consider this mostly imperative code snippet:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">storedItems</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">List</span><<span style="color:#2b91af;">ShoppingListItem</span>>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">failedItems</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">List</span><<span style="color:#2b91af;">ShoppingListItem</span>>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">errors</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">List</span><<span style="color:#2b91af;">Error</span>>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">state</span> = (<span style="font-weight:bold;color:#1f377f;">storedItems</span>, <span style="font-weight:bold;color:#1f377f;">failedItems</span>, <span style="font-weight:bold;color:#1f377f;">errors</span>);
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">item</span> <span style="font-weight:bold;color:#8f08c4;">in</span> <span style="font-weight:bold;color:#1f377f;">itemsToUpdate</span>)
{
<span style="color:#2b91af;">OneOf</span><<span style="color:#2b91af;">ShoppingListItem</span>, <span style="color:#2b91af;">NotFound</span>, <span style="color:#2b91af;">Error</span>> <span style="font-weight:bold;color:#1f377f;">updateResult</span> = <span style="color:blue;">await</span> <span style="color:#74531f;">UpdateItem</span>(<span style="font-weight:bold;color:#1f377f;">item</span>, <span style="font-weight:bold;color:#1f377f;">dbContext</span>);
<span style="font-weight:bold;color:#1f377f;">state</span> = <span style="font-weight:bold;color:#1f377f;">updateResult</span>.<span style="font-weight:bold;color:#74531f;">Match</span><(<span style="color:#2b91af;">List</span><<span style="color:#2b91af;">ShoppingListItem</span>>, <span style="color:#2b91af;">List</span><<span style="color:#2b91af;">ShoppingListItem</span>>, <span style="color:#2b91af;">List</span><<span style="color:#2b91af;">Error</span>>)>(
<span style="font-weight:bold;color:#1f377f;">storedItem</span> => { <span style="font-weight:bold;color:#1f377f;">storedItems</span>.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="font-weight:bold;color:#1f377f;">storedItem</span>); <span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">state</span>; },
<span style="font-weight:bold;color:#1f377f;">notFound</span> => { <span style="font-weight:bold;color:#1f377f;">failedItems</span>.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="font-weight:bold;color:#1f377f;">item</span>); <span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">state</span>; },
<span style="font-weight:bold;color:#1f377f;">error</span> => { <span style="font-weight:bold;color:#1f377f;">errors</span>.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="font-weight:bold;color:#1f377f;">error</span>); <span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">state</span>; }
);
}
<span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">dbContext</span>.<span style="font-weight:bold;color:#74531f;">SaveChangesAsync</span>();
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">Results</span>.<span style="color:#74531f;">Ok</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">BulkUpdateResult</span>([.. <span style="font-weight:bold;color:#1f377f;">storedItems</span>], [.. <span style="font-weight:bold;color:#1f377f;">failedItems</span>], [.. <span style="font-weight:bold;color:#1f377f;">errors</span>]));</pre>
</p>
<p>
There's quite a few things to take in, and one has to infer most of the types and APIs, since the original post didn't show more code than that. If you're used to engaging with Stack Overflow questions, however, it's not too hard to figure out what most of the moving parts do.
</p>
<p>
The most non-obvious detail is that the code uses a library called <a href="https://github.com/mcintyre321/OneOf/">OneOf</a>, which supplies general-purpose, but rather abstract, sum types. Both the container type <code>OneOf</code>, as well as the two indicator types <code>NotFound</code> and <code>Error</code> are defined in that library.
</p>
<p>
The <code>Match</code> method implements standard <a href="/2018/05/22/church-encoding">Church encoding</a>, which enables the code to pattern-match on the three alternative values that <code>UpdateItem</code> returns.
</p>
<p>
One more detail also warrants an explicit description: The <code>itemsToUpdate</code> object is an input argument of the type <code><span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">ShoppingListItem</span>></code>.
</p>
<p>
The implementation makes use of mutable state and undisciplined I/O. How do you refactor it to a more functional design?
</p>
<h3 id="c4e1b030e919464aa22ade11a511414f">
Standard traversal <a href="#c4e1b030e919464aa22ade11a511414f">#</a>
</h3>
<p>
I'll pretend that we only need to turn the above code snippet into a functional design. Thus, I'm ignoring that the code is most likely part of a larger code base. Because of the implied database interaction, the method isn't a <a href="https://en.wikipedia.org/wiki/Pure_function">pure function</a>. Unless it's a top-level method (that is, at the boundary of the application), it doesn't exemplify larger-scale <a href="/2018/11/19/functional-architecture-a-definition">functional architecture</a>.
</p>
<p>
That said, my goal is to refactor the code to an <a href="/2020/03/02/impureim-sandwich">Impureim Sandwich</a>: Impure actions first, then the meat of the functionality as a pure function, and then some more impure actions to complete the functionality. This strongly suggests that the first step should be to map over <code>itemsToUpdate</code> and call <code>UpdateItem</code> for each.
</p>
<p>
If, however, you do that, you get this:
</p>
<p>
<pre><span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">OneOf</span><<span style="color:#2b91af;">ShoppingListItem</span>, <span style="color:#2b91af;">NotFound</span>, <span style="color:#2b91af;">Error</span>>>> <span style="font-weight:bold;color:#1f377f;">results</span> =
<span style="font-weight:bold;color:#1f377f;">itemsToUpdate</span>.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">item</span> => <span style="color:#74531f;">UpdateItem</span>(<span style="font-weight:bold;color:#1f377f;">item</span>, <span style="font-weight:bold;color:#1f377f;">dbContext</span>));</pre>
</p>
<p>
The <code>results</code> object is a sequence of tasks. If we consider <a href="/2020/07/27/task-asynchronous-programming-as-an-io-surrogate">Task as a surrogate for IO</a>, each task should be considered impure, as it's either non-deterministic, has side effects, or both. This means that we can't pass <code>results</code> to a pure function, and that frustrates the ambition to structure the code as an Impureim Sandwich.
</p>
<p>
This is one of the most common problems in functional programming, and the answer is usually: Use a <a href="/2024/11/11/traversals">traversal</a>.
</p>
<p>
<pre><span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">OneOf</span><<span style="color:#2b91af;">ShoppingListItem</span>, <span style="color:#2b91af;">NotFound</span><<span style="color:#2b91af;">ShoppingListItem</span>>, <span style="color:#2b91af;">Error</span>>> <span style="font-weight:bold;color:#1f377f;">results</span> =
<span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">itemsToUpdate</span>.<span style="font-weight:bold;color:#74531f;">Traverse</span>(<span style="font-weight:bold;color:#1f377f;">item</span> => <span style="color:#74531f;">UpdateItem</span>(<span style="font-weight:bold;color:#1f377f;">item</span>, <span style="font-weight:bold;color:#1f377f;">dbContext</span>));</pre>
</p>
<p>
Because this first, impure layer of the sandwich awaits the task, <code>results</code> is now an immutable value that can be passed to the pure step. This, by the way, assumes that <code>ShoppingListItem</code> is immutable, too.
</p>
<p>
Notice that I adjusted one of the cases of the discriminated union to <code><span style="color:#2b91af;">NotFound</span><<span style="color:#2b91af;">ShoppingListItem</span>></code> rather than just <code>NotFound</code>. While the OneOf library ships with a <code>NotFound</code> type, it doesn't have a generic container of that name, so I defined it myself:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">sealed</span> <span style="color:blue;">record</span> <span style="color:#2b91af;">NotFound</span><<span style="color:#2b91af;">T</span>>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">Item</span>);</pre>
</p>
<p>
I added it to make the next step simpler.
</p>
<h3 id="8f0e6fb0f34047ed99c59f6140a2b08f">
Aggregating the results <a href="#8f0e6fb0f34047ed99c59f6140a2b08f">#</a>
</h3>
<p>
The next step is to sort the <code>results</code> into three 'buckets', as it were.
</p>
<p>
<pre><span style="color:green;">// Pure</span>
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">seed</span> =
(
<span style="color:#2b91af;">Enumerable</span>.<span style="color:#74531f;">Empty</span><<span style="color:#2b91af;">ShoppingListItem</span>>(),
<span style="color:#2b91af;">Enumerable</span>.<span style="color:#74531f;">Empty</span><<span style="color:#2b91af;">ShoppingListItem</span>>(),
<span style="color:#2b91af;">Enumerable</span>.<span style="color:#74531f;">Empty</span><<span style="color:#2b91af;">Error</span>>()
);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">result</span> = <span style="font-weight:bold;color:#1f377f;">results</span>.<span style="font-weight:bold;color:#74531f;">Aggregate</span>(
<span style="font-weight:bold;color:#1f377f;">seed</span>,
(<span style="font-weight:bold;color:#1f377f;">state</span>, <span style="font-weight:bold;color:#1f377f;">result</span>) =>
<span style="font-weight:bold;color:#1f377f;">result</span>.<span style="font-weight:bold;color:#74531f;">Match</span>(
<span style="font-weight:bold;color:#1f377f;">storedItem</span> => (<span style="font-weight:bold;color:#1f377f;">state</span>.Item1.<span style="font-weight:bold;color:#74531f;">Append</span>(<span style="font-weight:bold;color:#1f377f;">storedItem</span>), <span style="font-weight:bold;color:#1f377f;">state</span>.Item2, <span style="font-weight:bold;color:#1f377f;">state</span>.Item3),
<span style="font-weight:bold;color:#1f377f;">notFound</span> => (<span style="font-weight:bold;color:#1f377f;">state</span>.Item1, <span style="font-weight:bold;color:#1f377f;">state</span>.Item2.<span style="font-weight:bold;color:#74531f;">Append</span>(<span style="font-weight:bold;color:#1f377f;">notFound</span>.Item), <span style="font-weight:bold;color:#1f377f;">state</span>.Item3),
<span style="font-weight:bold;color:#1f377f;">error</span> => (<span style="font-weight:bold;color:#1f377f;">state</span>.Item1, <span style="font-weight:bold;color:#1f377f;">state</span>.Item2, <span style="font-weight:bold;color:#1f377f;">state</span>.Item3.<span style="font-weight:bold;color:#74531f;">Append</span>(<span style="font-weight:bold;color:#1f377f;">error</span>))));</pre>
</p>
<p>
It's also possible to inline the <code>seed</code> value, but here I defined it in a separate expression in an attempt at making the code a little more readable. I don't know if I succeeded, because regardless of where it goes, it's hardly <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> to break tuple initialization over multiple lines. I had to, though, because otherwise the code would run <a href="/2019/11/04/the-80-24-rule">too far to the right</a>.
</p>
<p>
The lambda expression handles each <code>result</code> in <code>results</code> and uses <code>Match</code> to append the value to its proper 'bucket'. The outer <code>result</code> is a tuple of the three collections.
</p>
<h3 id="035012be047e431d8904686ec9915b8f">
Saving the changes and returning the results <a href="#035012be047e431d8904686ec9915b8f">#</a>
</h3>
<p>
The final, impure step in the sandwich is to save the changes and return the results:
</p>
<p>
<pre><span style="color:green;">// Impure</span>
<span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">dbContext</span>.<span style="font-weight:bold;color:#74531f;">SaveChangesAsync</span>();
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">OkResult</span>(
<span style="color:blue;">new</span> <span style="color:#2b91af;">BulkUpdateResult</span>([.. <span style="font-weight:bold;color:#1f377f;">result</span>.Item1], [.. <span style="font-weight:bold;color:#1f377f;">result</span>.Item2], [.. <span style="font-weight:bold;color:#1f377f;">result</span>.Item3]));</pre>
</p>
<p>
To be honest, the last line of code is pure, but <a href="/2023/10/09/whats-a-sandwich">that's not unusual</a> when it comes to Impureim Sandwiches.
</p>
<h3 id="178ff7d455e44a619b67d911a6aecba7">
Accumulating the bulk-update result <a href="#178ff7d455e44a619b67d911a6aecba7">#</a>
</h3>
<p>
So far, I've assumed that the final <code>BulkUpdateResult</code> class is just a simple immutable container without much functionality. If, however, we add some copy-and-update functions to it, we can use them to aggregate the result, instead of an anonymous tuple.
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:#2b91af;">BulkUpdateResult</span> <span style="font-weight:bold;color:#74531f;">Store</span>(<span style="color:#2b91af;">ShoppingListItem</span> <span style="font-weight:bold;color:#1f377f;">item</span>) =>
<span style="color:blue;">new</span>([.. StoredItems, <span style="font-weight:bold;color:#1f377f;">item</span>], FailedItems, Errors);
<span style="color:blue;">internal</span> <span style="color:#2b91af;">BulkUpdateResult</span> <span style="font-weight:bold;color:#74531f;">Fail</span>(<span style="color:#2b91af;">ShoppingListItem</span> <span style="font-weight:bold;color:#1f377f;">item</span>) =>
<span style="color:blue;">new</span>(StoredItems, [.. FailedItems, <span style="font-weight:bold;color:#1f377f;">item</span>], Errors);
<span style="color:blue;">internal</span> <span style="color:#2b91af;">BulkUpdateResult</span> <span style="font-weight:bold;color:#74531f;">Error</span>(<span style="color:#2b91af;">Error</span> <span style="font-weight:bold;color:#1f377f;">error</span>) =>
<span style="color:blue;">new</span>(StoredItems, FailedItems, [.. Errors, <span style="font-weight:bold;color:#1f377f;">error</span>]);</pre>
</p>
<p>
I would have personally preferred the name <code>NotFound</code> instead of <code>Fail</code>, but I was going with the original post's <code>failedItems</code> terminology, and I thought that it made more sense to call a method <code>Fail</code> when it adds to a collection called <code>FailedItems</code>.
</p>
<p>
Adding these three instance methods to <code>BulkUpdateResult</code> simplifies the composing code:
</p>
<p>
<pre><span style="color:green;">// Impure</span>
<span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">OneOf</span><<span style="color:#2b91af;">ShoppingListItem</span>, <span style="color:#2b91af;">NotFound</span><<span style="color:#2b91af;">ShoppingListItem</span>>, <span style="color:#2b91af;">Error</span>>> <span style="font-weight:bold;color:#1f377f;">results</span> =
<span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">itemsToUpdate</span>.<span style="font-weight:bold;color:#74531f;">Traverse</span>(<span style="font-weight:bold;color:#1f377f;">item</span> => <span style="color:#74531f;">UpdateItem</span>(<span style="font-weight:bold;color:#1f377f;">item</span>, <span style="font-weight:bold;color:#1f377f;">dbContext</span>));
<span style="color:green;">// Pure</span>
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">result</span> = <span style="font-weight:bold;color:#1f377f;">results</span>.<span style="font-weight:bold;color:#74531f;">Aggregate</span>(
<span style="color:blue;">new</span> <span style="color:#2b91af;">BulkUpdateResult</span>([], [], []),
(<span style="font-weight:bold;color:#1f377f;">state</span>, <span style="font-weight:bold;color:#1f377f;">result</span>) =>
<span style="font-weight:bold;color:#1f377f;">result</span>.<span style="font-weight:bold;color:#74531f;">Match</span>(
<span style="font-weight:bold;color:#1f377f;">storedItem</span> => <span style="font-weight:bold;color:#1f377f;">state</span>.<span style="font-weight:bold;color:#74531f;">Store</span>(<span style="font-weight:bold;color:#1f377f;">storedItem</span>),
<span style="font-weight:bold;color:#1f377f;">notFound</span> => <span style="font-weight:bold;color:#1f377f;">state</span>.<span style="font-weight:bold;color:#74531f;">Fail</span>(<span style="font-weight:bold;color:#1f377f;">notFound</span>.Item),
<span style="font-weight:bold;color:#1f377f;">error</span> => <span style="font-weight:bold;color:#1f377f;">state</span>.<span style="font-weight:bold;color:#74531f;">Error</span>(<span style="font-weight:bold;color:#1f377f;">error</span>)));
<span style="color:green;">// Impure</span>
<span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">dbContext</span>.<span style="font-weight:bold;color:#74531f;">SaveChangesAsync</span>();
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">OkResult</span>(<span style="font-weight:bold;color:#1f377f;">result</span>);</pre>
</p>
<p>
This variation starts with an empty <code>BulkUpdateResult</code> and then uses <code>Store</code>, <code>Fail</code>, or <code>Error</code> as appropriate to update the state.
</p>
<h3 id="32e680ea1dbb4bc7bc097e8fcfcb90e9">
Parallel Sequence <a href="#32e680ea1dbb4bc7bc097e8fcfcb90e9">#</a>
</h3>
<p>
If the tasks you want to traverse are thread-safe, you might consider making the traversal concurrent. You can use <a href="https://learn.microsoft.com/dotnet/api/system.threading.tasks.task.whenall">Task.WhenAll</a> for that. It has the same type as <code>Sequence</code>, so if you can live with the extra non-determinism that comes with parallel execution, you can use that instead:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">static</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">T</span>>> <span style="color:#74531f;">Sequence</span><<span style="color:#2b91af;">T</span>>(<span style="color:blue;">this</span> <span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">T</span>>> <span style="font-weight:bold;color:#1f377f;">tasks</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">await</span> <span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">WhenAll</span>(<span style="font-weight:bold;color:#1f377f;">tasks</span>);
}</pre>
</p>
<p>
Since the method signature doesn't change, the rest of the code remains unchanged.
</p>
<h3 id="a54fe20498bd4aca99d7d4184209a4df">
Conclusion <a href="#a54fe20498bd4aca99d7d4184209a4df">#</a>
</h3>
<p>
One of the most common stumbling blocks in functional programming is when you have a collection of values, and you need to perform an impure action (typically I/O) for each. This leaves you with a collection of impure values (<code>Task</code> in C#, <code>Task</code> or <code>Async</code> in <a href="https://fsharp.org/">F#</a>, <code>IO</code> in <a href="https://www.haskell.org/">Haskell</a>, etc.). What you actually need is a single impure value that contains the collection of results.
</p>
<p>
The solution to this kind of problem is to <em>traverse</em> the collection, rather than mapping over it (with <code>Select</code>, <code>map</code>, <code>fmap</code>, or similar). Note that computer scientists often talk about <em>traversing</em> a data structure like a <a href="https://en.wikipedia.org/wiki/Tree_(abstract_data_type)">tree</a>. This is a less well-defined use of the word, and not directly related. That said, you <em>can</em> also write <code>Traverse</code> and <code>Sequence</code> functions for trees.
</p>
<p>
This article used a Stack Overflow question as the starting point for an example showing how to refactor imperative code to an Impureim Sandwich.
</p>
<p>
This completes the first variation requested in the Stack Overflow question.
</p>
<p>
<strong>Next:</strong> <a href="/2024/12/02/short-circuiting-an-asynchronous-traversal">Short-circuiting an asynchronous traversal</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Traversalshttps://blog.ploeh.dk/2024/11/11/traversals2024-11-11T07:45:00+00:00Mark Seemann
<div id="post">
<p>
<em>How to convert a list of tasks into an asynchronous list, and similar problems.</em>
</p>
<p>
This article is part of <a href="/2022/07/11/functor-relationships">a series of articles about functor relationships</a>. In a previous article you learned about <a href="/2022/07/18/natural-transformations">natural transformations</a>, and then how <a href="/2018/03/22/functors">functors</a> compose. You can skip several of them if you like, but you might find the one about <a href="/2024/10/28/functor-compositions">functor compositions</a> relevant. Still, this article can be read independently of the rest of the series.
</p>
<p>
You can go a long way with just a single functor or <a href="/2022/03/28/monads">monad</a>. Consider how useful C#'s LINQ API is, or similar kinds of APIs in other languages - typically <code>map</code> and <code>flatMap</code> methods. These APIs work exclusively with the <a href="/2022/04/19/the-list-monad">List monad</a> (which is also a functor). Working with lists, sequences, or collections is so useful that many languages have other kinds of special syntax specifically aimed at working with multiple values: <a href="https://en.wikipedia.org/wiki/List_comprehension">List comprehension</a>.
</p>
<p>
<a href="/2022/06/06/asynchronous-monads">Asynchronous monads</a> like <a href="https://docs.microsoft.com/dotnet/api/system.threading.tasks.task-1">Task<T></a> or <a href="https://fsharp.org/">F#</a>'s <a href="https://fsharp.github.io/fsharp-core-docs/reference/fsharp-control-fsharpasync-1.html">Async<'T></a> are another kind of functor so useful in their own right that languages have special <code>async</code> and <code>await</code> keywords to compose them.
</p>
<p>
Sooner or later, though, you run into situations where you'd like to combine two different functors.
</p>
<h3 id="ebf67a9789e44ad8997832e1ac7c17da">
Lists and tasks <a href="#ebf67a9789e44ad8997832e1ac7c17da" title="permalink">#</a>
</h3>
<p>
It's not unusual to combine collections and asynchrony. If you make an asynchronous database query, you could easily receive something like <code>Task<IEnumerable<Reservation>></code>. This, in isolation, hardly causes problems, but things get more interesting when you need to compose multiple reads.
</p>
<p>
Consider a query like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Task<Foo> Read(<span style="color:blue;">int</span> id)</pre>
</p>
<p>
What happens if you have a collection of IDs that you'd like to read? This happens:
</p>
<p>
<pre><span style="color:blue;">var</span> ids = <span style="color:blue;">new</span>[] { 42, 1337, 2112 };
IEnumerable<Task<Foo>> fooTasks = ids.Select(id => Foo.Read(id));</pre>
</p>
<p>
You get a collection of Tasks, which may be awkward because you can't <code>await</code> it. Perhaps you'd rather prefer a single Task that contains a collection: <code>Task<IEnumerable<Foo>></code>. In other words, you'd like to flip the functors:
</p>
<p>
<pre>IEnumerable<Task<Foo>>
Task<IEnumerable<Foo>></pre>
</p>
<p>
The top type is what you have. The bottom type is what you'd like to have.
</p>
<p>
The combination of asynchrony and collections is so common that .NET has special methods to do that. I'll briefly mention one of these later, but what's the <em>general</em> solution to this problem?
</p>
<p>
Whenever you need to flip two functors, you need a <em>traversal</em>.
</p>
<h3 id="b962041a5e3d4eb9ba5101641407ca3f">
Sequence <a href="#b962041a5e3d4eb9ba5101641407ca3f" title="permalink">#</a>
</h3>
<p>
As is almost always the case, we can look to <a href="https://www.haskell.org/">Haskell</a> for a canonical definition of traversals - or, as the type class is called: <a href="https://hackage.haskell.org/package/base/docs/Data-Traversable.html">Traversable</a>.
</p>
<p>
A <em>traversable functor</em> is a functor that enables you to flip that functor and another functor, like the above C# example. In more succinct syntax:
</p>
<p>
<pre>t (f a) -> f (t a)</pre>
</p>
<p>
Here, <code>t</code> symbolises any traversable functor (like <code>IEnumerable<T></code> in the above C# example), and <code>f</code> is another functor (like <code>Task<T></code>, above). By flipping the functors I mean making <code>t</code> and <code>f</code> change places; just like <code>IEnumerable</code> and <code>Task</code>, above.
</p>
<p>
Thinking of <a href="https://bartoszmilewski.com/2014/01/14/functors-are-containers/">functors as containers</a> we might depict the function like this:
</p>
<p>
<img src="/content/binary/traversal-sequence.png" alt="Nested functors depicted as concentric circles. To the left the circle t contains the circle f that again contains the circle a. To the right the circle f contains the circle t that again contains the circle a. An arrow points from the left circles to the right circles.">
</p>
<p>
To the left, we have an outer functor <code>t</code> (e.g. <code>IEnumerable</code>) that contains another functor <code>f</code> (e.g. <code>Task</code>) that again 'contains' values of type <code>a</code> (in C# typically called <code>T</code>). We'd like to flip how the containers are nested so that <code>f</code> contains <code>t</code>.
</p>
<p>
Contrary to what you might expect, the function that does that isn't called <em>traverse</em>; it's called <em>sequence</em>. (For those readers who are interested in Haskell specifics, the function I'm going to be talking about is actually called <a href="https://hackage.haskell.org/package/base/docs/Data-Traversable.html#v:sequenceA">sequenceA</a>. There's also a function called <a href="https://hackage.haskell.org/package/base/docs/Data-Traversable.html#v:sequence">sequence</a>, but it's not as general. The reason for the odd names are related to the evolution of various Haskell type classes.)
</p>
<p>
The <em>sequence</em> function doesn't work for any old functor. First, <code>t</code> has to be a <em>traversable functor</em>. We'll get back to that later. Second, <code>f</code> has to be an <a href="/2018/10/01/applicative-functors">applicative functor</a>. (To be honest, I'm not sure if this is <em>always</em> required, or if it's possible to produce an example of a specific functor that isn't applicative, but where it's still possible to implement a <em>sequence</em> function. The Haskell <code>sequenceA</code> function has <code>Applicative f</code> as a constraint, but as far as I can tell, this only means that this is a <em>sufficient</em> requirement - not that it's necessary.)
</p>
<p>
Since tasks (e.g. <code>Task<T></code>) are applicative functors (they are, because <a href="/2022/06/06/asynchronous-monads">they are monads</a>, and <a href="/2022/03/28/monads">all monads are applicative functors</a>), that second requirement is fulfilled for the above example. I'll show you how to implement a <code>Sequence</code> function in C# and how to use it, and then we'll return to the general discussion of what a traversable functor is:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Task<IEnumerable<T>> Sequence<<span style="color:#2b91af;">T</span>>(
<span style="color:blue;">this</span> IEnumerable<Task<T>> source)
{
<span style="color:blue;">return</span> source.Aggregate(
Task.FromResult(Enumerable.Empty<T>()),
<span style="color:blue;">async</span> (acc, t) =>
{
<span style="color:blue;">var</span> xs = <span style="color:blue;">await</span> acc;
<span style="color:blue;">var</span> x = <span style="color:blue;">await</span> t;
<span style="color:blue;">return</span> xs.Concat(<span style="color:blue;">new</span>[] { x });
});
}</pre>
</p>
<p>
This <code>Sequence</code> function enables you to flip any <code>IEnumerable<Task<T>></code> to a <code>Task<IEnumerable<T>></code>, including the above <code>fooTasks</code>:
</p>
<p>
<pre>Task<IEnumerable<Foo>> foosTask = fooTasks.Sequence();</pre>
</p>
<p>
You can also implement <code>sequence</code> in F#:
</p>
<p>
<pre><span style="color:green;">// Async<'a> list -> Async<'a list></span>
<span style="color:blue;">let</span> sequence asyncs =
<span style="color:blue;">let</span> go acc t = async {
<span style="color:blue;">let!</span> xs = acc
<span style="color:blue;">let!</span> x = t
<span style="color:blue;">return</span> List.append xs [x] }
List.fold go (fromValue []) asyncs</pre>
</p>
<p>
and use it like this:
</p>
<p>
<pre><span style="color:blue;">let</span> fooTasks = ids |> List.map Foo.Read
<span style="color:blue;">let</span> foosTask = fooTasks |> Async.sequence</pre>
</p>
<p>
For this example, I put the <code>sequence</code> function in a local <code>Async</code> module; it's not part of any published <code>Async</code> module.
</p>
<p>
These C# and F# examples are specific translations: From lists of tasks to a task of list. If you need another translation, you'll have to write a new function for that particular combination of functors. Haskell has more general capabilities, so that you don't have to write functions for all combinations. I'm not assuming that you know Haskell, however, so I'll proceed with the description.
</p>
<h3 id="d63d059d841b4d9783f42c0360b21662">
Traversable functor <a href="#d63d059d841b4d9783f42c0360b21662" title="permalink">#</a>
</h3>
<p>
The <em>sequence</em> function requires that the 'other' functor (the one that's <em>not</em> the traversable functor) is an applicative functor, but what about the traversable functor itself? What does it take to be a traversable functor?
</p>
<p>
I have to admit that I have to rely on Haskell specifics to a greater extent than normal. For most other concepts and abstractions in <a href="/2017/10/04/from-design-patterns-to-category-theory">the overall article series</a>, I've been able to draw on various sources, chief of which are <a href="https://bartoszmilewski.com/2014/10/28/category-theory-for-programmers-the-preface/">Category Theory for Programmers</a>. In various articles, I've cited my sources whenever possible. While I've relied on Haskell libraries for 'canonical' ways to <em>represent</em> concepts in a programming language, I've tried to present ideas as having a more universal origin than just Haskell.
</p>
<p>
When it comes to traversable functors, I haven't come across universal reasoning like that which gives rise to concepts like <a href="/2017/10/06/monoids">monoids</a>, functors, <a href="/2018/05/22/church-encoding">Church encodings</a>, or <a href="/2019/04/29/catamorphisms">catamorphisms</a>. This is most likely a failing on my part.
</p>
<p>
Traversals of the Haskell kind are, however, so <em>useful</em> that I find it appropriate to describe them. When consulting, it's a common solution to a lot of problems that people are having with functional programming.
</p>
<p>
Thus, based on Haskell's <a href="https://hackage.haskell.org/package/base/docs/Data-Traversable.html">Data.Traversable</a>, a traversable functor must:
<ul>
<li>be a functor</li>
<li>be a 'foldable' functor</li>
<li>define a <em>sequence</em> or <em>traverse</em> function</li>
</ul>
You've already seen examples of <em>sequence</em> functions, and I'm also assuming that (since you've made it so far in the article already) you know what a functor is. But what's a <em>foldable</em> functor?
</p>
<p>
Haskell comes with a <a href="https://hackage.haskell.org/package/base/docs/Data-Foldable.html">Foldable</a> type class. It defines a class of data that has a particular type of <a href="/2019/04/29/catamorphisms">catamorphism</a>. As I've outlined in my article on catamorphisms, Haskell's notion of a <em>fold</em> sometimes coincides with a (or 'the') catamorphism for a type, and sometimes not. For <a href="/2019/05/20/maybe-catamorphism">Maybe</a> and <a href="/2019/05/27/list-catamorphism">List</a> they do coincide, while they don't for <a href="/2019/06/03/either-catamorphism">Either</a> or <a href="/2019/06/10/tree-catamorphism">Tree</a>. It's not that you can't define <code>Foldable</code> for <a href="/2018/06/11/church-encoded-either">Either</a> or <a href="/2018/08/06/a-tree-functor">Tree</a>, it's just that it's not 'the' <em>general</em> catamorphism for that type.
</p>
<p>
I can't tell whether <code>Foldable</code> is a universal abstraction, or if it's just an ad-hoc API that turns out to be useful in practice. It looks like the latter to me, but my knowledge is only limited. Perhaps I'll be wiser in a year or two.
</p>
<p>
I will, however, take it as licence to treat this topic a little less formally than I've done with other articles. While there <em>are</em> laws associated with <code>Traversable</code>, they are rather complex, so I'm going to skip them.
</p>
<p>
The above requirements will enable you to define traversable functors if you run into some more exotic ones, but in practice, the common functors List, <a href="/2018/03/26/the-maybe-functor">Maybe</a>, <a href="/2019/01/14/an-either-functor">Either</a>, <a href="/2018/08/06/a-tree-functor">Tree</a>, and <a href="/2018/09/03/the-identity-functor">Identity</a> are all traversable. That it useful to know. If any of those functors is the outer functor in a composition of functors, then you can flip them to the inner position as long as the other functor is an applicative functor.
</p>
<p>
Since <code>IEnumerable<T></code> is traversable, and <code>Task<T></code> (or <code>Async<'T></code>) is an applicative functor, it's possible to use <code>Sequence</code> to convert <code>IEnumerable<Task<Foo>></code> to <code>Task<IEnumerable<Foo>></code>.
</p>
<h3 id="3346c092666c4dacb9a61cc1f622fc0f">
Traverse <a href="#3346c092666c4dacb9a61cc1f622fc0f" title="permalink">#</a>
</h3>
<p>
The C# and F# examples you've seen so far arrive at the desired type in a two-step process. First they produce the 'wrong' type with <code>ids.Select(Foo.Read)</code> or <code>ids |> List.map Foo.Read</code>, and then they use <code>Sequence</code> to arrive at the desired type.
</p>
<p>
When you use two expressions, you need two lines of code, and you also need to come up with a name for the intermediary value. It might be easier to chain the two function calls into a single expression:
</p>
<p>
<pre>Task<IEnumerable<Foo>> foosTask = ids.Select(Foo.Read).Sequence();</pre>
</p>
<p>
Or, in F#:
</p>
<p>
<pre><span style="color:blue;">let</span> foosTask = ids |> List.map Foo.Read |> Async.sequence</pre>
</p>
<p>
Chaining <code>Select</code>/<code>map</code> with <code>Sequence</code>/<code>sequence</code> is so common that it's a named function: <em>traverse</em>. In C#:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Task<IEnumerable<TResult>> Traverse<<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">TResult</span>>(
<span style="color:blue;">this</span> IEnumerable<T> source,
Func<T, Task<TResult>> selector)
{
<span style="color:blue;">return</span> source.Select(selector).Sequence();
}</pre>
</p>
<p>
This makes usage a little easier:
</p>
<p>
<pre>Task<IEnumerable<Foo>> foosTask = ids.Traverse(Foo.Read);</pre>
</p>
<p>
In F# the implementation might be similar:
</p>
<p>
<pre><span style="color:green;">// ('a -> Async<'b>) -> 'a list -> Async<'b list></span>
<span style="color:blue;">let</span> traverse f xs = xs |> List.map f |> sequence</pre>
</p>
<p>
Usage then looks like this:
</p>
<p>
<pre><span style="color:blue;">let</span> foosTask = ids |> Async.traverse Foo.Read</pre>
</p>
<p>
As you can tell, if you've already implemented <em>sequence</em> you can always implement <em>traverse</em>. The converse is also true: If you've already implemented <em>traverse</em>, you can always implement <em>sequence</em>. You'll see an example of that later.
</p>
<h3 id="117fac3b686e4db8b6c3c4e0ac556929">
A reusable idea <a href="#117fac3b686e4db8b6c3c4e0ac556929" title="permalink">#</a>
</h3>
<p>
If you know the .NET Task Parallel Library (TPL), you may demur that my implementation of <code>Sequence</code> seems like an inefficient version of <a href="https://docs.microsoft.com/dotnet/api/system.threading.tasks.task.whenall">Task.WhenAll</a>, and that <code>Traverse</code> could be written like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">async</span> Task<IEnumerable<TResult>> Traverse<<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">TResult</span>>(
<span style="color:blue;">this</span> IEnumerable<T> source,
Func<T, Task<TResult>> selector)
{
<span style="color:blue;">return</span> <span style="color:blue;">await</span> Task.WhenAll(source.Select(selector));
}</pre>
</p>
<p>
This alternative is certainly possible. Whether it's more efficient I don't know; I haven't measured. As foreshadowed in the beginning of the article, the combination of collections and asynchrony is so common that .NET has special APIs to handle that. You may ask, then: <em>What's the point?</em>
</p>
<p>
The point of is that a traversable functor is <em>a reusable idea</em>.
</p>
<p>
You may be able to find existing APIs like <code>Task.WhenAll</code> to deal with combinations of collections and asynchrony, but what if you need to deal with asynchronous Maybe or Either? Or a List of Maybes?
</p>
<p>
There may be no existing API to flip things around - before you add it. Now you know that there's a (dare I say it?) design pattern you can implement.
</p>
<h3 id="f81375a0121247698f0ad5eac4deebff">
Asynchronous Maybe <a href="#f81375a0121247698f0ad5eac4deebff" title="permalink">#</a>
</h3>
<p>
Once people go beyond collections they often run into problems. You may, for example, decide to use the <a href="/2022/04/25/the-maybe-monad">Maybe monad</a> in order to model the presence or absence of a value. Then, once you combine Maybe-based decision values with asynchronous processesing, you may run into problems.
</p>
<p>
For example, in my article <a href="/2019/02/11/asynchronous-injection">Asynchronous Injection</a> I modelled the core domaim logic as returning <code>Maybe<Reservation></code>. When handling an HTTP request, the application should use that value to determine what to do next. If the return value is empty it should do nothing, but when the Maybe value is populated, it should save the reservation in a data store using this method:
</p>
<p>
<pre>Task<<span style="color:blue;">int</span>> Create(Reservation reservation)</pre>
</p>
<p>
Finally, if accepting the reservation, the HTTP handler (<code>ReservationsController</code>) should return the resevation ID, which is the <code>int</code> returned by <code>Create</code>. Please refer to the article for details. It also links to the sample code on GitHub.
</p>
<p>
The entire expression is, however, <code>Task</code>-based:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> Task<IActionResult> Post(Reservation reservation)
{
<span style="color:blue;">return</span> <span style="color:blue;">await</span> Repository.ReadReservations(reservation.Date)
.Select(rs => maîtreD.TryAccept(rs, reservation))
.SelectMany(m => m.Traverse(Repository.Create))
.Match(InternalServerError(<span style="color:#a31515;">"Table unavailable"</span>), Ok);
}</pre>
</p>
<p>
The <code>Select</code> and <code>SelectMany</code> methods are defined on the <code>Task</code> monad. The <code>m</code> in the <code>SelectMany</code> lambda expression is the <code>Maybe<Reservation></code> returned by <code>TryAccept</code>. What would happen if you didn't have a <code>Traverse</code> method?
</p>
<p>
<pre>Task<Maybe<Task<<span style="color:blue;">int</span>>>> whatIsThis = Repository.ReadReservations(reservation.Date)
.Select(rs => maîtreD.TryAccept(rs, reservation))
.Select(m => m.Select(Repository.Create));</pre>
</p>
<p>
Notice that <code>whatIsThis</code> (so named because it's a temporary variable used to investigate the type of the expression so far) has an awkward type: <code>Task<Maybe<Task<<span style="color:blue;">int</span>>>></code>. That's a Task within a Maybe within a Task.
</p>
<p>
This makes it difficult to continue the composition and return an HTTP result.
</p>
<p>
Instead, use <code>Traverse</code>:
</p>
<p>
<pre>Task<Task<Maybe<<span style="color:blue;">int</span>>>> whatIsThis = Repository.ReadReservations(reservation.Date)
.Select(rs => maîtreD.TryAccept(rs, reservation))
.Select(m => m.Traverse(Repository.Create));</pre>
</p>
<p>
This flips the inner <code>Maybe<Task<<span style="color:blue;">int</span>>></code> to <code>Task<Maybe<<span style="color:blue;">int</span>>></code>. Now you have a Maybe within a Task within a Task. The outer two Tasks are now nicely nested, and it's a job for a monad to remove one level of nesting. That's the reason that the final composition uses <code>SelectMany</code> instead of <code>Select</code>.
</p>
<p>
The <code>Traverse</code> function is implemented like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Task<Maybe<TResult>> Traverse<<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">TResult</span>>(
<span style="color:blue;">this</span> Maybe<T> source,
Func<T, Task<TResult>> selector)
{
<span style="color:blue;">return</span> source.Match(
nothing: Task.FromResult(<span style="color:blue;">new</span> Maybe<TResult>()),
just: <span style="color:blue;">async</span> x => <span style="color:blue;">new</span> Maybe<TResult>(<span style="color:blue;">await</span> selector(x)));
}</pre>
</p>
<p>
The <em>idea</em> is reusable. You can also implement a similar traversal in F#:
</p>
<p>
<pre><span style="color:green;">// ('a -> Async<'b>) -> 'a option -> Async<'b option></span>
<span style="color:blue;">let</span> traverse f = <span style="color:blue;">function</span>
| Some x <span style="color:blue;">-></span> async {
<span style="color:blue;">let!</span> x' = f x
<span style="color:blue;">return</span> Some x' }
| None <span style="color:blue;">-></span> async { <span style="color:blue;">return</span> None }</pre>
</p>
<p>
You can see the F# function as well as a usage example in the article <a href="/2019/12/02/refactoring-registration-flow-to-functional-architecture">Refactoring registration flow to functional architecture</a>.
</p>
<h3 id="a9e25f8c3dc24d99b669f90a4e46afa0">
Sequence from traverse <a href="#a9e25f8c3dc24d99b669f90a4e46afa0" title="permalink">#</a>
</h3>
<p>
You've already seen that if you have a <em>sequence</em> function, you can implement <em>traverse</em>. I also claimed that the reverse is true: If you have <em>traverse</em> you can implement <em>sequence</em>.
</p>
<p>
When you've encountered these kinds of dual definitions a couple of times, you start to expect the ubiquitous identity function to make an appearance, and indeed it does:
</p>
<p>
<pre><span style="color:blue;">let</span> sequence x = traverse id x</pre>
</p>
<p>
That's the F# version where the identity function is built in as <code>id</code>. In C# you'd use a lambda expression:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Task<Maybe<T>> Sequence<<span style="color:#2b91af;">T</span>>(<span style="color:blue;">this</span> Maybe<Task<T>> source)
{
<span style="color:blue;">return</span> source.Traverse(x => x);
}</pre>
</p>
<p>
Since C# doesn't come with a predefined identity function, it's <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> to use <code>x => x</code> instead.
</p>
<h3 id="cc6c409706e24ea9b3ebefa49fcc3235">
Conclusion <a href="#cc6c409706e24ea9b3ebefa49fcc3235" title="permalink">#</a>
</h3>
<p>
Traversals are useful when you need to 'flip' the order of two different, nested functors. The outer one must be a traversable functor, and the inner an applicative functor.
</p>
<p>
Common traversable functors are List, Maybe, Either, Tree, and Identity, but there are more than those. In .NET you often need them when combining them with Tasks. In Haskell, they are useful when combined with <code>IO</code>.
</p>
<p>
<strong>Next:</strong> <a href="/2024/11/25/nested-monads">Nested monads</a>.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="c72c30e16cdd48419f95fd7ad5c74f81">
<div class="comment-author">qfilip <a href="#c72c30e16cdd48419f95fd7ad5c74f81">#</a></div>
<div class="comment-content">
<p>
Thanks for this one. You might be interested in <a href="https://andrewlock.net/working-with-the-result-pattern-part-1-replacing-exceptions-as-control-flow/">Andrew Lock's</a> take on the whole subject as well.
</p>
</div>
<div class="comment-date">2024-11-17 14:51 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Pendulum swing: no Haskell type annotation by defaulthttps://blog.ploeh.dk/2024/11/04/pendulum-swing-no-haskell-type-annotation-by-default2024-11-04T07:45:00+00:00Mark Seemann
<div id="post">
<p>
<em>Are Haskell IDE plugins now good enough that you don't need explicit type annotations?</em>
</p>
<p>
More than three years ago, I published <a href="/2021/02/22/pendulum-swings">a small article series</a> to document that I'd changed my mind on various small practices. Belatedly, here comes a fourth article, which, frankly, is a cousin rather than a sibling. Still, it fits the overall theme well enough to become another instalment in the series.
</p>
<p>
Here, I consider using fewer <a href="https://www.haskell.org/">Haskell</a> type annotations, following a practice that I've always followed in <a href="https://fsharp.org/">F#</a>.
</p>
<p>
To be honest, though, it's not that I've already applied the following practice for a long time, and only now write about it. It's rather that I feel the need to write this article to kick an old habit and start a new.
</p>
<h3 id="227874a509f24b93b9a091429b9ad03e">
Inertia <a href="#227874a509f24b93b9a091429b9ad03e">#</a>
</h3>
<p>
As I write in the dedication in <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>,
</p>
<blockquote>
<p>
"To my parents:
</p>
<p>
"My mother, Ulla Seemann, to whom I owe my attention to detail.
</p>
<p>
"My father, Leif Seemann, from whom I inherited my contrarian streak."
</p>
<footer><cite><a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a></cite>, dedication</footer>
</blockquote>
<p>
One should always be careful simplifying one's personality to a simple, easy-to-understand model, but a major point here is that I have two traits that pull in almost the opposite direction.
</p>
<p>
<img src="/content/binary/neatness-contrariness-vector-sum.png" alt="Two vectors labelled respectively neatness and contrariness pulling in almost opposing directions, while still not quite cancelling each other out, leaving a short vector sum pointing to the right.">
</p>
<p>
Despite much work, I only make slow progress. My desire to make things neat and proper almost cancel out my tendency to go against the norms. I tend to automatically toe whatever line that exists until the cognitive dissonance becomes so great that I can no longer ignore it.
</p>
<p>
I then write an article for the blog to clarify my thoughts.
</p>
<p>
You may read what comes next and ask, <em>what took you so long?!</em>
</p>
<p>
I can only refer to the above. I may look calm on the surface, but underneath I'm paddling like the dickens. Despite much work, though, only limited progress is visible.
</p>
<h3 id="a00a292d223a435b873f7cc1de1730c3">
Nudge <a href="#a00a292d223a435b873f7cc1de1730c3">#</a>
</h3>
<p>
Haskell is a statically typed language with the most powerful type system I know my way around. The types carry so much information that one can often infer <a href="/2022/10/24/encapsulation-in-functional-programming">a function's contract</a> from the type alone. This is also fortunate, since many Haskell libraries tend to have, shall we say, minimal documentation. Even so, I've often found myself able to figure out how to use an unfamiliar Haskell API by examining the various types that a library exports.
</p>
<p>
In fact, the type system is so powerful that it drives <a href="https://hoogle.haskell.org/">a specialized search engine</a>. If you need a function with the type <code>(<span style="color:#2b91af;">String</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">IO</span> <span style="color:#2b91af;">Int</span>) <span style="color:blue;">-></span> [<span style="color:#2b91af;">String</span>] <span style="color:blue;">-></span> <span style="color:#2b91af;">IO</span> [<span style="color:#2b91af;">Int</span>]</code> you can search for it. Hoogle will list all functions that match that type, including functions that are more abstract than your specialized need. You don't even have to imagine what the name might be.
</p>
<p>
Since the type system is so powerful, it's a major means of communication. Thus, it makes sense that <a href="https://en.wikipedia.org/wiki/Glasgow_Haskell_Compiler">GHC</a> regularly issues <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/using-warnings.html#ghc-flag--Wmissing-signatures">a warning</a> if a function lacks a type annotation.
</p>
<p>
While the compiler enables you to control which warnings are turned on, the <code>missing-signatures</code> warning is included in the popular <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/using-warnings.html#ghc-flag--Wall">all</a> flag that most people, I take it, use. I do, at least.
</p>
<p>
If you forget to declare the type of a function, the compiler will complain:
</p>
<p>
<pre>src\SecurityManager.hs:15:1: <span style="color:red;">warning</span>: [<span style="color:red;">GHC-38417</span>] [<span style="color:red;">-Wmissing-signatures</span>]
Top-level binding with no type signature:
createUser :: (Monad m, Text.Printf.PrintfArg b,
Text.Printf.PrintfArg (t a), Foldable t, Eq (t a)) =>
(String -> m ()) -> m (t a) -> (t a -> b) -> m ()
<span style="color:blue;"> |</span>
<span style="color:blue;">15 |</span> <span style="color:red;">createUser</span> writeLine readLine encrypt = do
<span style="color:blue;"> |</span> <span style="color:red;">^^^^^^^^^^</span></pre>
</p>
<p>
This is a strong nudge that you're supposed to give each function a type declaration, so I've been doing that for years. Neat and proper.
</p>
<p>
Of course, if you treat warnings as errors, as <a href="/code-that-fits-in-your-head">I recommend</a>, the nudge becomes a law.
</p>
<h3 id="cf16318003ef46ed8c67d81217e56011">
Learning from F# <a href="#cf16318003ef46ed8c67d81217e56011">#</a>
</h3>
<p>
While I try to adopt the style and <a href="/2015/08/03/idiomatic-or-idiosyncratic">idioms</a> of any language I work in, it's always annoyed me that I had to add a type annotation to a Haskell function. After all, the compiler can usually infer the type. Frankly, adding a type signature feels like redundant ceremony. It's like having to declare a function in a header file before being able to implement it in another file.
</p>
<p>
This particularly bothers me because I've long since abandoned type annotations in F#. As far as I can tell, most of the F# community has, too.
</p>
<p>
When you implement an F# function, you just write the implementation and let the compiler infer the type. (Code example from <a href="/2019/12/16/zone-of-ceremony">Zone of Ceremony</a>.)
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:blue;">inline</span> <span style="color:#74531f;">consume</span> <span style="color:#1f377f;">quantity</span> =
<span style="color:blue;">let</span> <span style="color:#74531f;">go</span> (<span style="color:#1f377f;">acc</span>, <span style="color:#1f377f;">xs</span>) <span style="color:#1f377f;">x</span> =
<span style="color:blue;">if</span> <span style="color:#1f377f;">quantity</span> <= <span style="color:#1f377f;">acc</span>
<span style="color:blue;">then</span> (<span style="color:#1f377f;">acc</span>, <span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">append</span> <span style="color:#1f377f;">xs</span> (<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">singleton</span> <span style="color:#1f377f;">x</span>))
<span style="color:blue;">else</span> (<span style="color:#1f377f;">acc</span> + <span style="color:#1f377f;">x</span>, <span style="color:#1f377f;">xs</span>)
<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">fold</span> <span style="color:#74531f;">go</span> (<span style="color:#2b91af;">LanguagePrimitives</span>.GenericZero, <span style="color:#2b91af;">Seq</span>.empty) >> <span style="color:#74531f;">snd</span></pre>
</p>
<p>
Since F# often has to interact with .NET code written in C#, you regularly have to add <em>some</em> type annotations to help the compiler along:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">average</span> (<span style="font-weight:bold;color:#1f377f;">timeSpans</span> : <span style="color:#2b91af;">NonEmpty</span><<span style="color:#2b91af;">TimeSpan</span>>) =
[ <span style="font-weight:bold;color:#1f377f;">timeSpans</span>.Head ] @ <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">ofSeq</span> <span style="font-weight:bold;color:#1f377f;">timeSpans</span>.Tail
|> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">averageBy</span> (_.Ticks >> <span style="color:#74531f;">double</span>)
|> <span style="color:#74531f;">int64</span>
|> <span style="color:#2b91af;">TimeSpan</span>.<span style="font-weight:bold;color:#74531f;">FromTicks</span></pre>
</p>
<p>
Even so, I follow the rule of minimal annotations: Only add the type information required to compile, and let the compiler infer the rest. For example, the above <a href="/2024/05/06/conservative-codomain-conjecture">average function</a> has the inferred type <code><span style="color:#2b91af;">NonEmpty</span><span style="color:#2b91af;"><</span><span style="color:#2b91af;">TimeSpan</span><span style="color:#2b91af;">></span> <span style="color:blue;">-></span> <span style="color:#2b91af;">TimeSpan</span></code>. While I had to specify the input type in order to be able to use the <a href="https://learn.microsoft.com/dotnet/api/system.datetime.ticks">Ticks property</a>, I didn't have to specify the return type. So I didn't.
</p>
<p>
My impression from reading other people's F# code is that this is a common, albeit not universal, approach to type annotation.
</p>
<p>
This minimizes ceremony, since you only need to declare and maintain the types that the compiler can't infer. There's no reason to repeat the work that the compiler can already do, and in practice, if you do, it just gets in the way.
</p>
<h3 id="fdd9161164f64f438aa0bedf5ff6f9a8">
Motivation for explicit type definitions <a href="#fdd9161164f64f438aa0bedf5ff6f9a8">#</a>
</h3>
<p>
When I extol the merits of static types, proponents of dynamically typed languages often argue that the types are in the way. Granted, this is <a href="/2021/08/09/am-i-stuck-in-a-local-maximum">a discussion that I still struggle with</a>, but based on my understanding of the argument, it seems entirely reasonable. After all, if you have to spend time declaring the type of each and every parameter, as well as a function's return type, it does seem to be in the way. This is only exacerbated if you later change your mind.
</p>
<p>
Programming is, to a large extend, an explorative activity. You start with one notion of how your code should be structured, but as you progress, you learn. You'll often have to go back and change existing code. This, as far as I can tell, is much easier in, say, <a href="https://www.python.org/">Python</a> or <a href="https://clojure.org/">Clojure</a> than in C# or <a href="https://www.java.com/">Java</a>.
</p>
<p>
If, however, one extrapolates from the experience with Java or C# to all statically typed languages, that would be a logical fallacy. My point with <a href="/2019/12/16/zone-of-ceremony">Zone of Ceremony</a> was exactly that there's a group of languages 'to the right' of high-ceremony languages with low levels of ceremony. Even though they're statically typed.
</p>
<p>
I have to admit, however, that in that article I cheated a little in order to drive home a point. While you <em>can</em> write Haskell code in a low-ceremony style, the tooling (in the form of the <code>all</code> warning set, at least) encourages a high-ceremony style. Add those type definitions, even thought they're redundant.
</p>
<p>
It's not that I don't understand some of the underlying motivation behind that rule. <a href="http://dmwit.com/">Daniel Wagner</a> enumerated several reasons in <a href="https://stackoverflow.com/a/19626857/126014">a 2013 Stack Overflow answer</a>. Some of the reasons still apply, but on the other hand, the world has also moved on in the intervening decade.
</p>
<p>
To be honest, the Haskell <a href="https://en.wikipedia.org/wiki/Integrated_development_environment">IDE</a> situation has always been precarious. One day, it works really well; the next day, I struggle with it. Over the years, though, things have improved.
</p>
<p>
There was a time when an explicit type definition was a indisputable help, because you couldn't rely on tools to light up and tell you what the inferred type was.
</p>
<p>
Today, on the other hand, the <a href="https://marketplace.visualstudio.com/items?itemName=haskell.haskell">Haskell extension for Visual Studio Code</a> automatically displays the inferred type above a function implementation:
</p>
<p>
<img src="/content/binary/haskell-code-with-inferred-type-displayed-by-vs-code.png" alt="Screen shot of a Haskell function in Visual Studio Code with the function's type automatically displayed above it by the Haskell extension.">
</p>
<p>
To be clear, the top line that shows the type definition is not part of the source code. It's just shown by Visual Studio Code as a code lens (I think it's called), and it automatically changes if I edit the code in such a way that the type changes.
</p>
<p>
If you can rely on such automatic type information, it seems that an explicit type declaration is less useful. It's at least one less reason to add type annotations to the source code.
</p>
<h3 id="367135868de54bcb8eebd2d9bc9a0f8c">
Ceremony example <a href="#367135868de54bcb8eebd2d9bc9a0f8c">#</a>
</h3>
<p>
In order to explain what I mean by <em>the types being in the way</em>, I'll give an example. Consider the code example from the article <a href="/2024/10/21/legacy-security-manager-in-haskell">Legacy Security Manager in Haskell</a>. In it, I described how every time I made a change to the <code>createUser</code> action, I had to effectively remove and re-add the type declaration.
</p>
<p>
It doesn't have to be like that. If instead I'd started without type annotations, I could have moved forward without being slowed down by having to edit type definitions. Take the first edit, breaking the dependency on the console, as an example. Without type annotations, the <code>createUser</code> action would look exactly as before, just without the type declaration. Its type would still be <code>IO ()</code>.
</p>
<p>
After the first edit, the first lines of the action now look like this:
</p>
<p>
<pre>createUser writeLine readLine = <span style="color:blue;">do</span>
<span style="color:blue;">()</span> <- writeLine <span style="color:#a31515;">"Enter a username"</span>
<span style="color:green;">-- ...</span></pre>
</p>
<p>
Even without a type definition, the action still has a type. The compiler infers it to be <code>(<span style="color:blue;">Monad</span> m, <span style="color:blue;">Eq</span> a, <span style="color:blue;">IsChar</span> a) <span style="color:blue;">=></span> (<span style="color:#2b91af;">String</span> <span style="color:blue;">-></span> m ()) <span style="color:blue;">-></span> m [a] <span style="color:blue;">-></span> m ()</code>, which is certainly a bit of a mouthful, but exactly what I had explicitly added in the other article.
</p>
<p>
The code doesn't compile until I also change the <code>main</code> method to pass the new parameters:
</p>
<p>
<pre>main = createUser <span style="color:blue;">putStrLn</span> <span style="color:blue;">getLine</span></pre>
</p>
<p>
You'd have to make a similar edit in, say, Python, although there'd be no compiler to remind you. My point isn't that this is better than a dynamically typed language, but rather that it's on par. The types aren't in the way.
</p>
<p>
We see the similar lack of required ceremony when the <code>createUser</code> action finally pulls in the <code>comparePasswords</code> and <code>validatePassword</code> functions:
</p>
<p>
<pre>createUser writeLine readLine encrypt = <span style="color:blue;">do</span>
<span style="color:blue;">()</span> <- writeLine <span style="color:#a31515;">"Enter a username"</span>
username <- readLine
writeLine <span style="color:#a31515;">"Enter your full name"</span>
fullName <- readLine
writeLine <span style="color:#a31515;">"Enter your password"</span>
password <- readLine
writeLine <span style="color:#a31515;">"Re-enter your password"</span>
confirmPassword <- readLine
writeLine $ either
<span style="color:blue;">id</span>
(printf <span style="color:#a31515;">"Saving Details for User (%s, %s, %s)"</span> username fullName . encrypt)
(validatePassword =<< comparePasswords password confirmPassword)</pre>
</p>
<p>
Again, there's no type annotation, and while the type actually <em>does</em> change to
</p>
<p>
<pre>(<span style="color:blue;">Monad</span> m, <span style="color:blue;">PrintfArg</span> b, <span style="color:blue;">PrintfArg</span> (t a), <span style="color:blue;">Foldable</span> t, <span style="color:blue;">Eq</span> (t a)) <span style="color:blue;">=></span>
(<span style="color:#2b91af;">String</span> <span style="color:blue;">-></span> m ()) <span style="color:blue;">-></span> m (t a) <span style="color:blue;">-></span> (t a <span style="color:blue;">-></span> b) <span style="color:blue;">-></span> m ()</pre>
</p>
<p>
it impacts none of the existing code. Again, the types aren't in the way, and no ceremony is required.
</p>
<p>
Compare that inferred type signature with the explicit final type annotation in <a href="/2024/10/21/legacy-security-manager-in-haskell">the previous article</a>. The inferred type is much more abstract and permissive than the explicit declaration, although I also grant that Daniel Wagner had a point that you can make explicit type definitions more reader-friendly.
</p>
<h3 id="d4469073def54f289edb56d1ca8417ee">
Flies in the ointment <a href="#d4469073def54f289edb56d1ca8417ee">#</a>
</h3>
<p>
Do the inferred types communicate intent? That's debatable. For example, it's not immediately clear that the above <code>t a</code> allows <code>String</code>.
</p>
<p>
Another thing that annoys me is that I had to add that <em>unit</em> binding on the first line:
</p>
<p>
<pre>createUser writeLine readLine encrypt = <span style="color:blue;">do</span>
<span style="color:blue;">()</span> <- writeLine <span style="color:#a31515;">"Enter a username"</span>
<span style="color:green;">-- ...</span></pre>
</p>
<p>
The reason for that is that if I don't do that (that is, if I just write <code>writeLine "Xyz"</code> all the way), the compiler infers the type of <code>writeLine</code> to be <code><span style="color:#2b91af;">String</span> <span style="color:blue;">-></span> m b2</code>, rather than just <code><span style="color:#2b91af;">String</span> <span style="color:blue;">-></span> m ()</code>. In effect, I want <code>b2 ~ ()</code>, but because the compiler thinks that <code>b2</code> may be anything, it issues an <a href="https://downloads.haskell.org/ghc/latest/docs/users_guide/using-warnings.html#ghc-flag--Wunused-do-bind">unused-do-bind</a> warning.
</p>
<p>
The idiomatic way to resolve that situation is to add a type definition, but that's the situation I'm trying to avoid. Thus, my desire to do without annotations pushes me to write unnatural implementation code. This reminds me of the notion of <a href="https://dhh.dk/2014/test-induced-design-damage.html">test-induced damage</a>. This is at best a disagreeable compromise.
</p>
<p>
It also annoys me that implementation details leak out to the inferred type, witnessed by the <code>PrintfArg</code> type constraint. What happens if I change the implementation to use list concatenation?
</p>
<p>
<pre>createUser writeLine readLine encrypt = <span style="color:blue;">do</span>
<span style="color:blue;">()</span> <- writeLine <span style="color:#a31515;">"Enter a username"</span>
username <- readLine
writeLine <span style="color:#a31515;">"Enter your full name"</span>
fullName <- readLine
writeLine <span style="color:#a31515;">"Enter your password"</span>
password <- readLine
writeLine <span style="color:#a31515;">"Re-enter your password"</span>
confirmPassword <- readLine
<span style="color:blue;">let</span> createMsg pwd =
<span style="color:#a31515;">"Saving Details for User ("</span> ++ username ++<span style="color:#a31515;">", "</span> ++ fullName ++ <span style="color:#a31515;">", "</span> ++ pwd ++<span style="color:#a31515;">")"</span>
writeLine $ either
<span style="color:blue;">id</span>
(createMsg . encrypt)
(validatePassword =<< comparePasswords password confirmPassword)</pre>
</p>
<p>
If I do that, the type also changes:
</p>
<p>
<pre><span style="color:blue;">Monad</span> m <span style="color:blue;">=></span> (<span style="color:#2b91af;">String</span> <span style="color:blue;">-></span> m ()) <span style="color:blue;">-></span> m [<span style="color:#2b91af;">Char</span>] <span style="color:blue;">-></span> ([<span style="color:#2b91af;">Char</span>] <span style="color:blue;">-></span> [<span style="color:#2b91af;">Char</span>]) <span style="color:blue;">-></span> m ()</pre>
</p>
<p>
While we get rid of the <code>PrintfArg</code> type constraint, the type becomes otherwise more concrete, now operating on <code>String</code> values (keeping in mind that <code>String</code> is a type synonym for <code>[Char]</code>).
</p>
<p>
The code still compiles, and all tests still pass, because the abstraction I've had in mind all along is essentially this last type.
</p>
<p>
The <code>writeLine</code> action should take a <code>String</code> and have some side effect, but return no data. The type <code><span style="color:#2b91af;">String</span> <span style="color:blue;">-></span> m ()</code> nicely models that, striking a fine balance between being sufficiently concrete to capture intent, but still abstract enough to be testable.
</p>
<p>
The <code>readLine</code> action should provide input <code>String</code> values, and again <code>m String</code> nicely models that concern.
</p>
<p>
Finally, <code>encrypt</code> is indeed a naked <code>String</code> <a href="https://en.wikipedia.org/wiki/Endomorphism">endomorphism</a>: <code>String -> String</code>.
</p>
<p>
With my decades of experience with object-oriented design, it still strikes me as odd that implementation details can make a type more abstract, but once you think it over, it may be okay.
</p>
<h3 id="a82d4017be064ce980c40e22aa6f801e">
More liberal abstractions <a href="#a82d4017be064ce980c40e22aa6f801e">#</a>
</h3>
<p>
The inferred types are consistently more liberal than the abstraction I have in mind, which is
</p>
<p>
<pre><span style="color:blue;">Monad</span> m <span style="color:blue;">=></span> (<span style="color:#2b91af;">String</span> <span style="color:blue;">-></span> m ()) <span style="color:blue;">-></span> m <span style="color:#2b91af;">String</span> <span style="color:blue;">-></span> (<span style="color:#2b91af;">String</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">String</span>) <span style="color:blue;">-></span> m ()</pre>
</p>
<p>
In all cases, the inferred types include that type as a subset.
</p>
<p>
<img src="/content/binary/create-user-abstraction-sets.png" alt="Various sets of inferred types.">
</p>
<p>
I hope that I've created the above diagram so that it makes sense, but the point I'm trying to get across is that the two type definitions in the lower middle are equivalent, and are the most specific types. That's the intended abstraction. Thinking of <a href="/2021/11/15/types-as-sets">types as sets</a>, all the other inferred types are supersets of that type, in various ways. Even though implementation details leak out in the shape of <code>PrintfArg</code> and <code>IsChar</code>, these are effectually larger sets.
</p>
<p>
This takes some getting used to: The implementation details are <em>more</em> liberal than the abstraction. This seems to be at odds with the <a href="https://en.wikipedia.org/wiki/Dependency_inversion_principle">Dependency Inversion Principle</a> (DIP), which suggests that abstractions shouldn't depend on implementation details. I'm not yet sure what to make of this, but I suspect that this is more of problem of overlapping linguistic semantics than software design. What I mean is that I have a feeling that 'implementation detail' have more than one meaning. At least, in the perspective of the DIP, an implementation detail <em>limits</em> your options. For example, depending on a particular database technology is more constraining than depending on some abstract notion of what the persistence mechanism might be. Contrast this with an implementation detail such as the <code>PrintfArg</code> type constraint. It doesn't narrow your options; on the contrary, it makes the implementation more liberal.
</p>
<p>
Still, while an implementation should <a href="https://en.wikipedia.org/wiki/Robustness_principle">be liberal in what it accepts</a>, it's probably not a good idea to publish such a capability to the wider world. After all, if you do, <a href="https://www.hyrumslaw.com/">someone will eventually rely on it</a>.
</p>
<h3 id="42ffe5249c7542809ca55a95a8f15f6c">
For internal use only <a href="#42ffe5249c7542809ca55a95a8f15f6c">#</a>
</h3>
<p>
Going through all these considerations, I think I'll revise my position as the following.
</p>
<p>
I'll forgo type annotations as long as I explore a problem space. For internal application use, this may effectively mean forever, in the sense that how you compose an application from smaller building blocks is likely to be in permanent flux. Here I have in mind your average web asset or other public-facing service that's in constant development. You keep adding new features, or changing domain logic as the overall business evolves.
</p>
<p>
As I've also recently discussed, <a href="/2024/02/05/statically-and-dynamically-typed-scripts">Haskell is a great scripting language</a>, and I think that here, too, I'll dial down the type definitions.
</p>
<p>
If I ever do another <a href="https://adventofcode.com/">Advent of Code</a> in Haskell, I think I'll also eschew explicit type annotations.
</p>
<p>
On the other hand, I can see that once an API stabilizes, you may want to lock it down. This may also apply to internal abstractions if you're working in a team and you explicitly want to communicate what a contract is.
</p>
<p>
If the code is a reusable library, I think that explicit type definitions are still required. Both for the reasons outlined by Daniel Wagner, and also to avoid being the victim of <a href="https://www.hyrumslaw.com/">Hyrum's law</a>.
</p>
<p>
That's why I phrase this pendulum swing as a new <em>default</em>. I'll begin programming without type definitions, but add them as needed. The point is rather that there may be parts of a code base where they're never needed, and then it's okay to keep going without them.
</p>
<p>
You can use a language pragma to opt out of the <code>missing-signatures</code> compiler warning on a module-by-module basis:
</p>
<p>
<pre>{-# <span style="color:gray;">OPTIONS_GHC</span> -Wno-missing-signatures #-}</pre>
</p>
<p>
This will enable me to rely on type inference in parts of the code base, while keeping the build clean of compiler warnings.
</p>
<h3 id="36e2b141fff548678e34d24eda5a3e03">
Conclusion <a href="#36e2b141fff548678e34d24eda5a3e03">#</a>
</h3>
<p>
I've always appreciated the F# compiler's ability to infer types and just let type changes automatically ripple through the code base. For that reason, the Haskell norm of explicitly adding a (redundant) type annotation has always vexed me.
</p>
<p>
It often takes me a long time to reach seemingly obvious conclusions, such as: Don't always add type definitions to Haskell functions. Let the type inference engine do its job.
</p>
<p>
The reason it takes me so long to take such a small step is that I want to follow 'best practice'; I want to write idiomatic code. When the standard compiler-warning set complains about missing type definitions, it takes me significant deliberation to discard such advice. I could imagine other programmers being in the same situation, which is one reason I wrote this article.
</p>
<p>
The point isn't that type definitions are a universally bad idea. They aren't. Rather, the point is only that it's also okay to do without them in parts of a code base. Perhaps only temporarily, but in some cases maybe permanently.
</p>
<p>
The <code>missing-signatures</code> warning shouldn't, I now believe, be considered an absolute law, but rather a contextual rule.
</p>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Functor compositionshttps://blog.ploeh.dk/2024/10/28/functor-compositions2024-10-28T06:58:00+00:00Mark Seemann
<div id="post">
<p>
<em>A functor nested within another functor forms a functor. With examples in C# and another language.</em>
</p>
<p>
This article is part of <a href="/2022/07/11/functor-relationships">a series of articles about functor relationships</a>. In this one you'll learn about a universal composition of functors. In short, if you have one functor nested within another functor, then this composition itself gives rise to a functor.
</p>
<p>
Together with other articles in this series, this result can help you answer questions such as: <em>Does this data structure form a functor?</em>
</p>
<p>
Since <a href="/2018/03/22/functors">functors</a> tend to be quite common, and since they're useful enough that many programming languages have special support or syntax for them, the ability to recognize a potential functor can be useful. Given a type like <code>Foo<T></code> (C# syntax) or <code>Bar<T1, T2></code>, being able to recognize it as a functor can come in handy. One scenario is if you yourself have just defined this data type. Recognizing that it's a functor strongly suggests that you should give it a <code>Select</code> method in C#, a <code>map</code> function in <a href="https://fsharp.org/">F#</a>, and so on.
</p>
<p>
Not all generic types give rise to a (covariant) functor. Some are rather <a href="/2021/09/02/contravariant-functors">contravariant functors</a>, and some are <a href="/2022/08/01/invariant-functors">invariant</a>.
</p>
<p>
If, on the other hand, you have a data type where one functor is nested within another functor, then the data type itself gives rise to a functor. You'll see some examples in this article.
</p>
<h3 id="a97b2f6471b74db6a83362a552ee5b03">
Abstract shape <a href="#a97b2f6471b74db6a83362a552ee5b03">#</a>
</h3>
<p>
Before we look at some examples found in other code, it helps if we know what we're looking for. Imagine that you have two functors <code>F</code> and <code>G</code>, and you're now considering a data structure that contains a value where <code>G</code> is nested inside of <code>F</code>.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">GInF</span><<span style="color:#2b91af;">T</span>>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">F</span><<span style="color:#2b91af;">G</span><<span style="color:#2b91af;">T</span>>> ginf;
<span style="color:blue;">public</span> <span style="color:#2b91af;">GInF</span>(<span style="color:#2b91af;">F</span><<span style="color:#2b91af;">G</span><<span style="color:#2b91af;">T</span>>> <span style="font-weight:bold;color:#1f377f;">ginf</span>)
{
<span style="color:blue;">this</span>.ginf = <span style="font-weight:bold;color:#1f377f;">ginf</span>;
}
<span style="color:green;">// Methods go here...</span></pre>
</p>
<p>
The <code><span style="color:#2b91af;">GInF</span><<span style="color:#2b91af;">T</span>></code> class has a single class field. The type of this field is an <code>F</code> <a href="https://bartoszmilewski.com/2014/01/14/functors-are-containers/">container</a>, but 'inside' <code>F</code> there's a <code>G</code> functor.
</p>
<p>
This kind of data structure gives rise to a functor. Knowing that, you can give it a <code>Select</code> method:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">GInF</span><<span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#74531f;">Select</span><<span style="color:#2b91af;">TResult</span>>(<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">selector</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">GInF</span><<span style="color:#2b91af;">TResult</span>>(ginf.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">g</span> => <span style="font-weight:bold;color:#1f377f;">g</span>.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>)));
}</pre>
</p>
<p>
The composed <code>Select</code> method calls <code>Select</code> on the <code>F</code> functor, passing it a lambda expression that calls <code>Select</code> on the <code>G</code> functor. That nested <code>Select</code> call produces an <code><span style="color:#2b91af;">F</span><<span style="color:#2b91af;">G</span><<span style="color:#2b91af;">TResult</span>>></code> that the composed <code>Select</code> method finally wraps in a <code><span style="color:blue;">new</span> <span style="color:#2b91af;">GInF</span><<span style="color:#2b91af;">TResult</span>></code> object that it returns.
</p>
<p>
I'll have more to say about how this generalizes to a nested composition of more than two functors, but first, let's consider some examples.
</p>
<h3 id="fcd4126b51c24b10867de4280f5e8844">
Priority list <a href="#fcd4126b51c24b10867de4280f5e8844">#</a>
</h3>
<p>
A common configuration is when the 'outer' functor is a collection, and the 'inner' functor is some other kind of container. The article <a href="/2024/07/01/an-immutable-priority-collection">An immutable priority collection</a> shows a straightforward example. The <code><span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>></code> class composes a single class field:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>>[] priorities;</pre>
</p>
<p>
The <code>priorities</code> field is an array (a collection) of <code><span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>></code> objects. That type is a simple <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/builtin-types/record">record</a> type:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">record</span> <span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">Item</span>, <span style="color:blue;">byte</span> <span style="font-weight:bold;color:#1f377f;">Priority</span>);</pre>
</p>
<p>
If we squint a little and consider only the parameter list, we may realize that this is fundamentally an 'embellished' tuple: <code>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">Item</span>, <span style="color:blue;">byte</span> <span style="font-weight:bold;color:#1f377f;">Priority</span>)</code>. <a href="/2018/12/31/tuple-bifunctor">A pair forms a bifunctor</a>, but in the <a href="https://www.haskell.org/">Haskell</a> <code>Prelude</code> a tuple is also a <code>Functor</code> instance over its rightmost element. In other words, if we'd swapped the <code><span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>></code> constructor parameters, it might have naturally looked like something we could <code>fmap</code>:
</p>
<p>
<pre>ghci> fmap (elem 'r') (55, "foo")
(55,False)</pre>
</p>
<p>
Here we have a tuple of an integer and a string. Imagine that the number <code>55</code> is the priority that we give to the label <code>"foo"</code>. This little ad-hoc example demonstrates how to map that tuple to another tuple with a priority, but now it instead holds a Boolean value indicating whether or not the string contained the character <code>'r'</code> (which it didn't).
</p>
<p>
You can easily swap the elements:
</p>
<p>
<pre>ghci> import Data.Tuple
ghci> swap (55, "foo")
("foo",55)</pre>
</p>
<p>
This looks just like the <code><span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>></code> parameter list. This also implies that if you originally have the parameter list in that order, you could <code>swap</code> it, map it, and swap it again:
</p>
<p>
<pre>ghci> swap $ fmap (elem 'r') $ swap ("foo", 55)
(False,55)</pre>
</p>
<p>
My point is only that <code><span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>></code> is isomorphic to a known functor. In reality you rarely need to analyze things that thoroughly to come to that realization, but the bottom line is that you can give <code><span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>></code> a lawful <code>Select</code> method:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">record</span> <span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">Item</span>, <span style="color:blue;">byte</span> <span style="font-weight:bold;color:#1f377f;">Priority</span>)
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#74531f;">Select</span><<span style="color:#2b91af;">TResult</span>>(<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">selector</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>(Item), Priority);
}
}</pre>
</p>
<p>
Hardly surprising, but since this article postulates that a functor of a functor is a functor, and since we already know that collections give rise to a functor, we should deduce that we can give <code><span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>></code> a <code>Select</code> method. And we can:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#74531f;">Select</span><<span style="color:#2b91af;">TResult</span>>(<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">selector</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">TResult</span>>(
priorities.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">p</span> => <span style="font-weight:bold;color:#1f377f;">p</span>.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>)).<span style="font-weight:bold;color:#74531f;">ToArray</span>());
}</pre>
</p>
<p>
Notice how much this implementation looks like the above <code><span style="color:#2b91af;">GInF</span><<span style="color:#2b91af;">T</span>></code> 'shape' implementation.
</p>
<h3 id="32b4e828d4584c3d8cda81a9682aee34">
Tree <a href="#32b4e828d4584c3d8cda81a9682aee34">#</a>
</h3>
<p>
An example only marginally more complicated than the above is shown in <a href="/2018/08/06/a-tree-functor">A Tree functor</a>. The <code><span style="color:#2b91af;">Tree</span><<span style="color:#2b91af;">T</span>></code> class shown in that article contains two constituents:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">Tree</span><<span style="color:#2b91af;">T</span>>> children;
<span style="color:blue;">public</span> <span style="color:#2b91af;">T</span> Item { <span style="color:blue;">get</span>; }</pre>
</p>
<p>
Just like <code><span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>></code> there's a collection, as well as a 'naked' <code>T</code> value. The main difference is that here, the collection is of the same type as the object itself: <code><span style="color:#2b91af;">Tree</span><<span style="color:#2b91af;">T</span>></code>.
</p>
<p>
You've seen a similar example in <a href="/2024/10/14/functor-sums">the previous article</a>, which also had a recursive data structure. If you assume, however, that <code><span style="color:#2b91af;">Tree</span><<span style="color:#2b91af;">T</span>></code> gives rise to a functor, then so does the nested composition of putting it in a collection. This means, from the 'theorem' put forth in this article, that <code><span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">Tree</span><<span style="color:#2b91af;">T</span>>></code> composes as a functor. Finally you have a product of a <code>T</code> (which is isomorphic to the <a href="/2018/09/03/the-identity-functor">Identity functor</a>) and that composed functor. From <a href="/2024/09/16/functor-products">Functor products</a> it follows that that's a functor too, which explains why <code><span style="color:#2b91af;">Tree</span><<span style="color:#2b91af;">T</span>></code> forms a functor. <a href="/2018/08/06/a-tree-functor">The article</a> shows the <code>Select</code> implementation.
</p>
<h3 id="17209725eab64da598ba924342dafbd0">
Binary tree Zipper <a href="#17209725eab64da598ba924342dafbd0">#</a>
</h3>
<p>
In both previous articles you've seen pieces of the puzzle explaining why the <a href="/2024/09/09/a-binary-tree-zipper-in-c">binary tree Zipper</a> gives rise to functor. There's one missing piece, however, that we can now finally address.
</p>
<p>
Recall that <code><span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">T</span>></code> composes these two objects:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>> Tree { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">Crumb</span><<span style="color:#2b91af;">T</span>>> Breadcrumbs { <span style="color:blue;">get</span>; }</pre>
</p>
<p>
We've already established that both <code><span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>></code> and <code><span style="color:#2b91af;">Crumb</span><<span style="color:#2b91af;">T</span>></code> form functors. In this article you've learned that a functor in a functor is a functor, which applies to <code><span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">Crumb</span><<span style="color:#2b91af;">T</span>>></code>. Both of the above read-only properties are functors, then, which means that the entire class is a product of functors. The <code>Select</code> method follows:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#74531f;">Select</span><<span style="color:#2b91af;">TResult</span>>(<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">selector</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">TResult</span>>(
Tree.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>),
Breadcrumbs.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">c</span> => <span style="font-weight:bold;color:#1f377f;">c</span>.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>)));
}</pre>
</p>
<p>
Notice that this <code>Select</code> implementation calls <code>Select</code> on the 'outer' <code>Breadcrumbs</code> by calling <code>Select</code> on each <code><span style="color:#2b91af;">Crumb</span><<span style="color:#2b91af;">T</span>></code>. This is similar to the previous examples in this article.
</p>
<h3 id="800728c4c9c54aec815c62352843d52b">
Other nested containers <a href="#800728c4c9c54aec815c62352843d52b">#</a>
</h3>
<p>
There are plenty of other examples of functors that contains other functor values. Asynchronous programming supplies its own family of examples.
</p>
<p>
The way that C# and many other languages model asynchronous or I/O-bound actions is to wrap them in a <a href="https://learn.microsoft.com/dotnet/api/system.threading.tasks.task-1">Task</a> container. If the value inside the <code>Task<T></code> container is itself a functor, you can make that a functor, too. Examples include <code>Task<IEnumerable<T>></code>, <code>Task<Maybe<T>></code> (or its close cousin <code>Task<T?></code>; notice <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/builtin-types/nullable-reference-types">the question mark</a>), <code>Task<Result<T1, T2>></code>, etc. You'll run into such types every time you have an I/O-bound or concurrent operation that returns <code>IEnumerable<T></code>, <code>Maybe<T></code> etc. as an asynchronous result.
</p>
<p>
While you <em>can</em> make such nested task functors a functor in its own right, you rarely need that in languages with native <code>async</code> and <code>await</code> features, since those languages nudge you in other directions.
</p>
<p>
You can, however, run into other issues with task-based programming, but you'll see examples and solutions in <a href="/2024/11/11/traversals">a future article</a>.
</p>
<p>
You'll run into other examples of nested containers with many property-based testing libraries. They typically define <a href="/2017/09/18/the-test-data-generator-functor">Test Data Generators</a>, often called <code>Gen<T></code>. For .NET, both <a href="https://fscheck.github.io/FsCheck/">FsCheck</a>, <a href="https://github.com/hedgehogqa/fsharp-hedgehog">Hedgehog</a>, and <a href="https://github.com/AnthonyLloyd/CsCheck">CsCheck</a> does this. For Haskell, <a href="https://hackage.haskell.org/package/QuickCheck">QuickCheck</a>, too, defines <code>Gen a</code>.
</p>
<p>
You often need to generate random collections, in which case you'd work with <code>Gen<IEnumerable<T>></code> or a similar collection type. If you need random <a href="/2018/03/26/the-maybe-functor">Maybe</a> values, you'll work with <code>Gen<Maybe<T>></code>, and so on.
</p>
<p>
On the other hand, <a href="/2016/06/28/roman-numerals-via-property-based-tdd">sometimes you need</a> to work with a collection of generators, such as <code>seq<Gen<'a>></code>.
</p>
<p>
These are all examples of functors within functors. It's not a given that you <em>must</em> treat such a combination as a functor in its own right. To be honest, typically, you don't. On the other hand, if you find yourself writing <code>Select</code> within <code>Select</code>, or <code>map</code> within <code>map</code>, depending on your language, it might make your code more succinct and readable if you give that combination a specialized functor affordance.
</p>
<h3 id="bffe8909eb904260be8aa4ab1a22efb2">
Higher arities <a href="#bffe8909eb904260be8aa4ab1a22efb2">#</a>
</h3>
<p>
Like the previous two articles, the 'theorem' presented here generalizes to more than two functors. If you have a third <code>H</code> functor, then <code>F<G<H<T>>></code> also gives rise to a functor. You can easily prove this by simple induction. We may first consider the base case. With a single functor (<em>n = 1</em>) any functor (say, <code>F</code>) is trivially a functor.
</p>
<p>
In the induction step (<em>n > 1</em>), you then assume that the <em>n - 1</em> 'stack' of functors already gives rise to a functor, and then proceed to prove that the configuration where all those nested functors are wrapped by yet another functor also forms a functor. Since the 'inner stack' of functors forms a functor (by assumption), you only need to prove that a configuration of the outer functor, and that 'inner stack', gives rise to a functor. You've seen how this works in this article, but I admit that a few examples constitute no proof. I'll leave you with only a sketch of this step, but you may consider using equational reasoning <a href="https://bartoszmilewski.com/2015/01/20/functors/">as demonstrated by Bartosz Milewski</a> and then prove the functor laws for such a composition.
</p>
<p>
The Haskell <a href="https://hackage.haskell.org/package/base/docs/Data-Functor-Compose.html">Data.Functor.Compose</a> module defines a general-purpose data type to compose functors. You may, for example, compose a tuple inside a Maybe inside a list:
</p>
<p>
<pre><span style="color:#2b91af;">thriceNested</span> <span style="color:blue;">::</span> <span style="color:blue;">Compose</span> [] (<span style="color:blue;">Compose</span> <span style="color:#2b91af;">Maybe</span> ((,) <span style="color:#2b91af;">Integer</span>)) <span style="color:#2b91af;">String</span>
thriceNested = Compose [Compose (Just (42, <span style="color:#a31515;">"foo"</span>)), Compose Nothing, Compose (Just (89, <span style="color:#a31515;">"ba"</span>))]</pre>
</p>
<p>
You can easily <code>fmap</code> that data structure, for example by evaluating whether the number of characters in each string is an odd number (if it's there at all):
</p>
<p>
<pre>ghci> fmap (odd . length) thriceNested
Compose [Compose (Just (42,True)),Compose Nothing,Compose (Just (89,False))]</pre>
</p>
<p>
The first element now has <code>True</code> as the second tuple element, since <code>"foo"</code> has an odd number of characters (3). The next element is <code>Nothing</code>, because <code>Nothing</code> maps to <code>Nothing</code>. The third element has <code>False</code> in the rightmost tuple element, since <code>"ba"</code> doesn't have an odd number of characters (it has 2).
</p>
<h3 id="8c6ca7bcdc554856bee94bd11981aa6f">
Relations to monads <a href="#8c6ca7bcdc554856bee94bd11981aa6f">#</a>
</h3>
<p>
A nested 'stack' of functors may remind you of the way that I prefer to teach <a href="/2022/03/28/monads">monads</a>: <em>A monad is a functor your can flatten</em>. In short, the definition is the ability to 'flatten' <code>F<F<T>></code> to <code>F<T></code>. A function that can do that is often called <code>join</code> or <code>Flatten</code>.
</p>
<p>
So far in this article, we've been looking at stacks of different functors, abstractly denoted <code>F<G<T>></code>. There's no rule, however, that says that <code>F</code> and <code>G</code> may not be the same. If <code>F = G</code> then <code>F<G<T>></code> is really <code>F<F<T>></code>. This starts to look like the <a href="https://en.wikipedia.org/wiki/Antecedent_(logic)">antecedent</a> of the monad definition.
</p>
<p>
While the starting point may be the same, these notions are not equivalent. Yes, <code>F<F<T>></code> <em>may</em> form a monad (if you can flatten it), but it does, universally, give rise to a functor. On the other hand, we can hardly talk about flattening <code>F<G<T>></code>, because that would imply that you'd have to somehow 'throw away' either <code>F</code> or <code>G</code>. There may be specific functors (e.g. Identity) for which this is possible, but there's no universal law to that effect.
</p>
<p>
Not all 'stacks' of functors are monads. <a href="/2022/03/28/monads">All monads, on the other hand, are functors</a>.
</p>
<h3 id="14f39729b7ab426e83a35a067cf8f3a1">
Conclusion <a href="#14f39729b7ab426e83a35a067cf8f3a1">#</a>
</h3>
<p>
A data structure that configures one type of functor inside of another functor itself forms a functor. The examples shown in this article are mostly constrained to two functors, but if you have a 'stack' of three, four, or more functors, that arrangement still gives rise to a functor.
</p>
<p>
This is useful to know, particularly if you're working in a language with only partial support for functors. Mainstream languages aren't going to automatically turn such stacks into functors, in the way that Haskell's <code>Compose</code> container almost does. Thus, knowing when you can safely give your generic types a <code>Select</code> method or <code>map</code> function may come in handy.
</p>
<p>
To be honest, though, this result is hardly the most important 'theorem' concerning stacks of functors. In reality, you often run into situations where you <em>do</em> have a stack of functors, but they're in the wrong order. You may have a collection of asynchronous tasks, but you really need an asynchronous task that contains a collection of values. The next article addresses that problem.
</p>
<p>
<strong>Next:</strong> <a href="/2024/11/11/traversals">Traversals</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Legacy Security Manager in Haskellhttps://blog.ploeh.dk/2024/10/21/legacy-security-manager-in-haskell2024-10-21T06:14:00+00:00Mark Seemann
<div id="post">
<p>
<em>A translation of the kata, and my first attempt at it.</em>
</p>
<p>
In early 2013 Richard Dalton published an article about <a href="https://www.devjoy.com/blog/legacy-code-katas/">legacy code katas</a>. The idea is to present a piece of 'legacy code' that you have to somehow refactor or improve. Of course, in order to make the exercise manageable, it's necessary to reduce it to some essence of what we might regard as legacy code. It'll only be one aspect of true legacy code. For the legacy Security Manager exercise, the main problem is that the code is difficult to unit test.
</p>
<p>
The original kata presents the 'legacy code' in C#, which may exclude programmers who aren't familiar with that language and platform. Since I find the exercise useful, I've previous published <a href="https://github.com/ploeh/SecurityManagerPython">a port to Python</a>. In this article, I'll port the exercise to <a href="https://www.haskell.org/">Haskell</a>, as well as walk through one attempt at achieving the goals of the kata.
</p>
<h3 id="03ee8805b5a44e77b92f9f6d132513bf">
The legacy code <a href="#03ee8805b5a44e77b92f9f6d132513bf">#</a>
</h3>
<p>
The original C# code is a <code>static</code> procedure that uses the <a href="https://learn.microsoft.com/dotnet/api/system.console">Console</a> API to ask a user a few simple questions, do some basic input validation, and print a message to the standard output stream. That's easy enough to port to Haskell:
</p>
<p>
<pre><span style="color:blue;">module</span> SecurityManager (<span style="color:#2b91af;">createUser</span>) <span style="color:blue;">where</span>
<span style="color:blue;">import</span> Text.Printf (<span style="color:#2b91af;">printf</span>)
<span style="color:#2b91af;">createUser</span> <span style="color:blue;">::</span> <span style="color:#2b91af;">IO</span> ()
createUser = <span style="color:blue;">do</span>
<span style="color:blue;">putStrLn</span> <span style="color:#a31515;">"Enter a username"</span>
username <- <span style="color:blue;">getLine</span>
<span style="color:blue;">putStrLn</span> <span style="color:#a31515;">"Enter your full name"</span>
fullName <- <span style="color:blue;">getLine</span>
<span style="color:blue;">putStrLn</span> <span style="color:#a31515;">"Enter your password"</span>
password <- <span style="color:blue;">getLine</span>
<span style="color:blue;">putStrLn</span> <span style="color:#a31515;">"Re-enter your password"</span>
confirmPassword <- <span style="color:blue;">getLine</span>
<span style="color:blue;">if</span> password /= confirmPassword
<span style="color:blue;">then</span>
<span style="color:blue;">putStrLn</span> <span style="color:#a31515;">"The passwords don't match"</span>
<span style="color:blue;">else</span>
<span style="color:blue;">if</span> <span style="color:blue;">length</span> password < 8
<span style="color:blue;">then</span>
<span style="color:blue;">putStrLn</span> <span style="color:#a31515;">"Password must be at least 8 characters in length"</span>
<span style="color:blue;">else</span> <span style="color:blue;">do</span>
<span style="color:green;">-- Encrypt the password (just reverse it, should be secure)
</span> <span style="color:blue;">let</span> array = <span style="color:blue;">reverse</span> password
<span style="color:blue;">putStrLn</span> $
printf <span style="color:#a31515;">"Saving Details for User (%s, %s, %s)"</span> username fullName array</pre>
</p>
<p>
Notice how the Haskell code seems to suffer slightly from the <a href="https://wiki.c2.com/?ArrowAntiPattern">Arrow code smell</a>, which is a problem that the C# code actually doesn't exhibit. The reason is that when using Haskell in an 'imperative style' (which you can, after a fashion, with <code>do</code> notation), you can't 'exit early' from a an <code>if</code> check. The problem is that you can't have <code>if</code>-<code>then</code> without <code>else</code>.
</p>
<p>
Haskell has other language features that enable you to get rid of Arrow code, but in the spirit of the exercise, this would take us too far away from the original C# code. Making the code prettier should be a task for the refactoring exercise, rather than the starting point.
</p>
<p>
I've <a href="https://github.com/ploeh/SecurityManagerHaskell">published the code to GitHub</a>, if you want a leg up.
</p>
<p>
Combined with Richard Dalton's original article, that's all you need to try your hand at the exercise. In the rest of this article, I'll go through my own attempt at the exercise. That said, while this was my first attempt at the Haskell version of it, I've done it multiple times in C#, and once in <a href="https://www.python.org/">Python</a>. In other words, this isn't my first rodeo.
</p>
<h3 id="b5098b724e8443c4afeaa56e92c2f0d2">
Break the dependency on the Console <a href="#b5098b724e8443c4afeaa56e92c2f0d2">#</a>
</h3>
<p>
As warned, the rest of the article is a walkthrough of the exercise, so if you'd like to try it yourself, stop reading now. On the other hand, if you want to read on, but follow along in the GitHub repository, I've pushed the rest of the code to a branch called <code>first-pass</code>.
</p>
<p>
The first part of the exercise is to <em>break the dependency on the console</em>. In a language like Haskell where functions are first-class citizens, this part is trivial. I removed the type declaration, moved <code>putStrLn</code> and <code>getLine</code> to parameters and renamed them. Finally, I asked the compiler what the new type is, and added the new type signature.
</p>
<p>
<pre><span style="color:blue;">import</span> Text.Printf (<span style="color:#2b91af;">printf</span>, <span style="color:blue;">IsChar</span>)
<span style="color:#2b91af;">createUser</span> <span style="color:blue;">::</span> (<span style="color:blue;">Monad</span> m, <span style="color:blue;">Eq</span> a, <span style="color:blue;">IsChar</span> a) <span style="color:blue;">=></span> (<span style="color:#2b91af;">String</span> <span style="color:blue;">-></span> m ()) <span style="color:blue;">-></span> m [a] <span style="color:blue;">-></span> m ()
createUser writeLine readLine = <span style="color:blue;">do</span>
writeLine <span style="color:#a31515;">"Enter a username"</span>
username <- readLine
writeLine <span style="color:#a31515;">"Enter your full name"</span>
fullName <- readLine
writeLine <span style="color:#a31515;">"Enter your password"</span>
password <- readLine
writeLine <span style="color:#a31515;">"Re-enter your password"</span>
confirmPassword <- readLine
<span style="color:blue;">if</span> password /= confirmPassword
<span style="color:blue;">then</span>
writeLine <span style="color:#a31515;">"The passwords don't match"</span>
<span style="color:blue;">else</span>
<span style="color:blue;">if</span> <span style="color:blue;">length</span> password < 8
<span style="color:blue;">then</span>
writeLine <span style="color:#a31515;">"Password must be at least 8 characters in length"</span>
<span style="color:blue;">else</span> <span style="color:blue;">do</span>
<span style="color:green;">-- Encrypt the password (just reverse it, should be secure)
</span> <span style="color:blue;">let</span> array = <span style="color:blue;">reverse</span> password
writeLine $
printf <span style="color:#a31515;">"Saving Details for User (%s, %s, %s)"</span> username fullName array</pre>
</p>
<p>
I also changed the <code>main</code> action of the program to pass <code>putStrLn</code> and <code>getLine</code> as arguments:
</p>
<p>
<pre><span style="color:blue;">import</span> SecurityManager (<span style="color:#2b91af;">createUser</span>)
<span style="color:#2b91af;">main</span> <span style="color:blue;">::</span> <span style="color:#2b91af;">IO</span> ()
main = createUser <span style="color:blue;">putStrLn</span> <span style="color:blue;">getLine</span></pre>
</p>
<p>
Manual testing indicates that I didn't break any functionality.
</p>
<h3 id="53e3144fa5b04528a8d54ae035dc40b8">
Get the password comparison feature under test <a href="#53e3144fa5b04528a8d54ae035dc40b8">#</a>
</h3>
<p>
The next task is to <em>get the password comparison feature under test</em>. Over a small series of Git commits, I added these <a href="/2018/05/07/inlined-hunit-test-lists">inlined, parametrized HUnit tests</a>:
</p>
<p>
<pre><span style="color:#a31515;">"Matching passwords"</span> ~: <span style="color:blue;">do</span>
pw <- [<span style="color:#a31515;">"password"</span>, <span style="color:#a31515;">"12345678"</span>, <span style="color:#a31515;">"abcdefgh"</span>]
<span style="color:blue;">let</span> actual = comparePasswords pw pw
<span style="color:blue;">return</span> $ Right pw ~=? actual
,
<span style="color:#a31515;">"Non-matching passwords"</span> ~: <span style="color:blue;">do</span>
(pw1, pw2) <-
[
(<span style="color:#a31515;">"password"</span>, <span style="color:#a31515;">"PASSWORD"</span>),
(<span style="color:#a31515;">"12345678"</span>, <span style="color:#a31515;">"12345677"</span>),
(<span style="color:#a31515;">"abcdefgh"</span>, <span style="color:#a31515;">"bacdefgh"</span>),
(<span style="color:#a31515;">"aaa"</span>, <span style="color:#a31515;">"bbb"</span>)
]
<span style="color:blue;">let</span> actual = comparePasswords pw1 pw2
<span style="color:blue;">return</span> $ Left <span style="color:#a31515;">"The passwords don't match"</span> ~=? actual</pre>
</p>
<p>
The resulting implementation is this <code>comparePasswords</code> function:
</p>
<p>
<pre><span style="color:#2b91af;">comparePasswords</span> <span style="color:blue;">::</span> <span style="color:#2b91af;">String</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">String</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Either</span> <span style="color:#2b91af;">String</span> <span style="color:#2b91af;">String</span>
comparePasswords pw1 pw2 =
<span style="color:blue;">if</span> pw1 == pw2
<span style="color:blue;">then</span> Right pw1
<span style="color:blue;">else</span> Left <span style="color:#a31515;">"The passwords don't match"</span></pre>
</p>
<p>
You'll notice that I chose to implement it as an <code>Either</code>-valued function. While I consider <a href="/2020/12/14/validation-a-solved-problem">validation a solved problem</a>, the usual solution involves some <a href="/2018/11/05/applicative-validation">applicative validation</a> container. In this exercise, validation is already short-circuiting, which means that we can use the standard monadic composition that <code>Either</code> affords.
</p>
<p>
At this point in the exercise, I just left the <code>comparePasswords</code> function there, without trying to use it within <code>createUser</code>. The reason for that is that <code>Either</code>-based composition is sufficiently different from <code>if</code>-<code>then</code>-<code>else</code> code that I wanted to get the entire system under test before I attempted that.
</p>
<h3 id="a1dc5d33f8eb4d5b80d015b197d1afc3">
Get the password validation feature under test <a href="#a1dc5d33f8eb4d5b80d015b197d1afc3">#</a>
</h3>
<p>
The third task of the exercise is to <em>get the password validation feature under test</em>. That's similar to the previous task. Once more, I'll show the tests first, and then the function driven by those tests, but I want to point out that both code artefacts came iteratively into existence through the usual <a href="/2019/10/21/a-red-green-refactor-checklist">red-green-refactor</a> cycle.
</p>
<p>
<pre><span style="color:#a31515;">"Validate short password"</span> ~: <span style="color:blue;">do</span>
pw <- [<span style="color:#a31515;">""</span>, <span style="color:#a31515;">"1"</span>, <span style="color:#a31515;">"12"</span>, <span style="color:#a31515;">"abc"</span>, <span style="color:#a31515;">"1234"</span>, <span style="color:#a31515;">"gtrex"</span>, <span style="color:#a31515;">"123456"</span>, <span style="color:#a31515;">"1234567"</span>]
<span style="color:blue;">let</span> actual = validatePassword pw
<span style="color:blue;">return</span> $ Left <span style="color:#a31515;">"Password must be at least 8 characters in length"</span> ~=? actual
,
<span style="color:#a31515;">"Validate long password"</span> ~: <span style="color:blue;">do</span>
pw <- [<span style="color:#a31515;">"12345678"</span>, <span style="color:#a31515;">"123456789"</span>, <span style="color:#a31515;">"abcdefghij"</span>, <span style="color:#a31515;">"elevenchars"</span>]
<span style="color:blue;">let</span> actual = validatePassword pw
<span style="color:blue;">return</span> $ Right pw ~=? actual</pre>
</p>
<p>
The resulting function is hardly surprising.
</p>
<p>
<pre><span style="color:#2b91af;">validatePassword</span> <span style="color:blue;">::</span> <span style="color:#2b91af;">String</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Either</span> <span style="color:#2b91af;">String</span> <span style="color:#2b91af;">String</span>
validatePassword pw =
<span style="color:blue;">if</span> <span style="color:blue;">length</span> pw < 8
<span style="color:blue;">then</span> Left <span style="color:#a31515;">"Password must be at least 8 characters in length"</span>
<span style="color:blue;">else</span> Right pw</pre>
</p>
<p>
As in the previous step, I chose to postpone <em>using</em> this function from within <code>createUser</code> until I had a set of characterization tests. That may not be entirely in the spirit of the four subtasks of the exercise, but on the other hand, I intended to do more than just those four activities. The code here is actually simple enough that I could easily refactor without full test coverage, but recalling that this is a legacy code exercise, I find it warranted to <em>pretend</em> that it's complicated.
</p>
<p>
To be fair to the exercise, there'd <em>also</em> be a valuable exercise in attempting to extract each feature piecemeal, because it's not alway possible to add complete characterization test coverage to a piece of gnarly legacy code. Be that as it may, I've already done that kind of exercise in C# a few times, and I had a different agenda for the Haskell exercise. In short, I was curious about what sort of inferred type <code>createUser</code> would have, once I'd gone through all four subtasks. I'll return to that topic in a moment. First, I want to address the fourth subtask.
</p>
<h3 id="dc17b82e5e374cce8d59e2791eadfdfb">
Allow different encryption algorithms to be used <a href="#dc17b82e5e374cce8d59e2791eadfdfb">#</a>
</h3>
<p>
The final part of the exercise is to <em>add a feature to allow different encryption algorithms to be used</em>. Once again, when you're working in a language where functions are first-class citizens, and <a href="https://en.wikipedia.org/wiki/Higher-order_function">higher-order functions</a> are <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a>, one solution is easily at hand:
</p>
<p>
<pre><span style="color:#2b91af;">createUser</span> <span style="color:blue;">::</span> (<span style="color:blue;">Monad</span> m, <span style="color:blue;">Foldable</span> t, <span style="color:blue;">Eq</span> (t a), <span style="color:blue;">PrintfArg</span> (t a), <span style="color:blue;">PrintfArg</span> b)
<span style="color:blue;">=></span> (<span style="color:#2b91af;">String</span> <span style="color:blue;">-></span> m ()) <span style="color:blue;">-></span> m (t a) <span style="color:blue;">-></span> (t a <span style="color:blue;">-></span> b) <span style="color:blue;">-></span> m ()
createUser writeLine readLine encrypt = <span style="color:blue;">do</span>
writeLine <span style="color:#a31515;">"Enter a username"</span>
username <- readLine
writeLine <span style="color:#a31515;">"Enter your full name"</span>
fullName <- readLine
writeLine <span style="color:#a31515;">"Enter your password"</span>
password <- readLine
writeLine <span style="color:#a31515;">"Re-enter your password"</span>
confirmPassword <- readLine
<span style="color:blue;">if</span> password /= confirmPassword
<span style="color:blue;">then</span>
writeLine <span style="color:#a31515;">"The passwords don't match"</span>
<span style="color:blue;">else</span>
<span style="color:blue;">if</span> <span style="color:blue;">length</span> password < 8
<span style="color:blue;">then</span>
writeLine <span style="color:#a31515;">"Password must be at least 8 characters in length"</span>
<span style="color:blue;">else</span> <span style="color:blue;">do</span>
<span style="color:blue;">let</span> array = encrypt password
writeLine $
printf <span style="color:#a31515;">"Saving Details for User (%s, %s, %s)"</span> username fullName array</pre>
</p>
<p>
The only change I've made is to promote <code>encrypt</code> to a parameter. This, of course, ripples through the code that calls the action, but currently, that's only the <code>main</code> action, where I had to add <code>reverse</code> as a third argument:
</p>
<p>
<pre><span style="color:#2b91af;">main</span> <span style="color:blue;">::</span> <span style="color:#2b91af;">IO</span> ()
main = createUser <span style="color:blue;">putStrLn</span> <span style="color:blue;">getLine</span> <span style="color:blue;">reverse</span></pre>
</p>
<p>
Before I made the change, I removed the type annotation from <code>createUser</code>, because adding a parameter causes the type to change. Keeping the type annotation would have caused a compilation error. Eschewing type annotations makes it easier to make changes. Once I'd made the change, I added the new annotation, inferred by the <a href="https://marketplace.visualstudio.com/items?itemName=haskell.haskell">Haskell Visual Studio Code extension</a>.
</p>
<p>
I was curious what kind of abstraction would arise. Would it be testable in some way?
</p>
<h3 id="da305705261f4c1fae7842d204097c6b">
Testability <a href="#da305705261f4c1fae7842d204097c6b">#</a>
</h3>
<p>
Consider the inferred type of <code>createUser</code> above. It's quite abstract, and I was curious if it was flexible enough to allow testability without adding <a href="https://dhh.dk/2014/test-induced-design-damage.html">test-induced damage</a>. In short, in object-oriented programming, you often need to add Dependency Injection to make code testable, and the valid criticism is that this makes code more complicated than it would otherwise have been. I consider such reproval justified, although I disagree with the conclusion. It's not the desire for testability that causes the damage, but rather that object-oriented design is at odds with testability.
</p>
<p>
That's my conjecture, anyway, so I'm always curious when working with other paradigms like functional programming. Is idiomatic code already testable, or do you need to 'do damage to it' in order to make it testable?
</p>
<p>
As a Haskell action goes, I would consider its type fairly idiomatic. The code, too, is straightforward, although perhaps rather naive. It looks like beginner Haskell, and as we'll see later, we can rewrite it to be more elegant.
</p>
<p>
Before I started the exercise, I wondered whether it'd be necessary to <a href="/2017/07/11/hello-pure-command-line-interaction">use free monads to model pure command-line interactions</a>. Since <code>createUser</code> returns <code>m ()</code>, where <code>m</code> is any <code>Monad</code> instance, using a free monad would be possible, but turns out to be overkill. After having thought about it a bit, I recalled that in many languages and platforms, you can <a href="https://stackoverflow.com/a/2139303/126014">redirect <em>standard in</em> and <em>standard out</em> for testing purposes</a>. The way you do that is typically by replacing each with some kind of text stream. Based on that knowledge, I thought I could use <a href="/2022/06/20/the-state-monad">the State monad</a> for characterization testing, with a list of strings for each text stream.
</p>
<p>
In other words, the code is already testable as it is. No test-induced damage here.
</p>
<h3 id="ae4ba5da448b4e248cb63f124b135834">
Characterization tests <a href="#ae4ba5da448b4e248cb63f124b135834">#</a>
</h3>
<p>
To use the State monad, I started by importing <a href="https://hackage.haskell.org/package/transformers/docs/Control-Monad-Trans-State-Lazy.html">Control.Monad.Trans.State.Lazy</a> into my test code. This enabled me to write the first characterization test:
</p>
<p>
<pre><span style="color:#a31515;">"Happy path"</span> ~: <span style="color:blue;">flip</span> evalState
([<span style="color:#a31515;">"just.inhale"</span>, <span style="color:#a31515;">"Justin Hale"</span>, <span style="color:#a31515;">"12345678"</span>, <span style="color:#a31515;">"12345678"</span>], <span style="color:blue;">[]</span>) $ <span style="color:blue;">do</span>
<span style="color:blue;">let</span> writeLine x = modify (second (++ [x]))
<span style="color:blue;">let</span> readLine = state (\(i, o) -> (<span style="color:blue;">head</span> i, (<span style="color:blue;">tail</span> i, o)))
<span style="color:blue;">let</span> encrypt = <span style="color:blue;">reverse</span>
createUser writeLine readLine encrypt
actual <- gets <span style="color:blue;">snd</span>
<span style="color:blue;">let</span> expected = [
<span style="color:#a31515;">"Enter a username"</span>,
<span style="color:#a31515;">"Enter your full name"</span>,
<span style="color:#a31515;">"Enter your password"</span>,
<span style="color:#a31515;">"Re-enter your password"</span>,
<span style="color:#a31515;">"Saving Details for User (just.inhale, Justin Hale, 87654321)"</span>]
<span style="color:blue;">return</span> $ expected ~=? actual</pre>
</p>
<p>
I consulted my earlier code from <a href="/2019/03/11/an-example-of-state-based-testing-in-haskell">An example of state-based testing in Haskell</a> instead of reinventing the wheel, so if you want a more detailed walkthrough, you may want to consult that article as well as this one.
</p>
<p>
The type of the state that the test makes use of is <code>([String], [String])</code>. As the lambda expression suggests by naming the elements <code>i</code> and <code>o</code>, the two string lists are used for respectively input and output. The test starts with an 'input stream' populated by 'user input' values, corresponding to each of the four answers a user might give to the questions asked.
</p>
<p>
The <code>readLine</code> function works by pulling the <code>head</code> off the input list <code>i</code>, while on the other hand not touching the output list <code>o</code>. Its type is <code>State ([a], b) a</code>, compatible with <code>createUser</code>, which requires its <code>readLine</code> parameter to have the type <code>m (t a)</code>, where <code>m</code> is a <code>Monad</code> instance, and <code>t</code> a <code>Foldable</code> instance. The effective type turns out to be <code>t a ~ [Char] = String</code>, so that <code>readLine</code> effectively has the type <code>State ([String], b) String</code>. Since <code>State ([String], b)</code> is a <code>Monad</code> instance, it fits the <code>m</code> type argument of the requirement.
</p>
<p>
The same kind of reasoning applies to <code>writeLine</code>, which appends the input value to the 'output stream', which is the second list in the I/O tuple.
</p>
<p>
The test runs the <code>createUser</code> action and then checks that the output list contains the <code>expected</code> values.
</p>
<p>
A similar test verifies the behaviour when the passwords don't match:
</p>
<p>
<pre><span style="color:#a31515;">"Mismatched passwords"</span> ~: <span style="color:blue;">flip</span> evalState
([<span style="color:#a31515;">"i.lean.right"</span>, <span style="color:#a31515;">"Ilene Wright"</span>, <span style="color:#a31515;">"password"</span>, <span style="color:#a31515;">"Password"</span>], <span style="color:blue;">[]</span>) $ <span style="color:blue;">do</span>
<span style="color:blue;">let</span> writeLine x = modify (second (++ [x]))
<span style="color:blue;">let</span> readLine = state (\(i, o) -> (<span style="color:blue;">head</span> i, (<span style="color:blue;">tail</span> i, o)))
<span style="color:blue;">let</span> encrypt = <span style="color:blue;">reverse</span>
createUser writeLine readLine encrypt
actual <- gets <span style="color:blue;">snd</span>
<span style="color:blue;">let</span> expected = [
<span style="color:#a31515;">"Enter a username"</span>,
<span style="color:#a31515;">"Enter your full name"</span>,
<span style="color:#a31515;">"Enter your password"</span>,
<span style="color:#a31515;">"Re-enter your password"</span>,
<span style="color:#a31515;">"The passwords don't match"</span>]
<span style="color:blue;">return</span> $ expected ~=? actual</pre>
</p>
<p>
You can see the third and final characterization test in the GitHub repository.
</p>
<h3 id="ba7601efc69a4b929e738396588dc69a">
Refactored action <a href="#ba7601efc69a4b929e738396588dc69a">#</a>
</h3>
<p>
With <a href="/2015/11/16/code-coverage-is-a-useless-target-measure">full test coverage</a> I could proceed to refactor the <code>createUser</code> action, pulling in the two functions I'd test-driven into existence earlier:
</p>
<p>
<pre><span style="color:#2b91af;">createUser</span> <span style="color:blue;">::</span> (<span style="color:blue;">Monad</span> m, <span style="color:blue;">PrintfArg</span> a)
<span style="color:blue;">=></span> (<span style="color:#2b91af;">String</span> <span style="color:blue;">-></span> m ()) <span style="color:blue;">-></span> m <span style="color:#2b91af;">String</span> <span style="color:blue;">-></span> (<span style="color:#2b91af;">String</span> <span style="color:blue;">-></span> a) <span style="color:blue;">-></span> m ()
createUser writeLine readLine encrypt = <span style="color:blue;">do</span>
writeLine <span style="color:#a31515;">"Enter a username"</span>
username <- readLine
writeLine <span style="color:#a31515;">"Enter your full name"</span>
fullName <- readLine
writeLine <span style="color:#a31515;">"Enter your password"</span>
password <- readLine
writeLine <span style="color:#a31515;">"Re-enter your password"</span>
confirmPassword <- readLine
writeLine $ either
<span style="color:blue;">id</span>
(printf <span style="color:#a31515;">"Saving Details for User (%s, %s, %s)"</span> username fullName . encrypt)
(validatePassword =<< comparePasswords password confirmPassword)</pre>
</p>
<p>
Because <code>createUser</code> now calls <code>comparePasswords</code> and <code>validatePassword</code>, the type of the overall composition is also more concrete. That's really just an artefact of my (misguided?) decision to give each of the two helper functions types that are more concrete than necessary.
</p>
<p>
As you can see, I left the initial call-and-response sequence intact, since I didn't feel that it needed improvement.
</p>
<h3 id="5dcbfa4c67c64780a76dc380fb64b138">
Conclusion <a href="#5dcbfa4c67c64780a76dc380fb64b138">#</a>
</h3>
<p>
I ported the Legacy Security Manager kata to Haskell because I thought it'd be easy enough to port the code itself, and I also found the exercise compelling enough in its own right.
</p>
<p>
The most interesting point, I think, is that the <code>createUser</code> action remains testable without making any other concession to testability than turning it into a higher-order function. For pure functions, we would expect this to be the case, since <a href="/2015/05/07/functional-design-is-intrinsically-testable">pure functions are intrinsically testable</a>, but for impure actions like <code>createUser</code>, this isn't a given. Interacting exclusively with the command-line API is, however, sufficiently simple that we can get by with the State monad. No free monad is needed, and so test-induced damage is kept at a minimum.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Functor sumshttps://blog.ploeh.dk/2024/10/14/functor-sums2024-10-14T18:26:00+00:00Mark Seemann
<div id="post">
<p>
<em>A choice of two or more functors gives rise to a functor. An article for object-oriented programmers.</em>
</p>
<p>
This article is part of <a href="/2022/07/11/functor-relationships">a series of articles about functor relationships</a>. In this one you'll learn about a universal composition of functors. In short, if you have a <a href="https://en.wikipedia.org/wiki/Tagged_union">sum type</a> of functors, that data structure itself gives rise to a functor.
</p>
<p>
Together with other articles in this series, this result can help you answer questions such as: <em>Does this data structure form a functor?</em>
</p>
<p>
Since <a href="/2018/03/22/functors">functors</a> tend to be quite common, and since they're useful enough that many programming languages have special support or syntax for them, the ability to recognize a potential functor can be useful. Given a type like <code>Foo<T></code> (C# syntax) or <code>Bar<T1, T2></code>, being able to recognize it as a functor can come in handy. One scenario is if you yourself have just defined this data type. Recognizing that it's a functor strongly suggests that you should give it a <code>Select</code> method in C#, a <code>map</code> function in <a href="https://fsharp.org/">F#</a>, and so on.
</p>
<p>
Not all generic types give rise to a (covariant) functor. Some are rather <a href="/2021/09/02/contravariant-functors">contravariant functors</a>, and some are <a href="/2022/08/01/invariant-functors">invariant</a>.
</p>
<p>
If, on the other hand, you have a data type which is a sum of two or more (covariant) functors <em>with the same type parameter</em>, then the data type itself gives rise to a functor. You'll see some examples in this article.
</p>
<h3 id="fd1c2960d14946008a49b07698151647">
Abstract shape in F# <a href="#fd1c2960d14946008a49b07698151647">#</a>
</h3>
<p>
Before we look at some examples found in other code, it helps if we know what we're looking for. You'll see a C# example in a minute, but since sum types require so much <a href="/2019/12/16/zone-of-ceremony">ceremony</a> in C#, we'll make a brief detour around F#.
</p>
<p>
Imagine that you have two lawful functors, <code>F</code> and <code>G</code>. Also imagine that you have a data structure that holds either an <code><span style="color:#2b91af;">F</span><<span style="color:#2b91af;">'a</span>></code> value or a <code><span style="color:#2b91af;">G</span><<span style="color:#2b91af;">'a</span>></code> value:
</p>
<p>
<pre><span style="color:blue;">type</span> <span style="color:#2b91af;">FOrG</span><<span style="color:#2b91af;">'a</span>> = <span style="color:#2b91af;">FOrGF</span> <span style="color:blue;">of</span> <span style="color:#2b91af;">F</span><<span style="color:#2b91af;">'a</span>> | <span style="color:#2b91af;">FOrGG</span> <span style="color:blue;">of</span> <span style="color:#2b91af;">G</span><<span style="color:#2b91af;">'a</span>></pre>
</p>
<p>
The name of the type is <code>FOrG</code>. In the <code>FOrGF</code> case, it holds an <code><span style="color:#2b91af;">F</span><<span style="color:#2b91af;">'a</span>></code> value, and in the <code>FOrGG</code> case it holds a <code><span style="color:#2b91af;">G</span><<span style="color:#2b91af;">'a</span>></code> value.
</p>
<p>
The point of this article is that since both <code>F</code> and <code>G</code> are (lawful) functors, then <code>FOrG</code> also gives rise to a functor. The composed <code>map</code> function can pattern-match on each case and call the respective <code>map</code> function that belongs to each of the two functors.
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:#74531f;">map</span> <span style="color:#74531f;">f</span> <span style="font-weight:bold;color:#1f377f;">forg</span> =
<span style="color:blue;">match</span> <span style="font-weight:bold;color:#1f377f;">forg</span> <span style="color:blue;">with</span>
| <span style="color:#2b91af;">FOrGF</span> <span style="font-weight:bold;color:#1f377f;">fa</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">FOrGF</span> (<span style="color:#2b91af;">F</span>.<span style="color:#74531f;">map</span> <span style="color:#74531f;">f</span> <span style="font-weight:bold;color:#1f377f;">fa</span>)
| <span style="color:#2b91af;">FOrGG</span> <span style="font-weight:bold;color:#1f377f;">ga</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">FOrGG</span> (<span style="color:#2b91af;">G</span>.<span style="color:#74531f;">map</span> <span style="color:#74531f;">f</span> <span style="font-weight:bold;color:#1f377f;">ga</span>)</pre>
</p>
<p>
For clarity I've named the values <code>fa</code> indicating <em>f of a</em> and <code>ga</code> indicating <em>g of a</em>.
</p>
<p>
Notice that it's an essential requirement that the individual functors (here <code>F</code> and <code>G</code>) are parametrized by the same type parameter (here <code>'a</code>). If your data structure contains <code><span style="color:#2b91af;">F</span><<span style="color:#2b91af;">'a</span>></code> and <code><span style="color:#2b91af;">G</span><<span style="color:#2b91af;">'b</span>></code>, the 'theorem' doesn't apply.
</p>
<h3 id="9ff2f85804104bf192941ec8634757b6">
Abstract shape in C# <a href="#9ff2f85804104bf192941ec8634757b6">#</a>
</h3>
<p>
The same kind of abstract shape requires much more boilerplate in C#. When defining a sum type in a language that doesn't support them, we may instead either <a href="/2018/06/25/visitor-as-a-sum-type">turn to the Visitor design pattern</a> or alternatively use <a href="/2018/05/22/church-encoding">Church encoding</a>. While the two are isomorphic, Church encoding is a bit simpler while the <a href="https://en.wikipedia.org/wiki/Visitor_pattern">Visitor pattern</a> seems more object-oriented. In this example I've chosen the simplicity of Church encoding.
</p>
<p>
Like in the above F# code, I've named the data structure the same, but it's now a class:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FOrG</span><<span style="color:#2b91af;">T</span>></pre>
</p>
<p>
Two constructors enable you to initialize it with either an <code><span style="color:#2b91af;">F</span><<span style="color:#2b91af;">T</span>></code> or a <code><span style="color:#2b91af;">G</span><<span style="color:#2b91af;">T</span>></code> value.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">FOrG</span>(<span style="color:#2b91af;">F</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">f</span>)
<span style="color:blue;">public</span> <span style="color:#2b91af;">FOrG</span>(<span style="color:#2b91af;">G</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">g</span>)</pre>
</p>
<p>
Notice that <code><span style="color:#2b91af;">F</span><<span style="color:#2b91af;">T</span>></code> and <code><span style="color:#2b91af;">G</span><<span style="color:#2b91af;">T</span>></code> share the same type parameter <code>T</code>. If a class had, instead, composed either <code><span style="color:#2b91af;">F</span><<span style="color:#2b91af;">T1</span>></code> or <code><span style="color:#2b91af;">G</span><<span style="color:#2b91af;">T2</span>></code>, the 'theorem' doesn't apply.
</p>
<p>
Finally, a <code>Match</code> method completes the Church encoding.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">TResult</span> <span style="font-weight:bold;color:#74531f;">Match</span><<span style="color:#2b91af;">TResult</span>>(
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">F</span><<span style="color:#2b91af;">T</span>>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenF</span>,
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">G</span><<span style="color:#2b91af;">T</span>>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenG</span>)</pre>
</p>
<p>
Regardless of exactly what <code>F</code> and <code>G</code> are, you can add a <code>Select</code> method to <code><span style="color:#2b91af;">FOrG</span><<span style="color:#2b91af;">T</span>></code> like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">FOrG</span><<span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#74531f;">Select</span><<span style="color:#2b91af;">TResult</span>>(<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">selector</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#74531f;">Match</span>(
<span style="font-weight:bold;color:#1f377f;">whenF</span>: <span style="font-weight:bold;color:#1f377f;">f</span> => <span style="color:blue;">new</span> <span style="color:#2b91af;">FOrG</span><<span style="color:#2b91af;">TResult</span>>(<span style="font-weight:bold;color:#1f377f;">f</span>.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>)),
<span style="font-weight:bold;color:#1f377f;">whenG</span>: <span style="font-weight:bold;color:#1f377f;">g</span> => <span style="color:blue;">new</span> <span style="color:#2b91af;">FOrG</span><<span style="color:#2b91af;">TResult</span>>(<span style="font-weight:bold;color:#1f377f;">g</span>.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>)));
}</pre>
</p>
<p>
Since we assume that <code>F</code> and <code>G</code> are functors, which in C# <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatically</a> have a <code>Select</code> method, we pass the <code>selector</code> to their respective <code>Select</code> methods. <code>f.Select</code> returns a new <code>F</code> value, while <code>g.Select</code> returns a new <code>G</code> value, but there's a constructor for each case, so the composed <code>Select</code> method repackages those return values in <code><span style="color:blue;">new</span> <span style="color:#2b91af;">FOrG</span><<span style="color:#2b91af;">TResult</span>></code> objects.
</p>
<p>
I'll have more to say about how this generalizes to a sum of more than two alternatives, but first, let's consider some examples.
</p>
<h3 id="03a6f1ef94ca4ca2927b38d95e34c31f">
Open or closed endpoints <a href="#03a6f1ef94ca4ca2927b38d95e34c31f">#</a>
</h3>
<p>
The simplest example that I can think of is that of <a href="/2024/01/01/variations-of-the-range-kata">range</a> endpoints. A range may be open, closed, or a mix thereof. Some mathematical notations use <code>(1, 6]</code> to indicate the range between 1 and 6, where 1 is excluded from the range, but 6 is included. An alternative notation is <code>]1, 6]</code>.
</p>
<p>
A given endpoint (1 and 6, above) is either open or closed, which implies a sum type. <a href="/2024/01/15/a-range-kata-implementation-in-f">In F# I defined it like this</a>:
</p>
<p>
<pre><span style="color:blue;">type</span> Endpoint<'a> = Open <span style="color:blue;">of</span> 'a | Closed <span style="color:blue;">of</span> 'a</pre>
</p>
<p>
If you're at all familiar with F#, this is clearly a <a href="https://learn.microsoft.com/dotnet/fsharp/language-reference/discriminated-unions">discriminated union</a>, which is just what the F# documentation calls sum types.
</p>
<p>
The article <a href="/2024/02/12/range-as-a-functor">Range as a functor</a> goes through examples in both <a href="https://www.haskell.org/">Haskell</a>, F#, and C#, demonstrating, among other points, how an endpoint sum type forms a functor.
</p>
<h3 id="9cf974abd1fb497aa43087e7697bb982">
Binary tree <a href="#9cf974abd1fb497aa43087e7697bb982">#</a>
</h3>
<p>
The next example we'll consider is the binary tree from <a href="/2024/09/09/a-binary-tree-zipper-in-c">A Binary Tree Zipper in C#</a>. In the <a href="https://learnyouahaskell.com/zippers">original Haskell Zippers article</a>, the data type is defined like this:
</p>
<p>
<pre><span style="color:blue;">data</span> Tree a = Empty | Node a (Tree a) (Tree a) <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Show</span>)</pre>
</p>
<p>
Even if you're not familiar with Haskell syntax, the vertical bar (<code>|</code>) indicates a choice between the left-hand side and the right-hand side. Many programming languages use the <code>|</code> character for Boolean disjunction (<em>or</em>), so the syntax should be intuitive. In this definition, a binary tree is either empty or a node with a value and two subtrees. What interests us here is that it's a sum type.
</p>
<p>
One way this manifests in C# is in the choice of two alternative constructors:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">BinaryTree</span>() : <span style="color:blue;">this</span>(<span style="color:#2b91af;">Empty</span>.Instance)
{
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">BinaryTree</span>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">value</span>, <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">left</span>, <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">right</span>)
: <span style="color:blue;">this</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">Node</span>(<span style="font-weight:bold;color:#1f377f;">value</span>, <span style="font-weight:bold;color:#1f377f;">left</span>.root, <span style="font-weight:bold;color:#1f377f;">right</span>.root))
{
}</pre>
</p>
<p>
<code><span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>></code> clearly has a generic type parameter. Does the class give rise to a functor?
</p>
<p>
It does if it's composed from a sum of two functors. Is that the case?
</p>
<p>
On the 'left' side, it seems that we have nothing. In the Haskell code, it's called <code>Empty</code>. In the C# code, this case is represented by the parameterless constructor (also known as the <em>default constructor</em>). There's no <code>T</code> there, so that doesn't look much like a functor.
</p>
<p>
All is, however, not lost. We may view this lack of data as a particular value ('nothing') wrapped in <a href="/2024/10/07/the-const-functor">the Const functor</a>. In Haskell and F# a value without data is called <em>unit</em> and written <code>()</code>. In C# or <a href="https://www.java.com/">Java</a> you may <a href="/2018/01/15/unit-isomorphisms">think of it as void</a>, although <em>unit</em> is a value that you can pass around, which isn't the case for <code>void</code>.
</p>
<p>
In Haskell, we could instead represent <code>Empty</code> as <code>Const ()</code>, which is a bona-fide <code>Functor</code> instance that you can <code>fmap</code>:
</p>
<p>
<pre>ghci> emptyNode = Const ()
ghci> fmap (+1) emptyNode
Const ()</pre>
</p>
<p>
This examples pretends to 'increment' a number that isn't there. Not that you'd need to do this. I'm only showing you this to make the argument that the empty node forms a functor.
</p>
<p>
The 'right' side of the sum type is most succinctly summarized by the Haskell code:
</p>
<p>
<pre>Node a (Tree a) (Tree a)</pre>
</p>
<p>
It's a 'naked' generic value and two generic trees. In C# it's the parameter list
</p>
<p>
<pre>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">value</span>, <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">left</span>, <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">right</span>)</pre>
</p>
<p>
Does that make a functor? Yes, it's a triple of a 'naked' generic value and two recursive subtrees, all sharing the same <code>T</code>. Just like in the <a href="/2024/09/16/functor-products">previous article</a> we can view a 'naked' generic value as equivalent to <a href="/2018/09/03/the-identity-functor">the Identity functor</a>, so that parameter is a functor. The other ones are recursive types: They are of the same type as the type we're trying to evaluate, <code><span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>></code>. If we assume that that forms a functor, that triple is a product type of functors. From the previous article, we know that that gives rise to a functor.
</p>
<p>
This means that in C#, for example, you can add the idiomatic <code>Select</code> method:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#74531f;">Select</span><<span style="color:#2b91af;">TResult</span>>(<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">selector</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#74531f;">Aggregate</span>(
<span style="font-weight:bold;color:#1f377f;">whenEmpty</span>: () => <span style="color:blue;">new</span> <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">TResult</span>>(),
<span style="font-weight:bold;color:#1f377f;">whenNode</span>: (<span style="font-weight:bold;color:#1f377f;">value</span>, <span style="font-weight:bold;color:#1f377f;">left</span>, <span style="font-weight:bold;color:#1f377f;">right</span>) =>
<span style="color:blue;">new</span> <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">TResult</span>>(<span style="font-weight:bold;color:#1f377f;">selector</span>(<span style="font-weight:bold;color:#1f377f;">value</span>), <span style="font-weight:bold;color:#1f377f;">left</span>, <span style="font-weight:bold;color:#1f377f;">right</span>));
}</pre>
</p>
<p>
In languages that support pattern-matching on sum types (such as F#), you'd have to match on each case and explicitly deal with the recursive mapping. Notice, however, that here I've used the <code>Aggregate</code> method to implement <code>Select</code>. The <code>Aggregate</code> method is the <code><span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>></code> class' <a href="/2019/04/29/catamorphisms">catamorphism</a>, and it already handles the recursion for us. In other words, <code>left</code> and <code>right</code> are already <code><span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">TResult</span>></code> objects.
</p>
<p>
What remains is only to tell <code>Aggregate</code> what to do when the tree is empty, and how to transform the 'naked' node <code>value</code>. The <code>Select</code> implementation handles the former by returning a new empty tree, and the latter by invoking <code><span style="font-weight:bold;color:#1f377f;">selector</span>(<span style="font-weight:bold;color:#1f377f;">value</span>)</code>.
</p>
<p>
Not only does the binary tree form a functor, but it turns out that the <a href="/2024/08/19/zippers">Zipper</a> does as well, because the breadcrumbs also give rise to a functor.
</p>
<h3 id="02e7e55d7f6f4c0d94c50cf577238859">
Breadcrumbs <a href="#02e7e55d7f6f4c0d94c50cf577238859">#</a>
</h3>
<p>
The <a href="https://learnyouahaskell.com/zippers">original Haskell Zippers article</a> defines a breadcrumb for the binary tree Zipper like this:
</p>
<p>
<pre><span style="color:blue;">data</span> Crumb a = LeftCrumb a (Tree a) | RightCrumb a (Tree a) <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Show</span>)</pre>
</p>
<p>
That's another sum type with generics on the left as well as the right. In C# the two options may be best illustrated by these two creation methods:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">Crumb</span><<span style="color:#2b91af;">T</span>> <span style="color:#74531f;">Left</span><<span style="color:#2b91af;">T</span>>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">value</span>, <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">right</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">Crumb</span><<span style="color:#2b91af;">T</span>>.<span style="color:#74531f;">Left</span>(<span style="font-weight:bold;color:#1f377f;">value</span>, <span style="font-weight:bold;color:#1f377f;">right</span>);
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">Crumb</span><<span style="color:#2b91af;">T</span>> <span style="color:#74531f;">Right</span><<span style="color:#2b91af;">T</span>>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">value</span>, <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">left</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">Crumb</span><<span style="color:#2b91af;">T</span>>.<span style="color:#74531f;">Right</span>(<span style="font-weight:bold;color:#1f377f;">value</span>, <span style="font-weight:bold;color:#1f377f;">left</span>);
}</pre>
</p>
<p>
Notice that the <code>Left</code> and <code>Right</code> choices have the same structure: A 'naked' generic <code>T</code> value, and a <code><span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>></code> object. Only the names differ. This suggests that we only need to think about one of them, and then we can reuse our conclusion for the other.
</p>
<p>
As we've already done once, we consider a <code>T</code> value equivalent with <code><span style="color:#2b91af;">Identity</span><<span style="color:#2b91af;">T</span>></code>, which is a functor. We've also, just above, established that <code><span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>></code> forms a functor. We have a product (argument list, or tuple) of functors, so that combination forms a functor.
</p>
<p>
Since this is true for both alternatives, this sum type, too, gives rise to a functor. This enables you to implement a <code>Select</code> method:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">Crumb</span><<span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#74531f;">Select</span><<span style="color:#2b91af;">TResult</span>>(<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">selector</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#74531f;">Match</span>(
(<span style="font-weight:bold;color:#1f377f;">v</span>, <span style="font-weight:bold;color:#1f377f;">r</span>) => <span style="color:#2b91af;">Crumb</span>.<span style="color:#74531f;">Left</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>(<span style="font-weight:bold;color:#1f377f;">v</span>), <span style="font-weight:bold;color:#1f377f;">r</span>.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>)),
(<span style="font-weight:bold;color:#1f377f;">v</span>, <span style="font-weight:bold;color:#1f377f;">l</span>) => <span style="color:#2b91af;">Crumb</span>.<span style="color:#74531f;">Right</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>(<span style="font-weight:bold;color:#1f377f;">v</span>), <span style="font-weight:bold;color:#1f377f;">l</span>.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>)));
}</pre>
</p>
<p>
By now the pattern should be familiar. Call <code><span style="font-weight:bold;color:#1f377f;">selector</span>(<span style="font-weight:bold;color:#1f377f;">v</span>)</code> directly on the 'naked' values, and pass <code>selector</code> to any other functors' <code>Select</code> method.
</p>
<p>
That's <em>almost</em> all the building blocks we have to declare <code><span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">T</span>></code> a functor as well, but we need one last theorem before we can do that. We'll conclude this work in <a href="/2024/10/28/functor-compositions">the next article</a>.
</p>
<h3 id="2b3a70f8791c41eb952ff160398fe441">
Higher arities <a href="#2b3a70f8791c41eb952ff160398fe441">#</a>
</h3>
<p>
Although we finally saw a 'real' triple product, all the sum types have involved binary choices between a 'left side' and a 'right side'. As was the case with functor products, the result generalizes to higher arities. A sum type with any number of cases forms a functor if all the cases give rise to a functor.
</p>
<p>
We can, again, use canonicalized forms to argue the case. (See <a href="https://thinkingwithtypes.com/">Thinking with Types</a> for a clear explanation of canonicalization of types.) A two-way choice is isomorphic to <a href="/2019/01/14/an-either-functor">Either</a>, and a three-way choice is isomorphic to <code>Either a (Either b c)</code>. Just like it's possible to build triples, quadruples, etc. by nesting pairs, we can construct n-ary choices by nesting Eithers. It's the same kind of inductive reasoning.
</p>
<p>
This is relevant because just as Haskell's <a href="https://hackage.haskell.org/package/base">base</a> library provides <a href="https://hackage.haskell.org/package/base/docs/Data-Functor-Product.html">Data.Functor.Product</a> for composing two (and thereby any number of) functors, it also provides <a href="https://hackage.haskell.org/package/base/docs/Data-Functor-Sum.html">Data.Functor.Sum</a> for composing functor sums.
</p>
<p>
The <code>Sum</code> type defines two case constructors: <code>InL</code> and <code>InR</code>, but it's isomorphic with <code>Either</code>:
</p>
<p>
<pre><span style="color:#2b91af;">canonizeSum</span> <span style="color:blue;">::</span> <span style="color:blue;">Sum</span> f g a <span style="color:blue;">-></span> <span style="color:#2b91af;">Either</span> (f a) (g a)
canonizeSum (InL x) = Left x
canonizeSum (InR y) = Right y
<span style="color:#2b91af;">summarizeEither</span> <span style="color:blue;">::</span> <span style="color:#2b91af;">Either</span> (f a) (g a) <span style="color:blue;">-></span> <span style="color:blue;">Sum</span> f g a
summarizeEither (Left x) = InL x
summarizeEither (Right y) = InR y</pre>
</p>
<p>
The point is that we can compose not only a choice of two, but of any number of functors, to a single functor type. A simple example is this choice between <a href="/2018/03/26/the-maybe-functor">Maybe</a>, list, or <a href="/2018/08/06/a-tree-functor">Tree</a>:
</p>
<p>
<pre><span style="color:#2b91af;">maybeOrListOrTree</span> <span style="color:blue;">::</span> <span style="color:blue;">Sum</span> (<span style="color:blue;">Sum</span> <span style="color:#2b91af;">Maybe</span> []) <span style="color:blue;">Tree</span> <span style="color:#2b91af;">String</span>
maybeOrListOrTree = InL (InL (Just <span style="color:#a31515;">"foo"</span>))</pre>
</p>
<p>
If we rather wanted to embed a list in that type, we can do that as well:
</p>
<p>
<pre><span style="color:#2b91af;">maybeOrListOrTree'</span> <span style="color:blue;">::</span> <span style="color:blue;">Sum</span> (<span style="color:blue;">Sum</span> <span style="color:#2b91af;">Maybe</span> []) <span style="color:blue;">Tree</span> <span style="color:#2b91af;">String</span>
maybeOrListOrTree' = InL (InR [<span style="color:#a31515;">"bar"</span>, <span style="color:#a31515;">"baz"</span>])</pre>
</p>
<p>
Both values have the same type, and since it's a <code>Functor</code> instance, you can <code>fmap</code> over it:
</p>
<p>
<pre>ghci> fmap (elem 'r') maybeOrListOrTree
InL (InL (Just False))
ghci> fmap (elem 'r') maybeOrListOrTree'
InL (InR [True,False])</pre>
</p>
<p>
These queries examine each <code>String</code> to determine whether or not they contain the letter <code>'r'</code>, which only <code>"bar"</code> does.
</p>
<p>
The point, anyway, is that sum types of any arity form a functor if all the cases do.
</p>
<h3 id="8545e09908fb4df4ace08e7b20ffc509">
Conclusion <a href="#8545e09908fb4df4ace08e7b20ffc509">#</a>
</h3>
<p>
In the previous article, you learned that a functor product gives rise to a functor. In this article, you learned that a functor sum does, too. If a data structure contains a choice of two or more functors, then that data type itself forms a functor.
</p>
<p>
As the previous article argues, this is useful to know, particularly if you're working in a language with only partial support for functors. Mainstream languages aren't going to automatically turn such sums into functors, in the way that Haskell's <code>Sum</code> <a href="https://bartoszmilewski.com/2014/01/14/functors-are-containers/">container</a> almost does. Thus, knowing when you can safely give your generic types a <code>Select</code> method or <code>map</code> function may come in handy.
</p>
<p>
There's one more rule like this one.
</p>
<p>
<strong>Next:</strong> <a href="/2024/10/28/functor-compositions">Functor compositions</a>.
</p>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.The Const functorhttps://blog.ploeh.dk/2024/10/07/the-const-functor2024-10-07T18:37:00+00:00Mark Seemann
<div id="post">
<p>
<em>Package a constant value, but make it look like a functor. An article for object-oriented programmers.</em>
</p>
<p>
This article is an instalment in <a href="/2018/03/22/functors">an article series about functors</a>. In previous articles, you've learned about useful functors such as <a href="/2018/03/26/the-maybe-functor">Maybe</a> and <a href="/2019/01/14/an-either-functor">Either</a>. You've also seen at least one less-than useful functor: <a href="/2018/09/03/the-identity-functor">The Identity functor</a>. In this article, you'll learn about another (practically) useless functor called <em>Const</em>. You can skip this article if you want.
</p>
<p>
Like Identity, the Const functor may not be that useful, but it nonetheless exists. You'll probably not need it for actual programming tasks, but knowing that it exists, like Identity, can be a useful as an analysis tool. It may help you quickly evaluate whether a particular data structure affords various compositions. For example, it may enable you to quickly identify whether, say, a constant type and a list <a href="/2022/07/11/functor-relationships">may compose to a functor</a>.
</p>
<p>
This article starts with C#, then proceeds over <a href="https://fsharp.org/">F#</a> to finally discuss <a href="https://www.haskell.org/">Haskell</a>'s built-in Const functor. You can just skip the languages you don't care about.
</p>
<h3 id="050cc4bc478f449ca11c28a83f8a2fda">
C# Const class <a href="#050cc4bc478f449ca11c28a83f8a2fda">#</a>
</h3>
<p>
While C# supports <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/builtin-types/record">records</a>, and you can implement Const as one, I here present it as a full-fledged class. For readers who may not be that familiar with modern C#, a normal class may be more recognizable.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Const</span><<span style="color:#2b91af;">T1</span>, <span style="color:#2b91af;">T2</span>>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">T1</span> Value { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:#2b91af;">Const</span>(<span style="color:#2b91af;">T1</span> <span style="font-weight:bold;color:#1f377f;">value</span>)
{
Value = <span style="font-weight:bold;color:#1f377f;">value</span>;
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">Const</span><<span style="color:#2b91af;">T1</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#74531f;">Select</span><<span style="color:#2b91af;">TResult</span>>(<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T2</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">selector</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">Const</span><<span style="color:#2b91af;">T1</span>, <span style="color:#2b91af;">TResult</span>>(Value);
}
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">Equals</span>(<span style="color:blue;">object</span> <span style="font-weight:bold;color:#1f377f;">obj</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">obj</span> <span style="color:blue;">is</span> <span style="color:#2b91af;">Const</span><<span style="color:#2b91af;">T1</span>, <span style="color:#2b91af;">T2</span>> <span style="font-weight:bold;color:#1f377f;">@const</span> &&
<span style="color:#2b91af;">EqualityComparer</span><<span style="color:#2b91af;">T1</span>>.Default.<span style="font-weight:bold;color:#74531f;">Equals</span>(Value, <span style="font-weight:bold;color:#1f377f;">@const</span>.Value);
}
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">int</span> <span style="font-weight:bold;color:#74531f;">GetHashCode</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> -1584136870 + <span style="color:#2b91af;">EqualityComparer</span><<span style="color:#2b91af;">T1</span>>.Default.<span style="font-weight:bold;color:#74531f;">GetHashCode</span>(Value);
}
}</pre>
</p>
<p>
The point of the Const functor is to make a constant value look like a functor; that is, <a href="https://bartoszmilewski.com/2014/01/14/functors-are-containers/">a container</a> that you can map from one type to another. The difference from the Identity functor is that Const doesn't allow you to map the constant. Rather, it cheats and pretends having a mappable type that, however, has no value associated with it; a <a href="https://wiki.haskell.org/Phantom_type">phantom type</a>.
</p>
<p>
In <code><span style="color:#2b91af;">Const</span><<span style="color:#2b91af;">T1</span>, <span style="color:#2b91af;">T2</span>></code>, the <code>T2</code> type parameter is the 'pretend' type. While the class contains a <code>T1</code> value, it contains no <code>T2</code> value. The <code>Select</code> method, on the other hand, maps <code>T2</code> to <code>TResult</code>. The operation is close to being a <a href="https://en.wikipedia.org/wiki/NOP_(code)">no-op</a>, but still not quite. While it doesn't do anything particularly practical, it <em>does</em> change the type of the returned value.
</p>
<p>
Here's a simple example of using the <code>Select</code> method:
</p>
<p>
<pre><span style="color:#2b91af;">Const</span><<span style="color:blue;">string</span>, <span style="color:blue;">double</span>> <span style="font-weight:bold;color:#1f377f;">c</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">Const</span><<span style="color:blue;">string</span>, <span style="color:blue;">int</span>>(<span style="color:#a31515;">"foo"</span>).<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">i</span> => <span style="color:#2b91af;">Math</span>.<span style="color:#74531f;">Sqrt</span>(<span style="font-weight:bold;color:#1f377f;">i</span>));</pre>
</p>
<p>
The new <code>c</code> value <em>also</em> contains <code>"foo"</code>. Only its type has changed.
</p>
<p>
If you find this peculiar, think of it as similar to mapping an empty list, or an empty Maybe value. In those cases, too, no <em>values</em> change; only the type changes. The difference between empty Maybe objects or empty lists, and the Const functor is that Const isn't empty. There <em>is</em> a value; it's just not the value being mapped.
</p>
<h3 id="3262b7a3818d46bca452500138f776b2">
Functor laws <a href="#3262b7a3818d46bca452500138f776b2">#</a>
</h3>
<p>
Although the Const functor doesn't really do anything, it still obeys the functor laws. To illustrate it (but not to prove it), here's an <a href="https://fscheck.github.io/FsCheck/">FsCheck</a> property that exercises the first functor law:
</p>
<p>
<pre>[<span style="color:#2b91af;">Property</span>(QuietOnSuccess = <span style="color:blue;">true</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">ConstObeysFirstFunctorLaw</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">i</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">left</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">Const</span><<span style="color:blue;">int</span>, <span style="color:blue;">string</span>>(<span style="font-weight:bold;color:#1f377f;">i</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">right</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">Const</span><<span style="color:blue;">int</span>, <span style="color:blue;">string</span>>(<span style="font-weight:bold;color:#1f377f;">i</span>).<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">x</span> => <span style="font-weight:bold;color:#1f377f;">x</span>);
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Equal</span>(<span style="font-weight:bold;color:#1f377f;">left</span>, <span style="font-weight:bold;color:#1f377f;">right</span>);
}</pre>
</p>
<p>
If you think it over for a minute, this makes sense. The test creates a <code><span style="color:#2b91af;">Const</span><<span style="color:blue;">int</span>, <span style="color:blue;">string</span>></code> that contains the integer <code>i</code>, and then proceeds to map <em>the string that isn't there</em> to 'itself'. Clearly, this doesn't change the <code>i</code> value contained in the <code><span style="color:#2b91af;">Const</span><<span style="color:blue;">int</span>, <span style="color:blue;">string</span>></code> container.
</p>
<p>
In the same spirit, a property demonstrates the second functor law:
</p>
<p>
<pre>[<span style="color:#2b91af;">Property</span>(QuietOnSuccess = <span style="color:blue;">true</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">ConstObeysSecondFunctorLaw</span>(
<span style="color:#2b91af;">Func</span><<span style="color:blue;">string</span>, <span style="color:blue;">byte</span>> <span style="font-weight:bold;color:#1f377f;">f</span>,
<span style="color:#2b91af;">Func</span><<span style="color:blue;">int</span>, <span style="color:blue;">string</span>> <span style="font-weight:bold;color:#1f377f;">g</span>,
<span style="color:blue;">short</span> <span style="font-weight:bold;color:#1f377f;">s</span>)
{
<span style="color:#2b91af;">Const</span><<span style="color:blue;">short</span>, <span style="color:blue;">byte</span>> <span style="font-weight:bold;color:#1f377f;">left</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">Const</span><<span style="color:blue;">short</span>, <span style="color:blue;">int</span>>(<span style="font-weight:bold;color:#1f377f;">s</span>).<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">g</span>).<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">f</span>);
<span style="color:#2b91af;">Const</span><<span style="color:blue;">short</span>, <span style="color:blue;">byte</span>> <span style="font-weight:bold;color:#1f377f;">right</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">Const</span><<span style="color:blue;">short</span>, <span style="color:blue;">int</span>>(<span style="font-weight:bold;color:#1f377f;">s</span>).<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">x</span> => <span style="font-weight:bold;color:#1f377f;">f</span>(<span style="font-weight:bold;color:#1f377f;">g</span>(<span style="font-weight:bold;color:#1f377f;">x</span>)));
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Equal</span>(<span style="font-weight:bold;color:#1f377f;">left</span>, <span style="font-weight:bold;color:#1f377f;">right</span>);
}</pre>
</p>
<p>
Again, the same kind of almost-no-op takes place. The <code>g</code> function first changes the <code>int</code> type to <code>string</code>, and then <code>f</code> changes the <code>string</code> type to <code>byte</code>, but no <em>value</em> ever changes; only the second type parameter. Thus, <code>left</code> and <code>right</code> remain equal, since they both contain the same value <code>s</code>.
</p>
<h3 id="ca40bd6e23794a0b9de36b0835dce6cb">
F# Const <a href="#ca40bd6e23794a0b9de36b0835dce6cb">#</a>
</h3>
<p>
In F# we may <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatically</a> express Const as a single-case union:
</p>
<p>
<pre><span style="color:blue;">type</span> <span style="color:#2b91af;">Const</span><<span style="color:#2b91af;">'v</span>, <span style="color:#2b91af;">'a</span>> = <span style="color:#2b91af;">Const</span> <span style="color:blue;">of</span> <span style="color:#2b91af;">'v</span></pre>
</p>
<p>
Here I've chosen to name the first type parameter <code>'v</code> (for <em>value</em>) in order to keep the 'functor type parameter' name <code>'a</code>. This enables me to meaningfully annotate the functor mapping function with the type <code><span style="color:#2b91af;">'a</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">'b</span></code>:
</p>
<p>
<pre><span style="color:blue;">module</span> <span style="color:#2b91af;">Const</span> =
<span style="color:blue;">let</span> <span style="color:#74531f;">get</span> (<span style="color:#2b91af;">Const</span> <span style="font-weight:bold;color:#1f377f;">x</span>) = <span style="font-weight:bold;color:#1f377f;">x</span>
<span style="color:blue;">let</span> <span style="color:#74531f;">map</span> (<span style="color:#74531f;">f</span> : <span style="color:#2b91af;">'a</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">'b</span>) (<span style="color:#2b91af;">Const</span> <span style="font-weight:bold;color:#1f377f;">x</span> : <span style="color:#2b91af;">Const</span><<span style="color:#2b91af;">'v</span>, <span style="color:#2b91af;">'a</span>>) : <span style="color:#2b91af;">Const</span><<span style="color:#2b91af;">'v</span>, <span style="color:#2b91af;">'b</span>> = <span style="color:#2b91af;">Const</span> <span style="font-weight:bold;color:#1f377f;">x</span></pre>
</p>
<p>
Usually, you don't need to annotate F# functions like <code>map</code>, but in this case I added explicit types in order to make it a recognizable functor map.
</p>
<p>
I could also have defined <code>map</code> like this:
</p>
<p>
<pre><span style="color:green;">// 'a -> Const<'b,'c> -> Const<'b,'d></span>
<span style="color:blue;">let</span> <span style="color:#74531f;">map</span> <span style="color:#1f377f;">f</span> (<span style="color:#2b91af;">Const</span> <span style="font-weight:bold;color:#1f377f;">x</span>) = <span style="color:#2b91af;">Const</span> <span style="font-weight:bold;color:#1f377f;">x</span></pre>
</p>
<p>
This still works, but is less recognizable as a functor map, since <code>f</code> may be any <code>'a</code>. Notice that if type inference is left to its own devices, it names the input type <code>Const<'b,'c></code> and the return type <code>Const<'b,'d></code>. This also means that if you want to supply <code>f</code> as a mapping function, this is legal, because we may consider <code>'a ~ 'c -> 'd</code>. It's still a functor map, but a less familiar representation.
</p>
<p>
Similar to the above C# code, two FsCheck properties demonstrate that the <code>Const</code> type obeys the functor laws.
</p>
<p>
<pre>[<<span style="color:#2b91af;">Property</span>(QuietOnSuccess = <span style="color:blue;">true</span>)>]
<span style="color:blue;">let</span> <span style="color:#74531f;">``Const obeys first functor law``</span> (<span style="font-weight:bold;color:#1f377f;">i</span> : <span style="color:#2b91af;">int</span>) =
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">left</span> = <span style="color:#2b91af;">Const</span> <span style="font-weight:bold;color:#1f377f;">i</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">right</span> = <span style="color:#2b91af;">Const</span> <span style="font-weight:bold;color:#1f377f;">i</span> |> <span style="color:#2b91af;">Const</span>.<span style="color:#74531f;">map</span> <span style="color:#74531f;">id</span>
<span style="font-weight:bold;color:#1f377f;">left</span> =! <span style="font-weight:bold;color:#1f377f;">right</span>
[<<span style="color:#2b91af;">Property</span>(QuietOnSuccess = <span style="color:blue;">true</span>)>]
<span style="color:blue;">let</span> <span style="color:#74531f;">``Const obeys second functor law``</span> (<span style="color:#74531f;">f</span> : <span style="color:#2b91af;">string</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">byte</span>) (<span style="color:#74531f;">g</span> : <span style="color:#2b91af;">int</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">string</span>) (<span style="font-weight:bold;color:#1f377f;">s</span> : <span style="color:#2b91af;">int16</span>) =
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">left</span> = <span style="color:#2b91af;">Const</span> <span style="font-weight:bold;color:#1f377f;">s</span> |> <span style="color:#2b91af;">Const</span>.<span style="color:#74531f;">map</span> <span style="color:#74531f;">g</span> |> <span style="color:#2b91af;">Const</span>.<span style="color:#74531f;">map</span> <span style="color:#74531f;">f</span>
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">right</span> = <span style="color:#2b91af;">Const</span> <span style="font-weight:bold;color:#1f377f;">s</span> |> <span style="color:#2b91af;">Const</span>.<span style="color:#74531f;">map</span> (<span style="color:#74531f;">g</span> >> <span style="color:#74531f;">f</span>)
<span style="font-weight:bold;color:#1f377f;">left</span> =! <span style="font-weight:bold;color:#1f377f;">right</span></pre>
</p>
<p>
The assertions use <a href="https://github.com/SwensenSoftware/unquote">Unquote</a>'s <code>=!</code> operator, which I usually read as <em>should equal</em> or <em>must equal</em>.
</p>
<h3 id="9474bc7665ed4f1da688dbb2484ccbf9">
Haskell Const <a href="#9474bc7665ed4f1da688dbb2484ccbf9">#</a>
</h3>
<p>
The Haskell <a href="https://hackage.haskell.org/package/base">base</a> library already comes with a <a href="https://hackage.haskell.org/package/base/docs/Control-Applicative.html#t:Const">Const</a> <code>newtype</code>.
</p>
<p>
You can easily create a new <code>Const</code> value:
</p>
<p>
<pre>ghci> Const "foo"
Const "foo"</pre>
</p>
<p>
If you inquire about its type, GHCi will tell you in a rather verbose way that the first type parameter is <code>String</code>, but the second may be any type <code>b</code>:
</p>
<p>
<pre>ghci> :t Const "foo"
Const "foo" :: forall {k} {b :: k}. Const String b</pre>
</p>
<p>
You can also map by 'incrementing' its non-existent second value:
</p>
<p>
<pre>ghci> (+1) <$> Const "foo"
Const "foo"
ghci> :t (+1) <$> Const "foo"
(+1) <$> Const "foo" :: Num b => Const String b</pre>
</p>
<p>
While the value remains <code>Const "foo"</code>, the type of <code>b</code> is now constrained to a <a href="https://hackage.haskell.org/package/base/docs/Prelude.html#t:Num">Num</a> instance, which follows from the use of the <code>+</code> operator.
</p>
<h3 id="83eea33a91f84b2c9ff4d364b0c868d6">
Functor law proofs <a href="#83eea33a91f84b2c9ff4d364b0c868d6">#</a>
</h3>
<p>
If you look at the source code for the <code>Functor</code> instance, it looks much like its F# equivalent:
</p>
<p>
<pre>instance Functor (Const m) where
fmap _ (Const v) = Const v</pre>
</p>
<p>
We can use equational reasoning with <a href="https://bartoszmilewski.com/2015/01/20/functors/">the notation that Bartosz Milewski uses</a> to prove that both functor laws hold, starting with the first:
</p>
<p>
<pre> fmap id (Const x)
= { definition of fmap }
Const x</pre>
</p>
<p>
Clearly, there's not much to that part. What about the second functor law?
</p>
<p>
<pre> fmap (g . f) (Const x)
= { definition of fmap }
Const x
= { definition of fmap }
fmap g (Const x)
= { definition of fmap }
fmap g (fmap f (Const x))
= { definition of composition }
(fmap g . fmap f) (Const x)</pre>
</p>
<p>
While that proof takes a few more steps, most are as trivial as the first proof.
</p>
<h3 id="e71a037a6f3f491ca3f755ce31809123">
Conclusion <a href="#e71a037a6f3f491ca3f755ce31809123">#</a>
</h3>
<p>
The Const functor is hardly a programming construct you'll use in your day-to-day work, but the fact that it exists can be used to generalize some results that involve functors. Now, whenever you have a result that involves a functor, you know that it also generalizes to constant values, just like the Identity functor teaches us that 'naked' type parameters can be thought of as functors.
</p>
<p>
To give a few examples, we may already know that <code>Tree<T></code> (C# syntax) is a functor, but a 'naked' generic type parameter <code>T</code> also gives rise to a functor (Identity), as does a non-generic type (such as <code>int</code> or <code>MyCustomClass</code>).
</p>
<p>
Thus, if you have a function that operates on any functor, it may also, conceivably, operate on data structures that have non-generic types. This may for example be interesting when we begin to consider <a href="/2022/07/11/functor-relationships">how functors compose</a>.
</p>
<p>
<strong>Next:</strong> <a href="/2021/07/19/the-state-functor">The State functor</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Das verflixte Hunde-Spielhttps://blog.ploeh.dk/2024/10/03/das-verflixte-hunde-spiel2024-10-03T17:41:00+00:00Mark Seemann
<div id="post">
<p>
<em>A puzzle kata, and a possible solution.</em>
</p>
<p>
When I was a boy I had a nine-piece puzzle that I'd been gifted by the Swizz branch of my family. It's called <em>Das verflixte Hunde-Spiel</em>, which means something like <em>the confounded dog game</em> in English. And while a puzzle with nine pieces doesn't sound like much, it is, in fact, incredibly difficult.
</p>
<p>
It's just a specific incarnation of a kind of game that you've almost certainly encountered, too.
</p>
<p>
<img src="/content/binary/hunde-spiel.jpg" alt="A picture of the box of the puzzle, together with the tiles spread out in unordered fashion.">
</p>
<p>
There are nine tiles, each with two dog heads and two dog ends. A dog may be coloured in one of four different patterns. The object of the game is to lay out the nine tiles in a 3x3 square so that all dog halves line up.
</p>
<h3 id="ddf5aa390eed4147a55a35e95803b6ad">
Game details <a href="#ddf5aa390eed4147a55a35e95803b6ad">#</a>
</h3>
<p>
The game is from 1979. Two of the tiles are identical, and, according to the information on the back of the box, two possible solutions exist. Described from top clockwise, the tiles are the following:
</p>
<ul>
<li>Brown head, grey head, umber tail, spotted tail</li>
<li>Brown head, spotted head, brown tail, umber tail</li>
<li>Brown head, spotted head, grey tail, umber tail</li>
<li>Brown head, spotted head, grey tail, umber tail</li>
<li>Brown head, umber head, spotted tail, grey tail</li>
<li>Grey head, brown head, spotted tail, umber tail</li>
<li>Grey head, spotted head, brown tail, umber tail</li>
<li>Grey head, umber head, brown tail, spotted tail</li>
<li>Grey head, umber head, grey tail, spotted tail</li>
</ul>
<p>
I've taken the liberty of using a shorthand for the patterns. The grey dogs are actually also spotted, but since there's only one grey pattern, the <em>grey</em> label is unambiguous. The dogs I've named <em>umber</em> are actually rather <em>burnt umber</em>, but that's too verbose for my tastes, so I just named them <em>umber</em>. Finally, the label <em>spotted</em> indicates dogs that are actually burnt umber with brown blotches.
</p>
<p>
Notice that there are two tiles with a brown head, a spotted head, a grey tail, and an umber tail.
</p>
<p>
The object of the game is to lay down the tiles in a 3x3 square so that all dogs fit. For further reference, I've numbered each position from one to nine like this:
</p>
<p>
<img src="/content/binary/numbered-3x3-tiles.png" alt="Nine tiles arranged in a three-by-three square, numbered from 1 to 9 from top left to bottom right.">
</p>
<p>
What makes the game hard? There are nine cards, so if you start with the upper left corner, you have nine choices. If you just randomly put down the tiles, you now have eight left for the top middle position, and so on. Standard combinatorics indicate that there are at least 9! = 362,880 permutations.
</p>
<p>
That's not the whole story, however, since you can rotate each tile in four different ways. You can rotate the first tile four ways, the second tile four ways, etc. for a total of 4<sup>9</sup> = 262,144 ways. Multiply these two numbers together, and you get 4<sup>9</sup>9! = 95,126,814,720 combinations. No wonder this puzzle is hard if there's only two solutions.
</p>
<p>
When analysed this way, however, there are actually 16 solutions, but that still makes it incredibly unlikely to arrive at a solution by chance. I'll get back to why there are 16 solutions later. For now, you should have enough information to try your hand with this game, if you'd like.
</p>
<p>
I found that the game made for an interesting <a href="/2020/01/13/on-doing-katas">kata</a>: Write a program that finds all possible solutions to the puzzle.
</p>
<p>
If you'd like to try your hand at this exercise, I suggest that you pause reading here.
</p>
<p>
In the rest of the article, I'll outline my first attempt. Spoiler alert: I'll also show one of the solutions.
</p>
<h3 id="113acd886fde4791b10c4a2b6f394216">
Types <a href="#113acd886fde4791b10c4a2b6f394216">#</a>
</h3>
<p>
When you program in <a href="https://www.haskell.org/">Haskell</a>, it's natural to start by defining some types.
</p>
<p>
<pre><span style="color:blue;">data</span> Half = Head | Tail <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Show</span>, <span style="color:#2b91af;">Eq</span>)
<span style="color:blue;">data</span> Pattern = Brown | Grey | Spotted | Umber <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Show</span>, <span style="color:#2b91af;">Eq</span>)
<span style="color:blue;">data</span> Tile = Tile {
<span style="color:#2b91af;">top</span> <span style="color:blue;">::</span> (<span style="color:blue;">Pattern</span>, <span style="color:blue;">Half</span>),
<span style="color:#2b91af;">right</span> <span style="color:blue;">::</span> (<span style="color:blue;">Pattern</span>, <span style="color:blue;">Half</span>),
<span style="color:#2b91af;">bottom</span> <span style="color:blue;">::</span> (<span style="color:blue;">Pattern</span>, <span style="color:blue;">Half</span>),
<span style="color:#2b91af;">left</span> <span style="color:blue;">::</span> (<span style="color:blue;">Pattern</span>, <span style="color:blue;">Half</span>) }
<span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Show</span>, <span style="color:#2b91af;">Eq</span>)</pre>
</p>
<p>
Each tile describes what you find on its <code>top</code>, <code>right</code> side, <code>bottom</code>, and <code>left</code> side.
</p>
<p>
We're also going to need a function to evaluate whether two halves match:
</p>
<p>
<pre><span style="color:#2b91af;">matches</span> <span style="color:blue;">::</span> (<span style="color:blue;">Pattern</span>, <span style="color:blue;">Half</span>) <span style="color:blue;">-></span> (<span style="color:blue;">Pattern</span>, <span style="color:blue;">Half</span>) <span style="color:blue;">-></span> <span style="color:#2b91af;">Bool</span>
matches (p1, h1) (p2, h2) = p1 == p2 && h1 /= h2</pre>
</p>
<p>
This function demands that the patterns match, but that the halves are opposites.
</p>
<p>
You can use the <code>Tile</code> type and its constituents to define the nine tiles of the game:
</p>
<p>
<pre><span style="color:#2b91af;">tiles</span> <span style="color:blue;">::</span> [<span style="color:blue;">Tile</span>]
tiles =
[
Tile (Brown, Head) (Grey, Head) (Umber, Tail) (Spotted, Tail),
Tile (Brown, Head) (Spotted, Head) (Brown, Tail) (Umber, Tail),
Tile (Brown, Head) (Spotted, Head) (Grey, Tail) (Umber, Tail),
Tile (Brown, Head) (Spotted, Head) (Grey, Tail) (Umber, Tail),
Tile (Brown, Head) (Umber, Head) (Spotted, Tail) (Grey, Tail),
Tile (Grey, Head) (Brown, Head) (Spotted, Tail) (Umber, Tail),
Tile (Grey, Head) (Spotted, Head) (Brown, Tail) (Umber, Tail),
Tile (Grey, Head) (Umber, Head) (Brown, Tail) (Spotted, Tail),
Tile (Grey, Head) (Umber, Head) (Grey, Tail) (Spotted, Tail)
]</pre>
</p>
<p>
Because I'm the neatnik that I am, I've sorted the tiles in lexicographic order, but the solution below doesn't rely on that.
</p>
<h3 id="1568796e41484e21bae6bb5734f996eb">
Brute force doesn't work <a href="#1568796e41484e21bae6bb5734f996eb">#</a>
</h3>
<p>
Before I started, I cast around the internet to see if there was an appropriate algorithm for the problem. While I found a few answers on <a href="https://stackoverflow.com/">Stack Overflow</a>, none of them gave me indication that any sophisticated algorithm was available. (Even so, there may be, and I just didn't find it.)
</p>
<p>
It seems clear, however, that you can implement some kind of recursive search-tree algorithm that cuts a branch off as soon as it realizes that it doesn't work. I'll get back to that later, so let's leave that for now.
</p>
<p>
Since I'd planned on writing the code in Haskell, I decided to first try something that might look like brute force. Because Haskell is lazily evaluated, you can sometimes get away with techniques that look wasteful when you're used to strict/eager evaluation. In this case, it turned out to not work, but it's often quicker to just make the attempt than trying to analyze the problem.
</p>
<p>
As already outlined, I first attempted a purely brute-force solution, betting that Haskell's lazy evaluation would be enough to skip over the unnecessary calculations:
</p>
<p>
<pre>allRotationsOf9 = replicateM 9 [0..3]
<span style="color:#2b91af;">allRotations</span> <span style="color:blue;">::</span> [<span style="color:blue;">Tile</span>] <span style="color:blue;">-></span> [[<span style="color:blue;">Tile</span>]]
allRotations ts = <span style="color:blue;">fmap</span> (\rs -> (\(r, t) -> rotations t !! r) <$> <span style="color:blue;">zip</span> rs ts) allRotationsOf9
<span style="color:#2b91af;">allConfigurations</span> <span style="color:blue;">::</span> [[<span style="color:blue;">Tile</span>]]
allConfigurations = permutations tiles >>= allRotations
solutions = <span style="color:blue;">filter</span> isSolution allConfigurations</pre>
</p>
<p>
My idea with the <code>allConfigurations</code> value was that it's supposed to enumerate all 95 billion combinations. Whether it actually does that, I was never able to verify, because if I try to run that code, my poor laptop runs for a couple of hours before it eventually runs out of memory. In other words, the GHCi process crashes.
</p>
<p>
I haven't shown <code>isSolution</code> or <code>rotations</code>, because I consider the implementations irrelevant. This attempt doesn't work anyway.
</p>
<p>
Now that I look at it, it's quite clear why this isn't a good strategy. There's little to be gained from lazy evaluation when the final attempt just attempts to <code>filter</code> a list. Even with lazy evaluation, the code still has to run through all 95 billion combinations.
</p>
<p>
Things might have been different if I just had to find one solution. With a little luck, it might be that the first solution appears after, say, a hundred million iterations, and lazy evaluation would then had meant that the remaining combinations would never run. Not so here, but hindsight is 20-20.
</p>
<h3 id="93754ac1a84e4a42b87253f1ffded97b">
Search tree <a href="#93754ac1a84e4a42b87253f1ffded97b">#</a>
</h3>
<p>
Back to the search tree idea. It goes like this: Start from the top left position and pick a random tile and rotation. Now pick an arbitrary tile <em>that fits</em> and place it to the right of it, and so on. As far as I can tell, you can always place the first four cards, but from there, you can easily encounter a combination that allows no further tiles. Here's an example:
</p>
<p>
<img src="/content/binary/hunde-spiel-no-fifth-tile.jpg" alt="Four matching tiles put down, with the remaining five tiles arranged to show that none of them fit the fifth position.">
</p>
<p>
None of the remaining five tiles fit in the fifth position. This means that we don't have to do <em>any</em> permutations that involve these four tiles in that combination. While the algorithm has to search through all five remaining tiles and rotations to discover that none fit in position 5, once it knows that, it doesn't have to go through the remaining four positions. That's 4<sup>4</sup>4! = 6,144 combinations that it can skip every time it discovers an impossible beginning. That doesn't sound like that much, but if we assume that this happens more often than not, it's still an improvement by orders of magnitude.
</p>
<p>
We may think of this algorithm as constructing a search tree, but immediately pruning all branches that aren't viable, as close to the root as possible.
</p>
<h3 id="76d4e7e1898a4da89de0d7afabbdc4e8">
Matches <a href="#76d4e7e1898a4da89de0d7afabbdc4e8">#</a>
</h3>
<p>
Before we get to the algorithm proper we need a few simple helper functions. One kind of function is a predicate that determines if a particular tile can occupy a given position. Since we may place any tile in any rotation in the first position, we don't need to write a predicate for that, but if we wanted to generalize, <code>const True</code> would do.
</p>
<p>
Whether or not we can place a given tile in the second position depends exclusively on the tile in the first position:
</p>
<p>
<pre><span style="color:#2b91af;">tile2Matches</span> <span style="color:blue;">::</span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Bool</span>
tile2Matches t1 t2 = right t1 `matches` left t2</pre>
</p>
<p>
If the <code>right</code> dog part of the first tile <code>matches</code> the <code>left</code> part of the second tile, the return value is <code>True</code>; otherwise, it's <code>False</code>. Note that I'm using infix notation for <code>matches</code>. I could also have written the function as
</p>
<p>
<pre><span style="color:#2b91af;">tile2Matches</span> <span style="color:blue;">::</span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Bool</span>
tile2Matches t1 t2 = matches (right t1) (left t2)</pre>
</p>
<p>
but it doesn't read as well.
</p>
<p>
In any case, the corresponding matching functions for the third and forth tile look similar:
</p>
<p>
<pre><span style="color:#2b91af;">tile3Matches</span> <span style="color:blue;">::</span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Bool</span>
tile3Matches t2 t3 = right t2 `matches` left t3
<span style="color:#2b91af;">tile4Matches</span> <span style="color:blue;">::</span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Bool</span>
tile4Matches t1 t4 = bottom t1 `matches` top t4</pre>
</p>
<p>
Notice that <code>tile4Matches</code> compares the fourth tile with the first tile rather than the third tile, because position 4 is directly beneath position 1, rather than to the right of position 3 (cf. the grid above). For that reason it also compares the <code>bottom</code> of tile 1 to the <code>top</code> of the fourth tile.
</p>
<p>
The matcher for the fifth tile is different:
</p>
<p>
<pre><span style="color:#2b91af;">tile5Matches</span> <span style="color:blue;">::</span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Bool</span>
tile5Matches t2 t4 t5 = bottom t2 `matches` top t5 && right t4 `matches` left t5</pre>
</p>
<p>
This is the first predicate that depends on two, rather than one, previous tiles. In position 5 we need to examine both the tile in position 2 and the one in position 4.
</p>
<p>
The same is true for position 6:
</p>
<p>
<pre><span style="color:#2b91af;">tile6Matches</span> <span style="color:blue;">::</span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Bool</span>
tile6Matches t3 t5 t6 = bottom t3 `matches` top t6 && right t5 `matches` left t6</pre>
</p>
<p>
but then the matcher for position 7 looks like the predicate for position 4:
</p>
<p>
<pre><span style="color:#2b91af;">tile7Matches</span> <span style="color:blue;">::</span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Bool</span>
tile7Matches t4 t7 = bottom t4 `matches` top t7</pre>
</p>
<p>
This is, of course, because the tile in position 7 only has to consider the tile in position 4. Finally, not surprising, the two remaining predicates look like something we've already seen:
</p>
<p>
<pre><span style="color:#2b91af;">tile8Matches</span> <span style="color:blue;">::</span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Bool</span>
tile8Matches t5 t7 t8 = bottom t5 `matches` top t8 && right t7 `matches` left t8
<span style="color:#2b91af;">tile9Matches</span> <span style="color:blue;">::</span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Bool</span>
tile9Matches t6 t8 t9 = bottom t6 `matches` top t9 && right t8 `matches` left t9</pre>
</p>
<p>
You may suggest that it'd be possible to reduce the number of predicates. After all, there's effectively only three different predicates: One that only looks at the tile to the left, one that only looks at the tile above, and one that looks both to the left and above.
</p>
<p>
Indeed, I could have boiled it down to just three functions:
</p>
<p>
<pre><span style="color:#2b91af;">matchesHorizontally</span> <span style="color:blue;">::</span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Bool</span>
matchesHorizontally x y = right x `matches` left y
<span style="color:#2b91af;">matchesVertically</span> <span style="color:blue;">::</span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Bool</span>
matchesVertically x y = bottom x `matches` top y
<span style="color:#2b91af;">matchesBoth</span> <span style="color:blue;">::</span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Bool</span>
matchesBoth x y z = matchesVertically x z && matchesHorizontally y z</pre>
</p>
<p>
but I now run the risk of calling the wrong predicate from my implementation of the algorithm. As you'll see, I'll call each predicate by name at each appropriate step, but if I had only these three functions, there's a risk that I might mistakenly use <code>matchesHorizontally</code> when I should have used <code>matchesVertically</code>, or vice versa. Reducing eight one-liners to three one-liners doesn't really seem to warrant the risk.
</p>
<h3 id="01bf66f9df1947d296a004e93638450d">
Rotations <a href="#01bf66f9df1947d296a004e93638450d">#</a>
</h3>
<p>
In addition to examining whether a given tile fits in a given position, we also need to be able to rotate any tile:
</p>
<p>
<pre><span style="color:#2b91af;">rotateClockwise</span> <span style="color:blue;">::</span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:blue;">Tile</span>
rotateClockwise (Tile t r b l) = Tile l t r b
<span style="color:#2b91af;">rotateCounterClockwise</span> <span style="color:blue;">::</span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:blue;">Tile</span>
rotateCounterClockwise (Tile t r b l) = Tile r b l t
<span style="color:#2b91af;">upend</span> <span style="color:blue;">::</span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> <span style="color:blue;">Tile</span>
upend (Tile t r b l) = Tile b l t r</pre>
</p>
<p>
What is really needed, it turns out, is to enumerate all four rotations of a tile:
</p>
<p>
<pre><span style="color:#2b91af;">rotations</span> <span style="color:blue;">::</span> <span style="color:blue;">Tile</span> <span style="color:blue;">-></span> [<span style="color:blue;">Tile</span>]
rotations t = [t, rotateClockwise t, upend t, rotateCounterClockwise t]</pre>
</p>
<p>
Since this, like everything else here, is a pure function, I experimented with defining a 'memoized tile' type that embedded all four rotations upon creation, so that the algorithm doesn't need to call the <code>rotations</code> function millions of times, but I couldn't measure any discernable performance improvement from it. There's no reason to make things more complicated than they need to be, so I didn't keep that change. (Since I do, however, <a href="https://stackoverflow.blog/2022/12/19/use-git-tactically/">use Git tactically</a> i did, of course, <a href="https://git-scm.com/docs/git-stash">stash</a> the experiment.)
</p>
<h3 id="8e7ce1c1bb9c4403abd72d5d3d87bf02">
Permutations <a href="#8e7ce1c1bb9c4403abd72d5d3d87bf02">#</a>
</h3>
<p>
While I couldn't make things work by enumerating all 95 billion combinations, enumerating all 362,880 permutations of non-rotated tiles is well within the realm of the possible:
</p>
<p>
<pre><span style="color:#2b91af;">allPermutations</span> <span style="color:blue;">::</span> [(<span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>)]
allPermutations =
(\[t1, t2, t3, t4, t5, t6, t7, t8, t9] -> (t1, t2, t3, t4, t5, t6, t7, t8, t9))
<$> permutations tiles</pre>
</p>
<p>
Doing this in GHCi on my old laptop takes 300 milliseconds, which is good enough compared to what comes next.
</p>
<p>
This list value uses <a href="https://hackage.haskell.org/package/base/docs/Data-List.html#v:permutations">permutations</a> to enumerate all the permutations. You may already have noticed that it converts the result into a nine-tuple. The reason for that is that this enables the algorithm to pattern-match into specific positions without having to resort to the <a href="https://hackage.haskell.org/package/base/docs/Data-List.html#v:-33--33-">index operator</a>, which is both partial and requires iteration of the list to reach the indexed element. Granted, the list is only nine elements long, and often the algorithm will only need to index to the fourth or fifth element. On the other hand, it's going to do it <em>a lot</em>. Perhaps it's a premature optimization, but if it is, it's at least one that makes the code more, rather than less, readable.
</p>
<h3 id="3f0af3d6c91a4cd68b026a0ccf93a0e2">
Algorithm <a href="#3f0af3d6c91a4cd68b026a0ccf93a0e2">#</a>
</h3>
<p>
I found it easiest to begin at the 'bottom' of what is effectively a recursive algorithm, even though I didn't implement it that way. At the 'bottom', I imagine that I'm almost done: That I've found eight tiles that match, and now I only need to examine if I can rotate the final tile so that it matches:
</p>
<p>
<pre><span style="color:#2b91af;">solve9th</span> <span style="color:blue;">::</span> (a, b, c, d, e, <span style="color:blue;">Tile</span>, g, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>)
<span style="color:blue;">-></span> [(a, b, c, d, e, <span style="color:blue;">Tile</span>, g, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>)]
solve9th (t1, t2, t3, t4, t5, t6, t7, t8, t9) = <span style="color:blue;">do</span>
match <- <span style="color:blue;">filter</span> (tile9Matches t6 t8) $ rotations t9
<span style="color:blue;">return</span> (t1, t2, t3, t4, t5, t6, t7, t8, match)</pre>
</p>
<p>
Recalling that Haskell functions compose from right to left, the function starts by enumerating the four <code>rotations</code> of the ninth and final tile <code>t9</code>. It then filters those four rotations by the <code>tile9Matches</code> predicate.
</p>
<p>
The <code>match</code> value is a rotation of <code>t9</code> that matches <code>t6</code> and <code>t8</code>. Whenever <code>solve9th</code> finds such a match, it returns the entire nine-tuple, because the assumption is that the eight first tiles are already valid.
</p>
<p>
Notice that the function uses <code>do</code> notation in the list monad, so it's quite possible that the first <code>filter</code> expression produces no <code>match</code>. In that case, the second line of code never runs, and instead, the function returns the empty list.
</p>
<p>
How do we find a tuple where the first eight elements are valid? Well, if we have seven valid tiles, we may consider the eighth and subsequently call <code>solve9th</code>:
</p>
<p>
<pre><span style="color:#2b91af;">solve8th</span> <span style="color:blue;">::</span> (a, b, c, d, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>)
<span style="color:blue;">-></span> [(a, b, c, d, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>)]
solve8th (t1, t2, t3, t4, t5, t6, t7, t8, t9) = <span style="color:blue;">do</span>
match <- <span style="color:blue;">filter</span> (tile8Matches t5 t7) $ rotations t8
solve9th (t1, t2, t3, t4, t5, t6, t7, match, t9)</pre>
</p>
<p>
This function looks a lot like <code>solve9th</code>, but it instead enumerates the four <code>rotations</code> of the eighth tile <code>t8</code> and filters with the <code>tile8Matches</code> predicate. Due to the <code>do</code> notation, it'll only call <code>solve9th</code> if it finds a <code>match</code>.
</p>
<p>
Once more, this function assumes that the first seven tiles are already in a legal constellation. How do we find seven valid tiles? The same way we find eight: By assuming that we have six valid tiles, and then finding the seventh, and so on:
</p>
<p>
<pre><span style="color:#2b91af;">solve7th</span> <span style="color:blue;">::</span> (a, b, c, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>)
<span style="color:blue;">-></span> [(a, b, c, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>)]
solve7th (t1, t2, t3, t4, t5, t6, t7, t8, t9) = <span style="color:blue;">do</span>
match <- <span style="color:blue;">filter</span> (tile7Matches t4) $ rotations t7
solve8th (t1, t2, t3, t4, t5, t6, match, t8, t9)
<span style="color:#2b91af;">solve6th</span> <span style="color:blue;">::</span> (a, b, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>)
<span style="color:blue;">-></span> [(a, b, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>)]
solve6th (t1, t2, t3, t4, t5, t6, t7, t8, t9) = <span style="color:blue;">do</span>
match <- <span style="color:blue;">filter</span> (tile6Matches t3 t5) $ rotations t6
solve7th (t1, t2, t3, t4, t5, match, t7, t8, t9)
<span style="color:#2b91af;">solve5th</span> <span style="color:blue;">::</span> (a, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>)
<span style="color:blue;">-></span> [(a, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>)]
solve5th (t1, t2, t3, t4, t5, t6, t7, t8, t9) = <span style="color:blue;">do</span>
match <- <span style="color:blue;">filter</span> (tile5Matches t2 t4) $ rotations t5
solve6th (t1, t2, t3, t4, match, t6, t7, t8, t9)
<span style="color:#2b91af;">solve4th</span> <span style="color:blue;">::</span> (<span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>)
<span style="color:blue;">-></span> [(<span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>)]
solve4th (t1, t2, t3, t4, t5, t6, t7, t8, t9) = <span style="color:blue;">do</span>
match <- <span style="color:blue;">filter</span> (tile4Matches t1) $ rotations t4
solve5th (t1, t2, t3, match, t5, t6, t7, t8, t9)
<span style="color:#2b91af;">solve3rd</span> <span style="color:blue;">::</span> (<span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>)
<span style="color:blue;">-></span> [(<span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>)]
solve3rd (t1, t2, t3, t4, t5, t6, t7, t8, t9) = <span style="color:blue;">do</span>
match <- <span style="color:blue;">filter</span> (tile3Matches t2) $ rotations t3
solve4th (t1, t2, match, t4, t5, t6, t7, t8, t9)
<span style="color:#2b91af;">solve2nd</span> <span style="color:blue;">::</span> (<span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>)
<span style="color:blue;">-></span> [(<span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>)]
solve2nd (t1, t2, t3, t4, t5, t6, t7, t8, t9) = <span style="color:blue;">do</span>
match <- <span style="color:blue;">filter</span> (tile2Matches t1) $ rotations t2
solve3rd (t1, match, t3, t4, t5, t6, t7, t8, t9)</pre>
</p>
<p>
You'll observe that <code>solve7th</code> down to <code>solve2nd</code> are very similar. The only things that really vary are the predicates, and the positions of the tile being examined, as well as its neighbours. Clearly I can generalize this code, but I'm not sure it's worth it. I wrote a few of these in the order I've presented them here, because it helped me think the problem through, and to be honest, once I had two or three of them, <a href="https://github.com/features/copilot">GitHub Copilot</a> picked up on the pattern and wrote the remaining functions for me.
</p>
<p>
Granted, <a href="/2018/09/17/typing-is-not-a-programming-bottleneck">typing isn't a programming bottleneck</a>, so we should rather ask if this kind of duplication looks like a maintenance problem. Given that this is a one-time exercise, I'll just leave it be and move on.
</p>
<p>
Particularly, if you're struggling to understand how this implements the 'truncated search tree', keep in mind that e..g <code>solve5th</code> is likely to produce no valid <code>match</code>, in which case it'll never call <code>solve6th</code>. The same may happen in <code>solve6th</code>, etc.
</p>
<p>
The 'top' function is a bit different because it doesn't need to <code>filter</code> anything:
</p>
<p>
<pre><span style="color:#2b91af;">solve1st</span> <span style="color:blue;">::</span> (<span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>)
<span style="color:blue;">-></span> [(<span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>)]
solve1st (t1, t2, t3, t4, t5, t6, t7, t8, t9) = <span style="color:blue;">do</span>
match <- rotations t1
solve2nd (match, t2, t3, t4, t5, t6, t7, t8, t9)</pre>
</p>
<p>
In the first position, any tile in any rotation is legal, so <code>solve1st</code> only enumerates all four <code>rotations</code> of <code>t1</code> and calls <code>solve2nd</code> for each.
</p>
<p>
The final step is to compose <code>allPermutations</code> with <code>solve1st</code>:
</p>
<p>
<pre><span style="color:#2b91af;">solutions</span> <span style="color:blue;">::</span> [(<span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>, <span style="color:blue;">Tile</span>)]
solutions = allPermutations >>= solve1st</pre>
</p>
<p>
Running this in GHCi on my 4½-year old laptop produces all 16 solutions in approximately 22 seconds.
</p>
<h3 id="d3d8c77398334534b5a200a240d7bddc">
Evaluation <a href="#d3d8c77398334534b5a200a240d7bddc">#</a>
</h3>
<p>
Is that good performance? Well, it turns out that it's possible to substantially improve on the situation. As I've mentioned a couple of times, so far I've been running the program from GHCi, the Haskell <a href="https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop">REPL</a>. Most of the 22 seconds are spent interpreting or compiling the code.
</p>
<p>
If I compile the code with some optimizations turned on, the executable runs in approximately 300 ms. That seems quite decent, if I may say so.
</p>
<p>
I can think of a few tweaks to the code that might conceivably improve things even more, but when I test, there's no discernable difference. Thus, I'll keep the code as shown here.
</p>
<p>
Here's one of the solutions:
</p>
<p>
<img src="/content/binary/hunde-spiel-solution.jpg" alt="One of the game solutions.">
</p>
<p>
The information on the box claims that there's two solutions. Why does the code shown here produce 16 solutions?
</p>
<p>
There's a good explanation for that. Recall that two of the tiles are identical. In the above solution picture, it's tile 1 and 3, although they're rotated 90° in relation to each other. This implies that you could take tile 1, rotate it counter-clockwise and put it in position 3, while simultaneously taking tile 3, rotating it clockwise, and putting it in position 1. Visually, you can't tell the difference, so they don't count as two distinct solutions. The algorithm, however, doesn't make that distinction, so it enumerates what is effectively the same solution twice.
</p>
<p>
Not surprising, it turns out that all 16 solutions are doublets in that way. We can confirm that by evaluating <code>length $ <a href="https://hackage.haskell.org/package/base/docs/Data-List.html#v:nub">nub</a> solutions</code>, which returns <code>8</code>.
</p>
<p>
Eight solutions are, however, still four times more than two. Can you figure out what's going on?
</p>
<p>
The algorithm also enumerates four rotations of each solution. Once we take this into account, there's only two visually distinct solutions left. One of them is shown above. I also have a picture of the other one, but I'm not going to totally spoil things for you.
</p>
<h3 id="f97500846a6e481ebe1706278f324979">
Conclusion <a href="#f97500846a6e481ebe1706278f324979">#</a>
</h3>
<p>
When I was eight, I might have had the time and the patience to actually lay the puzzle. Despite the incredibly bad odds, I vaguely remember finally solving it. There must be some more holistic processing going on in the brain, if even a kid can solve the puzzle, because it seems inconceivable that it should be done as described here.
</p>
<p>
Today, I don't care for that kind of puzzle in analog form, but I did, on the other hand, find it an interesting programming exercise.
</p>
<p>
The code could be smaller, but I like it as it is. While a bit on the verbose side, I think that it communicates well what's going on.
</p>
<p>
I was pleasantly surprised that I managed to get execution time down to 300 ms. I'd honestly not expected that when I started.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="fa087e5b49ce4a58936ac782cc44561b">
<div class="comment-author"><a href="https://github.com/anka-213">Andreas Källberg</a> <a href="#fa087e5b49ce4a58936ac782cc44561b">#</a></div>
<div class="comment-content">
<p>
Thanks for a nice blog post! I found the challange interesting, so I have written my own version of the code that both tries to be faster and also remove the redundant solutions, so it only generates two solutions in total. The code is available <a href="https://github.com/anka-213/haskell_toy_experiments/blob/master/HundeSpiel.hs">here</a>. It executes in roughly 8 milliseconds both in ghci and compiled (and takes a second to compile and run using runghc) on my laptop.
</p>
<p>
In order to improve the performance, I start with a blank grid and one-by-one add tiles until it is no longer possible to do so, and then bactrack, kind of like how you would do it by hand. As a tiny bonus, that I haven't actually measured if it makes any practical difference, I also selected the order of filling in the grid so that they can constrain each other as much as possible, by filling 2-by-2 squares as early as possible. I have however calculated the number of boards explored in each of the two variations. With a spiral order, 6852 boards are explored, while with a linear order, 9332 boards are explored.
</p>
<p>
In order to eliminate rotational symmetry, I start by filling the center square and fixing its rotation, rather than trying all rotations for it, since we could view any initial rotation of the center square as equivalent to rotating the whole board. In order to eliminate the identical solutions from the two identical tiles, I changed the encoding to use a number next to the tile to say how many copies are left of it, so when we choose a tile, there is only a single way to choose each tile, even if there are multiple copies of it. Both of these would also in theory make the code slightly faster if the time wasn't already dominated by general IO and other unrelated things.
</p>
<p>
I also added various pretty printing and tracing utilites to the code, so you can see exactly how it executes and which partial solutions it explores.
</p>
</div>
<div class="comment-date">2024-10-16 00:32 UTC</div>
</div>
<div class="comment" id="984fc5acb2314c79b2f2d7ddfacea285">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#984fc5acb2314c79b2f2d7ddfacea285">#</a></div>
<div class="comment-content">
<p>
Thank you for writing. I did try filling the two-by-two square first, as you suggest, but in isolation it makes no discernable difference.
</p>
<p>
I haven't tried your two other optimizations. The one to eliminate rotations should, I guess, reduce the search space to a fourth of mine, unless I'm mistaken. That would reduce my 300 ms to approximately 75 ms.
</p>
<p>
I can't easily guess how much time the other optimization shaves off, but it could be the one that makes the bigger difference.
</p>
</div>
<div class="comment-date">2024-10-19 08:21 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.FSZipper in C#https://blog.ploeh.dk/2024/09/23/fszipper-in-c2024-09-23T06:13:00+00:00Mark Seemann
<div id="post">
<p>
<em>Another functional model of a file system, with code examples in C#.</em>
</p>
<p>
This article is part of <a href="/2024/08/19/zippers">a series about Zippers</a>. In this one, I port the <code>FSZipper</code> data structure from the <a href="https://learnyouahaskell.com/">Learn You a Haskell for Great Good!</a> article <a href="https://learnyouahaskell.com/zippers">Zippers</a>.
</p>
<p>
A word of warning: I'm assuming that you're familiar with the contents of that article, so I'll skip the pedagogical explanations; I can hardly do it better that it's done there. Additionally, I'll make heavy use of certain standard constructs to port <a href="https://www.haskell.org/">Haskell</a> code, most notably <a href="/2018/05/22/church-encoding">Church encoding</a> to model <a href="https://en.wikipedia.org/wiki/Tagged_union">sum types</a> in languages that don't natively have them. Such as C#. In some cases, I'll implement the Church encoding using the data structure's <a href="/2019/04/29/catamorphisms">catamorphism</a>. Since the <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a> of the resulting code is quite low, you may be able to follow what's going on even if you don't know what Church encoding or catamorphisms are, but if you want to understand the background and motivation for that style of programming, you can consult the cited resources.
</p>
<p>
The code shown in this article is <a href="https://github.com/ploeh/CSharpZippers">available on GitHub</a>.
</p>
<h3 id="dd4cbc996cfa4347afa4b9279c95f6e1">
File system item initialization and structure <a href="#dd4cbc996cfa4347afa4b9279c95f6e1">#</a>
</h3>
<p>
If you haven't already noticed, Haskell (and other statically typed functional programming languages like <a href="https://fsharp.org/">F#</a>) makes heavy use of <a href="https://en.wikipedia.org/wiki/Tagged_union">sum types</a>, and the <code>FSZipper</code> example is no exception. It starts with a one-liner to define a file system item, which may be either a file or a folder. In C# we must instead use a class:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FSItem</span></pre>
</p>
<p>
Contrary to the two previous examples, the <code>FSItem</code> class has no generic type parameter. This is because I'm following the Haskell example code as closely as possible, but as I've previously shown, you can <a href="/2019/08/26/functional-file-system">model a file hierarchy with a general-purpose rose tree</a>.
</p>
<p>
Staying consistent with the two previous articles, I'll use Church encoding to model a sum type, and as discussed in <a href="/2024/09/09/a-binary-tree-zipper-in-c">the previous article</a> I use a <code>private</code> implementation for that.
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">IFSItem</span> imp;
<span style="color:blue;">private</span> <span style="color:#2b91af;">FSItem</span>(<span style="color:#2b91af;">IFSItem</span> <span style="font-weight:bold;color:#1f377f;">imp</span>)
{
<span style="color:blue;">this</span>.imp = <span style="font-weight:bold;color:#1f377f;">imp</span>;
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">FSItem</span> <span style="color:#74531f;">CreateFile</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">name</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">data</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">File</span>(<span style="font-weight:bold;color:#1f377f;">name</span>, <span style="font-weight:bold;color:#1f377f;">data</span>));
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">FSItem</span> <span style="color:#74531f;">CreateFolder</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">name</span>, <span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">FSItem</span>> <span style="font-weight:bold;color:#1f377f;">items</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">Folder</span>(<span style="font-weight:bold;color:#1f377f;">name</span>, <span style="font-weight:bold;color:#1f377f;">items</span>));
}</pre>
</p>
<p>
Two <code>static</code> creation methods enable client developers to create a single <code>FSItem</code> object, or an entire tree, like the example from the Haskell code, here ported to C#:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">FSItem</span> myDisk =
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFolder</span>(<span style="color:#a31515;">"root"</span>,
[
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"goat_yelling_like_man.wmv"</span>, <span style="color:#a31515;">"baaaaaa"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"pope_time.avi"</span>, <span style="color:#a31515;">"god bless"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFolder</span>(<span style="color:#a31515;">"pics"</span>,
[
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"ape_throwing_up.jpg"</span>, <span style="color:#a31515;">"bleargh"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"watermelon_smash.gif"</span>, <span style="color:#a31515;">"smash!!"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"skull_man(scary).bmp"</span>, <span style="color:#a31515;">"Yikes!"</span>)
]),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"dijon_poupon.doc"</span>, <span style="color:#a31515;">"best mustard"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFolder</span>(<span style="color:#a31515;">"programs"</span>,
[
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"fartwizard.exe"</span>, <span style="color:#a31515;">"10gotofart"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"owl_bandit.dmg"</span>, <span style="color:#a31515;">"mov eax, h00t"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"not_a_virus.exe"</span>, <span style="color:#a31515;">"really not a virus"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFolder</span>(<span style="color:#a31515;">"source code"</span>,
[
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"best_hs_prog.hs"</span>, <span style="color:#a31515;">"main = print (fix error)"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"random.hs"</span>, <span style="color:#a31515;">"main = print 4"</span>)
])
])
]);</pre>
</p>
<p>
Since the <code>imp</code> class field is just a <code>private</code> implementation detail, a client developer needs a way to query an <code>FSItem</code> object about its contents.
</p>
<h3 id="246063d761a94eb880a079f1d31b817d">
File system item catamorphism <a href="#246063d761a94eb880a079f1d31b817d">#</a>
</h3>
<p>
Just like the previous article, I'll start with the catamorphism. This is essentially the <a href="/2019/08/05/rose-tree-catamorphism">rose tree catamorphism</a>, just less generic, since <code>FSItem</code> doesn't have a generic type parameter.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">TResult</span> <span style="font-weight:bold;color:#74531f;">Aggregate</span><<span style="color:#2b91af;">TResult</span>>(
<span style="color:#2b91af;">Func</span><<span style="color:blue;">string</span>, <span style="color:blue;">string</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenFile</span>,
<span style="color:#2b91af;">Func</span><<span style="color:blue;">string</span>, <span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">TResult</span>>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenFolder</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> imp.<span style="font-weight:bold;color:#74531f;">Aggregate</span>(<span style="font-weight:bold;color:#1f377f;">whenFile</span>, <span style="font-weight:bold;color:#1f377f;">whenFolder</span>);
}</pre>
</p>
<p>
The <code>Aggregate</code> method delegates to its internal implementation class field, which is defined as the <code>private</code> nested interface <code>IFSItem</code>:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IFSItem</span>
{
<span style="color:#2b91af;">TResult</span> <span style="font-weight:bold;color:#74531f;">Aggregate</span><<span style="color:#2b91af;">TResult</span>>(
<span style="color:#2b91af;">Func</span><<span style="color:blue;">string</span>, <span style="color:blue;">string</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenFile</span>,
<span style="color:#2b91af;">Func</span><<span style="color:blue;">string</span>, <span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">TResult</span>>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenFolder</span>);
}</pre>
</p>
<p>
As discussed in the previous article, the interface is hidden away because it's only a vehicle for polymorphism. It's not intended for client developers to be used (although that would be benign) or implemented (which could break <a href="/encapsulation-and-solid">encapsulation</a>). There are only, and should ever only be, two implementations. The one that represents a file is the simplest:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">record</span> <span style="color:#2b91af;">File</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">Name</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">Data</span>) : <span style="color:#2b91af;">IFSItem</span>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">TResult</span> <span style="font-weight:bold;color:#74531f;">Aggregate</span><<span style="color:#2b91af;">TResult</span>>(
<span style="color:#2b91af;">Func</span><<span style="color:blue;">string</span>, <span style="color:blue;">string</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenFile</span>,
<span style="color:#2b91af;">Func</span><<span style="color:blue;">string</span>, <span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">TResult</span>>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenFolder</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">whenFile</span>(Name, Data);
}
}</pre>
</p>
<p>
The <code>File</code> record's <code>Aggregate</code> method unconditionally calls the supplied <code>whenFile</code> function argument with the <code>Name</code> and <code>Data</code> that was originally supplied via its constructor.
</p>
<p>
The <code>Folder</code> implementation is a bit trickier, mostly due to its recursive nature, but also because I wanted it to have structural equality.
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Folder</span> : <span style="color:#2b91af;">IFSItem</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">string</span> name;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">FSItem</span>> items;
<span style="color:blue;">public</span> <span style="color:#2b91af;">Folder</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">Name</span>, <span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">FSItem</span>> <span style="font-weight:bold;color:#1f377f;">Items</span>)
{
name = <span style="font-weight:bold;color:#1f377f;">Name</span>;
items = <span style="font-weight:bold;color:#1f377f;">Items</span>;
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">TResult</span> <span style="font-weight:bold;color:#74531f;">Aggregate</span><<span style="color:#2b91af;">TResult</span>>(
<span style="color:#2b91af;">Func</span><<span style="color:blue;">string</span>, <span style="color:blue;">string</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenFile</span>,
<span style="color:#2b91af;">Func</span><<span style="color:blue;">string</span>, <span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">TResult</span>>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenFolder</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">whenFolder</span>(
name,
items.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">i</span> => <span style="font-weight:bold;color:#1f377f;">i</span>.<span style="font-weight:bold;color:#74531f;">Aggregate</span>(<span style="font-weight:bold;color:#1f377f;">whenFile</span>, <span style="font-weight:bold;color:#1f377f;">whenFolder</span>)).<span style="font-weight:bold;color:#74531f;">ToList</span>());
}
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">Equals</span>(<span style="color:blue;">object</span>? <span style="font-weight:bold;color:#1f377f;">obj</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">obj</span> <span style="color:blue;">is</span> <span style="color:#2b91af;">Folder</span> <span style="font-weight:bold;color:#1f377f;">folder</span> &&
name == <span style="font-weight:bold;color:#1f377f;">folder</span>.name &&
items.<span style="font-weight:bold;color:#74531f;">SequenceEqual</span>(<span style="font-weight:bold;color:#1f377f;">folder</span>.items);
}
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">int</span> <span style="font-weight:bold;color:#74531f;">GetHashCode</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">HashCode</span>.<span style="color:#74531f;">Combine</span>(name, items);
}
}</pre>
</p>
<p>
It, too, unconditionally calls one of the two functions passed to its <code>Aggregate</code> method, but this time <code>whenFolder</code>. It does that, however, by first <em>recursively</em> calling <code>Aggregate</code> within a <code>Select</code> expression. It needs to do that because the <code>whenFolder</code> function expects the subtree to have been already converted to values of the <code>TResult</code> return type. This is a common pattern with catamorphisms, and takes a bit of time getting used to. You can see similar examples in the articles <a href="/2019/06/10/tree-catamorphism">Tree catamorphism</a>, <a href="/2019/08/05/rose-tree-catamorphism">Rose tree catamorphism</a>, <a href="/2019/06/24/full-binary-tree-catamorphism">Full binary tree catamorphism</a>, as well as the previous one in this series.
</p>
<p>
I also had to make <code>Folder</code> a <code>class</code> rather than a <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/builtin-types/record">record</a>, because I wanted the type to have structural equality, and you can't override <a href="https://learn.microsoft.com/dotnet/api/system.object.equals">Equals</a> on records (and if the base class library has any collection type with structural equality, I'm not aware of it).
</p>
<h3 id="761c6a5ede2b4df68985e61f6664822f">
File system item Church encoding <a href="#761c6a5ede2b4df68985e61f6664822f">#</a>
</h3>
<p>
True to the structure of the previous article, the catamorphism doesn't look quite like a Church encoding, but it's possible to define the latter from the former.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">TResult</span> <span style="font-weight:bold;color:#74531f;">Match</span><<span style="color:#2b91af;">TResult</span>>(
<span style="color:#2b91af;">Func</span><<span style="color:blue;">string</span>, <span style="color:blue;">string</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenFile</span>,
<span style="color:#2b91af;">Func</span><<span style="color:blue;">string</span>, <span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">FSItem</span>>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenFolder</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#74531f;">Aggregate</span>(
<span style="font-weight:bold;color:#1f377f;">whenFile</span>: (<span style="font-weight:bold;color:#1f377f;">name</span>, <span style="font-weight:bold;color:#1f377f;">data</span>) =>
(item: <span style="color:#74531f;">CreateFile</span>(<span style="font-weight:bold;color:#1f377f;">name</span>, <span style="font-weight:bold;color:#1f377f;">data</span>), result: <span style="font-weight:bold;color:#1f377f;">whenFile</span>(<span style="font-weight:bold;color:#1f377f;">name</span>, <span style="font-weight:bold;color:#1f377f;">data</span>)),
<span style="font-weight:bold;color:#1f377f;">whenFolder</span>: (<span style="font-weight:bold;color:#1f377f;">name</span>, <span style="font-weight:bold;color:#1f377f;">pairs</span>) =>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">items</span> = <span style="font-weight:bold;color:#1f377f;">pairs</span>.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">i</span> => <span style="font-weight:bold;color:#1f377f;">i</span>.item).<span style="font-weight:bold;color:#74531f;">ToList</span>();
<span style="font-weight:bold;color:#8f08c4;">return</span> (<span style="color:#74531f;">CreateFolder</span>(<span style="font-weight:bold;color:#1f377f;">name</span>, <span style="font-weight:bold;color:#1f377f;">items</span>), <span style="font-weight:bold;color:#1f377f;">whenFolder</span>(<span style="font-weight:bold;color:#1f377f;">name</span>, <span style="font-weight:bold;color:#1f377f;">items</span>));
}).result;
}</pre>
</p>
<p>
The trick is the same as in the previous article: Build up an intermediate tuple that contains both the current <code>item</code> as well as the <code>result</code> being accumulated. Once the <code>Aggregate</code> method returns, the <code>Match</code> method returns only the <code>result</code> part of the resulting tuple.
</p>
<p>
I implemented the <code>whenFolder</code> expression as a code block, because both tuple elements needed the <code>items</code> collection. You can inline the <code>Select</code> expression, but that would cause it to run twice. That's probably a premature optimization, but it also made the code a bit shorter, and, one may hope, a bit more readable.
</p>
<h3 id="c2ffdb1994bc400d99f359ecf4edb312">
Fily system breadcrumb <a href="#c2ffdb1994bc400d99f359ecf4edb312">#</a>
</h3>
<p>
Finally, things seem to be becoming a little easier. The port of <code>FSCrumb</code> is straightforward.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FSCrumb</span>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">FSCrumb</span>(
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">name</span>,
<span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">FSItem</span>> <span style="font-weight:bold;color:#1f377f;">left</span>,
<span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">FSItem</span>> <span style="font-weight:bold;color:#1f377f;">right</span>)
{
Name = <span style="font-weight:bold;color:#1f377f;">name</span>;
Left = <span style="font-weight:bold;color:#1f377f;">left</span>;
Right = <span style="font-weight:bold;color:#1f377f;">right</span>;
}
<span style="color:blue;">public</span> <span style="color:blue;">string</span> Name { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">FSItem</span>> Left { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">FSItem</span>> Right { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">Equals</span>(<span style="color:blue;">object</span>? <span style="font-weight:bold;color:#1f377f;">obj</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">obj</span> <span style="color:blue;">is</span> <span style="color:#2b91af;">FSCrumb</span> <span style="font-weight:bold;color:#1f377f;">crumb</span> &&
Name == <span style="font-weight:bold;color:#1f377f;">crumb</span>.Name &&
Left.<span style="font-weight:bold;color:#74531f;">SequenceEqual</span>(<span style="font-weight:bold;color:#1f377f;">crumb</span>.Left) &&
Right.<span style="font-weight:bold;color:#74531f;">SequenceEqual</span>(<span style="font-weight:bold;color:#1f377f;">crumb</span>.Right);
}
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">int</span> <span style="font-weight:bold;color:#74531f;">GetHashCode</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">HashCode</span>.<span style="color:#74531f;">Combine</span>(Name, Left, Right);
}
}</pre>
</p>
<p>
The only reason this isn't a <code>record</code> is, once again, that I want to override <code>Equals</code> so that the type can have structural equality. <a href="https://visualstudio.microsoft.com/">Visual Studio</a> wants me to convert to a <a href="https://learn.microsoft.com/dotnet/csharp/programming-guide/classes-and-structs/instance-constructors">primary constructor</a>. That would simplify the code a bit, but actually not that much.
</p>
<p>
(I'm still somewhat conservative in my choice of new C# language features. Not that I have anything against primary constructors which, after all, F# has had forever. The reason I'm holding back is for didactic reasons. Not every reader is on the latest language version, and some readers may be using another programming language entirely. On the other hand, primary constructors seem natural and intuitive, so I may start using them here on the blog as well. I don't think that they're going to be much of a barrier to understanding.)
</p>
<p>
Now that we have both the data type we want to zip, as well as the breadcrumb type we need, we can proceed to add the Zipper.
</p>
<h3 id="bb452627a3c3420a95a412dd33ad0efa">
File system Zipper <a href="#bb452627a3c3420a95a412dd33ad0efa">#</a>
</h3>
<p>
The <code>FSZipper</code> C# class fills the position of the eponymous Haskell type alias. Data structure and initialization is straightforward.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FSZipper</span>
{
<span style="color:blue;">private</span> <span style="color:#2b91af;">FSZipper</span>(<span style="color:#2b91af;">FSItem</span> <span style="font-weight:bold;color:#1f377f;">fSItem</span>, <span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">FSCrumb</span>> <span style="font-weight:bold;color:#1f377f;">breadcrumbs</span>)
{
FSItem = <span style="font-weight:bold;color:#1f377f;">fSItem</span>;
Breadcrumbs = <span style="font-weight:bold;color:#1f377f;">breadcrumbs</span>;
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">FSZipper</span>(<span style="color:#2b91af;">FSItem</span> <span style="font-weight:bold;color:#1f377f;">fSItem</span>) : <span style="color:blue;">this</span>(<span style="font-weight:bold;color:#1f377f;">fSItem</span>, [])
{
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">FSItem</span> FSItem { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">FSCrumb</span>> Breadcrumbs { <span style="color:blue;">get</span>; }
<span style="color:green;">// Methods follow here...</span></pre>
</p>
<p>
True to the style I've already established, I've made the master constructor <code>private</code> in order to highlight that the <code>Breadcrumbs</code> are the responsibility of the <code>FSZipper</code> class itself. It's not something client code need worry about.
</p>
<h3 id="5dd35bee62764a3fb455c406f1a63754">
Going down <a href="#5dd35bee62764a3fb455c406f1a63754">#</a>
</h3>
<p>
The Haskell Zippers article introduces <code>fsUp</code> before <code>fsTo</code>, but if we want to see some example code, we need to navigate <em>to</em> somewhere before we can navigate up. Thus, I'll instead start with the function that navigates to a child node.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">FSZipper</span>? <span style="font-weight:bold;color:#74531f;">GoTo</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">name</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> FSItem.<span style="font-weight:bold;color:#74531f;">Match</span>(
(<span style="font-weight:bold;color:#1f377f;">_</span>, <span style="font-weight:bold;color:#1f377f;">_</span>) => <span style="color:blue;">null</span>,
(<span style="font-weight:bold;color:#1f377f;">folderName</span>, <span style="font-weight:bold;color:#1f377f;">items</span>) =>
{
<span style="color:#2b91af;">FSItem</span>? <span style="font-weight:bold;color:#1f377f;">item</span> = <span style="color:blue;">null</span>;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">ls</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">List</span><<span style="color:#2b91af;">FSItem</span>>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">rs</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">List</span><<span style="color:#2b91af;">FSItem</span>>();
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">i</span> <span style="font-weight:bold;color:#8f08c4;">in</span> <span style="font-weight:bold;color:#1f377f;">items</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">item</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span> && <span style="font-weight:bold;color:#1f377f;">i</span>.<span style="font-weight:bold;color:#74531f;">IsNamed</span>(<span style="font-weight:bold;color:#1f377f;">name</span>))
<span style="font-weight:bold;color:#1f377f;">item</span> = <span style="font-weight:bold;color:#1f377f;">i</span>;
<span style="font-weight:bold;color:#8f08c4;">else</span> <span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">item</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#1f377f;">ls</span>.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="font-weight:bold;color:#1f377f;">i</span>);
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#1f377f;">rs</span>.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="font-weight:bold;color:#1f377f;">i</span>);
}
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">item</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">FSZipper</span>(
<span style="font-weight:bold;color:#1f377f;">item</span>,
Breadcrumbs.<span style="font-weight:bold;color:#74531f;">Prepend</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">FSCrumb</span>(<span style="font-weight:bold;color:#1f377f;">folderName</span>, <span style="font-weight:bold;color:#1f377f;">ls</span>, <span style="font-weight:bold;color:#1f377f;">rs</span>)).<span style="font-weight:bold;color:#74531f;">ToList</span>());
});
}</pre>
</p>
<p>
This is by far the most complicated navigation we've seen so far, and I've even taken the liberty of writing an imperative implementation. It's not that I don't know how I could implement it in a purely functional fashion, but I've chosen this implementation for a couple of reasons. The first of which is that, frankly, it was easier this way.
</p>
<p>
This stems from the second reason: That the .NET base class library, as far as I know, offers no functionality like Haskell's <a href="https://hackage.haskell.org/package/base/docs/Data-List.html#v:break">break</a> function. I could have written such a function myself, but felt that it was too much of a digression, even for me. Maybe I'll do that another day. It might make for <a href="/2020/01/13/on-doing-katas">a nice little exercise</a>.
</p>
<p>
The third reason is that <a href="/2011/10/11/CheckingforexactlyoneiteminasequenceusingCandF">C# doesn't afford pattern matching on sequences</a>, in the shape of destructuring the head and the tail of a list. (Not that I know of, anyway, but that language changes rapidly at the moment, and it does have <em>some</em> pattern-matching features now.) This means that I have to check <code>item</code> for <code>null</code> anyway.
</p>
<p>
In any case, while the implementation is imperative, an external caller can't tell. The <code>GoTo</code> method is still <a href="https://en.wikipedia.org/wiki/Referential_transparency">referentially transparent</a>. Which means that <a href="/2021/07/28/referential-transparency-fits-in-your-head">it fits in your head</a>.
</p>
<p>
You may have noticed that the implementation calls <code>IsNamed</code>, which is also new.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">IsNamed</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">name</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#74531f;">Match</span>((<span style="font-weight:bold;color:#1f377f;">n</span>, <span style="font-weight:bold;color:#1f377f;">_</span>) => <span style="font-weight:bold;color:#1f377f;">n</span> == <span style="font-weight:bold;color:#1f377f;">name</span>, (<span style="font-weight:bold;color:#1f377f;">n</span>, <span style="font-weight:bold;color:#1f377f;">_</span>) => <span style="font-weight:bold;color:#1f377f;">n</span> == <span style="font-weight:bold;color:#1f377f;">name</span>);
}</pre>
</p>
<p>
This is an instance method I added to <code>FSItem</code>.
</p>
<p>
In summary, the <code>GoTo</code> method enables client code to navigate down in the file hierarchy, as this unit test demonstrates:
</p>
<p>
<pre>[<span style="color:#2b91af;">Fact</span>]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">GoToSkullMan</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">FSZipper</span>(myDisk);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = <span style="font-weight:bold;color:#1f377f;">sut</span>.<span style="font-weight:bold;color:#74531f;">GoTo</span>(<span style="color:#a31515;">"pics"</span>)?.<span style="font-weight:bold;color:#74531f;">GoTo</span>(<span style="color:#a31515;">"skull_man(scary).bmp"</span>);
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">NotNull</span>(<span style="font-weight:bold;color:#1f377f;">actual</span>);
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Equal</span>(
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"skull_man(scary).bmp"</span>, <span style="color:#a31515;">"Yikes!"</span>),
<span style="font-weight:bold;color:#1f377f;">actual</span>.FSItem);
}</pre>
</p>
<p>
The example is elementary. First go to the <code>pics</code> folder, and from there to the <code>skull_man(scary).bmp</code>.
</p>
<h3 id="be9c842baa2c4cbb8afc50fdb9ea13c7">
Going up <a href="#be9c842baa2c4cbb8afc50fdb9ea13c7">#</a>
</h3>
<p>
Going back up the hierarchy isn't as complicated.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">FSZipper</span>? <span style="font-weight:bold;color:#74531f;">GoUp</span>()
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (Breadcrumbs.Count == 0)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">head</span> = Breadcrumbs.<span style="font-weight:bold;color:#74531f;">First</span>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">tail</span> = Breadcrumbs.<span style="font-weight:bold;color:#74531f;">Skip</span>(1);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">FSZipper</span>(
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFolder</span>(<span style="font-weight:bold;color:#1f377f;">head</span>.Name, [.. <span style="font-weight:bold;color:#1f377f;">head</span>.Left, FSItem, .. <span style="font-weight:bold;color:#1f377f;">head</span>.Right]),
<span style="font-weight:bold;color:#1f377f;">tail</span>.<span style="font-weight:bold;color:#74531f;">ToList</span>());
}</pre>
</p>
<p>
If the <code>Breadcrumbs</code> collection is empty, we're already at the root, in which case we can't go further up. In that case, the <code>GoUp</code> method returns <code>null</code>, as does the <code>GoTo</code> method if it can't find an item with the desired name. This possibility is explicitly indicated by the <code><span style="color:#2b91af;">FSZipper</span>?</code> return type; notice the question mark, <a href="https://learn.microsoft.com/dotnet/csharp/nullable-references">which indicates that the value may be null</a>. If you're working in a context or language where that feature isn't available, you may instead consider taking advantage of the <a href="/2022/04/25/the-maybe-monad">Maybe monad</a> (which is also what you'd <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatically</a> do in Haskell).
</p>
<p>
If <code>Breadcrumbs</code> is <em>not</em> empty, it means that there's a place to go up to. It also implies that the previous operation navigated down, and the only way that's possible is if the previous node was a folder. Thus, the <code>GoUp</code> method knows that it needs to reconstitute a folder, and from the <code>head</code> breadcrumb, it knows that folder's name, and what was originally to the <code>Left</code> and <code>Right</code> of the Zipper's <code>FSItem</code> property.
</p>
<p>
This unit test demonstrates how client code may use the <code>GoUp</code> method:
</p>
<p>
<pre>[<span style="color:#2b91af;">Fact</span>]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">GoUpFromSkullMan</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">FSZipper</span>(myDisk);
<span style="color:green;">// This is the same as the GoToSkullMan test</span>
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">newFocus</span> = <span style="font-weight:bold;color:#1f377f;">sut</span>.<span style="font-weight:bold;color:#74531f;">GoTo</span>(<span style="color:#a31515;">"pics"</span>)?.<span style="font-weight:bold;color:#74531f;">GoTo</span>(<span style="color:#a31515;">"skull_man(scary).bmp"</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = <span style="font-weight:bold;color:#1f377f;">newFocus</span>?.<span style="font-weight:bold;color:#74531f;">GoUp</span>()?.<span style="font-weight:bold;color:#74531f;">GoTo</span>(<span style="color:#a31515;">"watermelon_smash.gif"</span>);
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">NotNull</span>(<span style="font-weight:bold;color:#1f377f;">actual</span>);
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Equal</span>(
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"watermelon_smash.gif"</span>, <span style="color:#a31515;">"smash!!"</span>),
<span style="font-weight:bold;color:#1f377f;">actual</span>.FSItem);
}</pre>
</p>
<p>
This test first repeats the navigation also performed by the other test, then uses <code>GoUp</code> to go one level up, which finally enables it to navigate to the <code>watermelon_smash.gif</code> file.
</p>
<h3 id="7c96d9a847f04adfb660973e66246d13">
Renaming a file or folder <a href="#7c96d9a847f04adfb660973e66246d13">#</a>
</h3>
<p>
A Zipper enables you to navigate a data structure, but you can also use it to modify the element in focus. One option is to rename a file or folder.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">FSZipper</span> <span style="font-weight:bold;color:#74531f;">Rename</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">newName</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">FSZipper</span>(
FSItem.<span style="font-weight:bold;color:#74531f;">Match</span>(
(<span style="font-weight:bold;color:#1f377f;">_</span>, <span style="font-weight:bold;color:#1f377f;">dat</span>) => <span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="font-weight:bold;color:#1f377f;">newName</span>, <span style="font-weight:bold;color:#1f377f;">dat</span>),
(<span style="font-weight:bold;color:#1f377f;">_</span>, <span style="font-weight:bold;color:#1f377f;">items</span>) => <span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFolder</span>(<span style="font-weight:bold;color:#1f377f;">newName</span>, <span style="font-weight:bold;color:#1f377f;">items</span>)),
Breadcrumbs);
}</pre>
</p>
<p>
The <code>Rename</code> method 'pattern-matches' on the 'current' <code>FSItem</code> and in both cases creates a new file or folder with the new name. Since it doesn't need the old name for anything, it uses the wildcard pattern to ignore that value. This operation is always possible, so the return type is <code>FSZipper</code>, without a question mark, indicating that the method never returns <code>null</code>.
</p>
<p>
The following unit test replicates the Haskell article's example by renaming the <code>pics</code> folder to <code>cspi</code>.
</p>
<p>
<pre>[<span style="color:#2b91af;">Fact</span>]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RenamePics</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">FSZipper</span>(myDisk);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = <span style="font-weight:bold;color:#1f377f;">sut</span>.<span style="font-weight:bold;color:#74531f;">GoTo</span>(<span style="color:#a31515;">"pics"</span>)?.<span style="font-weight:bold;color:#74531f;">Rename</span>(<span style="color:#a31515;">"cspi"</span>).<span style="font-weight:bold;color:#74531f;">GoUp</span>();
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">NotNull</span>(<span style="font-weight:bold;color:#1f377f;">actual</span>);
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Empty</span>(<span style="font-weight:bold;color:#1f377f;">actual</span>.Breadcrumbs);
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Equal</span>(
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFolder</span>(<span style="color:#a31515;">"root"</span>,
[
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"goat_yelling_like_man.wmv"</span>, <span style="color:#a31515;">"baaaaaa"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"pope_time.avi"</span>, <span style="color:#a31515;">"god bless"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFolder</span>(<span style="color:#a31515;">"cspi"</span>,
[
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"ape_throwing_up.jpg"</span>, <span style="color:#a31515;">"bleargh"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"watermelon_smash.gif"</span>, <span style="color:#a31515;">"smash!!"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"skull_man(scary).bmp"</span>, <span style="color:#a31515;">"Yikes!"</span>)
]),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"dijon_poupon.doc"</span>, <span style="color:#a31515;">"best mustard"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFolder</span>(<span style="color:#a31515;">"programs"</span>,
[
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"fartwizard.exe"</span>, <span style="color:#a31515;">"10gotofart"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"owl_bandit.dmg"</span>, <span style="color:#a31515;">"mov eax, h00t"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"not_a_virus.exe"</span>, <span style="color:#a31515;">"really not a virus"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFolder</span>(<span style="color:#a31515;">"source code"</span>,
[
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"best_hs_prog.hs"</span>, <span style="color:#a31515;">"main = print (fix error)"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"random.hs"</span>, <span style="color:#a31515;">"main = print 4"</span>)
])
])
]),
<span style="font-weight:bold;color:#1f377f;">actual</span>.FSItem);
}</pre>
</p>
<p>
Since the test uses <code>GoUp</code> after <code>Rename</code>, the <code>actual</code> value contains the entire tree, while the <code>Breadcrumbs</code> collection is empty.
</p>
<h3 id="827bcbd5632844fa97b2a92e8beb17cf">
Adding a new file <a href="#827bcbd5632844fa97b2a92e8beb17cf">#</a>
</h3>
<p>
Finally, we can add a new file to a folder.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">FSZipper</span>? <span style="font-weight:bold;color:#74531f;">Add</span>(<span style="color:#2b91af;">FSItem</span> <span style="font-weight:bold;color:#1f377f;">item</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> FSItem.<span style="font-weight:bold;color:#74531f;">Match</span><<span style="color:#2b91af;">FSZipper</span>?>(
<span style="font-weight:bold;color:#1f377f;">whenFile</span>: (<span style="font-weight:bold;color:#1f377f;">_</span>, <span style="font-weight:bold;color:#1f377f;">_</span>) => <span style="color:blue;">null</span>,
<span style="font-weight:bold;color:#1f377f;">whenFolder</span>: (<span style="font-weight:bold;color:#1f377f;">name</span>, <span style="font-weight:bold;color:#1f377f;">items</span>) => <span style="color:blue;">new</span> <span style="color:#2b91af;">FSZipper</span>(
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFolder</span>(<span style="font-weight:bold;color:#1f377f;">name</span>, <span style="font-weight:bold;color:#1f377f;">items</span>.<span style="font-weight:bold;color:#74531f;">Prepend</span>(<span style="font-weight:bold;color:#1f377f;">item</span>).<span style="font-weight:bold;color:#74531f;">ToList</span>()),
Breadcrumbs));
}</pre>
</p>
<p>
This operation may fail, since we can't add a file to a file. This is, again, clearly indicated by the return type, which allows <code>null</code>.
</p>
<p>
This implementation adds the file to the start of the folder, but it would also be possible to add it at the end. I would consider that slightly more idiomatic in C#, but here I've followed the Haskell example code, which conses the new <code>item</code> to the beginning of the list. As is idiomatic in Haskell.
</p>
<p>
The following unit test reproduces the Haskell article's example.
</p>
<p>
<pre>[<span style="color:#2b91af;">Fact</span>]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">AddPic</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">FSZipper</span>(myDisk);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = <span style="font-weight:bold;color:#1f377f;">sut</span>.<span style="font-weight:bold;color:#74531f;">GoTo</span>(<span style="color:#a31515;">"pics"</span>)?.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"heh.jpg"</span>, <span style="color:#a31515;">"lol"</span>))?.<span style="font-weight:bold;color:#74531f;">GoUp</span>();
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">NotNull</span>(<span style="font-weight:bold;color:#1f377f;">actual</span>);
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Equal</span>(
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFolder</span>(<span style="color:#a31515;">"root"</span>,
[
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"goat_yelling_like_man.wmv"</span>, <span style="color:#a31515;">"baaaaaa"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"pope_time.avi"</span>, <span style="color:#a31515;">"god bless"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFolder</span>(<span style="color:#a31515;">"pics"</span>,
[
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"heh.jpg"</span>, <span style="color:#a31515;">"lol"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"ape_throwing_up.jpg"</span>, <span style="color:#a31515;">"bleargh"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"watermelon_smash.gif"</span>, <span style="color:#a31515;">"smash!!"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"skull_man(scary).bmp"</span>, <span style="color:#a31515;">"Yikes!"</span>)
]),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"dijon_poupon.doc"</span>, <span style="color:#a31515;">"best mustard"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFolder</span>(<span style="color:#a31515;">"programs"</span>,
[
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"fartwizard.exe"</span>, <span style="color:#a31515;">"10gotofart"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"owl_bandit.dmg"</span>, <span style="color:#a31515;">"mov eax, h00t"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"not_a_virus.exe"</span>, <span style="color:#a31515;">"really not a virus"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFolder</span>(<span style="color:#a31515;">"source code"</span>,
[
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"best_hs_prog.hs"</span>, <span style="color:#a31515;">"main = print (fix error)"</span>),
<span style="color:#2b91af;">FSItem</span>.<span style="color:#74531f;">CreateFile</span>(<span style="color:#a31515;">"random.hs"</span>, <span style="color:#a31515;">"main = print 4"</span>)
])
])
]),
<span style="font-weight:bold;color:#1f377f;">actual</span>.FSItem);
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Empty</span>(<span style="font-weight:bold;color:#1f377f;">actual</span>.Breadcrumbs);
}</pre>
</p>
<p>
This example also follows the edit with a <code>GoUp</code> call, with the effect that the Zipper is once more focused on the entire tree. The assertion verifies that the new <code>heh.jpg</code> file is the first file in the <code>pics</code> folder.
</p>
<h3 id="18720e9e88d94384921a2b664b4e0a7a">
Conclusion <a href="#18720e9e88d94384921a2b664b4e0a7a">#</a>
</h3>
<p>
The code for <code>FSZipper</code> is actually a bit simpler than for the binary tree. This, I think, is mostly attributable to the <code>FSZipper</code> having fewer constituent sum types. While sum types are trivial, and extraordinarily useful in languages that natively support them, they require a lot of boilerplate in a language like C#.
</p>
<p>
Do you need something like <code>FSZipper</code> in C#? Probably not. As I've already discussed, this article series mostly exists as a programming exercise.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Functor productshttps://blog.ploeh.dk/2024/09/16/functor-products2024-09-16T06:08:00+00:00Mark Seemann
<div id="post">
<p>
<em>A tuple or class of functors is also a functor. An article for object-oriented developers.</em>
</p>
<p>
This article is part of <a href="/2022/07/11/functor-relationships">a series of articles about functor relationships</a>. In this one you'll learn about a universal composition of <a href="/2018/03/22/functors">functors</a>. In short, if you have a <a href="https://en.wikipedia.org/wiki/Product_type">product type</a> of functors, that data structure itself gives rise to a functor.
</p>
<p>
Together with other articles in this series, this result can help you answer questions such as: <em>Does this data structure form a functor?</em>
</p>
<p>
Since functors tend to be quite common, and since they're useful enough that many programming languages have special support or syntax for them, the ability to recognize a potential functor can be useful. Given a type like <code>Foo<T></code> (C# syntax) or <code>Bar<T1, T2></code>, being able to recognize it as a functor can come in handy. One scenario is if you yourself have just defined such a data type. Recognizing that it's a functor strongly suggests that you should give it a <code>Select</code> method in C#, a <code>map</code> function in <a href="https://fsharp.org/">F#</a>, and so on.
</p>
<p>
Not all generic types give rise to a (covariant) functor. Some are rather <a href="/2021/09/02/contravariant-functors">contravariant functors</a>, and some are <a href="/2022/08/01/invariant-functors">invariant</a>.
</p>
<p>
If, on the other hand, you have a data type which is a product of two or more (covariant) functors <em>with the same type parameter</em>, then the data type itself gives rise to a functor. You'll see some examples in this article.
</p>
<h3 id="9fc25288b4504ff3b4fabe932ecf2ea2">
Abstract shape <a href="#9fc25288b4504ff3b4fabe932ecf2ea2">#</a>
</h3>
<p>
Before we look at some examples found in other code, it helps if we know what we're looking for. Most (if not all?) languages support product types. In canonical form, they're just tuples of values, but in an object-oriented language like C#, such types are typically classes.
</p>
<p>
Imagine that you have two functors <code>F</code> and <code>G</code>, and you're now considering a data structure that contains a value of both types.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FAndG</span><<span style="color:#2b91af;">T</span>>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">FAndG</span>(<span style="color:#2b91af;">F</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">f</span>, <span style="color:#2b91af;">G</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">g</span>)
{
F = <span style="font-weight:bold;color:#1f377f;">f</span>;
G = <span style="font-weight:bold;color:#1f377f;">g</span>;
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">F</span><<span style="color:#2b91af;">T</span>> F { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:#2b91af;">G</span><<span style="color:#2b91af;">T</span>> G { <span style="color:blue;">get</span>; }
<span style="color:green;">// Methods go here...</span></pre>
</p>
<p>
The name of the type is <code><span style="color:#2b91af;">FAndG</span><<span style="color:#2b91af;">T</span>></code> because it contains both an <code><span style="color:#2b91af;">F</span><<span style="color:#2b91af;">T</span>></code> object and a <code><span style="color:#2b91af;">G</span><<span style="color:#2b91af;">T</span>></code> object.
</p>
<p>
Notice that it's an essential requirement that the individual functors (here <code>F</code> and <code>G</code>) are parametrized by the same type parameter (here <code>T</code>). If your data structure contains <code><span style="color:#2b91af;">F</span><<span style="color:#2b91af;">T1</span>></code> and <code><span style="color:#2b91af;">G</span><<span style="color:#2b91af;">T2</span>></code>, the following 'theorem' doesn't apply.
</p>
<p>
The point of this article is that such an <code><span style="color:#2b91af;">FAndG</span><<span style="color:#2b91af;">T</span>></code> data structure forms a functor. The <code>Select</code> implementation is quite unsurprising:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">FAndG</span><<span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#74531f;">Select</span><<span style="color:#2b91af;">TResult</span>>(<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">selector</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">FAndG</span><<span style="color:#2b91af;">TResult</span>>(F.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>), G.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>));
}</pre>
</p>
<p>
Since we've assumed that both <code>F</code> and <code>G</code> already are functors, they must come with some projection function. In C# it's <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatically</a> called <code>Select</code>, while in F# it'd typically be called <code>map</code>:
</p>
<p>
<pre><span style="color:green;">// ('a -> 'b) -> FAndG<'a> -> FAndG<'b></span>
<span style="color:blue;">let</span> <span style="color:#74531f;">map</span> <span style="color:#74531f;">f</span> <span style="font-weight:bold;color:#1f377f;">fandg</span> = { F = <span style="color:#2b91af;">F</span>.<span style="color:#74531f;">map</span> <span style="color:#74531f;">f</span> <span style="font-weight:bold;color:#1f377f;">fandg</span>.F; G = <span style="color:#2b91af;">G</span>.<span style="color:#74531f;">map</span> <span style="color:#74531f;">f</span> <span style="font-weight:bold;color:#1f377f;">fandg</span>.G }</pre>
</p>
<p>
assuming a record type like
</p>
<p>
<pre><span style="color:blue;">type</span> <span style="color:#2b91af;">FAndG</span><<span style="color:#2b91af;">'a</span>> = { F : <span style="color:#2b91af;">F</span><<span style="color:#2b91af;">'a</span>>; G : <span style="color:#2b91af;">G</span><<span style="color:#2b91af;">'a</span>> }</pre>
</p>
<p>
In both the C# <code>Select</code> example and the F# <code>map</code> function, the composed functor passes the function argument (<code>selector</code> or <code>f</code>) to both <code>F</code> and <code>G</code> and uses it to map both constituents. It then composes a new product from these individual results.
</p>
<p>
I'll have more to say about how this generalizes to a product of more than two functors, but first, let's consider some examples.
</p>
<h3 id="e3b18df7ac4440d7aada000ce27044f3">
List Zipper <a href="#e3b18df7ac4440d7aada000ce27044f3">#</a>
</h3>
<p>
One of the simplest example I can think of is a List Zipper, which <a href="https://learnyouahaskell.com/zippers">in Haskell</a> is nothing but a type alias of a tuple of lists:
</p>
<p>
<pre><span style="color:blue;">type</span> ListZipper a = ([a],[a])</pre>
</p>
<p>
In the article <a href="/2024/08/26/a-list-zipper-in-c">A List Zipper in C#</a> you saw how the <code><span style="color:#2b91af;">ListZipper</span><<span style="color:#2b91af;">T</span>></code> class composes two <code><span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">T</span>></code> objects.
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">T</span>> values;
<span style="color:blue;">public</span> <span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">T</span>> Breadcrumbs { <span style="color:blue;">get</span>; }
<span style="color:blue;">private</span> <span style="color:#2b91af;">ListZipper</span>(<span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">values</span>, <span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">breadcrumbs</span>)
{
<span style="color:blue;">this</span>.values = <span style="font-weight:bold;color:#1f377f;">values</span>;
Breadcrumbs = <span style="font-weight:bold;color:#1f377f;">breadcrumbs</span>;
}</pre>
</p>
<p>
Since we already know that sequences like <code><span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">T</span>></code> form functors, we now know that so must <code><span style="color:#2b91af;">ListZipper</span><<span style="color:#2b91af;">T</span>></code>. And indeed, the <code>Select</code> implementation looks similar to the above 'shape outline'.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">ListZipper</span><<span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#74531f;">Select</span><<span style="color:#2b91af;">TResult</span>>(<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">selector</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">ListZipper</span><<span style="color:#2b91af;">TResult</span>>(values.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>), Breadcrumbs.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>));
}</pre>
</p>
<p>
It passes the <code>selector</code> function to the <code>Select</code> method of both <code>values</code> and <code>Breadcrumbs</code>, and composes the results into a <code><span style="color:blue;">new</span> <span style="color:#2b91af;">ListZipper</span><<span style="color:#2b91af;">TResult</span>></code>.
</p>
<p>
While this example is straightforward, it may not be the most compelling, because <code><span style="color:#2b91af;">ListZipper</span><<span style="color:#2b91af;">T</span>></code> composes two identical functors: <code><span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">T</span>></code>. The knowledge that functors compose is more general than that.
</p>
<h3 id="051a4cc14ad74c2ca2f62fd12051f97c">
Non-empty collection <a href="#051a4cc14ad74c2ca2f62fd12051f97c">#</a>
</h3>
<p>
Next after the above List Zipper, the simplest example I can think of is a non-empty list. On this blog I originally introduced it in the article <a href="/2017/12/11/semigroups-accumulate">Semigroups accumulate</a>, but here I'll use the variant from <a href="/2023/08/07/nonempty-catamorphism">NonEmpty catamorphism</a>. It composes a single value of the type <code>T</code> with an <code><span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">T</span>></code>.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">NonEmptyCollection</span>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">head</span>, <span style="color:blue;">params</span> <span style="color:#2b91af;">T</span>[] <span style="font-weight:bold;color:#1f377f;">tail</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">head</span> == <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">ArgumentNullException</span>(<span style="color:blue;">nameof</span>(<span style="font-weight:bold;color:#1f377f;">head</span>));
<span style="color:blue;">this</span>.Head = <span style="font-weight:bold;color:#1f377f;">head</span>;
<span style="color:blue;">this</span>.Tail = <span style="font-weight:bold;color:#1f377f;">tail</span>;
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">T</span> Head { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">T</span>> Tail { <span style="color:blue;">get</span>; }</pre>
</p>
<p>
The <code>Tail</code>, being an <code><span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">T</span>></code>, easily forms a functor, since it's a kind of list. But what about <code>Head</code>, which is a 'naked' <code>T</code> value? Does that form a functor? If so, which one?
</p>
<p>
Indeed, a 'naked' <code>T</code> value is isomorphic to <a href="/2018/09/03/the-identity-functor">the Identity functor</a>. This situation is an example of how knowing about the Identity functor is useful, even if you never actually write code that uses it. Once you realize that <code>T</code> is equivalent with a functor, you've now established that <code><span style="color:#2b91af;">NonEmptyCollection</span><<span style="color:#2b91af;">T</span>></code> composes two functors. Therefore, it must itself form a functor, and you realize that you can give it a <code>Select</code> method.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">NonEmptyCollection</span><<span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#74531f;">Select</span><<span style="color:#2b91af;">TResult</span>>(<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">selector</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">NonEmptyCollection</span><<span style="color:#2b91af;">TResult</span>>(<span style="font-weight:bold;color:#1f377f;">selector</span>(Head), Tail.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>).<span style="font-weight:bold;color:#74531f;">ToArray</span>());
}</pre>
</p>
<p>
Notice that even though we understand that <code>T</code> is equivalent to the Identity functor, there's no reason to actually wrap <code>Head</code> in an <code><span style="color:#2b91af;">Identity</span><<span style="color:#2b91af;">T</span>></code> <a href="https://bartoszmilewski.com/2014/01/14/functors-are-containers/">container</a> just to call <code>Select</code> on it and unwrap the result. Rather, the above <code>Select</code> implementation directly invokes <code>selector</code> with <code>Head</code>. It is, after all, a function that takes a <code>T</code> value as input and returns a <code>TResult</code> object as output.
</p>
<h3 id="d721c5ba6eda4016be1417ea01105bea">
Ranges <a href="#d721c5ba6eda4016be1417ea01105bea">#</a>
</h3>
<p>
It's hard to come up with an example that's both somewhat compelling and realistic, and at the same time prototypically pure. Stripped of all 'noise' functor products are just tuples, but that hardly makes for a compelling example. On the other hand, most other examples I can think of combine results about functors where they compose in more than one way. Not only as products, but also as <a href="/2024/10/14/functor-sums">sums of functors</a>, as well as nested compositions. You'll be able to read about these in future articles, but for the next examples, you'll have to accept some claims about functors at face value.
</p>
<p>
In <a href="/2024/02/12/range-as-a-functor">Range as a functor</a> you saw how both <code><span style="color:#2b91af;">Endpoint</span><<span style="color:#2b91af;">T</span>></code> and <code><span style="color:#2b91af;">Range</span><<span style="color:#2b91af;">T</span>></code> are functors. The article shows functor implementations for each, in both C#, F#, and <a href="https://www.haskell.org/">Haskell</a>. For now we'll ignore the deeper underlying reason why <code><span style="color:#2b91af;">Endpoint</span><<span style="color:#2b91af;">T</span>></code> forms a functor, and instead focus on <code><span style="color:#2b91af;">Range</span><<span style="color:#2b91af;">T</span>></code>.
</p>
<p>
In Haskell I never defined an explicit <code>Range</code> type, but rather just treated ranges as tuples. As stated repeatedly already, tuples are the essential products, so if you accept that <code>Endpoint</code> gives rise to a functor, then a 'range tuple' does, too.
</p>
<p>
In F# <code>Range</code> is defined like this:
</p>
<p>
<pre><span style="color:blue;">type</span> Range<'a> = { LowerBound : Endpoint<'a>; UpperBound : Endpoint<'a> }</pre>
</p>
<p>
Such a record type is also easily identified as a product type. In a sense, we can think of a record type as a 'tuple with metadata', where the metadata contains <em>names</em> of elements.
</p>
<p>
In C# <code><span style="color:#2b91af;">Range</span><<span style="color:#2b91af;">T</span>></code> is a class with two <code><span style="color:#2b91af;">Endpoint</span><<span style="color:#2b91af;">T</span>></code> fields.
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">Endpoint</span><<span style="color:#2b91af;">T</span>> min;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">Endpoint</span><<span style="color:#2b91af;">T</span>> max;
<span style="color:blue;">public</span> <span style="color:#2b91af;">Range</span>(<span style="color:#2b91af;">Endpoint</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">min</span>, <span style="color:#2b91af;">Endpoint</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">max</span>)
{
<span style="color:blue;">this</span>.min = <span style="font-weight:bold;color:#1f377f;">min</span>;
<span style="color:blue;">this</span>.max = <span style="font-weight:bold;color:#1f377f;">max</span>;
}</pre>
</p>
<p>
In a sense, you can think of such an immutable class as equivalent to a record type, only requiring substantial <a href="/2019/12/16/zone-of-ceremony">ceremony</a>. The point is that because a range is a product of two functors, it itself gives rise to a functor. You can see all the implementations in <a href="/2024/02/12/range-as-a-functor">Range as a functor</a>.
</p>
<h3 id="25e4dea36f644217ba1e28f2a509f3ab">
Binary tree Zipper <a href="#25e4dea36f644217ba1e28f2a509f3ab">#</a>
</h3>
<p>
In <a href="/2024/09/09/a-binary-tree-zipper-in-c">A Binary Tree Zipper in C#</a> you saw that the <code><span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">T</span>></code> class has two class fields:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>> Tree { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">Crumb</span><<span style="color:#2b91af;">T</span>>> Breadcrumbs { <span style="color:blue;">get</span>; }</pre>
</p>
<p>
Both have the same generic type parameter <code>T</code>, so the question is whether <code><span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">T</span>></code> may form a functor? We now know that the answer is affirmative if <code><span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>></code> and <code><span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">Crumb</span><<span style="color:#2b91af;">T</span>>></code> are both functors.
</p>
<p>
For now, believe me when I claim that this is the case. This means that you can add a <code>Select</code> method to the class:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#74531f;">Select</span><<span style="color:#2b91af;">TResult</span>>(<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">selector</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">TResult</span>>(
Tree.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>),
Breadcrumbs.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">c</span> => <span style="font-weight:bold;color:#1f377f;">c</span>.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>)));
}</pre>
</p>
<p>
By now, this should hardly be surprising: Call <code>Select</code> on each constituent functor and create a proper return value from the results.
</p>
<h3 id="1179fc33850d430780c843583c16adcb">
Higher arities <a href="#1179fc33850d430780c843583c16adcb">#</a>
</h3>
<p>
All examples have involved products of only two functors, but the result generalizes to higher arities. To gain an understanding of why, consider that it's always possible to rewrite tuples of higher arities as nested pairs. As an example, a triple like <code>(42, <span style="color:#a31515;">"foo"</span>, True)</code> can be rewritten as <code>(42, (<span style="color:#a31515;">"foo"</span>, True))</code> without loss of information. The latter representation is a pair (a two-tuple) where the first element is <code>42</code>, but the second element is another pair. These two representations are isomorphic, meaning that we can go back and forth without losing data.
</p>
<p>
By induction you can generalize this result to any arity. The point is that the only data type you need to describe a product is a pair.
</p>
<p>
Haskell's <a href="https://hackage.haskell.org/package/base">base</a> library defines a specialized container called <a href="https://hackage.haskell.org/package/base/docs/Data-Functor-Product.html">Product</a> for this very purpose: If you have two <code>Functor</code> instances, you can <code>Pair</code> them up, and they become a single <code>Functor</code>.
</p>
<p>
Let's start with a <code>Pair</code> of <code>Maybe</code> and a list:
</p>
<p>
<pre>ghci> Pair (Just "foo") ["bar", "baz", "qux"]
Pair (Just "foo") ["bar","baz","qux"]</pre>
</p>
<p>
This is a single 'object', if you will, that composes those two <code>Functor</code> instances. This means that you can map over it:
</p>
<p>
<pre>ghci> elem 'b' <$> Pair (Just "foo") ["bar", "baz", "qux"]
Pair (Just False) [True,True,False]</pre>
</p>
<p>
Here I've used the infix <code><$></code> operator as an alternative to <code>fmap</code>. By composing with <code>elem 'b'</code>, I'm asking every value inside the container whether or not it contains the character <code>b</code>. The <code>Maybe</code> value doesn't, while the first two list elements do.
</p>
<p>
If you want to compose three, rather than two, <code>Functor</code> instances, you just nest the <code>Pairs</code>, just like you can nest tuples:
</p>
<p>
<pre>ghci> elem 'b' <$> Pair (Identity "quux") (Pair (Just "foo") ["bar", "baz", "qux"])
Pair (Identity False) (Pair (Just False) [True,True,False])</pre>
</p>
<p>
This example now introduces the <code>Identity</code> container as a third <code>Functor</code> instance. I could have used any other <code>Functor</code> instance instead of <code>Identity</code>, but some of them are more awkward to create or display. For example, the <a href="/2021/08/30/the-reader-functor">Reader</a> or <a href="/2021/07/19/the-state-functor">State</a> functors have no <code>Show</code> instances in Haskell, meaning that GHCi doesn't know how to print them as values. Other <code>Functor</code> instances didn't work as well for the example, since they tend to be more awkward to create. As an example, any non-trivial <a href="https://hackage.haskell.org/package/containers/docs/Data-Tree.html#t:Tree">Tree</a> requires substantial editor space to express.
</p>
<h3 id="329c3274f8f54171905d747867fc293b">
Conclusion <a href="#329c3274f8f54171905d747867fc293b">#</a>
</h3>
<p>
A product of functors may itself be made a functor. The examples shown in this article are all constrained to two functors, but if you have a product of three, four, or more functors, that product still gives rise to a functor.
</p>
<p>
This is useful to know, particularly if you're working in a language with only partial support for functors. Mainstream languages aren't going to automatically turn such products into functors, in the way that Haskell's <code>Product</code> container almost does. Thus, knowing when you can safely give your generic types a <code>Select</code> method or <code>map</code> function may come in handy.
</p>
<p>
There are more rules like this one. The next article examines another.
</p>
<p>
<strong>Next:</strong> <a href="/2024/10/14/functor-sums">Functor sums</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.A Binary Tree Zipper in C#https://blog.ploeh.dk/2024/09/09/a-binary-tree-zipper-in-c2024-09-09T06:09:00+00:00Mark Seemann
<div id="post">
<p>
<em>A port of another Haskell example, still just because.</em>
</p>
<p>
This article is part of <a href="/2024/08/19/zippers">a series about Zippers</a>. In this one, I port the <code>Zipper</code> data structure from the <a href="https://learnyouahaskell.com/">Learn You a Haskell for Great Good!</a> article also called <a href="https://learnyouahaskell.com/zippers">Zippers</a>.
</p>
<p>
A word of warning: I'm assuming that you're familiar with the contents of that article, so I'll skip the pedagogical explanations; I can hardly do it better that it's done there. Additionally, I'll make heavy use of certain standard constructs to port <a href="https://www.haskell.org/">Haskell</a> code, most notably <a href="/2018/05/22/church-encoding">Church encoding</a> to model <a href="https://en.wikipedia.org/wiki/Tagged_union">sum types</a> in languages that don't natively have them. Such as C#. In some cases, I'll implement the Church encoding using the data structure's <a href="/2019/04/29/catamorphisms">catamorphism</a>. Since the <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a> of the resulting code is quite low, you may be able to follow what's going on even if you don't know what Church encoding or catamorphisms are, but if you want to understand the background and motivation for that style of programming, you can consult the cited resources.
</p>
<p>
The code shown in this article is <a href="https://github.com/ploeh/CSharpZippers">available on GitHub</a>.
</p>
<h3 id="e612adde6ff2487ebd026c858f36233f">
Binary tree initialization and structure <a href="#e612adde6ff2487ebd026c858f36233f">#</a>
</h3>
<p>
In the Haskell code, the binary <code>Tree</code> type is a recursive <a href="https://en.wikipedia.org/wiki/Tagged_union">sum type</a>, defined on a single line of code. C#, on the other hand, has no built-in language construct that supports sum types, so a more elaborate solution is required. At least two options are available to us. One is to <a href="/2018/06/25/visitor-as-a-sum-type">model a sum type as a Visitor</a>. Another is to use <a href="/2018/05/22/church-encoding">Church encoding</a>. In this article, I'll do the latter.
</p>
<p>
I find the type name (<code>Tree</code>) used in the Zippers article a bit too vague, and since I consider <a href="https://peps.python.org/pep-0020/">explicit better than implicit</a>, I'll use a more precise class name:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>></pre>
</p>
<p>
Even so, there are different kinds of binary trees. In <a href="/2019/06/24/full-binary-tree-catamorphism">a previous article</a> I've shown a catamorphism for a <em>full <a href="https://en.wikipedia.org/wiki/Binary_tree">binary tree</a></em>. This variation is not as strict, since it allows a node to have zero, one, or two children. Or, strictly speaking, a node always has exactly two children, but both, or one of them, may be empty. <code><span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>></code> uses Church encoding to distinguish between the two, but we'll return to that in a moment.
</p>
<p>
First, we'll examine how the class allows initialization:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">IBinaryTree</span> root;
<span style="color:blue;">private</span> <span style="color:#2b91af;">BinaryTree</span>(<span style="color:#2b91af;">IBinaryTree</span> <span style="font-weight:bold;color:#1f377f;">root</span>)
{
<span style="color:blue;">this</span>.root = <span style="font-weight:bold;color:#1f377f;">root</span>;
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">BinaryTree</span>() : <span style="color:blue;">this</span>(<span style="color:#2b91af;">Empty</span>.Instance)
{
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">BinaryTree</span>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">value</span>, <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">left</span>, <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">right</span>)
: <span style="color:blue;">this</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">Node</span>(<span style="font-weight:bold;color:#1f377f;">value</span>, <span style="font-weight:bold;color:#1f377f;">left</span>.root, <span style="font-weight:bold;color:#1f377f;">right</span>.root))
{
}</pre>
</p>
<p>
The class uses a <code>private</code> <code>root</code> object to implement behaviour, and constructor chaining for initialization. The master constructor is <code>private</code>, since the <code>IBinaryTree</code> interface is <code>private</code>. The parameterless constructor implicitly indicates an empty node, whereas the other <code>public</code> constructor indicates a node with a value and two children. Yes, I know that I just wrote that explicit is better than implicit, but it turns out that with the <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/operators/new-operator">target-typed <code>new</code></a> operator feature in C#, constructing trees in code becomes easier with this design choice:
</p>
<p>
<pre><span style="color:#2b91af;">BinaryTree</span><<span style="color:blue;">int</span>> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span>(
42,
<span style="color:blue;">new</span>(),
<span style="color:blue;">new</span>(2, <span style="color:blue;">new</span>(), <span style="color:blue;">new</span>()));</pre>
</p>
<p>
As <a href="/2020/11/30/name-by-role">the variable name suggests</a>, I've taken this code example from a unit test.
</p>
<h3 id="57fddbbeebc44489b3ebc0c4fd7c0d9f">
Private interface <a href="#57fddbbeebc44489b3ebc0c4fd7c0d9f">#</a>
</h3>
<p>
The class delegates method calls to the <code>root</code> field, which is an instance of the <code>private</code>, nested <code>IBinaryTree</code> interface:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IBinaryTree</span>
{
<span style="color:#2b91af;">TResult</span> <span style="font-weight:bold;color:#74531f;">Aggregate</span><<span style="color:#2b91af;">TResult</span>>(
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenEmpty</span>,
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">TResult</span>, <span style="color:#2b91af;">TResult</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenNode</span>);
}</pre>
</p>
<p>
Why is <code>IBinaryTree</code> a <code>private</code> interface? Why does that interface even exist?
</p>
<p>
To be frank, I could have chosen another implementation strategy. Since there's only two mutually exclusive alternatives (<em>node</em> or <em>empty</em>), I could also have indicated which is which with a Boolean flag. You can see an example of that implementation tactic in the <code>Table</code> class in the sample code that accompanies <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>.
</p>
<p>
Using a Boolean flag, however, only works when there are exactly two choices. If you have three or more, things because more complicated. You could try to use an <a href="https://en.wikipedia.org/wiki/Enumerated_type">enum</a>, but in most languages, these tend to be nothing but glorified integers, and are typically not type-safe. If you define a three-way enum, there's no guarantee that a value of that type takes only one of these three values, and a good compiler will typically insist that you check for any other value as well. The C# compiler certainly does.
</p>
<p>
Church encoding offers a better alternative, but since it makes use of polymorphism, the most <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> choice in C# is either an interface or a base class. Since I favour interfaces over base classes, that's what I've chosen here, but for the purposes of this little digression, it makes no difference: The following argument applies to base classes as well.
</p>
<p>
An interface (or base class) suggests to users of an API that they can implement it in order to extend behaviour. That's an impression I don't wish to give client developers. The purpose of the interface is exclusively to enable <a href="https://en.wikipedia.org/wiki/Double_dispatch">double dispatch</a> to work. There's only two implementations of the <code>IBinaryTree</code> interface, and under no circumstances should there be more.
</p>
<p>
The interface is an implementation detail, which is why both it, and its implementations, are <code>private</code>.
</p>
<h3 id="72ecf86f028f482ebcdb02e914e4cd06">
Binary tree catamorphism <a href="#72ecf86f028f482ebcdb02e914e4cd06">#</a>
</h3>
<p>
The <code>IBinaryTree</code> interface defines a catamorphism for the <code><span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>></code> class. Since we may often view a catamorphism as a sort of 'generalized fold', and since these kinds of operations in C# are typically called <code>Aggregate</code>, that's what I've called the method.
</p>
<p>
An aggregate function affords a way to traverse a data structure and collect information into a single value, here of type <code>TResult</code>. The return type may, however, be a complex type, including another <code><span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>></code>. You'll see examples of complex return values later in this article.
</p>
<p>
As already discussed, there are exactly two implementations of <code>IBinaryTree</code>. The one representing an empty node is the simplest:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Empty</span> : <span style="color:#2b91af;">IBinaryTree</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">readonly</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">Empty</span> Instance = <span style="color:blue;">new</span>();
<span style="color:blue;">private</span> <span style="color:#2b91af;">Empty</span>()
{
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">TResult</span> <span style="font-weight:bold;color:#74531f;">Aggregate</span><<span style="color:#2b91af;">TResult</span>>(
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenEmpty</span>,
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">TResult</span>, <span style="color:#2b91af;">TResult</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenNode</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">whenEmpty</span>();
}
}</pre>
</p>
<p>
The <code>Aggregate</code> implementation unconditionally calls the supplied <code>whenEmpty</code> function, which returns some <code>TResult</code> value unknown to the <code>Empty</code> class.
</p>
<p>
Although not strictly necessary, I've made the class a <a href="https://en.wikipedia.org/wiki/Singleton_pattern">Singleton</a>. Since I like to <a href="/2021/05/03/structural-equality-for-better-tests">take advantage of structural equality to write better tests</a>, it was either that, or overriding <code>Equals</code> and <code>GetHashCode</code>.
</p>
<p>
The other implementation gets around that problem by being a <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/builtin-types/record">record</a>:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">record</span> <span style="color:#2b91af;">Node</span>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">Value</span>, <span style="color:#2b91af;">IBinaryTree</span> <span style="font-weight:bold;color:#1f377f;">Left</span>, <span style="color:#2b91af;">IBinaryTree</span> <span style="font-weight:bold;color:#1f377f;">Right</span>) : <span style="color:#2b91af;">IBinaryTree</span>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">TResult</span> <span style="font-weight:bold;color:#74531f;">Aggregate</span><<span style="color:#2b91af;">TResult</span>>(
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenEmpty</span>,
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">TResult</span>, <span style="color:#2b91af;">TResult</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenNode</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">whenNode</span>(
Value,
Left.<span style="font-weight:bold;color:#74531f;">Aggregate</span>(<span style="font-weight:bold;color:#1f377f;">whenEmpty</span>, <span style="font-weight:bold;color:#1f377f;">whenNode</span>),
Right.<span style="font-weight:bold;color:#74531f;">Aggregate</span>(<span style="font-weight:bold;color:#1f377f;">whenEmpty</span>, <span style="font-weight:bold;color:#1f377f;">whenNode</span>));
}
}</pre>
</p>
<p>
It, too, unconditionally calls one of the two functions passed to its <code>Aggregate</code> method, but this time <code>whenNode</code>. It does that, however, by first <em>recursively</em> calling <code>Aggregate</code> on both <code>Left</code> and <code>Right</code>. It needs to do that because the <code>whenNode</code> function expects the subtrees to have been already converted to values of the <code>TResult</code> return type. This is a common pattern with catamorphisms, and takes a bit of time getting used to. You can see similar examples in the articles <a href="/2019/06/10/tree-catamorphism">Tree catamorphism</a>, <a href="/2019/08/05/rose-tree-catamorphism">Rose tree catamorphism</a>, and <a href="/2019/06/24/full-binary-tree-catamorphism">Full binary tree catamorphism</a>.
</p>
<p>
The <code><span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>></code> class defines a <code>public</code> <code>Aggregate</code> method that delegates to its <code>root</code> field:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">TResult</span> <span style="font-weight:bold;color:#74531f;">Aggregate</span><<span style="color:#2b91af;">TResult</span>>(
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenEmpty</span>,
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">TResult</span>, <span style="color:#2b91af;">TResult</span>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenNode</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> root.<span style="font-weight:bold;color:#74531f;">Aggregate</span>(<span style="font-weight:bold;color:#1f377f;">whenEmpty</span>, <span style="font-weight:bold;color:#1f377f;">whenNode</span>);
}</pre>
</p>
<p>
The astute reader may now remark that the <code>Aggregate</code> method doesn't look like a Church encoding.
</p>
<h3 id="e99e9074e04e416c82a7345574d4944b">
Binary tree Church encoding <a href="#e99e9074e04e416c82a7345574d4944b">#</a>
</h3>
<p>
A Church encoding will typically have a <code>Match</code> method that enables client code to match on all the alternative cases in the sum type, without those confusing already-converted <code>TResult</code> values. It turns out that you can implement the desired <code>Match</code> method with the <code>Aggregate</code> method.
</p>
<p>
One of the advantages of doing meaningless coding exercises like this one is that you can pursue various ideas that interest you. One idea that interests me is the potential universality of catamorphisms. I conjecture that a catamorphism is an <a href="https://en.wikipedia.org/wiki/Algebraic_data_type">algebraic data type</a>'s universal API, and that you can implement all other methods or functions with it. I admit that I haven't done much research in the form of perusing existing literature, but at least it seems to be the case conspicuously often.
</p>
<p>
As it is here.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">TResult</span> <span style="font-weight:bold;color:#74531f;">Match</span><<span style="color:#2b91af;">TResult</span>>(
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenEmpty</span>,
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>>, <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenNode</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> root
.<span style="font-weight:bold;color:#74531f;">Aggregate</span>(
() => (tree: <span style="color:blue;">new</span> <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>>(), result: <span style="font-weight:bold;color:#1f377f;">whenEmpty</span>()),
(<span style="font-weight:bold;color:#1f377f;">x</span>, <span style="font-weight:bold;color:#1f377f;">l</span>, <span style="font-weight:bold;color:#1f377f;">r</span>) => (
<span style="color:blue;">new</span> <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>>(<span style="font-weight:bold;color:#1f377f;">x</span>, <span style="font-weight:bold;color:#1f377f;">l</span>.tree, <span style="font-weight:bold;color:#1f377f;">r</span>.tree),
<span style="font-weight:bold;color:#1f377f;">whenNode</span>(<span style="font-weight:bold;color:#1f377f;">x</span>, <span style="font-weight:bold;color:#1f377f;">l</span>.tree, <span style="font-weight:bold;color:#1f377f;">r</span>.tree)))
.result;
}</pre>
</p>
<p>
Now, I readily admit that it took me a couple of hours tossing and turning in my bed before this solution came to me. I don't find it intuitive at all, but it works.
</p>
<p>
The <code>Aggregate</code> method requires that the <code>whenNode</code> function's <em>left</em> and <em>right</em> values are of <em>the same</em> <code>TResult</code> type as the return type. How do we consolidate that requirement with the <code>Match</code> method's variation, where <em>its</em> <code>whenNode</code> function requires the <em>left</em> and <em>right</em> values to be <code><span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>></code> values, but the return type still <code>TResult</code>?
</p>
<p>
The way out of this conundrum, it turns out, is to combine both in a tuple. Thus, when <code>Match</code> calls <code>Aggregate</code>, the implied <code>TResult</code> type is <em>not</em> the <code>TResult</code> visible in the <code>Match</code> method declaration. Rather, it's inferred to be of the type <code>(<span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>>, <span style="color:#2b91af;">TResult</span>)</code>. That is, a tuple where the first element is a <code><span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>></code> value, and the second element is a <code><span style="color:#2b91af;">TResult</span></code> value. The C# compiler's type inference engine then figures out that <code>(<span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>>, <span style="color:#2b91af;">TResult</span>)</code> must also be the return type of the <code>Aggregate</code> method call.
</p>
<p>
That's not what <code>Match</code> should return, but the second tuple element contains a value of the correct type, so it returns that. Since I've given the tuple elements names, the <code>Match</code> implementation accomplishes that by returning the <code>result</code> tuple field.
</p>
<h3 id="816773c095624bfcb5cced827ba76455">
Breadcrumbs <a href="#816773c095624bfcb5cced827ba76455">#</a>
</h3>
<p>
That's just the tree that we want to zip. So far, we can only move from root to branches, but not the other way. Before we can define a Zipper for the tree, we need a data structure to store breadcrumbs (the navigation log, if you will).
</p>
<p>
In Haskell it's just another one-liner, but in C# this requires another full-fledged class:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Crumb</span><<span style="color:#2b91af;">T</span>></pre>
</p>
<p>
It's another sum type, so once more, I make the constructor private and use a <code>private</code> class field for the implementation:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">ICrumb</span> imp;
<span style="color:blue;">private</span> <span style="color:#2b91af;">Crumb</span>(<span style="color:#2b91af;">ICrumb</span> <span style="font-weight:bold;color:#1f377f;">imp</span>)
{
<span style="color:blue;">this</span>.imp = <span style="font-weight:bold;color:#1f377f;">imp</span>;
}
<span style="color:blue;">internal</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">Crumb</span><<span style="color:#2b91af;">T</span>> <span style="color:#74531f;">Left</span>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">value</span>, <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">right</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">LeftCrumb</span>(<span style="font-weight:bold;color:#1f377f;">value</span>, <span style="font-weight:bold;color:#1f377f;">right</span>));
}
<span style="color:blue;">internal</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">Crumb</span><<span style="color:#2b91af;">T</span>> <span style="color:#74531f;">Right</span>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">value</span>, <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">left</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">RightCrumb</span>(<span style="font-weight:bold;color:#1f377f;">value</span>, <span style="font-weight:bold;color:#1f377f;">left</span>));
}</pre>
</p>
<p>
To stay consistent throughout the code base, I also use Church encoding to distinguish between a <code>Left</code> and <code>Right</code> breadcrumb, and the technique is similar. First, define a <code>private</code> interface:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">ICrumb</span>
{
<span style="color:#2b91af;">TResult</span> <span style="font-weight:bold;color:#74531f;">Match</span><<span style="color:#2b91af;">TResult</span>>(
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenLeft</span>,
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenRight</span>);
}</pre>
</p>
<p>
Then, use <code>private</code> nested types to implement the interface.
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">record</span> <span style="color:#2b91af;">LeftCrumb</span>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">Value</span>, <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">Right</span>) : <span style="color:#2b91af;">ICrumb</span>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">TResult</span> <span style="font-weight:bold;color:#74531f;">Match</span><<span style="color:#2b91af;">TResult</span>>(
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenLeft</span>,
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenRight</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">whenLeft</span>(Value, Right);
}
}</pre>
</p>
<p>
The <code>RightCrumb</code> record is essentially just the 'mirror image' of the <code>LeftCrumb</code> record, and just as was the case with <code><span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>></code>, the <code><span style="color:#2b91af;">Crumb</span><<span style="color:#2b91af;">T</span>></code> class exposes an externally accessible <code>Match</code> method that just delegates to the <code>private</code> class field:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">TResult</span> <span style="font-weight:bold;color:#74531f;">Match</span><<span style="color:#2b91af;">TResult</span>>(
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenLeft</span>,
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>>, <span style="color:#2b91af;">TResult</span>> <span style="font-weight:bold;color:#1f377f;">whenRight</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> imp.<span style="font-weight:bold;color:#74531f;">Match</span>(<span style="font-weight:bold;color:#1f377f;">whenLeft</span>, <span style="font-weight:bold;color:#1f377f;">whenRight</span>);
}</pre>
</p>
<p>
Finally, all the building blocks are ready for the actual Zipper.
</p>
<h3 id="f345665355144ccfbbc5767d75f48ece">
Zipper data structure and initialization <a href="#f345665355144ccfbbc5767d75f48ece">#</a>
</h3>
<p>
In the Haskell code, the Zipper is another one-liner, and really just a type alias. In C#, once more, we're going to need a full class.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">T</span>></pre>
</p>
<p>
The Haskell article simply calls this type alias <code>Zipper</code>, but I find that name too general, since there's more than one kind of Zipper. I think I understand that the article chooses that name for didactic reasons, but here I've chosen a more consistent disambiguation scheme, so I've named the class <code><span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">T</span>></code>.
</p>
<p>
The Haskell example is just a type alias for a tuple, and the C# class is similar, although with significantly more <a href="/2019/12/16/zone-of-ceremony">ceremony</a>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>> Tree { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">Crumb</span><<span style="color:#2b91af;">T</span>>> Breadcrumbs { <span style="color:blue;">get</span>; }
<span style="color:blue;">private</span> <span style="color:#2b91af;">BinaryTreeZipper</span>(
<span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">tree</span>,
<span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">Crumb</span><<span style="color:#2b91af;">T</span>>> <span style="font-weight:bold;color:#1f377f;">breadcrumbs</span>)
{
Tree = <span style="font-weight:bold;color:#1f377f;">tree</span>;
Breadcrumbs = <span style="font-weight:bold;color:#1f377f;">breadcrumbs</span>;
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">BinaryTreeZipper</span>(<span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">tree</span>) : <span style="color:blue;">this</span>(<span style="font-weight:bold;color:#1f377f;">tree</span>, [])
{
}</pre>
</p>
<p>
I've here chosen to add an extra bit of <a href="/2022/10/24/encapsulation-in-functional-programming">encapsulation</a> by making the master constructor <code>private</code>. This prevents client code from creating an arbitrary object with breadcrumbs without having navigated through the tree. To be honest, I don't think it violates any contract even if we allow this, but it at least highlights that the <code>Breadcrumbs</code> role is to keep a log of what previously happened to the object.
</p>
<h3 id="9cafe5fe05cd4d619b8d50cd3a86f549">
Navigation <a href="#9cafe5fe05cd4d619b8d50cd3a86f549">#</a>
</h3>
<p>
We can now reproduce the navigation functions from the Haskell article.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">T</span>>? <span style="font-weight:bold;color:#74531f;">GoLeft</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> Tree.<span style="font-weight:bold;color:#74531f;">Match</span><<span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">T</span>>?>(
<span style="font-weight:bold;color:#1f377f;">whenEmpty</span>: () => <span style="color:blue;">null</span>,
<span style="font-weight:bold;color:#1f377f;">whenNode</span>: (<span style="font-weight:bold;color:#1f377f;">x</span>, <span style="font-weight:bold;color:#1f377f;">l</span>, <span style="font-weight:bold;color:#1f377f;">r</span>) => <span style="color:blue;">new</span> <span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">T</span>>(
<span style="font-weight:bold;color:#1f377f;">l</span>,
Breadcrumbs.<span style="font-weight:bold;color:#74531f;">Prepend</span>(<span style="color:#2b91af;">Crumb</span>.<span style="color:#74531f;">Left</span>(<span style="font-weight:bold;color:#1f377f;">x</span>, <span style="font-weight:bold;color:#1f377f;">r</span>))));
}</pre>
</p>
<p>
Going left 'pattern-matches' on the <code>Tree</code> and, if not empty, constructs a new <code>BinaryTreeZipper</code> object with the left tree, and a <code>Left</code> breadcrumb that stores the 'current' node value and the right subtree. If the 'current' node is empty, on the other hand, the method returns <code>null</code>. This possibility is explicitly indicated by the <code><span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">T</span>>?</code> return type; notice the question mark, <a href="https://learn.microsoft.com/dotnet/csharp/nullable-references">which indicates that the value may be null</a>. If you're working in a context or language where that feature isn't available, you may instead consider taking advantage of the <a href="/2022/04/25/the-maybe-monad">Maybe monad</a> (which is also what you'd <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatically</a> do in Haskell).
</p>
<p>
The <code>GoRight</code> method is similar to <code>GoLeft</code>.
</p>
<p>
We may also attempt to navigate up in the tree, undoing our last downward move:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">T</span>>? <span style="font-weight:bold;color:#74531f;">GoUp</span>()
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (!Breadcrumbs.<span style="font-weight:bold;color:#74531f;">Any</span>())
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">head</span> = Breadcrumbs.<span style="font-weight:bold;color:#74531f;">First</span>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">tail</span> = Breadcrumbs.<span style="font-weight:bold;color:#74531f;">Skip</span>(1);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">head</span>.<span style="font-weight:bold;color:#74531f;">Match</span>(
<span style="font-weight:bold;color:#1f377f;">whenLeft</span>: (<span style="font-weight:bold;color:#1f377f;">x</span>, <span style="font-weight:bold;color:#1f377f;">r</span>) => <span style="color:blue;">new</span> <span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">T</span>>(
<span style="color:blue;">new</span> <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>>(<span style="font-weight:bold;color:#1f377f;">x</span>, Tree, <span style="font-weight:bold;color:#1f377f;">r</span>),
<span style="font-weight:bold;color:#1f377f;">tail</span>),
<span style="font-weight:bold;color:#1f377f;">whenRight</span>: (<span style="font-weight:bold;color:#1f377f;">x</span>, <span style="font-weight:bold;color:#1f377f;">l</span>) => <span style="color:blue;">new</span> <span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">T</span>>(
<span style="color:blue;">new</span> <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>>(<span style="font-weight:bold;color:#1f377f;">x</span>, <span style="font-weight:bold;color:#1f377f;">l</span>, Tree),
<span style="font-weight:bold;color:#1f377f;">tail</span>));
}</pre>
</p>
<p>
This is another operation that may fail. If we're already at the root of the tree, there are no <code>Breadcrumbs</code>, in which case the only option is to return a value indicating that the operation failed; here, <code>null</code>, but in other languages perhaps <code>None</code> or <code>Nothing</code>.
</p>
<p>
If, on the other hand, there's at least one breadcrumb, the <code>GoUp</code> method uses the most recent one (<code>head</code>) to construct a new <code><span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">T</span>></code> object that reconstitutes the opposite (sibling) subtree and the parent node. It does that by 'pattern-matching' on the <code>head</code> breadcrumb, which enables it to distinguish a left breadcrumb from a right breadcrumb.
</p>
<p>
Finally, we may keep trying to <code>GoUp</code> until we reach the root:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#74531f;">TopMost</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#74531f;">GoUp</span>()?.<span style="font-weight:bold;color:#74531f;">TopMost</span>() ?? <span style="color:blue;">this</span>;
}</pre>
</p>
<p>
You'll see an example of that a little later.
</p>
<h3 id="56a16be50dc4405d8931e9210895b5a0">
Modifications <a href="#56a16be50dc4405d8931e9210895b5a0">#</a>
</h3>
<p>
Continuing the port of the Haskell code, we can <code>Modify</code> the current node with a function:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#74531f;">Modify</span>(<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">f</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">T</span>>(
Tree.<span style="font-weight:bold;color:#74531f;">Match</span>(
<span style="font-weight:bold;color:#1f377f;">whenEmpty</span>: () => <span style="color:blue;">new</span> <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>>(),
<span style="font-weight:bold;color:#1f377f;">whenNode</span>: (<span style="font-weight:bold;color:#1f377f;">x</span>, <span style="font-weight:bold;color:#1f377f;">l</span>, <span style="font-weight:bold;color:#1f377f;">r</span>) => <span style="color:blue;">new</span> <span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>>(<span style="font-weight:bold;color:#1f377f;">f</span>(<span style="font-weight:bold;color:#1f377f;">x</span>), <span style="font-weight:bold;color:#1f377f;">l</span>, <span style="font-weight:bold;color:#1f377f;">r</span>)),
Breadcrumbs);
}</pre>
</p>
<p>
This operation always succeeds, since it chooses to ignore the change if the tree is empty. Thus, there's no question mark on the return type, indicating that the method never returns <code>null</code>.
</p>
<p>
Finally, we may replace a node with a new subtree:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#74531f;">Attach</span>(<span style="color:#2b91af;">BinaryTree</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">tree</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:#2b91af;">T</span>>(<span style="font-weight:bold;color:#1f377f;">tree</span>, Breadcrumbs);
}</pre>
</p>
<p>
The following unit test demonstrates a combination of several of the methods shown above:
</p>
<p>
<pre>[<span style="color:#2b91af;">Fact</span>]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">AttachAndGoTopMost</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">BinaryTreeZipper</span><<span style="color:blue;">char</span>>(freeTree);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">farLeft</span> = <span style="font-weight:bold;color:#1f377f;">sut</span>.<span style="font-weight:bold;color:#74531f;">GoLeft</span>()?.<span style="font-weight:bold;color:#74531f;">GoLeft</span>()?.<span style="font-weight:bold;color:#74531f;">GoLeft</span>()?.<span style="font-weight:bold;color:#74531f;">GoLeft</span>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = <span style="font-weight:bold;color:#1f377f;">farLeft</span>?.<span style="font-weight:bold;color:#74531f;">Attach</span>(<span style="color:blue;">new</span>(<span style="color:#a31515;">'Z'</span>, <span style="color:blue;">new</span>(), <span style="color:blue;">new</span>())).<span style="font-weight:bold;color:#74531f;">TopMost</span>();
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">NotNull</span>(<span style="font-weight:bold;color:#1f377f;">actual</span>);
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Equal</span>(
<span style="color:blue;">new</span>(<span style="color:#a31515;">'P'</span>,
<span style="color:blue;">new</span>(<span style="color:#a31515;">'O'</span>,
<span style="color:blue;">new</span>(<span style="color:#a31515;">'L'</span>,
<span style="color:blue;">new</span>(<span style="color:#a31515;">'N'</span>,
<span style="color:blue;">new</span>(<span style="color:#a31515;">'Z'</span>, <span style="color:blue;">new</span>(), <span style="color:blue;">new</span>()),
<span style="color:blue;">new</span>()),
<span style="color:blue;">new</span>(<span style="color:#a31515;">'T'</span>, <span style="color:blue;">new</span>(), <span style="color:blue;">new</span>())),
<span style="color:blue;">new</span>(<span style="color:#a31515;">'Y'</span>,
<span style="color:blue;">new</span>(<span style="color:#a31515;">'S'</span>, <span style="color:blue;">new</span>(), <span style="color:blue;">new</span>()),
<span style="color:blue;">new</span>(<span style="color:#a31515;">'A'</span>, <span style="color:blue;">new</span>(), <span style="color:blue;">new</span>()))),
<span style="color:blue;">new</span>(<span style="color:#a31515;">'L'</span>,
<span style="color:blue;">new</span>(<span style="color:#a31515;">'W'</span>,
<span style="color:blue;">new</span>(<span style="color:#a31515;">'C'</span>, <span style="color:blue;">new</span>(), <span style="color:blue;">new</span>()),
<span style="color:blue;">new</span>(<span style="color:#a31515;">'R'</span>, <span style="color:blue;">new</span>(), <span style="color:blue;">new</span>())),
<span style="color:blue;">new</span>(<span style="color:#a31515;">'A'</span>,
<span style="color:blue;">new</span>(<span style="color:#a31515;">'A'</span>, <span style="color:blue;">new</span>(), <span style="color:blue;">new</span>()),
<span style="color:blue;">new</span>(<span style="color:#a31515;">'C'</span>, <span style="color:blue;">new</span>(), <span style="color:blue;">new</span>())))),
<span style="font-weight:bold;color:#1f377f;">actual</span>.Tree);
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Empty</span>(<span style="font-weight:bold;color:#1f377f;">actual</span>.Breadcrumbs);
}</pre>
</p>
<p>
The test starts with <code>freeTree</code> (not shown) and first navigates to the leftmost empty node. Here it uses <code>Attach</code> to add a new 'singleton' subtree with the value <code>'Z'</code>. Finally, it uses <code>TopMost</code> to return to the root node.
</p>
<p>
In <a href="/2013/06/24/a-heuristic-for-formatting-code-according-to-the-aaa-pattern">the Assert phase</a>, the test verifies that the <code>actual</code> object contains the expected values.
</p>
<h3 id="8eaa9438655f4bcbb9447796a7ed7154">
Conclusion <a href="#8eaa9438655f4bcbb9447796a7ed7154">#</a>
</h3>
<p>
The Tree Zipper shown here is a port of the example given in the Haskell <a href="https://learnyouahaskell.com/zippers">Zippers article</a>. As I've already discussed in the <a href="/2024/08/19/zippers">introduction article</a>, this data structure doesn't make much sense in C#, where you can easily implement a navigable tree with two-way links. Even if this requires state mutation, you can package such a data structure in a proper object with good <a href="/encapsulation-and-solid">encapsulation</a>, so that operations don't leave any dangling pointers or the like.
</p>
<p>
As far as I can tell, the code shown in this article isn't useful in production code, but I hope that, at least, you still learned something from it. I always learn a new thing or two from <a href="/2020/01/13/on-doing-katas">doing programming exercises</a> and writing about them, and this was no exception.
</p>
<p>
In the next article, I continue with the final of the Haskell article's three examples.
</p>
<p>
<strong>Next:</strong> <a href="/2024/09/23/fszipper-in-c">FSZipper in C#</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Keeping cross-cutting concerns out of application codehttps://blog.ploeh.dk/2024/09/02/keeping-cross-cutting-concerns-out-of-application-code2024-09-02T06:19:00+00:00Mark Seemann
<div id="post">
<p>
<em>Don't inject third-party dependencies. Use Decorators.</em>
</p>
<p>
I recently came across <a href="https://stackoverflow.com/q/78887199/126014">a Stack Overflow question</a> that reminded me of a topic I've been meaning to write about for a long time: <a href="https://en.wikipedia.org/wiki/Cross-cutting_concern">Cross-cutting concerns</a>.
</p>
<p>
When it comes to <a href="https://en.wikipedia.org/wiki/Casablanca_(film)">the usual suspects</a>, logging, fault tolerance, caching, the best solution is usually to apply the <a href="https://en.wikipedia.org/wiki/Decorator_pattern">Decorator pattern</a>.
</p>
<p>
I often see code that uses Dependency Injection (DI) to inject, say, a logging interface into application code. You can see an example of that in <a href="/2020/03/23/repeatable-execution">Repeatable execution</a>, as well as a suggestion for a better design. Not surprisingly, the better design involves logging Decorators.
</p>
<p>
The Stack Overflow question isn't about logging, but rather about fault tolerance; <a href="https://martinfowler.com/bliki/CircuitBreaker.html">Circuit Breaker</a>, retry policies, timeouts, etc.
</p>
<h3 id="02d07297ea6341c6aef55c0fcb76678c">
Injected concern <a href="#02d07297ea6341c6aef55c0fcb76678c">#</a>
</h3>
<p>
The question does a good job of presenting a minimal, reproducible example. At the outset, the code looks like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">MyApi</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">ResiliencePipeline</span> pipeline;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">IOrganizationService</span> service;
<span style="color:blue;">public</span> <span style="color:#2b91af;">MyApi</span>(<span style="color:#2b91af;">ResiliencePipelineProvider</span><<span style="color:blue;">string</span>> <span style="font-weight:bold;color:#1f377f;">provider</span>, <span style="color:#2b91af;">IOrganizationService</span> <span style="font-weight:bold;color:#1f377f;">service</span>)
{
<span style="color:blue;">this</span>.pipeline = <span style="font-weight:bold;color:#1f377f;">provider</span>.<span style="font-weight:bold;color:#74531f;">GetPipeline</span>(<span style="color:#a31515;">"retry-pipeline"</span>);
<span style="color:blue;">this</span>.service = <span style="font-weight:bold;color:#1f377f;">service</span>;
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">List</span><<span style="color:blue;">string</span>> <span style="font-weight:bold;color:#74531f;">GetSomething</span>(<span style="color:#2b91af;">QueryByAttribute</span> <span style="font-weight:bold;color:#1f377f;">query</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">result</span> = <span style="color:blue;">this</span>.pipeline.<span style="font-weight:bold;color:#74531f;">Execute</span>(() => service.<span style="font-weight:bold;color:#74531f;">RetrieveMultiple</span>(<span style="font-weight:bold;color:#1f377f;">query</span>));
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">result</span>.Entities.<span style="font-weight:bold;color:#74531f;">Cast</span><<span style="color:blue;">string</span>>().<span style="font-weight:bold;color:#74531f;">ToList</span>();
}
}</pre>
</p>
<p>
The Stack Overflow question asks how to test this implementation, but I'd rather take the example as an opportunity to discuss design alternatives. Not surprisingly, it turns out that with a more decoupled design, testing becomes easier, too.
</p>
<p>
Before we proceed, a few words about this example code. I assume that this isn't <a href="https://stackoverflow.com/users/3805597">Andy Cooke</a>'s actual production code. Rather, I interpret it as a reduced example that highlights the actual question. This is important because you might ask: <em>Why bother testing two lines of code?</em>
</p>
<p>
Indeed, as presented, the <code>GetSomething</code> method is <a href="/2018/11/12/what-to-test-and-not-to-test">so simple that you may consider not testing it</a>. Thus, I interpret the second line of code as a stand-in for more complicated production code. Hold on to that thought, because once I'm done, that's all that's going to be left, and you may then think that it's so simple that it really doesn't warrant all this hoo-ha.
</p>
<h3 id="32211a755e0a4b9bbd04a049ddbba0c8">
Coupling <a href="#32211a755e0a4b9bbd04a049ddbba0c8">#</a>
</h3>
<p>
As shown, the <code>MyApi</code> class is coupled to <a href="https://www.thepollyproject.org/">Polly</a>, because <code>ResiliencePipeline</code> is defined by that library. To be clear, all I've heard is that Polly is a fine library. I've used it for a few projects myself, but I also admit that I haven't that much experience with it. I'd probably use it again the next time I need a Circuit Breaker or similar, so the following discussion isn't a denouncement of Polly. Rather, it applies to all third-party dependencies, or perhaps even dependencies that are part of your language's base library.
</p>
<p>
Coupling is a major cause of <a href="https://en.wikipedia.org/wiki/Spaghetti_code">spaghetti code</a> and code rot in general. To write sustainable code, you should be cognizant of coupling. The most decoupled code is <a href="/2022/11/21/decouple-to-delete">code that you can easily delete</a>.
</p>
<p>
This doesn't mean that you shouldn't use high-quality third-party libraries like Polly. Among myriads of software engineering heuristics, we know that we should be aware of the <a href="https://en.wikipedia.org/wiki/Not_invented_here">not-invented-here syndrome</a>.
</p>
<p>
When it comes to classic cross-cutting concerns, the Decorator pattern is usually a better design than injecting the concern into application code. The above example clearly looks innocuous, but imagine injecting both a <code>ResiliencePipeline</code>, a logger, and perhaps a caching service, and your real application code eventually disappears in 'infrastructure code'.
</p>
<p>
It's not that we don't want to have these third-party dependencies, but rather that we want to move them somewhere else.
</p>
<h3 id="67a215289ba944b984b4d113b10e419c">
Resilient Decorator <a href="#67a215289ba944b984b4d113b10e419c">#</a>
</h3>
<p>
The concern in the above example is the desire to make the <code>IOrganizationService</code> dependency more resilient. The <code>MyApi</code> class only becomes more resilient as a transitive effect. The first refactoring step, then, is to introduce a resilient Decorator.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ResilientOrganizationService</span>(
<span style="color:#2b91af;">ResiliencePipeline</span> <span style="font-weight:bold;color:#1f377f;">pipeline</span>,
<span style="color:#2b91af;">IOrganizationService</span> <span style="font-weight:bold;color:#1f377f;">inner</span>) : <span style="color:#2b91af;">IOrganizationService</span>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">QueryResult</span> <span style="font-weight:bold;color:#74531f;">RetrieveMultiple</span>(<span style="color:#2b91af;">QueryByAttribute</span> <span style="font-weight:bold;color:#1f377f;">query</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">pipeline</span>.<span style="font-weight:bold;color:#74531f;">Execute</span>(() => <span style="font-weight:bold;color:#1f377f;">inner</span>.<span style="font-weight:bold;color:#74531f;">RetrieveMultiple</span>(<span style="font-weight:bold;color:#1f377f;">query</span>));
}
}</pre>
</p>
<p>
As Decorators must, this class composes another <code>IOrganizationService</code> while also implementing that interface itself. It does so by being an <a href="https://en.wikipedia.org/wiki/Adapter_pattern">Adapter</a> over the Polly API.
</p>
<p>
I've applied <a href="https://vuscode.wordpress.com/2009/10/16/inversion-of-control-single-responsibility-principle-and-nikola-s-laws-of-dependency-injection/">Nikola Malovic's 4th law of DI</a>:
</p>
<blockquote>
<p>
"Every constructor of a class being resolved should not have any implementation other then accepting a set of its own dependencies."
</p>
<footer><cite><a href="https://vuscode.wordpress.com/2009/10/16/inversion-of-control-single-responsibility-principle-and-nikola-s-laws-of-dependency-injection/">Inversion Of Control, Single Responsibility Principle and Nikola’s laws of dependency injection</a></cite>, Nikola Malovic, 2009</footer>
</blockquote>
<p>
Instead of injecting a <code><span style="color:#2b91af;">ResiliencePipelineProvider</span><<span style="color:blue;">string</span>></code> only to call <code>GetPipeline</code> on it, it just receives a <code>ResiliencePipeline</code> and saves the object for use in the <code>RetrieveMultiple</code> method. It does that via a <a href="https://learn.microsoft.com/dotnet/csharp/programming-guide/classes-and-structs/instance-constructors#primary-constructors">primary constructor</a>, which is a recent C# language addition. It's just syntactic sugar for Constructor Injection, and as usual <a href="https://fsharp.org/">F#</a> developers should feel right at home.
</p>
<h3 id="8e967ac0c4ea4323b280e7a665825903">
Simplifying MyApi <a href="#8e967ac0c4ea4323b280e7a665825903">#</a>
</h3>
<p>
Now that you have a resilient version of <code>IOrganizationService</code> you don't need to have any Polly code in <code>MyApi</code>. Remove it and simplify:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">MyApi</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">IOrganizationService</span> service;
<span style="color:blue;">public</span> <span style="color:#2b91af;">MyApi</span>(<span style="color:#2b91af;">IOrganizationService</span> <span style="font-weight:bold;color:#1f377f;">service</span>)
{
<span style="color:blue;">this</span>.service = <span style="font-weight:bold;color:#1f377f;">service</span>;
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">List</span><<span style="color:blue;">string</span>> <span style="font-weight:bold;color:#74531f;">GetSomething</span>(<span style="color:#2b91af;">QueryByAttribute</span> <span style="font-weight:bold;color:#1f377f;">query</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">result</span> = service.<span style="font-weight:bold;color:#74531f;">RetrieveMultiple</span>(<span style="font-weight:bold;color:#1f377f;">query</span>);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">result</span>.Entities.<span style="font-weight:bold;color:#74531f;">Cast</span><<span style="color:blue;">string</span>>().<span style="font-weight:bold;color:#74531f;">ToList</span>();
}
}</pre>
</p>
<p>
As promised, there's almost nothing left of it now, but I'll remind you that I consider the second line of <code>GetSomething</code> as a stand-in for something more complicated that you might need to test. As it is now, though, testing it is trivial:
</p>
<p>
<pre>[<span style="color:#2b91af;">Theory</span>]
[<span style="color:#2b91af;">InlineData</span>(<span style="color:#a31515;">"foo"</span>, <span style="color:#a31515;">"bar"</span>, <span style="color:#a31515;">"baz"</span>)]
[<span style="color:#2b91af;">InlineData</span>(<span style="color:#a31515;">"qux"</span>, <span style="color:#a31515;">"quux"</span>, <span style="color:#a31515;">"corge"</span>)]
[<span style="color:#2b91af;">InlineData</span>(<span style="color:#a31515;">"grault"</span>, <span style="color:#a31515;">"garply"</span>, <span style="color:#a31515;">"waldo"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">GetSomething</span>(<span style="color:blue;">params</span> <span style="color:blue;">string</span>[] <span style="font-weight:bold;color:#1f377f;">expected</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">service</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">Mock</span><<span style="color:#2b91af;">IOrganizationService</span>>();
<span style="font-weight:bold;color:#1f377f;">service</span>
.<span style="font-weight:bold;color:#74531f;">Setup</span>(<span style="font-weight:bold;color:#1f377f;">s</span> => <span style="font-weight:bold;color:#1f377f;">s</span>.<span style="font-weight:bold;color:#74531f;">RetrieveMultiple</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">QueryByAttribute</span>()))
.<span style="font-weight:bold;color:#74531f;">Returns</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">QueryResult</span>(<span style="font-weight:bold;color:#1f377f;">expected</span>));
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">MyApi</span>(<span style="font-weight:bold;color:#1f377f;">service</span>.Object);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = <span style="font-weight:bold;color:#1f377f;">sut</span>.<span style="font-weight:bold;color:#74531f;">GetSomething</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">QueryByAttribute</span>());
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Equal</span>(<span style="font-weight:bold;color:#1f377f;">expected</span>, <span style="font-weight:bold;color:#1f377f;">actual</span>);
}</pre>
</p>
<p>
The larger point, however, is that not only have you now managed to keep third-party dependencies out of your application code, you've also simplified it and made it easier to test.
</p>
<h3 id="c9dd80b39e234c6595ef31de1fea30c2">
Composition <a href="#c9dd80b39e234c6595ef31de1fea30c2">#</a>
</h3>
<p>
You can still create a resilient <code>MyApi</code> object in your <a href="/2011/07/28/CompositionRoot">Composition Root</a>:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">service</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">ResilientOrganizationService</span>(<span style="font-weight:bold;color:#1f377f;">pipeline</span>, <span style="font-weight:bold;color:#1f377f;">inner</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">myApi</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">MyApi</span>(<span style="font-weight:bold;color:#1f377f;">service</span>);</pre>
</p>
<p>
Decomposing the problem in this way, you decouple your application code from third-party dependencies. You can define <code>ResilientOrganizationService</code> in the application's Composition Root, which also keeps the Polly dependency there. Even so, you can implement <code>MyApi</code> as part of your application layer.
</p>
<p>
<img src="/content/binary/polly-in-outer-shell.png" alt="Three circles arranged in layers. In the outer layer, there's a box labelled 'ResilientOrganizationService' and another box labelled 'Polly'. An arrow points from 'ResilientOrganizationService' to 'Polly'. In the second layer in there's a box labelled 'MyApi'. The inner circle is empty." >
</p>
<p>
I usually illustrate <a href="/2013/12/03/layers-onions-ports-adapters-its-all-the-same">Ports and Adapters</a>, or, if you will, <a href="/ref/clean-architecture">Clean Architecture</a> as concentric circles, but in this diagram I've skewed the circles to make space for the boxes. In other words, the diagram is 'not to scale'. Ideally, the outermost layer is much smaller and thinner than any of the the other layers. I've also included an inner green layer which indicates the architecture's Domain Model, but since I assume that <code>MyApi</code> is part of some application layer, I've left the Domain Model empty.
</p>
<h3 id="ba7304112c214dd6be84011aea811dbf">
Reasons to decouple <a href="#ba7304112c214dd6be84011aea811dbf">#</a>
</h3>
<p>
Why is it important to decouple application code from Polly? First, keep in mind that in this discussion Polly is just a stand-in for any third-party dependency. It's up to you as a software architect to decide how you'll structure your code, but third-party dependencies are one of the first things I look for. A third-party component changes with time, and often independently of your base platform. You may have to deal with breaking changes or security patches at inopportune times. The organization that maintains the component may cease to operate. This happens to commercial entities and open-source contributors alike, although for different reasons.
</p>
<p>
Second, even a top-tier library like Polly will undergo changes. If your time horizon is five to ten years, you'll be surprised how much things change. You may protest that no-one designs software systems with such a long view, but I think that if you ask the business people involved with your software, they most certainly expect your system to last a long time.
</p>
<p>
I believe that I heard on a podcast that some Microsoft teams had taken a dependency on Polly. Assuming, for the sake of argument, that this is true, while we may not wish to depend on some random open-source component, depending on Polly is safe, right? In the long run, it isn't. Five years ago, you had the same situation with <a href="https://www.newtonsoft.com/json">Json.NET</a>, but then Microsoft hired James Newton-King and had him make a JSON API as part of the .NET base library. While Json.NET isn't dead by any means, now you have two competing JSON libraries, and Microsoft uses their own in the frameworks and libraries that they release.
</p>
<p>
Deciding to decouple your application code from a third-party component is ultimately a question of risk management. It's up to you to make the bet. Do you pay the up-front cost of decoupling, or do you postpone it, hoping it'll never be necessary?
</p>
<p>
I usually do the former, because the cost is low, and there are other benefits as well. As I've already touched on, unit testing becomes easier.
</p>
<h3 id="f126ff285a014a6d85cff276436321c8">
Configuration <a href="#f126ff285a014a6d85cff276436321c8">#</a>
</h3>
<p>
Since Polly only lives in the Composition Root, you'll also need to define the <code>ResiliencePipeline</code> there. You can write the code that creates that pieline wherever you like, but it might be natural to make it a creation function on the <code>ResilientOrganizationService</code> class:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">ResiliencePipeline</span> <span style="color:#74531f;">CreatePipeline</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">ResiliencePipelineBuilder</span>()
.<span style="font-weight:bold;color:#74531f;">AddRetry</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">RetryStrategyOptions</span>
{
MaxRetryAttempts = 4
})
.<span style="font-weight:bold;color:#74531f;">AddTimeout</span>(<span style="color:#2b91af;">TimeSpan</span>.<span style="color:#74531f;">FromSeconds</span>(1))
.<span style="font-weight:bold;color:#74531f;">Build</span>();
}</pre>
</p>
<p>
That's just an example, and perhaps not what you'd like to do. Perhaps you rather want some of these values to be defined in a configuration file. Thus, this isn't what you <em>have</em> to do, but rather what you <em>could</em> do.
</p>
<p>
If you use this option, however, you could take the return value of this method and inject it into the <code>ResilientOrganizationService</code> constructor.
</p>
<h3 id="11bbc9df98474c33a6ce0902b13178d4">
Conclusion <a href="#11bbc9df98474c33a6ce0902b13178d4">#</a>
</h3>
<p>
Cross-cutting concerns, like caching, logging, security, or, in this case, fault tolerance, are usually best addressed with the Decorator pattern. In this article, you saw an example of using the Decorator pattern to decouple the concern of fault tolerance from the consumer of the service that you need to handle in a fault-tolerant manner.
</p>
<p>
The specific example dealt with the Polly library, but the point isn't that Polly is a particularly nasty third-party component that you need to protect yourself against. Rather, it just so happened that I came across a Stack Overflow question that used Polly, and I though it was a a nice example.
</p>
<p>
As far as I can tell, Polly is actually one of the top .NET open-source packages, so this article is not a denouncement of Polly. It's just a sketch of how to move useful dependencies around in your code base to make sure that they impact your application code as little as possible.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.A List Zipper in C#https://blog.ploeh.dk/2024/08/26/a-list-zipper-in-c2024-08-26T13:19:00+00:00Mark Seemann
<div id="post">
<p>
<em>A port of a Haskell example, just because.</em>
</p>
<p>
This article is part of <a href="/2024/08/19/zippers">a series about Zippers</a>. In this one, I port the <code>ListZipper</code> data structure from the <a href="https://learnyouahaskell.com/">Learn You a Haskell for Great Good!</a> article also called <a href="https://learnyouahaskell.com/zippers">Zippers</a>.
</p>
<p>
A word of warning: I'm assuming that you're familiar with the contents of that article, so I'll skip the pedagogical explanations; I can hardly do it better that it's done there.
</p>
<p>
The code shown in this article is <a href="https://github.com/ploeh/CSharpZippers">available on GitHub</a>.
</p>
<h3 id="04e3cad425414735aff6a3a0507a9855">
Initialization and structure <a href="#04e3cad425414735aff6a3a0507a9855">#</a>
</h3>
<p>
In the Haskell code, <code>ListZipper</code> is just a type alias, but C# doesn't have that, so instead, we'll have to introduce a class.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ListZipper</span><<span style="color:#2b91af;">T</span>> : <span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">T</span>></pre>
</p>
<p>
Since it implements <code><span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">T</span>></code>, it may be used like any other sequence, but it also comes with some special operations that enable client code to move forward and backward, as well as inserting and removing values.
</p>
<p>
The class has the following fields, properties, and constructors:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">T</span>> values;
<span style="color:blue;">public</span> <span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">T</span>> Breadcrumbs { <span style="color:blue;">get</span>; }
<span style="color:blue;">private</span> <span style="color:#2b91af;">ListZipper</span>(<span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">values</span>, <span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">breadcrumbs</span>)
{
<span style="color:blue;">this</span>.values = <span style="font-weight:bold;color:#1f377f;">values</span>;
Breadcrumbs = <span style="font-weight:bold;color:#1f377f;">breadcrumbs</span>;
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">ListZipper</span>(<span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">values</span>) : <span style="color:blue;">this</span>(<span style="font-weight:bold;color:#1f377f;">values</span>, [])
{
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">ListZipper</span>(<span style="color:blue;">params</span> <span style="color:#2b91af;">T</span>[] <span style="font-weight:bold;color:#1f377f;">values</span>) : <span style="color:blue;">this</span>(<span style="font-weight:bold;color:#1f377f;">values</span>.<span style="font-weight:bold;color:#74531f;">AsEnumerable</span>())
{
}</pre>
</p>
<p>
It uses constructor chaining to initialize a <code>ListZipper</code> object with proper <a href="/encapsulation-and-solid">encapsulation</a>. Notice that the master constructor is private. This prevents client code from initializing an object with arbitrary <code>Breadcrumbs</code>. Rather, the <code>Breadcrumbs</code> (the log, if you will) is going to be the result of various operations performed by client code, and only the <code>ListZipper</code> class itself can use this constructor.
</p>
<p>
You may consider the constructor that takes a single <code><span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">T</span>></code> as the 'main' <code>public</code> constructor, and the other one as a convenience that enables a client developer to write code like <code><span style="color:blue;">new</span> <span style="color:#2b91af;">ListZipper</span><<span style="color:blue;">string</span>>(<span style="color:#a31515;">"foo"</span>, <span style="color:#a31515;">"bar"</span>, <span style="color:#a31515;">"baz"</span>)</code>.
</p>
<p>
The class' <code><span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">T</span>></code> implementation only enumerates the <code>values</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">IEnumerator</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#74531f;">GetEnumerator</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> values.<span style="font-weight:bold;color:#74531f;">GetEnumerator</span>();
}</pre>
</p>
<p>
In other words, when enumerating a <code>ListZipper</code>, you only get the 'forward' <code>values</code>. Client code may still examine the <code>Breadcrumbs</code>, since this is a <code>public</code> property, but it should have little need for that.
</p>
<p>
(I admit that making <code>Breadcrumbs</code> public is a concession to testability, since it enabled me to write assertions against this property. It's a form of <a href="/2013/04/04/structural-inspection">structural inspection</a>, which is a technique that I use much less than I did a decade ago. Still, in this case, while you may argue that it violates <a href="https://en.wikipedia.org/wiki/Information_hiding">information hiding</a>, it at least doesn't allow client code to put an object in an invalid state. Had the <code>ListZipper</code> class been a part of a reusable library, I would probably have hidden that data, too, but since this is exercise code, I found this an acceptable compromise. Notice, too, that in the original Haskell code, the breadcrumbs are available to client code.)
</p>
<p>
Regular readers of this blog may be aware that <a href="/2013/07/20/linq-versus-the-lsp">I usually favour IReadOnlyCollection<T> over IEnumerable<T></a>. Here, on the other hand, I've allowed <code>values</code> to be any <code><span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">T</span>></code>, which includes infinite sequences. I decided to do that because Haskell lists, too, may be infinite, and as far as I can tell, <code>ListZipper</code> actually does work with infinite sequences. I have, at least, written a few tests with infinite sequences, and they pass. (I may still have missed an edge case or two. I can't rule that out.)
</p>
<h3 id="908d0fe3cf5d453da3541127ae365d00">
Movement <a href="#908d0fe3cf5d453da3541127ae365d00">#</a>
</h3>
<p>
It's not much fun just being able to initialize an object. You also want to be able to do something with it, such as moving forward:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">ListZipper</span><<span style="color:#2b91af;">T</span>>? <span style="font-weight:bold;color:#74531f;">GoForward</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">head</span> = values.<span style="font-weight:bold;color:#74531f;">Take</span>(1);
<span style="font-weight:bold;color:#8f08c4;">if</span> (!<span style="font-weight:bold;color:#1f377f;">head</span>.<span style="font-weight:bold;color:#74531f;">Any</span>())
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">tail</span> = values.<span style="font-weight:bold;color:#74531f;">Skip</span>(1);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">ListZipper</span><<span style="color:#2b91af;">T</span>>(<span style="font-weight:bold;color:#1f377f;">tail</span>, <span style="font-weight:bold;color:#1f377f;">head</span>.<span style="font-weight:bold;color:#74531f;">Concat</span>(Breadcrumbs));
}</pre>
</p>
<p>
You can move forward through any <code>IEnumerable</code>, so why make things so complicated? The benefit of this <code>GoForward</code> method (<a href="https://en.wikipedia.org/wiki/Pure_function">function</a>, really) is that it records where it came from, which means that moving backwards becomes an option:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">ListZipper</span><<span style="color:#2b91af;">T</span>>? <span style="font-weight:bold;color:#74531f;">GoBack</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">head</span> = Breadcrumbs.<span style="font-weight:bold;color:#74531f;">Take</span>(1);
<span style="font-weight:bold;color:#8f08c4;">if</span> (!<span style="font-weight:bold;color:#1f377f;">head</span>.<span style="font-weight:bold;color:#74531f;">Any</span>())
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">tail</span> = Breadcrumbs.<span style="font-weight:bold;color:#74531f;">Skip</span>(1);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">ListZipper</span><<span style="color:#2b91af;">T</span>>(<span style="font-weight:bold;color:#1f377f;">head</span>.<span style="font-weight:bold;color:#74531f;">Concat</span>(values), <span style="font-weight:bold;color:#1f377f;">tail</span>);
}</pre>
</p>
<p>
This test may serve as an example of client code that makes use of those two operations:
</p>
<p>
<pre>[<span style="color:#2b91af;">Fact</span>]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">GoBack1</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">ListZipper</span><<span style="color:blue;">int</span>>(1, 2, 3, 4);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = <span style="font-weight:bold;color:#1f377f;">sut</span>.<span style="font-weight:bold;color:#74531f;">GoForward</span>()?.<span style="font-weight:bold;color:#74531f;">GoForward</span>()?.<span style="font-weight:bold;color:#74531f;">GoForward</span>()?.<span style="font-weight:bold;color:#74531f;">GoBack</span>();
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Equal</span>([3, 4], <span style="font-weight:bold;color:#1f377f;">actual</span>);
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Equal</span>([2, 1], <span style="font-weight:bold;color:#1f377f;">actual</span>?.Breadcrumbs);
}</pre>
</p>
<p>
Going forward takes the first element off <code>values</code> and adds it to the front of <code>Breadcrumbs</code>. Going backwards is nearly symmetrical: It takes the first element off the <code>Breadcrumbs</code> and adds it back to the front of the <code>values</code>. Used in this way, <code>Breadcrumbs</code> works as a <a href="https://en.wikipedia.org/wiki/Stack_(abstract_data_type)">stack</a>.
</p>
<p>
Notice that both <code>GoForward</code> and <code>GoBack</code> admit the possibility of failure. If <code>values</code> is empty, you can't go forward. If <code>Breadcrumbs</code> is empty, you can't go back. In both cases, the functions return <code>null</code>, which are also indicated by the <code><span style="color:#2b91af;">ListZipper</span><<span style="color:#2b91af;">T</span>>?</code> return types; notice the question mark, <a href="https://learn.microsoft.com/dotnet/csharp/nullable-references">which indicates that the value may be null</a>. If you're working in a context or language where that feature isn't available, you may instead consider taking advantage of the <a href="/2022/04/25/the-maybe-monad">Maybe monad</a> (which is also what you'd <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatically</a> do in Haskell).
</p>
<p>
To be clear, the <a href="https://learnyouahaskell.com/zippers">Zippers article</a> does discuss handling failures using Maybe, but only applies it to its binary tree example. Thus, the error handling shown here is my own addition.
</p>
<h3 id="704f23586ead4b199b171baa50dfd1da">
Modifications <a href="#704f23586ead4b199b171baa50dfd1da">#</a>
</h3>
<p>
In addition to moving back and forth in the list, we can also modify it. The following operations are also not in the Zippers article, but are rather my own contributions. Adding a new element is easy:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">ListZipper</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#74531f;">Insert</span>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">value</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">ListZipper</span><<span style="color:#2b91af;">T</span>>(values.<span style="font-weight:bold;color:#74531f;">Prepend</span>(<span style="font-weight:bold;color:#1f377f;">value</span>), Breadcrumbs);
}</pre>
</p>
<p>
Notice that this operation is always possible. Even if the list is empty, we can <code>Insert</code> a value. In that case, it just becomes the list's first and only element.
</p>
<p>
A simple test demonstrates usage:
</p>
<p>
<pre>[<span style="color:#2b91af;">Fact</span>]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">InsertAtFocus</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">ListZipper</span><<span style="color:blue;">string</span>>(<span style="color:#a31515;">"foo"</span>, <span style="color:#a31515;">"bar"</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = <span style="font-weight:bold;color:#1f377f;">sut</span>.<span style="font-weight:bold;color:#74531f;">GoForward</span>()?.<span style="font-weight:bold;color:#74531f;">Insert</span>(<span style="color:#a31515;">"ploeh"</span>).<span style="font-weight:bold;color:#74531f;">GoBack</span>();
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">NotNull</span>(<span style="font-weight:bold;color:#1f377f;">actual</span>);
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Equal</span>([<span style="color:#a31515;">"foo"</span>, <span style="color:#a31515;">"ploeh"</span>, <span style="color:#a31515;">"bar"</span>], <span style="font-weight:bold;color:#1f377f;">actual</span>);
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Empty</span>(<span style="font-weight:bold;color:#1f377f;">actual</span>.Breadcrumbs);
}</pre>
</p>
<p>
Likewise, we may attempt to remove an element from the list:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">ListZipper</span><<span style="color:#2b91af;">T</span>>? <span style="font-weight:bold;color:#74531f;">Remove</span>()
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (!values.<span style="font-weight:bold;color:#74531f;">Any</span>())
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">ListZipper</span><<span style="color:#2b91af;">T</span>>(values.<span style="font-weight:bold;color:#74531f;">Skip</span>(1), Breadcrumbs);
}</pre>
</p>
<p>
Contrary to <code>Insert</code>, the <code>Remove</code> operation will fail if <code>values</code> is empty. Notice that this doesn't necessarily imply that the list as such is empty, but only that the focus is at the end of the list (which, of course, never happens if <code>values</code> is infinite):
</p>
<p>
<pre>[<span style="color:#2b91af;">Fact</span>]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RemoveAtEnd</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">ListZipper</span><<span style="color:blue;">string</span>>(<span style="color:#a31515;">"foo"</span>, <span style="color:#a31515;">"bar"</span>).<span style="font-weight:bold;color:#74531f;">GoForward</span>()?.<span style="font-weight:bold;color:#74531f;">GoForward</span>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = <span style="font-weight:bold;color:#1f377f;">sut</span>?.<span style="font-weight:bold;color:#74531f;">Remove</span>();
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Null</span>(<span style="font-weight:bold;color:#1f377f;">actual</span>);
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">NotNull</span>(<span style="font-weight:bold;color:#1f377f;">sut</span>);
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Empty</span>(<span style="font-weight:bold;color:#1f377f;">sut</span>);
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Equal</span>([<span style="color:#a31515;">"bar"</span>, <span style="color:#a31515;">"foo"</span>], <span style="font-weight:bold;color:#1f377f;">sut</span>.Breadcrumbs);
}</pre>
</p>
<p>
In this example, the focus is at the end of the list, so there's nothing to remove. The list, however, is not empty, but all the data currently reside in the <code>Breadcrumbs</code>.
</p>
<p>
Finally, we can combine insertion and removal to implement a replacement operation:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">ListZipper</span><<span style="color:#2b91af;">T</span>>? <span style="font-weight:bold;color:#74531f;">Replace</span>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">newValue</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#74531f;">Remove</span>()?.<span style="font-weight:bold;color:#74531f;">Insert</span>(<span style="font-weight:bold;color:#1f377f;">newValue</span>);
}</pre>
</p>
<p>
As the name implies, this operation replaces the value currently in focus with a completely different value. Here's an example:
</p>
<p>
<pre>[<span style="color:#2b91af;">Fact</span>]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">ReplaceAtFocus</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">ListZipper</span><<span style="color:blue;">string</span>>(<span style="color:#a31515;">"foo"</span>, <span style="color:#a31515;">"bar"</span>, <span style="color:#a31515;">"baz"</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = <span style="font-weight:bold;color:#1f377f;">sut</span>.<span style="font-weight:bold;color:#74531f;">GoForward</span>()?.<span style="font-weight:bold;color:#74531f;">Replace</span>(<span style="color:#a31515;">"qux"</span>)?.<span style="font-weight:bold;color:#74531f;">GoBack</span>();
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">NotNull</span>(<span style="font-weight:bold;color:#1f377f;">actual</span>);
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Equal</span>([<span style="color:#a31515;">"foo"</span>, <span style="color:#a31515;">"qux"</span>, <span style="color:#a31515;">"baz"</span>], <span style="font-weight:bold;color:#1f377f;">actual</span>);
<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Empty</span>(<span style="font-weight:bold;color:#1f377f;">actual</span>.Breadcrumbs);
}</pre>
</p>
<p>
Once more, this may fail if the current focus is empty, so <code>Replace</code> also returns a nullable value.
</p>
<h3 id="5979ae4ab42543f79df4e572f7f5c2c3">
Conclusion <a href="#5979ae4ab42543f79df4e572f7f5c2c3">#</a>
</h3>
<p>
For a C# developer, the <code><span style="color:#2b91af;">ListZipper</span><<span style="color:#2b91af;">T</span>></code> class looks odd. Why would you ever want to use this data structure? Why not just use <a href="https://learn.microsoft.com/dotnet/api/system.collections.generic.list-1">List<T></a>?
</p>
<p>
As I hope I've made clear in the <a href="/2024/08/19/zippers">introduction article</a>, I can't, indeed, think of a good reason.
</p>
<p>
I've gone through this exercise <a href="/2020/01/13/on-doing-katas">to hone my skills</a>, and to prepare myself for the more intimidating exercise it is to implement a binary tree Zipper.
</p>
<p>
<strong>Next:</strong> <a href="/2024/09/09/a-binary-tree-zipper-in-c">A Binary Tree Zipper in C#</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Zippershttps://blog.ploeh.dk/2024/08/19/zippers2024-08-19T14:13:00+00:00Mark Seemann
<div id="post">
<p>
<em>Some functional programming examples ported to C#, just because.</em>
</p>
<p>
Many algorithms rely on data structures that enable the implementation to move in more than one way. A simple example is a <a href="https://en.wikipedia.org/wiki/Doubly_linked_list">doubly-linked list</a>, where an algorithm can move both forward and backward from a given element. Other examples are various tree-based algorithms, such as <a href="https://en.wikipedia.org/wiki/Red%E2%80%93black_tree">red-black trees</a> where certain operations trigger reorganization of the tree. Yet other data structures, such as <a href="https://en.wikipedia.org/wiki/Fibonacci_heap">Fibonacci heaps</a>, combine doubly-linked lists with trees that allow navigation in more than one direction.
</p>
<p>
In an imperative programming language, you can easily implement such data structures, as long as the language allows data mutation. Here's a simple example:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">node1</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">Node</span><<span style="color:blue;">string</span>>(<span style="color:#a31515;">"foo"</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">node2</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">Node</span><<span style="color:blue;">string</span>>(<span style="color:#a31515;">"bar"</span>) { Previous = <span style="font-weight:bold;color:#1f377f;">node1</span> };
<span style="font-weight:bold;color:#1f377f;">node1</span>.Next = <span style="font-weight:bold;color:#1f377f;">node2</span>;</pre>
</p>
<p>
It's possible to double-link <code>node1</code> to <code>node2</code> by first creating <code>node1</code>. At that point, <code>node2</code> still doesn't exist, so you can't yet assign <code><span style="font-weight:bold;color:#1f377f;">node1</span>.Next</code>, but once you've initialized <code>node2</code>, you can mutate the state of <code>node1</code> by changing its <code>Next</code> property.
</p>
<p>
When data structures are immutable (as they must be in functional programming) this is no longer possible. How may you get around that limitation?
</p>
<h3 id="3b3c3d4cba1f4ae8bef462b28047860a">
Alternatives <a href="#3b3c3d4cba1f4ae8bef462b28047860a">#</a>
</h3>
<p>
Some languages get around this problem in various ways. <a href="https://www.haskell.org/">Haskell</a>, because of its lazy evaluation, enables a technique called <a href="https://wiki.haskell.org/Tying_the_Knot">tying the knot</a> that, frankly, makes my head hurt.
</p>
<p>
Even though I write a decent amount of Haskell code, that's not something that I make use of. Usually, it turns out, you can solve most problems by thinking about them differently. By choosing another perspective, and another data structure, you can often arrive at a good, functional solution to your problem.
</p>
<p>
One family of general-purpose data structures are called Zippers. The general idea is that the data structure has a natural 'focus' (e.g. the head of a list), but it also keeps a record of 'breadcrumbs', that is, where the caller has previously been. This enables client code to 'go back' or 'go up', if the natural direction is to 'go forward' or 'go down'. It's a bit like <a href="https://martinfowler.com/eaaDev/EventSourcing.html">Event Sourcing</a>, in that every operation leaves a log entry that can later be used to reconstruct what happened. <a href="/2020/03/23/repeatable-execution">Repeatable Execution</a> also comes to mind, although it's not quite the same.
</p>
<p>
For an introduction to Zippers, I recommend the excellent and highly readable article <a href="https://learnyouahaskell.com/zippers">Zippers</a>. In this article series, I'm going to assume that you're familiar with the contents of that article.
</p>
<h3 id="8ec371f87d2f468ea6ebbc3a2e420cbb">
C# ports <a href="#8ec371f87d2f468ea6ebbc3a2e420cbb">#</a>
</h3>
<p>
While I may add more articles to this series in the future, as I'm writing this, I have nothing more planned than writing about how it's possible to implement the article's three Zippers in C#.
</p>
<ul>
<li><a href="/2024/08/26/a-list-zipper-in-c">A List Zipper in C#</a></li>
<li><a href="/2024/09/09/a-binary-tree-zipper-in-c">A Binary Tree Zipper in C#</a></li>
<li><a href="/2024/09/23/fszipper-in-c">FSZipper in C#</a></li>
</ul>
<p>
Why would you want to do this?
</p>
<p>
To be honest, for production code, I can't think of a good reason. I did it for a few reasons, most of them didactic. Additionally, <a href="/2020/01/13/on-doing-katas">writing code for exercise</a> helps you improve. If you know enough Haskell to understand what's going on in the <a href="https://learnyouahaskell.com/zippers">Zippers article</a>, you may consider porting some of it to your favourite language, as an exercise.
</p>
<p>
It may help you <a href="/ref/stranger-in-a-strange-land">grokking</a> functional programming.
</p>
<p>
That's really it, though. There's no reason to use Zippers in a language like C#, which <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatically</a> makes use of mutation. If you want a doubly-linked list, you can just write code as shown in the beginning of this article.
</p>
<p>
If you're interested in an <a href="https://fsharp.org/">F#</a> perspective on Zippers, <a href="https://tomasp.net/">Tomas Petricek</a> has a cool article: <a href="https://tomasp.net/blog/tree-zipper-query.aspx/">Processing trees with F# zipper computation</a>.
</p>
<h3 id="8a124e3b10aa4b0b889efe866f63dc91">
Conclusion <a href="#8a124e3b10aa4b0b889efe866f63dc91">#</a>
</h3>
<p>
Zippers constitute a family of data structures that enables you to move in multiple directions. Left and right in a list. Up or down in a tree. For an imperative programmer, that's literally just another day at the office, but in disciplined functional programming, making cyclic graphs can be surprisingly tricky.
</p>
<p>
Even in functional programming, I rarely reach for a Zipper, since I can often find a library with a higher level of abstraction that does what I need it to do. Still, learning of new ways to solve problems never seems a waste to me.
</p>
<p>
In the next three articles, I'll go through the examples from <a href="https://learnyouahaskell.com/zippers">the Zipper article</a> and show how I ported them to C#. While that article starts with a <a href="https://en.wikipedia.org/wiki/Binary_tree">binary tree</a>, I'll instead begin with the doubly-linked list, since it's the simplest of the three.
</p>
<p>
<strong>Next:</strong> <a href="/2024/08/26/a-list-zipper-in-c">A List Zipper in C#</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Using only a Domain Model to persist restaurant table configurationshttps://blog.ploeh.dk/2024/08/12/using-only-a-domain-model-to-persist-restaurant-table-configurations2024-08-12T12:57:00+00:00Mark Seemann
<div id="post">
<p>
<em>A data architecture example in C# and ASP.NET.</em>
</p>
<p>
This is part of a <a href="/2024/07/25/three-data-architectures-for-the-server">small article series on data architectures</a>. In this, the third instalment, you'll see an alternative way of modelling data in a server-based application. One that doesn't rely on statically typed classes to model data. As the introductory article explains, the example code shows how to create a new restaurant table configuration, or how to display an existing resource. The sample code base is an ASP.NET 8.0 <a href="https://en.wikipedia.org/wiki/REST">REST</a> API.
</p>
<p>
Keep in mind that while the sample code does store data in a relational database, the term <em>table</em> in this article mainly refers to physical tables, rather than database tables.
</p>
<p>
The idea is to use 'raw' serialization APIs to handle communication with external systems. For the presentation layer, the example even moves representation concerns to middleware, so that it's nicely abstracted away from the application layer.
</p>
<p>
An architecture diagram like this attempts to capture the design:
</p>
<p>
<img src="/content/binary/domain-model-only-data-architecture.png" alt="Architecture diagram showing a box labelled Domain Model with bidirectional arrows both above and below, pointing below towards a cylinder, and above towards a document.">
</p>
<p>
Here, the arrows indicate mappings, not dependencies.
</p>
<p>
Like in the <a href="/2024/07/29/using-ports-and-adapters-to-persist-restaurant-table-configurations">DTO-based Ports and Adapters architecture</a>, the goal is to being able to design Domain Models unconstrained by serialization concerns, but also being able to format external data unconstrained by Reflection-based serializers. Thus, while this architecture is centred on a Domain Model, there are no <a href="https://en.wikipedia.org/wiki/Data_transfer_object">Data Transfer Objects</a> (DTOs) to represent <a href="https://json.org/">JSON</a>, <a href="https://en.wikipedia.org/wiki/XML">XML</a>, or database rows.
</p>
<h3 id="799ef3debcb748079610a1ff818360e2">
HTTP interaction <a href="#799ef3debcb748079610a1ff818360e2">#</a>
</h3>
<p>
To establish the context of the application, here's how HTTP interactions may play out. The following is a copy of the identically named section in the article <a href="/2024/07/29/using-ports-and-adapters-to-persist-restaurant-table-configurations">Using Ports and Adapters to persist restaurant table configurations</a>, repeated here for your convenience.
</p>
<p>
A client can create a new table with a <code>POST</code> HTTP request:
</p>
<p>
<pre>POST /tables HTTP/1.1
content-type: application/json
{ <span style="color:#2e75b6;">"communalTable"</span>: { <span style="color:#2e75b6;">"capacity"</span>: 16 } }</pre>
</p>
<p>
Which might elicit a response like this:
</p>
<p>
<pre>HTTP/1.1 201 Created
Location: https://example.com/Tables/844581613e164813aa17243ff8b847af</pre>
</p>
<p>
Clients can later use the address indicated by the <code>Location</code> header to retrieve a representation of the resource:
</p>
<p>
<pre>GET /Tables/844581613e164813aa17243ff8b847af HTTP/1.1
accept: application/json</pre>
</p>
<p>
Which would result in this response:
</p>
<p>
<pre>HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
{<span style="color:#2e75b6;">"communalTable"</span>:{<span style="color:#2e75b6;">"capacity"</span>:16}}</pre>
</p>
<p>
By default, ASP.NET handles and returns JSON. Later in this article you'll see how well it deals with other data formats.
</p>
<h3 id="63cacca8023b4adb9534a34aba0c50ff">
Boundary <a href="#63cacca8023b4adb9534a34aba0c50ff">#</a>
</h3>
<p>
ASP.NET supports some variation of the <a href="https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller">model-view-controller</a> (MVC) pattern, and Controllers handle HTTP requests. At the outset, the <em>action method</em> that handles the <code>POST</code> request looks like this:
</p>
<p>
<pre>[<span style="color:#2b91af;">HttpPost</span>]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">IActionResult</span>> <span style="font-weight:bold;color:#74531f;">Post</span>(<span style="color:#2b91af;">Table</span> <span style="font-weight:bold;color:#1f377f;">table</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">id</span> = <span style="color:#2b91af;">Guid</span>.<span style="color:#74531f;">NewGuid</span>();
<span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">repository</span>.<span style="font-weight:bold;color:#74531f;">Create</span>(<span style="font-weight:bold;color:#1f377f;">id</span>, <span style="font-weight:bold;color:#1f377f;">table</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">CreatedAtActionResult</span>(
<span style="color:blue;">nameof</span>(<span style="font-weight:bold;color:#74531f;">Get</span>),
<span style="color:blue;">null</span>,
<span style="color:blue;">new</span> { id = <span style="font-weight:bold;color:#1f377f;">id</span>.<span style="font-weight:bold;color:#74531f;">ToString</span>(<span style="color:#a31515;">"N"</span>) },
<span style="color:blue;">null</span>);
}</pre>
</p>
<p>
While this looks identical to the <code>Post</code> method for <a href="/2024/08/05/using-a-shared-data-model-to-persist-restaurant-table-configurations">the Shared Data Model architecture</a>, it's not, because it's not the same <code>Table</code> class. Not by a long shot. The <code>Table</code> class in use here is the one originally introduced in the article <a href="/2023/12/25/serializing-restaurant-tables-in-c">Serializing restaurant tables in C#</a>, with a few inconsequential differences.
</p>
<p>
How does a Controller <em>action method</em> receive an input parameter directly in the form of a Domain Model, keeping in mind that this particular Domain Model is far from serialization-friendly? The short answer is <em>middleware</em>, which we'll get to in a moment. Before we look at that, however, let's also look at the <code>Get</code> method that supports HTTP <code>GET</code> requests:
</p>
<p>
<pre>[<span style="color:#2b91af;">HttpGet</span>(<span style="color:#a31515;">"</span><span style="color:#0073ff;">{</span>id<span style="color:#0073ff;">}</span><span style="color:#a31515;">"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">IActionResult</span>> <span style="font-weight:bold;color:#74531f;">Get</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">id</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (!<span style="color:#2b91af;">Guid</span>.<span style="color:#74531f;">TryParseExact</span>(<span style="font-weight:bold;color:#1f377f;">id</span>, <span style="color:#a31515;">"N"</span>, <span style="color:blue;">out</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">guid</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">BadRequestResult</span>();
<span style="color:#2b91af;">Table</span>? <span style="font-weight:bold;color:#1f377f;">table</span> = <span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">repository</span>.<span style="font-weight:bold;color:#74531f;">Read</span>(<span style="font-weight:bold;color:#1f377f;">guid</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">table</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">NotFoundResult</span>();
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">OkObjectResult</span>(<span style="font-weight:bold;color:#1f377f;">table</span>);
}</pre>
</p>
<p>
This, too, looks exactly like the Shared Data Model architecture, again with the crucial difference that the <code>Table</code> class is completely different. The <code>Get</code> method just takes the <code>table</code> object and wraps it in an <code>OkObjectResult</code> and returns it.
</p>
<p>
The <code>Table</code> class is, in reality, extraordinarily opaque, and not at all friendly to serialization, so how do the service turn it into JSON?
</p>
<h3 id="38d9d73532fc4912834452fff3d33b3a">
JSON middleware <a href="#38d9d73532fc4912834452fff3d33b3a">#</a>
</h3>
<p>
Most web frameworks come with extensibility points where you can add middleware. A common need is to be able to add custom serializers. In ASP.NET they're called <em>formatters</em>, and can be added at application startup:
</p>
<p>
<pre><span style="font-weight:bold;color:#1f377f;">builder</span>.Services.<span style="font-weight:bold;color:#74531f;">AddControllers</span>(<span style="font-weight:bold;color:#1f377f;">opts</span> =>
{
<span style="font-weight:bold;color:#1f377f;">opts</span>.InputFormatters.<span style="font-weight:bold;color:#74531f;">Insert</span>(0, <span style="color:blue;">new</span> <span style="color:#2b91af;">TableJsonInputFormatter</span>());
<span style="font-weight:bold;color:#1f377f;">opts</span>.OutputFormatters.<span style="font-weight:bold;color:#74531f;">Insert</span>(0, <span style="color:blue;">new</span> <span style="color:#2b91af;">TableJsonOutputFormatter</span>());
});</pre>
</p>
<p>
As the names imply, <code>TableJsonInputFormatter</code> deserializes JSON input, while <code>TableJsonOutputFormatter</code> serializes strongly typed objects to JSON.
</p>
<p>
We'll look at each in turn, starting with <code>TableJsonInputFormatter</code>, which is responsible for deserializing JSON documents into <code>Table</code> objects, as used by, for example, the <code>Post</code> method.
</p>
<h3 id="9ff7ca45e5fd4d19bb9be29769e9a298">
JSON input formatter <a href="#9ff7ca45e5fd4d19bb9be29769e9a298">#</a>
</h3>
<p>
You create an input formatter by implementing the <a href="https://learn.microsoft.com/dotnet/api/microsoft.aspnetcore.mvc.formatters.iinputformatter">IInputFormatter</a> interface, although in this example code base, inheriting from <a href="https://learn.microsoft.com/dotnet/api/microsoft.aspnetcore.mvc.formatters.textinputformatter">TextInputFormatter</a> is enough:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">TableJsonInputFormatter</span> : <span style="color:#2b91af;">TextInputFormatter</span></pre>
</p>
<p>
You can use the constructor to define which media types and encodings the formatter will support:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">TableJsonInputFormatter</span>()
{
SupportedMediaTypes.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="color:#2b91af;">MediaTypeHeaderValue</span>.<span style="color:#74531f;">Parse</span>(<span style="color:#a31515;">"application/json"</span>));
SupportedEncodings.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="color:#2b91af;">Encoding</span>.UTF8);
SupportedEncodings.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="color:#2b91af;">Encoding</span>.Unicode);
}</pre>
</p>
<p>
You'll also need to tell the formatter, which .NET type it supports:
</p>
<p>
<pre><span style="color:blue;">protected</span> <span style="color:blue;">override</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">CanReadType</span>(<span style="color:#2b91af;">Type</span> <span style="font-weight:bold;color:#1f377f;">type</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">type</span> <span style="font-weight:bold;color:#74531f;">==</span> <span style="color:blue;">typeof</span>(<span style="color:#2b91af;">Table</span>);
}</pre>
</p>
<p>
As far as I can tell, the ASP.NET framework will first determine which <em>action method</em> (that is, which Controller, and which method on that Controller) should handle a given HTTP request. For a <code>POST</code> request, as shown above, it'll determine that the appropriate <em>action method</em> is the <code>Post</code> method.
</p>
<p>
Since the <code>Post</code> method takes a <code>Table</code> object as input, the framework then goes through the registered formatters and asks them whether they can read from an HTTP request into that type. In this case, the <code>TableJsonInputFormatter</code> answers <code>true</code> only if the <code>type</code> is <code>Table</code>.
</p>
<p>
When <code>CanReadType</code> answers <code>true</code>, the framework then invokes a method to turn the HTTP request into an object:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">InputFormatterResult</span>> <span style="font-weight:bold;color:#74531f;">ReadRequestBodyAsync</span>(
<span style="color:#2b91af;">InputFormatterContext</span> <span style="font-weight:bold;color:#1f377f;">context</span>,
<span style="color:#2b91af;">Encoding</span> <span style="font-weight:bold;color:#1f377f;">encoding</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">rdr</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">StreamReader</span>(<span style="font-weight:bold;color:#1f377f;">context</span>.HttpContext.Request.Body, <span style="font-weight:bold;color:#1f377f;">encoding</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">json</span> = <span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">rdr</span>.<span style="font-weight:bold;color:#74531f;">ReadToEndAsync</span>().<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">table</span> = <span style="color:#2b91af;">TableJson</span>.<span style="color:#74531f;">Deserialize</span>(<span style="font-weight:bold;color:#1f377f;">json</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">table</span> <span style="color:blue;">is</span> { })
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">await</span> <span style="color:#2b91af;">InputFormatterResult</span>.<span style="color:#74531f;">SuccessAsync</span>(<span style="font-weight:bold;color:#1f377f;">table</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">await</span> <span style="color:#2b91af;">InputFormatterResult</span>.<span style="color:#74531f;">FailureAsync</span>().<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
}</pre>
</p>
<p>
The <code>ReadRequestBodyAsync</code> method reads the HTTP request body into a <code>string</code> value called <code>json</code>, and then passes the value to <code>TableJson.Deserialize</code>. You can see the implementation of the <code>Deserialize</code> method in the article <a href="/2023/12/25/serializing-restaurant-tables-in-c">Serializing restaurant tables in C#</a>. In short, it uses the default .NET JSON parser to probe a document object model. If it can turn the JSON document into a <code>Table</code> value, it does that. Otherwise, it returns <code>null</code>.
</p>
<p>
The above <code>ReadRequestBodyAsync</code> method then checks if the return value from <code>TableJson.Deserialize</code> is <code>null</code>. If it's not, it wraps the result in a value that indicates success. If it's <code>null</code>, it uses <code>FailureAsync</code> to indicate a deserialization failure.
</p>
<p>
With this input formatter in place as middleware, any action method that takes a <code>Table</code> parameter will automatically receive a deserialized JSON object, if possible.
</p>
<h3 id="e04e265f6cdd48869aa8510e14092644">
JSON output formatter <a href="#e04e265f6cdd48869aa8510e14092644">#</a>
</h3>
<p>
The <code>TableJsonOutputFormatter</code> class works much in the same way, but instead derives from the <a href="https://learn.microsoft.com/dotnet/api/microsoft.aspnetcore.mvc.formatters.textoutputformatter">TextOutputFormatter</a> base class:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">TableJsonOutputFormatter</span> : <span style="color:#2b91af;">TextOutputFormatter</span></pre>
</p>
<p>
The constructor looks just like the <code>TableJsonInputFormatter</code>, and instead of a <code>CanReadType</code> method, it has a <code>CanWriteType</code> method that also looks identical.
</p>
<p>
The <code>WriteResponseBodyAsync</code> serializes a <code>Table</code> object to JSON:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:#2b91af;">Task</span> <span style="font-weight:bold;color:#74531f;">WriteResponseBodyAsync</span>(
<span style="color:#2b91af;">OutputFormatterWriteContext</span> <span style="font-weight:bold;color:#1f377f;">context</span>,
<span style="color:#2b91af;">Encoding</span> <span style="font-weight:bold;color:#1f377f;">selectedEncoding</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">context</span>.Object <span style="color:blue;">is</span> <span style="color:#2b91af;">Table</span> <span style="font-weight:bold;color:#1f377f;">table</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">context</span>.HttpContext.Response.<span style="font-weight:bold;color:#74531f;">WriteAsync</span>(<span style="font-weight:bold;color:#1f377f;">table</span>.<span style="font-weight:bold;color:#74531f;">Serialize</span>(), <span style="font-weight:bold;color:#1f377f;">selectedEncoding</span>);
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">InvalidOperationException</span>(<span style="color:#a31515;">"Expected a Table object."</span>);
}</pre>
</p>
<p>
If <code>context.Object</code> is, in fact, a <code>Table</code> object, the method calls <code>table.Serialize()</code>, which you can also see in the article <a href="/2023/12/25/serializing-restaurant-tables-in-c">Serializing restaurant tables in C#</a>. In short, it pattern-matches on the two possible kinds of tables and builds an appropriate <a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">abstract syntax tree</a> or document object model that it then serializes to JSON.
</p>
<h3 id="7c935b48e0cf42369b5d0c55c688d5bf">
Data access <a href="#7c935b48e0cf42369b5d0c55c688d5bf">#</a>
</h3>
<p>
While the application stores data in <a href="https://en.wikipedia.org/wiki/Microsoft_SQL_Server">SQL Server</a>, it uses no <a href="https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping">object-relational mapper</a> (ORM). Instead, it simply uses ADO.NET, as also outlined in the article <a href="/2023/09/18/do-orms-reduce-the-need-for-mapping">Do ORMs reduce the need for mapping?</a>
</p>
<p>
At first glance, the <code>Create</code> method looks simple:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span> <span style="font-weight:bold;color:#74531f;">Create</span>(<span style="color:#2b91af;">Guid</span> <span style="font-weight:bold;color:#1f377f;">id</span>, <span style="color:#2b91af;">Table</span> <span style="font-weight:bold;color:#1f377f;">table</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">conn</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">SqlConnection</span>(<span style="font-weight:bold;color:#1f377f;">connectionString</span>);
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">cmd</span> = <span style="font-weight:bold;color:#1f377f;">table</span>.<span style="font-weight:bold;color:#74531f;">Accept</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">SqlInsertCommandVisitor</span>(<span style="font-weight:bold;color:#1f377f;">id</span>));
<span style="font-weight:bold;color:#1f377f;">cmd</span>.Connection = <span style="font-weight:bold;color:#1f377f;">conn</span>;
<span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">conn</span>.<span style="font-weight:bold;color:#74531f;">OpenAsync</span>().<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">cmd</span>.<span style="font-weight:bold;color:#74531f;">ExecuteNonQueryAsync</span>().<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
}</pre>
</p>
<p>
The main work, however, is done by the nested <code>SqlInsertCommandVisitor</code> class:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">SqlInsertCommandVisitor</span>(<span style="color:#2b91af;">Guid</span> <span style="font-weight:bold;color:#1f377f;">id</span>) : <span style="color:#2b91af;">ITableVisitor</span><<span style="color:#2b91af;">SqlCommand</span>>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">SqlCommand</span> <span style="font-weight:bold;color:#74531f;">VisitCommunal</span>(<span style="color:#2b91af;">NaturalNumber</span> <span style="font-weight:bold;color:#1f377f;">capacity</span>)
{
<span style="color:blue;">const</span> <span style="color:blue;">string</span> createCommunalSql = <span style="color:maroon;">@"
INSERT INTO [dbo].[Tables] ([PublicId], [Capacity])
VALUES (@PublicId, @Capacity)"</span>;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">cmd</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">SqlCommand</span>(createCommunalSql);
<span style="font-weight:bold;color:#1f377f;">cmd</span>.Parameters.<span style="font-weight:bold;color:#74531f;">AddWithValue</span>(<span style="color:#a31515;">"@PublicId"</span>, <span style="font-weight:bold;color:#1f377f;">id</span>);
<span style="font-weight:bold;color:#1f377f;">cmd</span>.Parameters.<span style="font-weight:bold;color:#74531f;">AddWithValue</span>(<span style="color:#a31515;">"@Capacity"</span>, (<span style="color:blue;">int</span>)<span style="font-weight:bold;color:#1f377f;">capacity</span>);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">cmd</span>;
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">SqlCommand</span> <span style="font-weight:bold;color:#74531f;">VisitSingle</span>(<span style="color:#2b91af;">NaturalNumber</span> <span style="font-weight:bold;color:#1f377f;">capacity</span>, <span style="color:#2b91af;">NaturalNumber</span> <span style="font-weight:bold;color:#1f377f;">minimalReservation</span>)
{
<span style="color:blue;">const</span> <span style="color:blue;">string</span> createSingleSql = <span style="color:maroon;">@"
INSERT INTO [dbo].[Tables] ([PublicId], [Capacity], [MinimalReservation])
VALUES (@PublicId, @Capacity, @MinimalReservation)"</span>;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">cmd</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">SqlCommand</span>(createSingleSql);
<span style="font-weight:bold;color:#1f377f;">cmd</span>.Parameters.<span style="font-weight:bold;color:#74531f;">AddWithValue</span>(<span style="color:#a31515;">"@PublicId"</span>, <span style="font-weight:bold;color:#1f377f;">id</span>);
<span style="font-weight:bold;color:#1f377f;">cmd</span>.Parameters.<span style="font-weight:bold;color:#74531f;">AddWithValue</span>(<span style="color:#a31515;">"@Capacity"</span>, (<span style="color:blue;">int</span>)<span style="font-weight:bold;color:#1f377f;">capacity</span>);
<span style="font-weight:bold;color:#1f377f;">cmd</span>.Parameters.<span style="font-weight:bold;color:#74531f;">AddWithValue</span>(<span style="color:#a31515;">"@MinimalReservation"</span>, (<span style="color:blue;">int</span>)<span style="font-weight:bold;color:#1f377f;">minimalReservation</span>);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">cmd</span>;
}
}</pre>
</p>
<p>
It 'pattern-matches' on the two possible kinds of table and returns an appropriate <a href="https://learn.microsoft.com/dotnet/api/microsoft.data.sqlclient.sqlcommand">SqlCommand</a> that the <code>Create</code> method then executes. Notice that no 'Entity' class is needed. The code works straight on <code>SqlCommand</code>.
</p>
<p>
The same is true for the repository's <code>Read</code> method:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Table</span>?> <span style="font-weight:bold;color:#74531f;">Read</span>(<span style="color:#2b91af;">Guid</span> <span style="font-weight:bold;color:#1f377f;">id</span>)
{
<span style="color:blue;">const</span> <span style="color:blue;">string</span> readByIdSql = <span style="color:maroon;">@"
SELECT [Capacity], [MinimalReservation]
FROM [dbo].[Tables]
WHERE[PublicId] = @id"</span>;
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">conn</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">SqlConnection</span>(<span style="font-weight:bold;color:#1f377f;">connectionString</span>);
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">cmd</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">SqlCommand</span>(readByIdSql, <span style="font-weight:bold;color:#1f377f;">conn</span>);
<span style="font-weight:bold;color:#1f377f;">cmd</span>.Parameters.<span style="font-weight:bold;color:#74531f;">AddWithValue</span>(<span style="color:#a31515;">"@id"</span>, <span style="font-weight:bold;color:#1f377f;">id</span>);
<span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">conn</span>.<span style="font-weight:bold;color:#74531f;">OpenAsync</span>().<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">rdr</span> = <span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">cmd</span>.<span style="font-weight:bold;color:#74531f;">ExecuteReaderAsync</span>().<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (!<span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">rdr</span>.<span style="font-weight:bold;color:#74531f;">ReadAsync</span>().<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">capacity</span> = (<span style="color:blue;">int</span>)<span style="font-weight:bold;color:#1f377f;">rdr</span>[<span style="color:#a31515;">"Capacity"</span>];
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">mimimalReservation</span> = <span style="font-weight:bold;color:#1f377f;">rdr</span>[<span style="color:#a31515;">"MinimalReservation"</span>] <span style="color:blue;">as</span> <span style="color:blue;">int</span>?;
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">mimimalReservation</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">Table</span>.<span style="color:#74531f;">TryCreateCommunal</span>(<span style="font-weight:bold;color:#1f377f;">capacity</span>);
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">Table</span>.<span style="color:#74531f;">TryCreateSingle</span>(<span style="font-weight:bold;color:#1f377f;">capacity</span>, <span style="font-weight:bold;color:#1f377f;">mimimalReservation</span>.Value);
}</pre>
</p>
<p>
It works directly on <a href="https://learn.microsoft.com/dotnet/api/microsoft.data.sqlclient.sqldatareader">SqlDataReader</a>. Again, no extra 'Entity' class is required. If the data in the database makes sense, the <code>Read</code> method return a well-<a href="/encapsulation-and-solid">encapsulated</a> <code>Table</code> object.
</p>
<h3 id="2a52a80eef4a4168b281a376751415e2">
XML formats <a href="#2a52a80eef4a4168b281a376751415e2">#</a>
</h3>
<p>
That covers the basics, but how well does this kind of architecture stand up to changing requirements?
</p>
<p>
One axis of variation is when a service needs to support multiple representations. In this example, I'll imagine that the service also needs to support not just one, but two, XML formats.
</p>
<p>
Granted, you may not run into that particular requirement that often, but it's typical of a kind of change that you're likely to run into. In REST APIs, for example, <a href="/2015/06/22/rest-implies-content-negotiation">you should use content negotiation for versioning</a>, and that's the same kind of problem.
</p>
<p>
To be fair, application code also changes for a variety of other reasons, including new features, changes to business logic, etc. I can't possibly cover all, though, and many of these are much better described than changes in wire formats.
</p>
<p>
As described in the <a href="/2024/07/25/three-data-architectures-for-the-server">introduction article</a>, ideally the XML should support a format implied by these examples:
</p>
<p>
<pre><span style="color:blue;"><</span><span style="color:#a31515;">communal-table</span><span style="color:blue;">></span>
<span style="color:blue;"> <</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>12<span style="color:blue;"></</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>
<span style="color:blue;"></</span><span style="color:#a31515;">communal-table</span><span style="color:blue;">></span>
<span style="color:blue;"><</span><span style="color:#a31515;">single-table</span><span style="color:blue;">></span>
<span style="color:blue;"> <</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>4<span style="color:blue;"></</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>
<span style="color:blue;"> <</span><span style="color:#a31515;">minimal-reservation</span><span style="color:blue;">></span>3<span style="color:blue;"></</span><span style="color:#a31515;">minimal-reservation</span><span style="color:blue;">></span>
<span style="color:blue;"></</span><span style="color:#a31515;">single-table</span><span style="color:blue;">></span></pre>
</p>
<p>
Notice that while these two examples have different root elements, they're still considered to both represent <em>a table</em>. Although <a href="/2023/10/16/at-the-boundaries-static-types-are-illusory">at the boundaries, static types are illusory</a> we may still, loosely speaking, consider both of those XML documents as belonging to the same 'type'.
</p>
<p>
With both of the previous architectures described in this article series, I've had to give up on this schema. The present data architecture, finally, is able to handle this requirement.
</p>
<h3 id="a0f5e6525743464c97e3fc090d04efbe">
HTTP interactions with element-biased XML <a href="#a0f5e6525743464c97e3fc090d04efbe">#</a>
</h3>
<p>
The service should support the new XML format when presented with the the <code>"application/xml"</code> media type, either as a <code>content-type</code> header or <code>accept</code> header. An initial <code>POST</code> request may look like this:
</p>
<p>
<pre>POST /tables HTTP/1.1
content-type: application/xml
<span style="color:blue;"><</span><span style="color:#a31515;">communal-table</span><span style="color:blue;">><</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>12<span style="color:blue;"></</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></</span><span style="color:#a31515;">communal-table</span><span style="color:blue;">></span></pre>
</p>
<p>
Which produces a reply like this:
</p>
<p>
<pre>HTTP/1.1 201 Created
Location: https://example.com/Tables/a77ac3fd221e4a5caaca3a0fc2b83ffc</pre>
</p>
<p>
And just like before, a client can later use the address in the <code>Location</code> header to request the resource. By using the <code>accept</code> header, it can indicate that it wishes to receive the reply formatted as XML:
</p>
<p>
<pre>GET /Tables/a77ac3fd221e4a5caaca3a0fc2b83ffc HTTP/1.1
accept: application/xml</pre>
</p>
<p>
Which produces this response with XML content in the body:
</p>
<p>
<pre>HTTP/1.1 200 OK
Content-Type: application/xml; charset=utf-8
<span style="color:blue;"><</span><span style="color:#a31515;">communal-table</span><span style="color:blue;">><</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>12<span style="color:blue;"></</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></</span><span style="color:#a31515;">communal-table</span><span style="color:blue;">></span></pre>
</p>
<p>
How do you add support for this new format?
</p>
<h3 id="a94d98e6c9b6458d9754354f736c69be">
Element-biased XML formatters <a href="#a94d98e6c9b6458d9754354f736c69be">#</a>
</h3>
<p>
Not surprisingly, you can add support for the new format by adding new formatters.
</p>
<p>
<pre><span style="font-weight:bold;color:#1f377f;">opts</span>.InputFormatters.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">ElementBiasedTableXmlInputFormatter</span>());
<span style="font-weight:bold;color:#1f377f;">opts</span>.OutputFormatters.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">ElementBiasedTableXmlOutputFormatter</span>());</pre>
</p>
<p>
Importantly, and in stark contrast to <a href="/2024/07/29/using-ports-and-adapters-to-persist-restaurant-table-configurations">the DTO-based Ports and Adapters example</a>, you don't have to change the existing code to add XML support. If you're concerned about design heuristics such as the <a href="https://en.wikipedia.org/wiki/Single-responsibility_principle">Single Responsibility Principle</a>, you may consider this a win. Apart from the two lines of code adding the formatters, all other code to support this new feature is in new classes.
</p>
<p>
Both of the new formatters support the <code>"application/xml"</code> media type.
</p>
<h3 id="853eca2110634b5289ffb3fe16f9dacd">
Deserializing element-biased XML <a href="#853eca2110634b5289ffb3fe16f9dacd">#</a>
</h3>
<p>
The constructor and <code>CanReadType</code> implementation of <code>ElementBiasedTableXmlInputFormatter</code> is nearly identical to code you've already seen here, so I'll skip the repetition. The <code>ReadRequestBodyAsync</code> implementation is also conceptually similar, but of course differs in the details.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">InputFormatterResult</span>> <span style="font-weight:bold;color:#74531f;">ReadRequestBodyAsync</span>(
<span style="color:#2b91af;">InputFormatterContext</span> <span style="font-weight:bold;color:#1f377f;">context</span>,
<span style="color:#2b91af;">Encoding</span> <span style="font-weight:bold;color:#1f377f;">encoding</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">xml</span> = <span style="color:blue;">await</span> <span style="color:#2b91af;">XElement</span>
.<span style="color:#74531f;">LoadAsync</span>(<span style="font-weight:bold;color:#1f377f;">context</span>.HttpContext.Request.Body, <span style="color:#2b91af;">LoadOptions</span>.None, <span style="color:#2b91af;">CancellationToken</span>.None)
.<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">table</span> = <span style="color:#2b91af;">TableXml</span>.<span style="color:#74531f;">TryParseElementBiased</span>(<span style="font-weight:bold;color:#1f377f;">xml</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">table</span> <span style="color:blue;">is</span> { })
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">await</span> <span style="color:#2b91af;">InputFormatterResult</span>.<span style="color:#74531f;">SuccessAsync</span>(<span style="font-weight:bold;color:#1f377f;">table</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">await</span> <span style="color:#2b91af;">InputFormatterResult</span>.<span style="color:#74531f;">FailureAsync</span>().<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
}</pre>
</p>
<p>
As is also the case with the JSON input formatter, the <code>ReadRequestBodyAsync</code> method really only implements an <a href="https://en.wikipedia.org/wiki/Adapter_pattern">Adapter</a> over a more specialized parser function:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">Table</span>? <span style="color:#74531f;">TryParseElementBiased</span>(<span style="color:#2b91af;">XElement</span> <span style="font-weight:bold;color:#1f377f;">xml</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">xml</span>.Name <span style="font-weight:bold;color:#74531f;">==</span> <span style="color:#a31515;">"communal-table"</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">capacity</span> = <span style="font-weight:bold;color:#1f377f;">xml</span>.<span style="font-weight:bold;color:#74531f;">Element</span>(<span style="color:#a31515;">"capacity"</span>)?.Value;
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">capacity</span> <span style="color:blue;">is</span> { })
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="color:blue;">int</span>.<span style="color:#74531f;">TryParse</span>(<span style="font-weight:bold;color:#1f377f;">capacity</span>, <span style="color:blue;">out</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">c</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">Table</span>.<span style="color:#74531f;">TryCreateCommunal</span>(<span style="font-weight:bold;color:#1f377f;">c</span>);
}
}
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">xml</span>.Name <span style="font-weight:bold;color:#74531f;">==</span> <span style="color:#a31515;">"single-table"</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">capacity</span> = <span style="font-weight:bold;color:#1f377f;">xml</span>.<span style="font-weight:bold;color:#74531f;">Element</span>(<span style="color:#a31515;">"capacity"</span>)?.Value;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">minimalReservation</span> = <span style="font-weight:bold;color:#1f377f;">xml</span>.<span style="font-weight:bold;color:#74531f;">Element</span>(<span style="color:#a31515;">"minimal-reservation"</span>)?.Value;
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">capacity</span> <span style="color:blue;">is</span> { } && <span style="font-weight:bold;color:#1f377f;">minimalReservation</span> <span style="color:blue;">is</span> { })
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="color:blue;">int</span>.<span style="color:#74531f;">TryParse</span>(<span style="font-weight:bold;color:#1f377f;">capacity</span>, <span style="color:blue;">out</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">c</span>) &&
<span style="color:blue;">int</span>.<span style="color:#74531f;">TryParse</span>(<span style="font-weight:bold;color:#1f377f;">minimalReservation</span>, <span style="color:blue;">out</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">mr</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">Table</span>.<span style="color:#74531f;">TryCreateSingle</span>(<span style="font-weight:bold;color:#1f377f;">c</span>, <span style="font-weight:bold;color:#1f377f;">mr</span>);
}
}
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
}</pre>
</p>
<p>
In keeping with the common theme of the Domain Model Only data architecture, it deserialized by examining an <a href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax Tree</a> (AST) or document object model (DOM), specifically making use of the <a href="https://learn.microsoft.com/dotnet/api/system.xml.linq.xelement">XElement</a> API. This class is really part of the <a href="https://learn.microsoft.com/dotnet/standard/linq/linq-xml-overview">LINQ to XML</a> API, but you'll probably agree that the above code example makes little use of LINQ.
</p>
<h3 id="53fbee4be7ca45c090de48168de537fb">
Serializing element-biased XML <a href="#53fbee4be7ca45c090de48168de537fb">#</a>
</h3>
<p>
Hardly surprising, turning a <code>Table</code> object into element-biased XML involves steps similar to converting it to JSON. The <code>ElementBiasedTableXmlOutputFormatter</code> class' <code>WriteResponseBodyAsync</code> method contains this implementation:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:#2b91af;">Task</span> <span style="font-weight:bold;color:#74531f;">WriteResponseBodyAsync</span>(
<span style="color:#2b91af;">OutputFormatterWriteContext</span> <span style="font-weight:bold;color:#1f377f;">context</span>,
<span style="color:#2b91af;">Encoding</span> <span style="font-weight:bold;color:#1f377f;">selectedEncoding</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">context</span>.Object <span style="color:blue;">is</span> <span style="color:#2b91af;">Table</span> <span style="font-weight:bold;color:#1f377f;">table</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">context</span>.HttpContext.Response.<span style="font-weight:bold;color:#74531f;">WriteAsync</span>(
<span style="font-weight:bold;color:#1f377f;">table</span>.<span style="font-weight:bold;color:#74531f;">GenerateElementBiasedXml</span>(),
<span style="font-weight:bold;color:#1f377f;">selectedEncoding</span>);
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">InvalidOperationException</span>(<span style="color:#a31515;">"Expected a Table object."</span>);
}</pre>
</p>
<p>
Again, the heavy lifting is done by a specialized function:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">static</span> <span style="color:blue;">string</span> <span style="color:#74531f;">GenerateElementBiasedXml</span>(<span style="color:blue;">this</span> <span style="color:#2b91af;">Table</span> <span style="font-weight:bold;color:#1f377f;">table</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">table</span>.<span style="font-weight:bold;color:#74531f;">Accept</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">ElementBiasedTableVisitor</span>());
}
<span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ElementBiasedTableVisitor</span> : <span style="color:#2b91af;">ITableVisitor</span><<span style="color:blue;">string</span>>
{
<span style="color:blue;">public</span> <span style="color:blue;">string</span> <span style="font-weight:bold;color:#74531f;">VisitCommunal</span>(<span style="color:#2b91af;">NaturalNumber</span> <span style="font-weight:bold;color:#1f377f;">capacity</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">xml</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">XElement</span>(
<span style="color:#a31515;">"communal-table"</span>,
<span style="color:blue;">new</span> <span style="color:#2b91af;">XElement</span>(<span style="color:#a31515;">"capacity"</span>, (<span style="color:blue;">int</span>)<span style="font-weight:bold;color:#1f377f;">capacity</span>));
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">xml</span>.<span style="font-weight:bold;color:#74531f;">ToString</span>(<span style="color:#2b91af;">SaveOptions</span>.DisableFormatting);
}
<span style="color:blue;">public</span> <span style="color:blue;">string</span> <span style="font-weight:bold;color:#74531f;">VisitSingle</span>(
<span style="color:#2b91af;">NaturalNumber</span> <span style="font-weight:bold;color:#1f377f;">capacity</span>,
<span style="color:#2b91af;">NaturalNumber</span> <span style="font-weight:bold;color:#1f377f;">minimalReservation</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">xml</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">XElement</span>(
<span style="color:#a31515;">"single-table"</span>,
<span style="color:blue;">new</span> <span style="color:#2b91af;">XElement</span>(<span style="color:#a31515;">"capacity"</span>, (<span style="color:blue;">int</span>)<span style="font-weight:bold;color:#1f377f;">capacity</span>),
<span style="color:blue;">new</span> <span style="color:#2b91af;">XElement</span>(<span style="color:#a31515;">"minimal-reservation"</span>, (<span style="color:blue;">int</span>)<span style="font-weight:bold;color:#1f377f;">minimalReservation</span>));
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">xml</span>.<span style="font-weight:bold;color:#74531f;">ToString</span>(<span style="color:#2b91af;">SaveOptions</span>.DisableFormatting);
}
}</pre>
</p>
<p>
True to form, <code>GenerateElementBiasedXml</code> assembles an appropriate AST for the kind of table in question, and finally converts it to a <code>string</code> value.
</p>
<h3 id="32ae551ed0b84bb6947e612d08ff7c50">
Attribute-biased XML <a href="#32ae551ed0b84bb6947e612d08ff7c50">#</a>
</h3>
<p>
I was curious how far I could take this kind of variation, so for the sake of exploration, I invented yet another XML format to support. Instead of making exclusive use of XML elements, this format uses XML attributes for primitive values.
</p>
<p>
<pre><span style="color:blue;"><</span><span style="color:#a31515;">communal-table</span><span style="color:blue;"> </span><span style="color:red;">capacity</span><span style="color:blue;">=</span>"<span style="color:blue;">12</span>"<span style="color:blue;"> /></span>
<span style="color:blue;"><</span><span style="color:#a31515;">single-table</span><span style="color:blue;"> </span><span style="color:red;">capacity</span><span style="color:blue;">=</span>"<span style="color:blue;">4</span>"<span style="color:blue;"> </span><span style="color:red;">minimal-reservation</span><span style="color:blue;">=</span>"<span style="color:blue;">3</span>"<span style="color:blue;"> /></span></pre>
</p>
<p>
In order to distinguish this XML format from the other, I invented the vendor media type <code>"application/vnd.ploeh.table+xml"</code>. The new formatters only handle this media type.
</p>
<p>
There's not much new to report. The new formatters work like the previous. In order to parse the new format, a new function does that, still based on <code>XElement</code>:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">Table</span>? <span style="color:#74531f;">TryParseAttributeBiased</span>(<span style="color:#2b91af;">XElement</span> <span style="font-weight:bold;color:#1f377f;">xml</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">xml</span>.Name <span style="font-weight:bold;color:#74531f;">==</span> <span style="color:#a31515;">"communal-table"</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">capacity</span> = <span style="font-weight:bold;color:#1f377f;">xml</span>.<span style="font-weight:bold;color:#74531f;">Attribute</span>(<span style="color:#a31515;">"capacity"</span>)?.Value;
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">capacity</span> <span style="color:blue;">is</span> { })
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="color:blue;">int</span>.<span style="color:#74531f;">TryParse</span>(<span style="font-weight:bold;color:#1f377f;">capacity</span>, <span style="color:blue;">out</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">c</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">Table</span>.<span style="color:#74531f;">TryCreateCommunal</span>(<span style="font-weight:bold;color:#1f377f;">c</span>);
}
}
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">xml</span>.Name <span style="font-weight:bold;color:#74531f;">==</span> <span style="color:#a31515;">"single-table"</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">capacity</span> = <span style="font-weight:bold;color:#1f377f;">xml</span>.<span style="font-weight:bold;color:#74531f;">Attribute</span>(<span style="color:#a31515;">"capacity"</span>)?.Value;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">minimalReservation</span> = <span style="font-weight:bold;color:#1f377f;">xml</span>.<span style="font-weight:bold;color:#74531f;">Attribute</span>(<span style="color:#a31515;">"minimal-reservation"</span>)?.Value;
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">capacity</span> <span style="color:blue;">is</span> { } && <span style="font-weight:bold;color:#1f377f;">minimalReservation</span> <span style="color:blue;">is</span> { })
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="color:blue;">int</span>.<span style="color:#74531f;">TryParse</span>(<span style="font-weight:bold;color:#1f377f;">capacity</span>, <span style="color:blue;">out</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">c</span>) &&
<span style="color:blue;">int</span>.<span style="color:#74531f;">TryParse</span>(<span style="font-weight:bold;color:#1f377f;">minimalReservation</span>, <span style="color:blue;">out</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">mr</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">Table</span>.<span style="color:#74531f;">TryCreateSingle</span>(<span style="font-weight:bold;color:#1f377f;">c</span>, <span style="font-weight:bold;color:#1f377f;">mr</span>);
}
}
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
}</pre>
</p>
<p>
Likewise, converting a <code>Table</code> object to this format looks like code you've already seen:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">static</span> <span style="color:blue;">string</span> <span style="color:#74531f;">GenerateAttributeBiasedXml</span>(<span style="color:blue;">this</span> <span style="color:#2b91af;">Table</span> <span style="font-weight:bold;color:#1f377f;">table</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">table</span>.<span style="font-weight:bold;color:#74531f;">Accept</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">AttributedBiasedTableVisitor</span>());
}
<span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">AttributedBiasedTableVisitor</span> : <span style="color:#2b91af;">ITableVisitor</span><<span style="color:blue;">string</span>>
{
<span style="color:blue;">public</span> <span style="color:blue;">string</span> <span style="font-weight:bold;color:#74531f;">VisitCommunal</span>(<span style="color:#2b91af;">NaturalNumber</span> <span style="font-weight:bold;color:#1f377f;">capacity</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">xml</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">XElement</span>(
<span style="color:#a31515;">"communal-table"</span>,
<span style="color:blue;">new</span> <span style="color:#2b91af;">XAttribute</span>(<span style="color:#a31515;">"capacity"</span>, (<span style="color:blue;">int</span>)<span style="font-weight:bold;color:#1f377f;">capacity</span>));
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">xml</span>.<span style="font-weight:bold;color:#74531f;">ToString</span>(<span style="color:#2b91af;">SaveOptions</span>.DisableFormatting);
}
<span style="color:blue;">public</span> <span style="color:blue;">string</span> <span style="font-weight:bold;color:#74531f;">VisitSingle</span>(
<span style="color:#2b91af;">NaturalNumber</span> <span style="font-weight:bold;color:#1f377f;">capacity</span>,
<span style="color:#2b91af;">NaturalNumber</span> <span style="font-weight:bold;color:#1f377f;">minimalReservation</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">xml</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">XElement</span>(
<span style="color:#a31515;">"single-table"</span>,
<span style="color:blue;">new</span> <span style="color:#2b91af;">XAttribute</span>(<span style="color:#a31515;">"capacity"</span>, (<span style="color:blue;">int</span>)<span style="font-weight:bold;color:#1f377f;">capacity</span>),
<span style="color:blue;">new</span> <span style="color:#2b91af;">XAttribute</span>(<span style="color:#a31515;">"minimal-reservation"</span>, (<span style="color:blue;">int</span>)<span style="font-weight:bold;color:#1f377f;">minimalReservation</span>));
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">xml</span>.<span style="font-weight:bold;color:#74531f;">ToString</span>(<span style="color:#2b91af;">SaveOptions</span>.DisableFormatting);
}
}</pre>
</p>
<p>
Consistent with adding the first XML support, I didn't have to touch any of the existing Controller or data access code.
</p>
<h3 id="490e8625f0334deda0f377c8e8a3ad51">
Evaluation <a href="#490e8625f0334deda0f377c8e8a3ad51">#</a>
</h3>
<p>
If you're concerned with <a href="https://en.wikipedia.org/wiki/Separation_of_concerns">separation of concerns</a>, the Domain Model Only architecture gracefully handles variation in external formats without impacting application logic, Domain Model, or data access. You deal with each new format in a consistent and independent manner. The architecture offers the ultimate data representation flexibility, since everything you can write as a stream of bytes you can implement.
</p>
<p>
Since <a href="/2023/10/16/at-the-boundaries-static-types-are-illusory">at the boundary, static types are illusory</a> this architecture is congruent with reality. For a REST service, at least, reality is what goes on the wire. While static types can also be used to model what wire formats look like, there's always a risk that you can use your <a href="https://en.wikipedia.org/wiki/Integrated_development_environment">IDE's</a> refactoring tools to change a DTO in such a way that the code still compiles, but you've now changed the wire format. This could easily break existing clients.
</p>
<p>
When wire compatibility is important, I test-drive enough <a href="/2021/01/25/self-hosted-integration-tests-in-aspnet">self-hosted tests</a> that directly use and verify the wire format to give me a good sense of stability. Without DTO classes, it becomes increasingly important to cover externally visible behaviour with a trustworthy test suite, but really, if compatibility is important, you should be doing that anyway.
</p>
<p>
It almost goes without saying that a requirement for this architecture is that your chosen web framework supports it. As you've seen here, ASP.NET does, but that's not a given in general. Most web frameworks worth their salt will come with mechanisms that enable you to add new wire formats, but the question is how opinionated such extensibility points are. Do they expect you to work with DTOs, or are they more flexible than that?
</p>
<p>
You may consider the pure Domain Model Only data architecture too specialized for everyday use. I may do that, too. As I wrote in the <a href="/2024/07/25/three-data-architectures-for-the-server">introduction article</a>, I don't intent these walk-throughs to be prescriptive. Rather, they explore what's possible, so that you and I have a bigger set of alternatives to choose from.
</p>
<h3 id="d3dc45b5c52741a99bee6788610cc69c">
Hybrid architectures <a href="#d3dc45b5c52741a99bee6788610cc69c">#</a>
</h3>
<p>
In the code base that accompanies <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, I use a hybrid data architecture that I've used for years. ADO.NET for data access, as shown here, but DTOs for external JSON serialization. As demonstrated in the article <a href="/2024/07/29/using-ports-and-adapters-to-persist-restaurant-table-configurations">Using Ports and Adapters to persist restaurant table configurations</a>, using DTOs for the presentation layer may cause trouble if you need to support multiple wire formats. On the other hand, if you don't expect that this is a concern, you may decide to run that risk. I often do that.
</p>
<p>
When presenting these three architectures to a larger audience, one audience member told me that his team used another hybrid architecture: DTOs for the presentation layer, and separate DTOs for data access, but no Domain Model. I can see how this makes sense in a mostly <a href="https://en.wikipedia.org/wiki/Create,_read,_update_and_delete">CRUD</a>-heavy application where nonetheless you need to be able to vary user interfaces independently from the database schema.
</p>
<p>
Finally, I should point out that the Domain Model Only data architecture is, in reality, also a kind of Ports and Adapters architecture. It just uses more low-level Adapter implementations than you idiomatically see.
</p>
<h3 id="9b25fc7ec7754509b06585eaf757e0cb">
Conclusion <a href="#9b25fc7ec7754509b06585eaf757e0cb">#</a>
</h3>
<p>
The Domain Model Only data architecture emphasises modelling business logic as a strongly-typed, well-encapsulated Domain Model, while eschewing using statically-typed DTOs for communication with external processes. What I most like about this alternative is that it leaves little confusion as to where functionality goes.
</p>
<p>
When you have, say, <code>TableDto</code>, <code>Table</code>, and <code>TableEntity</code> classes, you need a sophisticated and mature team to trust all developers to add functionality in the right place. If there's only a single <code>Table</code> Domain Model, it may be more evident to developers that only business logic belongs there, and other concerns ought to be addressed in different ways.
</p>
<p>
Even so, you may consider all the low-level parsing code not to your liking, and instead decide to use DTOs. I may too, depending on context.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="ed975f7a8fc64d3e9ee35efaa8a6bcbc">
<div class="comment-author"><a href="https://github.com/JesHansen">Jes Hansen</a> <a href="#ed975f7a8fc64d3e9ee35efaa8a6bcbc">#</a></div>
<div class="comment-content">
<p>
In this version of the data archictecture, let's suppose that the controller that now accepts a Domain Object directly is part of a larger REST API. How would you handle discoverability of the API, as the usual OpenAPI (Swagger et.al.) tools probably takes offence at this type of request object?
</p>
</div>
<div class="comment-date">2024-08-19 12:10 UTC</div>
</div>
<div class="comment" id="3b2bacb8fa274590ad45cb7864f19848">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#3b2bacb8fa274590ad45cb7864f19848">#</a></div>
<div class="comment-content">
<p>
Jes, thank you for writing. If by discoverability you mean 'documentation', I would handle that the same way I usually handle documentation requirements for REST APIs: by writing one or my documents that explain how the API works. If there are other possible uses of OpenAPI than that, and the GUI to perform ad-hoc experiments, I'm going to need to be taken to task, because then I'm not aware of them.
</p>
<p>
I've recently discussed <a href="/2024/05/13/gratification">my general misgivings about OpenAPI</a>, and they apply here as well. I'm aware that other people feel differently about this, and that's okay too.
</p>
<blockquote>
"the usual OpenAPI (Swagger et.al.) tools probably takes offence at this type of request object"
</blockquote>
<p>
You may be right, but I haven't tried, so I don't know if this is the case.
</p>
</div>
<div class="comment-date">2024-08-22 16:55 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Using a Shared Data Model to persist restaurant table configurationshttps://blog.ploeh.dk/2024/08/05/using-a-shared-data-model-to-persist-restaurant-table-configurations2024-08-05T06:14:00+00:00Mark Seemann
<div id="post">
<p>
<em>A data architecture example in C# and ASP.NET.</em>
</p>
<p>
This is part of a <a href="/2024/07/25/three-data-architectures-for-the-server">small article series on data architectures</a>. In this, the second instalment, you'll see a common attempt at addressing the mapping issue that I mentioned in the <a href="/2024/07/29/using-ports-and-adapters-to-persist-restaurant-table-configurations">previous article</a>. As the introductory article explains, the example code shows how to create a new restaurant table configuration, or how to display an existing resource. The sample code base is an ASP.NET 8.0 <a href="https://en.wikipedia.org/wiki/REST">REST</a> API.
</p>
<p>
Keep in mind that while the sample code does store data in a relational database, the term <em>table</em> in this article mainly refers to physical tables, rather than database tables.
</p>
<p>
The idea in this data architecture is to use a single, shared data model for each business object in the service. This is in contrast to the Ports and Adapters architecture, where you typically have a <a href="https://en.wikipedia.org/wiki/Data_transfer_object">Data Transfer Object</a> (DTO) for (<a href="https://json.org/">JSON</a> or <a href="https://en.wikipedia.org/wiki/XML">XML</a>) serialization, another class for the Domain Model, and a third to support an <a href="https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping">object-relational mapper</a>.
</p>
<p>
An architecture diagram may attempt to illustrate the idea like this:
</p>
<p>
<img src="/content/binary/shared-data-model-architecture.png" alt="Architecture diagram showing three vertically stacked layers named UI/data, business logic, and data access, with a vertical box labelled data model overlapping all three.">
</p>
<p>
While ostensibly keeping alive the idea of application <em>layers</em>, data models are allowed to cross layers to be used both for database persistence, business logic, and in the presentation layer.
</p>
<h3 id="880c5cdea9104efe99f0b816c6b2d632">
Data model <a href="#880c5cdea9104efe99f0b816c6b2d632">#</a>
</h3>
<p>
Since the goal is to use a single class to model all application concerns, it means that we also need to use it for database persistence. The most commonly used ORM in .NET is <a href="https://en.wikipedia.org/wiki/Entity_Framework">Entity Framework</a>, so I'll use that for the example. It's <a href="/2023/09/18/do-orms-reduce-the-need-for-mapping">not something that I normally do</a>, so it's possible that I could have done it better than what follows.
</p>
<p>
Still, assume that the database schema defines the <code>Tables</code> table like this:
</p>
<p>
<pre><span style="color:blue;">CREATE</span> <span style="color:blue;">TABLE</span> [dbo]<span style="color:gray;">.</span>[Tables]<span style="color:blue;"> </span><span style="color:gray;">(</span>
[Id] <span style="color:blue;">INT</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL</span> <span style="color:blue;">IDENTITY</span> <span style="color:blue;">PRIMARY</span> <span style="color:blue;">KEY</span><span style="color:gray;">,</span>
[PublicId] <span style="color:blue;">UNIQUEIDENTIFIER</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL</span> <span style="color:blue;">UNIQUE</span><span style="color:gray;">,</span>
[Capacity] <span style="color:blue;">INT</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL,</span>
[MinimalReservation] <span style="color:blue;">INT</span> <span style="color:gray;">NULL</span>
<span style="color:gray;">)</span></pre>
</p>
<p>
I used a scaffolding tool to generate Entity Framework code from the database schema and then modified what it had created. This is the result:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">partial</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Table</span>
{
[<span style="color:#2b91af;">JsonIgnore</span>]
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Id { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
[<span style="color:#2b91af;">JsonIgnore</span>]
<span style="color:blue;">public</span> <span style="color:#2b91af;">Guid</span> PublicId { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">string</span> Type => MinimalReservation.HasValue ? <span style="color:#a31515;">"single"</span> : <span style="color:#a31515;">"communal"</span>;
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Capacity { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">int</span>? MinimalReservation { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
}</pre>
</p>
<p>
Notice that I added <code>[JsonIgnore]</code> attributes to two of the properties, since I didn't want to serialize them to JSON. I also added the calculated property <code>Type</code> to include a discriminator in the JSON documents.
</p>
<h3 id="852e790115024988b43e57b369fe67e7">
HTTP interaction <a href="#852e790115024988b43e57b369fe67e7">#</a>
</h3>
<p>
A client can create a new table with a <code>POST</code> HTTP request:
</p>
<p>
<pre>POST /tables HTTP/1.1
content-type: application/json
{<span style="color:#2e75b6;">"type"</span>:<span style="color:#a31515;">"communal"</span>,<span style="color:#2e75b6;">"capacity"</span>:12}</pre>
</p>
<p>
Notice that the JSON document doesn't follow the desired schema described in the <a href="/2024/07/25/three-data-architectures-for-the-server">introduction article</a>. It can't, because the data architecture is bound to the shared <code>Table</code> class. Or at least, if it's possible to attain the desired format with a single class and only some strategically placed attributes, I'm not aware of it. As the article <a href="/2024/08/12/using-only-a-domain-model-to-persist-restaurant-table-configurations">Using only a Domain Model to persist restaurant table configurations</a> will show, it <em>is</em> possible to attain that goal with the appropriate middleware, but I consider doing that to be an example of the third architecture, so not something I will cover in this article.
</p>
<p>
The service will respond to the above request like this:
</p>
<p>
<pre>HTTP/1.1 201 Created
Location: https://example.com/Tables/777779466d2549d69f7e30b6c35bde3c</pre>
</p>
<p>
Clients can later use the address indicated by the <code>Location</code> header to retrieve a representation of the resource:
</p>
<p>
<pre>GET /Tables/777779466d2549d69f7e30b6c35bde3c HTTP/1.1
accept: application/json</pre>
</p>
<p>
Which elicits this response:
</p>
<p>
<pre>HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
{<span style="color:#2e75b6;">"type"</span>:<span style="color:#a31515;">"communal"</span>,<span style="color:#2e75b6;">"capacity"</span>:12}</pre>
</p>
<p>
The JSON format still doesn't conform to the desired format because the Controller in question deals exclusively with the shared <code>Table</code> data model.
</p>
<h3 id="7dd82d2b3f4348a98d1125547dc35684">
Boundary <a href="#7dd82d2b3f4348a98d1125547dc35684">#</a>
</h3>
<p>
At the boundary of the application, Controllers handle HTTP requests with <em>action methods</em> (an ASP.NET term). The framework matches requests by a combination of naming conventions and attributes. The <code>Post</code> action method handles incoming <code>POST</code> requests.
</p>
<p>
<pre>[<span style="color:#2b91af;">HttpPost</span>]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">IActionResult</span>> <span style="font-weight:bold;color:#74531f;">Post</span>(<span style="color:#2b91af;">Table</span> <span style="font-weight:bold;color:#1f377f;">table</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">id</span> = <span style="color:#2b91af;">Guid</span>.<span style="color:#74531f;">NewGuid</span>();
<span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">repository</span>.<span style="font-weight:bold;color:#74531f;">Create</span>(<span style="font-weight:bold;color:#1f377f;">id</span>, <span style="font-weight:bold;color:#1f377f;">table</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">CreatedAtActionResult</span>(
<span style="color:blue;">nameof</span>(<span style="font-weight:bold;color:#74531f;">Get</span>),
<span style="color:blue;">null</span>,
<span style="color:blue;">new</span> { id = <span style="font-weight:bold;color:#1f377f;">id</span>.<span style="font-weight:bold;color:#74531f;">ToString</span>(<span style="color:#a31515;">"N"</span>) },
<span style="color:blue;">null</span>);
}</pre>
</p>
<p>
Notice that the input parameter isn't a separate DTO, but rather the shared <code>Table</code> object. Since it's shared, the Controller can pass it directly to the <code>repository</code> without any mapping.
</p>
<p>
The same simplicity is on display in the <code>Get</code> method:
</p>
<p>
<pre>[<span style="color:#2b91af;">HttpGet</span>(<span style="color:#a31515;">"</span><span style="color:#0073ff;">{</span>id<span style="color:#0073ff;">}</span><span style="color:#a31515;">"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">IActionResult</span>> <span style="font-weight:bold;color:#74531f;">Get</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">id</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (!<span style="color:#2b91af;">Guid</span>.<span style="color:#74531f;">TryParseExact</span>(<span style="font-weight:bold;color:#1f377f;">id</span>, <span style="color:#a31515;">"N"</span>, <span style="color:blue;">out</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">guid</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">BadRequestResult</span>();
<span style="color:#2b91af;">Table</span>? <span style="font-weight:bold;color:#1f377f;">table</span> = <span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">repository</span>.<span style="font-weight:bold;color:#74531f;">Read</span>(<span style="font-weight:bold;color:#1f377f;">guid</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">table</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">NotFoundResult</span>();
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">OkObjectResult</span>(<span style="font-weight:bold;color:#1f377f;">table</span>);
}</pre>
</p>
<p>
Once the <code>Get</code> method has parsed the <code>id</code> it goes straight to the <code>repository</code>, retrieves the <code>table</code> and returns it if it's there. No mapping is required by the Controller. What about the <code>repository</code>?
</p>
<h3 id="a3154e68acfa4976b9cf91e4e1036834">
Data access <a href="#a3154e68acfa4976b9cf91e4e1036834">#</a>
</h3>
<p>
The <code>SqlTablesRepository</code> class reads and writes data from <a href="https://en.wikipedia.org/wiki/Microsoft_SQL_Server">SQL Server</a> using Entity Framework. The <code>Create</code> method is as simple as this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span> <span style="font-weight:bold;color:#74531f;">Create</span>(<span style="color:#2b91af;">Guid</span> <span style="font-weight:bold;color:#1f377f;">id</span>, <span style="color:#2b91af;">Table</span> <span style="font-weight:bold;color:#1f377f;">table</span>)
{
<span style="color:#2b91af;">ArgumentNullException</span>.<span style="color:#74531f;">ThrowIfNull</span>(<span style="font-weight:bold;color:#1f377f;">table</span>);
<span style="font-weight:bold;color:#1f377f;">table</span>.PublicId = <span style="font-weight:bold;color:#1f377f;">id</span>;
<span style="color:blue;">await</span> context.Tables.<span style="font-weight:bold;color:#74531f;">AddAsync</span>(<span style="font-weight:bold;color:#1f377f;">table</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="color:blue;">await</span> context.<span style="font-weight:bold;color:#74531f;">SaveChangesAsync</span>().<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
}</pre>
</p>
<p>
The <code>Read</code> method is even simpler - a one-liner:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Table</span>?> <span style="font-weight:bold;color:#74531f;">Read</span>(<span style="color:#2b91af;">Guid</span> <span style="font-weight:bold;color:#1f377f;">id</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">await</span> context.Tables
.<span style="font-weight:bold;color:#74531f;">SingleOrDefaultAsync</span>(<span style="font-weight:bold;color:#1f377f;">t</span> => <span style="font-weight:bold;color:#1f377f;">t</span>.PublicId <span style="font-weight:bold;color:#74531f;">==</span> <span style="font-weight:bold;color:#1f377f;">id</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
}</pre>
</p>
<p>
Again, no mapping. Just return the database Entity.
</p>
<h3 id="4f14688d058a469f858792ce8d129af9">
XML serialization <a href="#4f14688d058a469f858792ce8d129af9">#</a>
</h3>
<p>
Simple, so far, but how does this data architecture handle changing requirements?
</p>
<p>
One axis of variation is when a service needs to support multiple representations. In this example, I'll imagine that the service also needs to support XML.
</p>
<p>
Granted, you may not run into that particular requirement that often, but it's typical of a kind of change that you're likely to run into. In REST APIs, for example, <a href="/2015/06/22/rest-implies-content-negotiation">you should use content negotiation for versioning</a>, and that's the same kind of problem.
</p>
<p>
To be fair, application code also changes for a variety of other reasons, including new features, changes to business logic, etc. I can't possibly cover all, though, and many of these are much better described than changes in wire formats.
</p>
<p>
As was also the case in <a href="/2024/07/29/using-ports-and-adapters-to-persist-restaurant-table-configurations">the previous article</a>, it quickly turns out that it's not possible to support any of the desired XML formats described in the <a href="/2024/07/25/three-data-architectures-for-the-server">introduction article</a>. Instead, for the sake of exploring what <em>is</em> possible, I'll compromise and support XML documents like these examples:
</p>
<p>
<pre><span style="color:blue;"><</span><span style="color:#a31515;">table</span><span style="color:blue;">></span>
<span style="color:blue;"> <</span><span style="color:#a31515;">type</span><span style="color:blue;">></span>communal<span style="color:blue;"></</span><span style="color:#a31515;">type</span><span style="color:blue;">></span>
<span style="color:blue;"> <</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>12<span style="color:blue;"></</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>
<span style="color:blue;"></</span><span style="color:#a31515;">table</span><span style="color:blue;">></span>
<span style="color:blue;"><</span><span style="color:#a31515;">table</span><span style="color:blue;">></span>
<span style="color:blue;"> <</span><span style="color:#a31515;">type</span><span style="color:blue;">></span>single<span style="color:blue;"></</span><span style="color:#a31515;">type</span><span style="color:blue;">></span>
<span style="color:blue;"> <</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>4<span style="color:blue;"></</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>
<span style="color:blue;"> <</span><span style="color:#a31515;">minimal-reservation</span><span style="color:blue;">></span>3<span style="color:blue;"></</span><span style="color:#a31515;">minimal-reservation</span><span style="color:blue;">></span>
<span style="color:blue;"></</span><span style="color:#a31515;">table</span><span style="color:blue;">></span></pre>
</p>
<p>
This schema, it turns out, is the same as the element-biased format from <a href="/2024/07/29/using-ports-and-adapters-to-persist-restaurant-table-configurations">the previous article</a>. I could, instead, have chosen to support the attribute-biased format, but, because of the shared data model, not both.
</p>
<p>
Notice how using statically typed classes, attributes, and Reflection to guide serialization leads toward certain kinds of formats. You can't easily support any arbitrary JSON or XML schema, but are rather nudged into a more constrained subset of possible formats. There's nothing too bad about this. As usual, there are trade-offs involved. You concede flexibility, but gain convenience: Just slab some attributes on your DTO, and it works well enough for most purposes. I mostly point it out because this entire article series is about awareness of choice. There's always some cost to be paid.
</p>
<p>
That said, supporting that XML format is surprisingly easy:
</p>
<p>
<pre>[<span style="color:#2b91af;">XmlRoot</span>(<span style="color:#a31515;">"table"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">partial</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Table</span>
{
[<span style="color:#2b91af;">JsonIgnore</span>, <span style="color:#2b91af;">XmlIgnore</span>]
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Id { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
[<span style="color:#2b91af;">JsonIgnore</span>, <span style="color:#2b91af;">XmlIgnore</span>]
<span style="color:blue;">public</span> <span style="color:#2b91af;">Guid</span> PublicId { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
[<span style="color:#2b91af;">XmlElement</span>(<span style="color:#a31515;">"type"</span>), <span style="color:#2b91af;">NotMapped</span>]
<span style="color:blue;">public</span> <span style="color:blue;">string</span>? Type { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
[<span style="color:#2b91af;">XmlElement</span>(<span style="color:#a31515;">"capacity"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Capacity { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
[<span style="color:#2b91af;">XmlElement</span>(<span style="color:#a31515;">"minimal-reservation"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">int</span>? MinimalReservation { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">ShouldSerializeMinimalReservation</span>() =>
MinimalReservation.HasValue;
<span style="color:blue;">internal</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">InferType</span>()
{
Type = MinimalReservation.HasValue ? <span style="color:#a31515;">"single"</span> : <span style="color:#a31515;">"communal"</span>;
}
}</pre>
</p>
<p>
Most of the changes are simple additions of the <code>XmlRoot</code>, <code>XmlElement</code>, and <code>XmlIgnore</code> attributes. In order to serialize the <code><span style="color:blue;"><</span><span style="color:#a31515;">type</span><span style="color:blue;">></span></code> element, however, I also had to convert the <code>Type</code> property to a read/write property, which had some ripple effects.
</p>
<p>
For one, I had to add the <code>NotMapped</code> attribute to tell Entity Framework that it shouldn't try to save the value of that property in the database. As you can see in the above SQL schema, the <code>Tables</code> table has no <code>Type</code> column.
</p>
<p>
This also meant that I had to change the <code>Read</code> method in <code>SqlTablesRepository</code> to call the new <code>InferType</code> method:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Table</span>?> <span style="font-weight:bold;color:#74531f;">Read</span>(<span style="color:#2b91af;">Guid</span> <span style="font-weight:bold;color:#1f377f;">id</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">table</span> = <span style="color:blue;">await</span> context.Tables
.<span style="font-weight:bold;color:#74531f;">SingleOrDefaultAsync</span>(<span style="font-weight:bold;color:#1f377f;">t</span> => <span style="font-weight:bold;color:#1f377f;">t</span>.PublicId <span style="font-weight:bold;color:#74531f;">==</span> <span style="font-weight:bold;color:#1f377f;">id</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#1f377f;">table</span>?.<span style="font-weight:bold;color:#74531f;">InferType</span>();
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">table</span>;
}</pre>
</p>
<p>
I'm not happy with this kind of <a href="https://en.wikipedia.org/wiki/Sequential_coupling">sequential coupling</a>, but to be honest, this data architecture inherently has an appalling lack of <a href="/encapsulation-and-solid">encapsulation</a>. Having to call <code>InferType</code> is just par for the course.
</p>
<p>
That said, despite a few stumbling blocks, adding XML support turned out to be surprisingly easy in this data architecture. Granted, I had to compromise on the schema, and could only support one XML schema, so we shouldn't really take this as an endorsement. To <a href="/ref/psychology-of-computer-programming">paraphrase Gerald Weinberg</a>, if it doesn't have to work, it's easy to implement.
</p>
<h3 id="cd22920ca9564df0b88bd67d6fe2eef6">
Evaluation <a href="#cd22920ca9564df0b88bd67d6fe2eef6">#</a>
</h3>
<p>
There's no denying that the Shared Data Model architecture is <em>simple</em>. There's no mapping between different layers, and it's easy to get started. Like the <a href="/2024/07/29/using-ports-and-adapters-to-persist-restaurant-table-configurations">DTO-based Ports and Adapters architecture</a>, you'll find many documentation examples and <em>getting-started</em> guides that work like this. In a sense, you can say that it's what the ASP.NET framework, or, perhaps particularly the Entity Framework (EF), 'wants you to do'. To be fair, I find ASP.NET to be reasonably unopinionated, so what inveigling you may run into may be mostly attributable to EF.
</p>
<p>
While it may feel nice that it's easy to get started, <a href="/2024/05/13/gratification">instant gratification often comes at a cost</a>. Consider the <code>Table</code> class shown here. Because of various constraints imposed by EF and the JSON and XML serializers, it has no encapsulation. One thing is that the sophisticated <a href="/2018/06/25/visitor-as-a-sum-type">Visitor-encoded</a> <code>Table</code> class introduced in the article <a href="/2023/12/25/serializing-restaurant-tables-in-c">Serializing restaurant tables in C#</a> is completely out of the question, but you can't even add a required constructor like this one:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">Table</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">capacity</span>)
{
Capacity = <span style="font-weight:bold;color:#1f377f;">capacity</span>;
}</pre>
</p>
<p>
Granted, it seems to work with both EF and the JSON serializer, which I suppose is a recent development, but it doesn't work with the XML serializer, which requires that
</p>
<blockquote>
<p>
"A class must have a parameterless constructor to be serialized by <strong>XmlSerializer</strong>."
</p>
<footer><cite><a href="https://learn.microsoft.com/dotnet/standard/serialization/introducing-xml-serialization">XML serialization</a></cite>, Microsoft documentation, 2023-04-05, retrieved 2024-07-27, their emphasis</footer>
</blockquote>
<p>
Even if this, too, changes in the future, DTO-based designs are at odds with encapsulation. If you doubt the veracity of that statement, I challenge you to complete <a href="/2024/06/12/simpler-encapsulation-with-immutability">the Priority Collection kata</a> with serializable DTOs.
</p>
<p>
Another problem with the Shared Data Model architecture is that it so easily decays to a <a href="https://wiki.c2.com/?BigBallOfMud">Big Ball of Mud</a>. Even though the above architecture diagram hollowly insists that layering is still possible, a Shared Data Model is an attractor of behaviour. You'll soon find that a class like <code>Table</code> has methods that serve presentation concerns, others that implement business logic, and others again that address persistence issues. It has become a <a href="https://en.wikipedia.org/wiki/God_object">God Class</a>.
</p>
<p>
From these problems it doesn't follow that the architecture doesn't have merit. If you're developing a <a href="https://en.wikipedia.org/wiki/Create,_read,_update_and_delete">CRUD</a>-heavy application with a user interface (UI) that's merely a glorified table view, this could be a good choice. You <em>would</em> be coupling the UI to the database, so that if you need to change how the UI works, you might also have to modify the database schema, or vice versa.
</p>
<p>
This is still not an indictment, but merely an observation of consequences. If you can live with them, then choose the Shared Data Model architecture. I can easily imagine application types where that would be the case.
</p>
<h3 id="194d1086b32b49958ad88e5325b4e1f1">
Conclusion <a href="#194d1086b32b49958ad88e5325b4e1f1">#</a>
</h3>
<p>
In the Shared Data Model architecture you use a single model (here, a class) to handle all application concerns: UI, business logic, data access. While this shows a blatant disregard for the notion of separation of concerns, no law states that you must, always, separate concerns.
</p>
<p>
Sometimes it's okay to mix concerns, and then the Shared Data Model architecture is dead simple. Just make sure that you know when it's okay.
</p>
<p>
While this architecture is the ultimate in simplicity, it's also quite constrained. The third and final data architecture I'll cover, on the other hand, offers the ultimate in flexibility, at the expense (not surprisingly) of some complexity.
</p>
<p>
<strong>Next:</strong> <a href="/2024/08/12/using-only-a-domain-model-to-persist-restaurant-table-configurations">Using only a Domain Model to persist restaurant table configurations</a>.
</p>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Using Ports and Adapters to persist restaurant table configurationshttps://blog.ploeh.dk/2024/07/29/using-ports-and-adapters-to-persist-restaurant-table-configurations2024-07-29T08:05:00+00:00Mark Seemann
<div id="post">
<p>
<em>A data architecture example in C# and ASP.NET.</em>
</p>
<p>
This is part of a <a href="/2024/07/25/three-data-architectures-for-the-server">small article series on data architectures</a>. In the first instalment, you'll see the outline of a Ports and Adapters implementation. As the introductory article explains, the example code shows how to create a new restaurant table configuration, or how to display an existing resource. The sample code base is an ASP.NET 8.0 <a href="https://en.wikipedia.org/wiki/REST">REST</a> API.
</p>
<p>
Keep in mind that while the sample code does store data in a relational database, the term <em>table</em> in this article mainly refers to physical tables, rather than database tables.
</p>
<p>
While Ports and Adapters architecture diagrams are usually drawn as concentric circles, you can also draw (subsets of) it as more traditional layered diagrams:
</p>
<p>
<img src="/content/binary/three-layer-table-architecture.png" alt="Three-layer architecture diagram showing TableDto, Table, and TableEntity as three vertically stacked boxes, with arrows between them.">
</p>
<p>
Here, the arrows indicate mappings, not dependencies.
</p>
<h3 id="5e95ae96906c43f9aee53cfbd680a378">
HTTP interaction <a href="#5e95ae96906c43f9aee53cfbd680a378">#</a>
</h3>
<p>
A client can create a new table with a <code>POST</code> HTTP request:
</p>
<p>
<pre>POST /tables HTTP/1.1
content-type: application/json
{ <span style="color:#2e75b6;">"communalTable"</span>: { <span style="color:#2e75b6;">"capacity"</span>: 16 } }</pre>
</p>
<p>
Which might elicit a response like this:
</p>
<p>
<pre>HTTP/1.1 201 Created
Location: https://example.com/Tables/844581613e164813aa17243ff8b847af</pre>
</p>
<p>
Clients can later use the address indicated by the <code>Location</code> header to retrieve a representation of the resource:
</p>
<p>
<pre>GET /Tables/844581613e164813aa17243ff8b847af HTTP/1.1
accept: application/json</pre>
</p>
<p>
Which would result in this response:
</p>
<p>
<pre>HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
{<span style="color:#2e75b6;">"communalTable"</span>:{<span style="color:#2e75b6;">"capacity"</span>:16}}</pre>
</p>
<p>
By default, ASP.NET handles and returns JSON. Later in this article you'll see how well it deals with other data formats.
</p>
<h3 id="95914d7bd8c845d2b30d8ddc029c1711">
Boundary <a href="#95914d7bd8c845d2b30d8ddc029c1711">#</a>
</h3>
<p>
ASP.NET supports some variation of the <a href="https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller">model-view-controller</a> (MVC) pattern, and Controllers handle HTTP requests. At the outset, the <em>action method</em> that handles the <code>POST</code> request looks like this:
</p>
<p>
<pre>[<span style="color:#2b91af;">HttpPost</span>]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">IActionResult</span>> <span style="font-weight:bold;color:#74531f;">Post</span>(<span style="color:#2b91af;">TableDto</span> <span style="font-weight:bold;color:#1f377f;">dto</span>)
{
<span style="color:#2b91af;">ArgumentNullException</span>.<span style="color:#74531f;">ThrowIfNull</span>(<span style="font-weight:bold;color:#1f377f;">dto</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">id</span> = <span style="color:#2b91af;">Guid</span>.<span style="color:#74531f;">NewGuid</span>();
<span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">repository</span>.<span style="font-weight:bold;color:#74531f;">Create</span>(<span style="font-weight:bold;color:#1f377f;">id</span>, <span style="font-weight:bold;color:#1f377f;">dto</span>.<span style="font-weight:bold;color:#74531f;">ToTable</span>()).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">CreatedAtActionResult</span>(<span style="color:blue;">nameof</span>(<span style="font-weight:bold;color:#74531f;">Get</span>), <span style="color:blue;">null</span>, <span style="color:blue;">new</span> { id = <span style="font-weight:bold;color:#1f377f;">id</span>.<span style="font-weight:bold;color:#74531f;">ToString</span>(<span style="color:#a31515;">"N"</span>) }, <span style="color:blue;">null</span>);
}</pre>
</p>
<p>
As is <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> in ASP.NET, input and output are modelled by <a href="https://en.wikipedia.org/wiki/Data_transfer_object">data transfer objects</a> (DTOs), in this case called <code>TableDto</code>. I've already covered this little object model in the article <a href="/2023/12/25/serializing-restaurant-tables-in-c">Serializing restaurant tables in C#</a>, so I'm not going to repeat it here.
</p>
<p>
The <code>ToTable</code> method, on the other hand, is a good example of how trying to cut corners lead to more complicated code:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:#2b91af;">Table</span> <span style="font-weight:bold;color:#74531f;">ToTable</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">candidate</span> =
<span style="color:#2b91af;">Table</span>.<span style="color:#74531f;">TryCreateSingle</span>(SingleTable?.Capacity ?? -1, SingleTable?.MinimalReservation ?? -1);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">candidate</span> <span style="color:blue;">is</span> { })
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">candidate</span>.Value;
<span style="font-weight:bold;color:#1f377f;">candidate</span> = <span style="color:#2b91af;">Table</span>.<span style="color:#74531f;">TryCreateCommunal</span>(CommunalTable?.Capacity ?? -1);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">candidate</span> <span style="color:blue;">is</span> { })
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">candidate</span>.Value;
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">InvalidOperationException</span>(<span style="color:#a31515;">"Invalid TableDto."</span>);
}</pre>
</p>
<p>
Compare it to the <code>TryParse</code> method in the <a href="/2023/12/25/serializing-restaurant-tables-in-c">Serializing restaurant tables in C#</a> article. That one is simpler, and less error-prone.
</p>
<p>
I think that I wrote <code>ToTable</code> that way because I didn't want to deal with error handling in the Controller, and while I test-drove the code, I never wrote a test that supply malformed input. I should have, and so should you, but this is demo code, and I never got around to it.
</p>
<p>
Enough about that. The other action method handles <code>GET</code> requests:
</p>
<p>
<pre>[<span style="color:#2b91af;">HttpGet</span>(<span style="color:#a31515;">"</span><span style="color:#0073ff;">{</span>id<span style="color:#0073ff;">}</span><span style="color:#a31515;">"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">IActionResult</span>> <span style="font-weight:bold;color:#74531f;">Get</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">id</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (!<span style="color:#2b91af;">Guid</span>.<span style="color:#74531f;">TryParseExact</span>(<span style="font-weight:bold;color:#1f377f;">id</span>, <span style="color:#a31515;">"N"</span>, <span style="color:blue;">out</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">guid</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">BadRequestResult</span>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">table</span> = <span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">repository</span>.<span style="font-weight:bold;color:#74531f;">Read</span>(<span style="font-weight:bold;color:#1f377f;">guid</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">table</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">NotFoundResult</span>();
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">OkObjectResult</span>(<span style="color:#2b91af;">TableDto</span>.<span style="color:#74531f;">From</span>(<span style="font-weight:bold;color:#1f377f;">table</span>.Value));
}</pre>
</p>
<p>
The static <code>TableDto.From</code> method is identical to the <code>ToDto</code> method from the <a href="/2023/12/25/serializing-restaurant-tables-in-c">Serializing restaurant tables in C#</a> article, just with a different name.
</p>
<p>
To summarize so far: At the boundary of the application, Controller methods receive or return <code>TableDto</code> objects, which are mapped to and from the Domain Model named <code>Table</code>.
</p>
<h3 id="b5cbc5042c01488da08a856d84d3f17e">
Domain Model <a href="#b5cbc5042c01488da08a856d84d3f17e">#</a>
</h3>
<p>
The Domain Model <code>Table</code> is also identical to the code shown in <a href="/2023/12/25/serializing-restaurant-tables-in-c">Serializing restaurant tables in C#</a>. In order to comply with the <a href="https://en.wikipedia.org/wiki/Dependency_inversion_principle">Dependency Inversion Principle</a> (DIP), mapping to and from <code>TableDto</code> is defined on the latter. The DTO, being an implementation detail, may depend on the abstraction (the Domain Model), but not the other way around.
</p>
<p>
In the same spirit, conversions to and from the database are defined entirely within the <code>repository</code> implementation.
</p>
<h3 id="84b01fa88e8541f5afbb9aa04f454645">
Data access layer <a href="#84b01fa88e8541f5afbb9aa04f454645">#</a>
</h3>
<p>
Keeping the example consistent, the code base also models data access with C# classes. It uses <a href="https://learn.microsoft.com/ef">Entity Framework</a> to read from and write to <a href="https://en.wikipedia.org/wiki/Microsoft_SQL_Server">SQL Server</a>. The class that models a row in the database is also a kind of DTO, even though here it's idiomatically called an <em>entity:</em>
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">partial</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">TableEntity</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Id { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:#2b91af;">Guid</span> PublicId { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Capacity { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">int</span>? MinimalReservation { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
}</pre>
</p>
<p>
I had a command-line tool scaffold the code for me, and since <a href="/2023/09/18/do-orms-reduce-the-need-for-mapping">I don't usually live in that world</a>, I don't know why it's a <code>partial class</code>. It seems to be working, though.
</p>
<p>
The <code>SqlTablesRepository</code> class implements the mapping between <code>Table</code> and <code>TableEntity</code>. For instance, the <code>Create</code> method looks like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span> <span style="font-weight:bold;color:#74531f;">Create</span>(<span style="color:#2b91af;">Guid</span> <span style="font-weight:bold;color:#1f377f;">id</span>, <span style="color:#2b91af;">Table</span> <span style="font-weight:bold;color:#1f377f;">table</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">entity</span> = <span style="font-weight:bold;color:#1f377f;">table</span>.<span style="font-weight:bold;color:#74531f;">Accept</span>(<span style="color:blue;">new</span> <span style="color:#2b91af;">TableToEntityConverter</span>(<span style="font-weight:bold;color:#1f377f;">id</span>));
<span style="color:blue;">await</span> context.Tables.<span style="font-weight:bold;color:#74531f;">AddAsync</span>(<span style="font-weight:bold;color:#1f377f;">entity</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="color:blue;">await</span> context.<span style="font-weight:bold;color:#74531f;">SaveChangesAsync</span>().<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
}</pre>
</p>
<p>
That looks simple, but is only because all the work is done by the <code>TableToEntityConverter</code>, which is a nested class:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">TableToEntityConverter</span> : <span style="color:#2b91af;">ITableVisitor</span><<span style="color:#2b91af;">TableEntity</span>>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">Guid</span> id;
<span style="color:blue;">public</span> <span style="color:#2b91af;">TableToEntityConverter</span>(<span style="color:#2b91af;">Guid</span> <span style="font-weight:bold;color:#1f377f;">id</span>)
{
<span style="color:blue;">this</span>.id = <span style="font-weight:bold;color:#1f377f;">id</span>;
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">TableEntity</span> <span style="font-weight:bold;color:#74531f;">VisitCommunal</span>(<span style="color:#2b91af;">NaturalNumber</span> <span style="font-weight:bold;color:#1f377f;">capacity</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">TableEntity</span>
{
PublicId = id,
Capacity = (<span style="color:blue;">int</span>)<span style="font-weight:bold;color:#1f377f;">capacity</span>,
};
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">TableEntity</span> <span style="font-weight:bold;color:#74531f;">VisitSingle</span>(
<span style="color:#2b91af;">NaturalNumber</span> <span style="font-weight:bold;color:#1f377f;">capacity</span>,
<span style="color:#2b91af;">NaturalNumber</span> <span style="font-weight:bold;color:#1f377f;">minimalReservation</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">TableEntity</span>
{
PublicId = id,
Capacity = (<span style="color:blue;">int</span>)<span style="font-weight:bold;color:#1f377f;">capacity</span>,
MinimalReservation = (<span style="color:blue;">int</span>)<span style="font-weight:bold;color:#1f377f;">minimalReservation</span>,
};
}
}</pre>
</p>
<p>
Mapping the other way is easier, so the <code>SqlTablesRepository</code> does it inline in the <code>Read</code> method:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">Table</span>?> <span style="font-weight:bold;color:#74531f;">Read</span>(<span style="color:#2b91af;">Guid</span> <span style="font-weight:bold;color:#1f377f;">id</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">entity</span> = <span style="color:blue;">await</span> context.Tables
.<span style="font-weight:bold;color:#74531f;">SingleOrDefaultAsync</span>(<span style="font-weight:bold;color:#1f377f;">t</span> => <span style="font-weight:bold;color:#1f377f;">t</span>.PublicId <span style="font-weight:bold;color:#74531f;">==</span> <span style="font-weight:bold;color:#1f377f;">id</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">entity</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">entity</span>.MinimalReservation <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">Table</span>.<span style="color:#74531f;">TryCreateCommunal</span>(<span style="font-weight:bold;color:#1f377f;">entity</span>.Capacity);
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">Table</span>.<span style="color:#74531f;">TryCreateSingle</span>(
<span style="font-weight:bold;color:#1f377f;">entity</span>.Capacity,
<span style="font-weight:bold;color:#1f377f;">entity</span>.MinimalReservation.Value);
}</pre>
</p>
<p>
Similar to the case of the DTO, mapping between <code>Table</code> and <code>TableEntity</code> is the responsibility of the <code>SqlTablesRepository</code> class, since data persistence is an implementation detail. According to the DIP it shouldn't be part of the Domain Model, and it isn't.
</p>
<h3 id="77dda87b99ac49bf91d08079e252fc50">
XML formats <a href="#77dda87b99ac49bf91d08079e252fc50">#</a>
</h3>
<p>
That covers the basics, but how well does this kind of architecture stand up to changing requirements?
</p>
<p>
One axis of variation is when a service needs to support multiple representations. In this example, I'll imagine that the service also needs to support not just one, but two, XML formats.
</p>
<p>
Granted, you may not run into that particular requirement that often, but it's typical of a kind of change that you're likely to run into. In REST APIs, for example, <a href="/2015/06/22/rest-implies-content-negotiation">you should use content negotiation for versioning</a>, and that's the same kind of problem.
</p>
<p>
To be fair, application code also changes for a variety of other reasons, including new features, changes to business logic, etc. I can't possibly cover all, though, and many of these are much better described than changes in wire formats.
</p>
<p>
As described in the <a href="/2024/07/25/three-data-architectures-for-the-server">introduction article</a>, ideally the XML should support a format implied by these examples:
</p>
<p>
<pre><span style="color:blue;"><</span><span style="color:#a31515;">communal-table</span><span style="color:blue;">></span>
<span style="color:blue;"> <</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>12<span style="color:blue;"></</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>
<span style="color:blue;"></</span><span style="color:#a31515;">communal-table</span><span style="color:blue;">></span>
<span style="color:blue;"><</span><span style="color:#a31515;">single-table</span><span style="color:blue;">></span>
<span style="color:blue;"> <</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>4<span style="color:blue;"></</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>
<span style="color:blue;"> <</span><span style="color:#a31515;">minimal-reservation</span><span style="color:blue;">></span>3<span style="color:blue;"></</span><span style="color:#a31515;">minimal-reservation</span><span style="color:blue;">></span>
<span style="color:blue;"></</span><span style="color:#a31515;">single-table</span><span style="color:blue;">></span></pre>
</p>
<p>
Notice that while these two examples have different root elements, they're still considered to both represent <em>a table</em>. Although <a href="/2023/10/16/at-the-boundaries-static-types-are-illusory">at the boundaries, static types are illusory</a> we may still, loosely speaking, consider both of those XML documents as belonging to the same 'type'.
</p>
<p>
To be honest, if there's a way to support this kind of <a href="/2024/04/15/services-share-schema-and-contract-not-class">schema</a> by defining DTOs to be serialized and deserialized, I don't know what it looks like. That's not meant to imply that it's impossible. There's often an epistemological problem associated with proving things impossible, so I'll just leave it there.
</p>
<p>
To be clear, it's not that I don't know how to support that kind of schema at all. I do, as the article <a href="/2024/08/12/using-only-a-domain-model-to-persist-restaurant-table-configurations">Using only a Domain Model to persist restaurant table configurations</a> will show. I just don't know how to do it with DTO-based serialisation.
</p>
<h3 id="6b0409d4-67fd-4e86-9a83-538fa5b690d4">
Element-biased XML <a href="#6b0409d4-67fd-4e86-9a83-538fa5b690d4">#</a>
</h3>
<p>
Instead of the above XML schema, I will, instead explore how hard it is to support a variant schema, implied by these two examples:
</p>
<p>
<pre><span style="color:blue;"><</span><span style="color:#a31515;">table</span><span style="color:blue;">></span>
<span style="color:blue;"> <</span><span style="color:#a31515;">type</span><span style="color:blue;">></span>communal<span style="color:blue;"></</span><span style="color:#a31515;">type</span><span style="color:blue;">></span>
<span style="color:blue;"> <</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>12<span style="color:blue;"></</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>
<span style="color:blue;"></</span><span style="color:#a31515;">table</span><span style="color:blue;">></span>
<span style="color:blue;"><</span><span style="color:#a31515;">table</span><span style="color:blue;">></span>
<span style="color:blue;"> <</span><span style="color:#a31515;">type</span><span style="color:blue;">></span>single<span style="color:blue;"></</span><span style="color:#a31515;">type</span><span style="color:blue;">></span>
<span style="color:blue;"> <</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>4<span style="color:blue;"></</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>
<span style="color:blue;"> <</span><span style="color:#a31515;">minimal-reservation</span><span style="color:blue;">></span>3<span style="color:blue;"></</span><span style="color:#a31515;">minimal-reservation</span><span style="color:blue;">></span>
<span style="color:blue;"></</span><span style="color:#a31515;">table</span><span style="color:blue;">></span></pre>
</p>
<p>
This variation shares the same <code><span style="color:blue;"><</span><span style="color:#a31515;">table</span><span style="color:blue;">></span></code> root element and instead distinguishes between the two kinds of table with a <code><span style="color:blue;"><</span><span style="color:#a31515;">type</span><span style="color:blue;">></span></code> discriminator.
</p>
<p>
This kind of schema we can define with a DTO:
</p>
<p>
<pre>[<span style="color:#2b91af;">XmlRoot</span>(<span style="color:#a31515;">"table"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ElementBiasedTableXmlDto</span>
{
[<span style="color:#2b91af;">XmlElement</span>(<span style="color:#a31515;">"type"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">string</span>? Type { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
[<span style="color:#2b91af;">XmlElement</span>(<span style="color:#a31515;">"capacity"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Capacity { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
[<span style="color:#2b91af;">XmlElement</span>(<span style="color:#a31515;">"minimal-reservation"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">int</span>? MinimalReservation { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">ShouldSerializeMinimalReservation</span>() =>
MinimalReservation.HasValue;
<span style="color:green;">// Mapping methods not shown here...</span>
}</pre>
</p>
<p>
As you may have already noticed, however, this isn't the same type as <code>TableJsonDto</code>, so how are we going to implement the Controller methods that receive and send objects of this type?
</p>
<h3 id="28e522bca7b04f94a68c00f15f0d7243">
Posting XML <a href="#28e522bca7b04f94a68c00f15f0d7243">#</a>
</h3>
<p>
The service should still accept JSON as shown above, but now, additionally, it should also support HTTP requests like this one:
</p>
<p>
<pre>POST /tables HTTP/1.1
content-type: application/xml
<span style="color:blue;"><</span><span style="color:#a31515;">table</span><span style="color:blue;">><</span><span style="color:#a31515;">type</span><span style="color:blue;">></span>communal<span style="color:blue;"></</span><span style="color:#a31515;">type</span><span style="color:blue;">><</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>12<span style="color:blue;"></</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></</span><span style="color:#a31515;">table</span><span style="color:blue;">></span></pre>
</p>
<p>
How do you implement this new feature?
</p>
<p>
My first thought was to add a <code>Post</code> overload to the Controller:
</p>
<p>
<pre>[<span style="color:#2b91af;">HttpPost</span>]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">IActionResult</span>> <span style="font-weight:bold;color:#74531f;">Post</span>(<span style="color:#2b91af;">ElementBiasedTableXmlDto</span> <span style="font-weight:bold;color:#1f377f;">dto</span>)
{
<span style="color:#2b91af;">ArgumentNullException</span>.<span style="color:#74531f;">ThrowIfNull</span>(<span style="font-weight:bold;color:#1f377f;">dto</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">id</span> = <span style="color:#2b91af;">Guid</span>.<span style="color:#74531f;">NewGuid</span>();
<span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">repository</span>.<span style="font-weight:bold;color:#74531f;">Create</span>(<span style="font-weight:bold;color:#1f377f;">id</span>, <span style="font-weight:bold;color:#1f377f;">dto</span>.<span style="font-weight:bold;color:#74531f;">ToTable</span>()).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">CreatedAtActionResult</span>(
<span style="color:blue;">nameof</span>(<span style="font-weight:bold;color:#74531f;">Get</span>),
<span style="color:blue;">null</span>,
<span style="color:blue;">new</span> { id = <span style="font-weight:bold;color:#1f377f;">id</span>.<span style="font-weight:bold;color:#74531f;">ToString</span>(<span style="color:#a31515;">"N"</span>) },
<span style="color:blue;">null</span>);
}</pre>
</p>
<p>
I just copied and pasted the original <code>Post</code> method and changed the type of the <code>dto</code> parameter. I also had to add a <code>ToTable</code> conversion to <code>ElementBiasedTableXmlDto</code>:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:#2b91af;">Table</span> <span style="font-weight:bold;color:#74531f;">ToTable</span>()
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (Type == <span style="color:#a31515;">"single"</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">t</span> = <span style="color:#2b91af;">Table</span>.<span style="color:#74531f;">TryCreateSingle</span>(Capacity, MinimalReservation ?? 0);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">t</span> <span style="color:blue;">is</span> { })
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">t</span>.Value;
}
<span style="font-weight:bold;color:#8f08c4;">if</span> (Type == <span style="color:#a31515;">"communal"</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">t</span> = <span style="color:#2b91af;">Table</span>.<span style="color:#74531f;">TryCreateCommunal</span>(Capacity);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">t</span> <span style="color:blue;">is</span> { })
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">t</span>.Value;
}
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">InvalidOperationException</span>(<span style="color:#a31515;">"Invalid Table DTO."</span>);
}</pre>
</p>
<p>
While all of that compiles, it doesn't work.
</p>
<p>
When you attempt to <code>POST</code> a request against the service, the ASP.NET framework now throws an <code>AmbiguousMatchException</code> indicating that "The request matched multiple endpoints". Which is understandable.
</p>
<p>
This lead me to the first round of <a href="/2023/10/02/dependency-whac-a-mole">Framework Whac-A-Mole</a>. What I'd like to do is to select the appropriate action method based on <code>content-type</code> or <code>accept</code> headers. How does one do that?
</p>
<p>
After some web searching, I came across <a href="https://stackoverflow.com/a/1045616/126014">a Stack Overflow answer</a> that seemed to indicate a way forward.
</p>
<h3 id="5f262b2b42c542148b62223f0fb3d9a9">
Selecting the right action method <a href="#5f262b2b42c542148b62223f0fb3d9a9">#</a>
</h3>
<p>
One way to address the issue is to implement a custom <a href="https://learn.microsoft.com/dotnet/api/microsoft.aspnetcore.mvc.actionconstraints.actionmethodselectorattribute">ActionMethodSelectorAttribute</a>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">SelectTableActionMethodAttribute</span> : <span style="color:#2b91af;">ActionMethodSelectorAttribute</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">IsValidForRequest</span>(<span style="color:#2b91af;">RouteContext</span> <span style="font-weight:bold;color:#1f377f;">routeContext</span>, <span style="color:#2b91af;">ActionDescriptor</span> <span style="font-weight:bold;color:#1f377f;">action</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">action</span> <span style="color:blue;">is</span> <span style="color:blue;">not</span> <span style="color:#2b91af;">ControllerActionDescriptor</span> <span style="font-weight:bold;color:#1f377f;">cad</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">false</span>;
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">cad</span>.Parameters.Count != 1)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">false</span>;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">dtoType</span> = <span style="font-weight:bold;color:#1f377f;">cad</span>.Parameters[0].ParameterType;
<span style="color:green;">// Demo code only. This doesn't take into account a possible charset</span>
<span style="color:green;">// parameter. See RFC 9110, section 8.3</span>
<span style="color:green;">// (https://www.rfc-editor.org/rfc/rfc9110#field.content-type) for more</span>
<span style="color:green;">// information.</span>
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">routeContext</span>?.HttpContext.Request.ContentType == <span style="color:#a31515;">"application/json"</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">dtoType</span> <span style="font-weight:bold;color:#74531f;">==</span> <span style="color:blue;">typeof</span>(<span style="color:#2b91af;">TableJsonDto</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">routeContext</span>?.HttpContext.Request.ContentType == <span style="color:#a31515;">"application/xml"</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">dtoType</span> <span style="font-weight:bold;color:#74531f;">==</span> <span style="color:blue;">typeof</span>(<span style="color:#2b91af;">ElementBiasedTableXmlDto</span>);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">false</span>;
}
}</pre>
</p>
<p>
As the code comment suggests, this isn't as robust as it should be. A <code>content-type</code> header may also look like this:
</p>
<p>
<pre>Content-Type: application/json; charset=utf-8</pre>
</p>
<p>
The exact string equality check shown above would fail in such a scenario, suggesting that a more sophisticated implementation is warranted. I'll skip that for now, since this demo code already compromises on the overall XML schema. For an example of more robust <a href="https://en.wikipedia.org/wiki/Content_negotiation">content negotiation</a> implementations, see <a href="/2024/08/12/using-only-a-domain-model-to-persist-restaurant-table-configurations">Using only a Domain Model to persist restaurant table configurations</a>.
</p>
<p>
Adorn both <code>Post</code> action methods with this custom attribute, and the service now handles both formats:
</p>
<p>
<pre>[<span style="color:#2b91af;">HttpPost</span>, <span style="color:#2b91af;">SelectTableActionMethod</span>]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">IActionResult</span>> <span style="font-weight:bold;color:#74531f;">Post</span>(<span style="color:#2b91af;">TableJsonDto</span> <span style="font-weight:bold;color:#1f377f;">dto</span>)
<span style="color:green;">// ...</span>
[<span style="color:#2b91af;">HttpPost</span>, <span style="color:#2b91af;">SelectTableActionMethod</span>]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">IActionResult</span>> <span style="font-weight:bold;color:#74531f;">Post</span>(<span style="color:#2b91af;">ElementBiasedTableXmlDto</span> <span style="font-weight:bold;color:#1f377f;">dto</span>)
<span style="color:green;">// ...</span></pre>
</p>
<p>
While that handles <code>POST</code> requests, it doesn't implement content negotiation for <code>GET</code> requests.
</p>
<h3 id="4cc040875e3349d996b3711a6fea20c0">
Getting XML <a href="#4cc040875e3349d996b3711a6fea20c0">#</a>
</h3>
<p>
In order to <code>GET</code> an XML representation, clients can supply an <code>accept</code> header value:
</p>
<p>
<pre>GET /Tables/153f224c91fb4403988934118cc14024 HTTP/1.1
accept: application/xml</pre>
</p>
<p>
which will reply with
</p>
<p>
<pre>HTTP/1.1 200 OK
Content-Length: 59
Content-Type: application/xml; charset=utf-8
<span style="color:blue;"><</span><span style="color:#a31515;">table</span><span style="color:blue;">><</span><span style="color:#a31515;">type</span><span style="color:blue;">></span>communal<span style="color:blue;"></</span><span style="color:#a31515;">type</span><span style="color:blue;">><</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>12<span style="color:blue;"></</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></</span><span style="color:#a31515;">table</span><span style="color:blue;">></span></pre>
</p>
<p>
How do we implement that?
</p>
<p>
Keep in mind that since this data-architecture variation uses two different DTOs to model JSON and XML, respectively, an action method can't just return an object of a single type and hope that the ASP.NET framework takes care of the rest. Again, I'm aware of middleware that'll deal nicely with this kind of problem, but not in this architecture; see <a href="/2024/08/12/using-only-a-domain-model-to-persist-restaurant-table-configurations">Using only a Domain Model to persist restaurant table configurations</a> for such a solution.
</p>
<p>
The best I could come up with, given the constraints I'd imposed on myself, then, was this:
</p>
<p>
<pre>[<span style="color:#2b91af;">HttpGet</span>(<span style="color:#a31515;">"</span><span style="color:#0073ff;">{</span>id<span style="color:#0073ff;">}</span><span style="color:#a31515;">"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">IActionResult</span>> <span style="font-weight:bold;color:#74531f;">Get</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">id</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (!<span style="color:#2b91af;">Guid</span>.<span style="color:#74531f;">TryParseExact</span>(<span style="font-weight:bold;color:#1f377f;">id</span>, <span style="color:#a31515;">"N"</span>, <span style="color:blue;">out</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">guid</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">BadRequestResult</span>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">table</span> = <span style="color:blue;">await</span> <span style="font-weight:bold;color:#1f377f;">repository</span>.<span style="font-weight:bold;color:#74531f;">Read</span>(<span style="font-weight:bold;color:#1f377f;">guid</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">table</span> <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">NotFoundResult</span>();
<span style="color:green;">// Demo code only. This doesn't take into account quality values.</span>
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">accept</span> =
<span style="font-weight:bold;color:#1f377f;">httpContextAccessor</span>?.HttpContext?.Request.Headers.Accept.<span style="font-weight:bold;color:#74531f;">ToString</span>();
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">accept</span> == <span style="color:#a31515;">"application/json"</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">OkObjectResult</span>(<span style="color:#2b91af;">TableJsonDto</span>.<span style="color:#74531f;">From</span>(<span style="font-weight:bold;color:#1f377f;">table</span>.Value));
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">accept</span> == <span style="color:#a31515;">"application/xml"</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">OkObjectResult</span>(<span style="color:#2b91af;">ElementBiasedTableXmlDto</span>.<span style="color:#74531f;">From</span>(<span style="font-weight:bold;color:#1f377f;">table</span>.Value));
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">StatusCodeResult</span>((<span style="color:blue;">int</span>)<span style="color:#2b91af;">HttpStatusCode</span>.NotAcceptable);
}</pre>
</p>
<p>
As the comment suggests, this is once again code that barely passes the few tests that I have, but really isn't production-ready. An <code>accept</code> header may also look like this:
</p>
<p>
<pre>accept: application/xml; q=1.0,application/json; q=0.5</pre>
</p>
<p>
Given such an <code>accept</code> header, the service ought to return an XML representation with the <code>application/xml</code> content type, but instead, this <code>Get</code> method returns <code>406 Not Acceptable</code>.
</p>
<p>
As I've already outlined, I'm not going to fix this problem, as this is only an exploration. It seems that we can already conclude that this style of architecture is ill-suited to deal with this kind of problem. If that's the conclusion, then why spend time fixing outstanding problems?
</p>
<h3 id="6db23995819543c3920bc2e7c5d16bb0">
Attribute-biased XML <a href="#6db23995819543c3920bc2e7c5d16bb0">#</a>
</h3>
<p>
Even so, just to punish myself, apparently, I also tried to add support for an alternative XML format that use attributes to record primitive values. Again, I couldn't make the schema described in the <a href="/2024/07/25/three-data-architectures-for-the-server">introductory article</a> work, but I did manage to add support for XML documents like these:
</p>
<p>
<pre><span style="color:blue;"><</span><span style="color:#a31515;">table</span><span style="color:blue;"> </span><span style="color:red;">type</span><span style="color:blue;">=</span>"<span style="color:blue;">communal</span>"<span style="color:blue;"> </span><span style="color:red;">capacity</span><span style="color:blue;">=</span>"<span style="color:blue;">12</span>"<span style="color:blue;"> /></span>
<span style="color:blue;"><</span><span style="color:#a31515;">table</span><span style="color:blue;"> </span><span style="color:red;">type</span><span style="color:blue;">=</span>"<span style="color:blue;">single</span>"<span style="color:blue;"> </span><span style="color:red;">capacity</span><span style="color:blue;">=</span>"<span style="color:blue;">4</span>"<span style="color:blue;"> </span><span style="color:red;">minimal-reservation</span><span style="color:blue;">=</span>"<span style="color:blue;">3</span>"<span style="color:blue;"> /></span></pre>
</p>
<p>
The code is similar to what I've already shown, so I'll only list the DTO:
</p>
<p>
<pre>[<span style="color:#2b91af;">XmlRoot</span>(<span style="color:#a31515;">"table"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">AttributeBiasedTableXmlDto</span>
{
[<span style="color:#2b91af;">XmlAttribute</span>(<span style="color:#a31515;">"type"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">string</span>? Type { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
[<span style="color:#2b91af;">XmlAttribute</span>(<span style="color:#a31515;">"capacity"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Capacity { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
[<span style="color:#2b91af;">XmlAttribute</span>(<span style="color:#a31515;">"minimal-reservation"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">int</span> MinimalReservation { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">ShouldSerializeMinimalReservation</span>() => 0 < MinimalReservation;
<span style="color:green;">// Mapping methods not shown here...</span>
}</pre>
</p>
<p>
This DTO looks a lot like the <code>ElementBiasedTableXmlDto</code> class, only it adorns properties with <code>XmlAttribute</code> rather than <code>XmlElement</code>.
</p>
<h3 id="b3b79b4075b1457dbe92a1a4aae944c6">
Evaluation <a href="#b3b79b4075b1457dbe92a1a4aae944c6">#</a>
</h3>
<p>
Even though I had to compromise on essential goals, I wasted an appalling amount of time and energy on <a href="https://projects.csail.mit.edu/gsb/old-archive/gsb-archive/gsb2000-02-11.html">yak shaving</a> and <a href="/2023/10/02/dependency-whac-a-mole">Framework Whac-A-Mole</a>. The DTO-based approach to modelling external resources really doesn't work when you need to do substantial content negotiation.
</p>
<p>
Even so, a DTO-based Ports and Adapters architecture may be useful when that's not a concern. If, instead of a REST API, you're developing a web site, you'll typically not need to vary representation independently of resource. In other words, a web page is likely to have at most one underlying model.
</p>
<p>
Compared to other large frameworks I've run into, ASP.NET is fairly unopinionated. Even so, the idiomatic way to use it is based on DTOs. DTOs to represent external data. DTOs to represent UI components. DTOs to represent database rows (although they're often called <em>entities</em> in that context). You'll find a ton of examples using this data architecture, so it's incredibly well-described. If you run into problems, odds are that someone has blazed a trail before you.
</p>
<p>
Even outside of .NET, <a href="https://alistair.cockburn.us/hexagonal-architecture/">this kind of architecture is well-known</a>. While I've learned a thing or two from experience, I've picked up a lot of my knowledge about software architecture from people like <a href="https://martinfowler.com/">Martin Fowler</a> and <a href="https://en.wikipedia.org/wiki/Robert_C._Martin">Robert C. Martin</a>.
</p>
<p>
When you also apply the Dependency Inversion Principle, you'll get good separations of concerns. This aspect of Ports and Adapters is most explicitly described in <a href="/ref/clean-architecture">Clean Architecture</a>. For example, a change to the UI generally doesn't affect the database. You may find that example ridiculous, because why should it, but consult the article <a href="/2024/08/05/using-a-shared-data-model-to-persist-restaurant-table-configurations">Using a Shared Data Model to persist restaurant table configurations</a> to see how this may happen.
</p>
<p>
The major drawbacks of the DTO-based data architecture is that much mapping is required. With three different DTOs (e.g. JSON DTO, Domain Model, and <a href="https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping">ORM</a> Entity), you need four-way translation as indicated in the above figure. People often complain about all that mapping, and no: <a href="/2023/09/18/do-orms-reduce-the-need-for-mapping">ORMs don't reduce the need for mapping</a>.
</p>
<p>
Another problem is that this style of architecture is <em>complicated</em>. As I've <a href="/2016/03/18/functional-architecture-is-ports-and-adapters">argued elsewhere</a>, Ports and Adapters often constitute an unstable equilibrium. While you can make it work, it requires a level of sophistication and maturity among team members that is not always present. And when it goes wrong, it may quickly deteriorate into a <a href="https://wiki.c2.com/?BigBallOfMud">Big Ball of Mud</a>.
</p>
<h3 id="162f6985f9674d6397869620b904de1f">
Conclusion <a href="#162f6985f9674d6397869620b904de1f">#</a>
</h3>
<p>
A DTO-based Ports and Adapters architecture is well-described and has good separation of concerns. In this article, however, we've seen that it doesn't deal successfully with realistic content negotiation. While that may seem like a shortcoming, it's a drawback that you may be able to live with. Particularly if you don't need to do content negotiation at all.
</p>
<p>
This way of organizing code around data is quite common. It's often the default data architecture, and I sometimes get the impression that a development team has 'chosen' to use it without considering alternatives.
</p>
<p>
It's not a bad architecture despite evidence to the contrary in this article. The scenario examined here may not be relevant. The main drawback of having all these objects playing different roles is all the mapping that's required.
</p>
<p>
The next data architecture attempts to address that concern.
</p>
<p>
<strong>Next:</strong> <a href="/2024/08/05/using-a-shared-data-model-to-persist-restaurant-table-configurations">Using a Shared Data Model to persist restaurant table configurations</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Three data architectures for the serverhttps://blog.ploeh.dk/2024/07/25/three-data-architectures-for-the-server2024-07-25T18:30:00+00:00Mark Seemann
<div id="post">
<p>
<em>A comparison, for educational purposes.</em>
</p>
<p>
<em>Use the right tool for the job.</em> How often have you encountered that phrase when discussing software architecture?
</p>
<p>
There's nothing wrong with the sentiment per se, but it's almost devoid of meaning. It doesn't pass the 'not test'. Try to negate it and imagine if anyone would seriously hold that belief: <em>Don't use the right tool for the job,</em> said no-one ever.
</p>
<p>
Even so, the underlying idea is that there are better and worse ways to solve problems. In software architecture too. It follows that you should choose the better solution.
</p>
<p>
How to do that requires skill and experience. When planning a good software architecture, an important consideration is how it'll handle future requirements. This seems to indicate that an architect should be able to predict the future in order to pick the best architecture. Which is, in general, not possible. Predicting the future is not the topic of this article.
</p>
<p>
There is, however, a more practical issue related to the notion of using the right tool for the job. One that we <em>can</em> address.
</p>
<h3 id="19b9dea780d2475fb4e0311dc1cc6893">
Choice <a href="#19b9dea780d2475fb4e0311dc1cc6893">#</a>
</h3>
<p>
In order to choose the better solution, you need to be aware of alternatives. You can't choose if there's nothing to choose from. This seems obvious, but a flowchart may drive home the point in an even stronger fashion.
</p>
<p>
<img src="/content/binary/flowchart-without-choice.png" alt="A flowchart diagram, but without any choice at the decision shape.">
</p>
<p>
On the other hand, if you have options, you're now in a position to choose.
</p>
<p>
<img src="/content/binary/flowchart-with-choice.png" alt="A flowchart diagram, now with three options available from the decision shape.">
</p>
<p>
In order to make a decision, you must be able to identify alternatives. This is hardly earth-shattering, but perhaps a bit abstract. To make it concrete, in this article, I'll look at a particular example.
</p>
<h3 id="4b9045825d9a47c3a6d8f0af1de89a2c">
Default data architecture <a href="#4b9045825d9a47c3a6d8f0af1de89a2c">#</a>
</h3>
<p>
Many applications need some sort of persistent storage. Particularly when it comes to (relational) database-based systems, I've seen more than one organization defaulting to a single data architecture: A presentation layer with View Models, a business logic layer with Domain Models, and a data access layer with ORM objects. A few decades ago, you'd typically see that model illustrated with horizontal layers. This is no longer en vogue. Today, most organizations that I consult with will tell me that they've decided on Ports and Adapters. Even so, if you do it right, <a href="/2013/12/03/layers-onions-ports-adapters-its-all-the-same">it's the same architecture</a>.
</p>
<p>
Reusing a diagram from <a href="/2024/07/08/should-interfaces-be-asynchronous">a recent article</a>, we may draw it like this:
</p>
<p>
<img src="/content/binary/ports-and-adapters-dependency-graph-2.png" alt="Ports and Adapters diagram, with arrows pointing inward.">
</p>
<p>
The architect or senior developer who made that decision is obviously aware of some of the lore in the industry. He or she can often name that data architecture as either Ports and Adapters, <a href="https://alistair.cockburn.us/hexagonal-architecture/">Hexagonal Architecture</a>, <a href="/ref/clean-architecture">Clean Architecture</a>, or, more rarely, <a href="https://jeffreypalermo.com/2008/07/the-onion-architecture-part-1/">Onion Architecture</a>.
</p>
<p>
I still get the impression that this way of arranging code was chosen by default, without much deliberation. I see it so often that it strikes me as a 'default architecture'. Are architects aware of alternatives? Can they compare the benefits and drawbacks of each alternative?
</p>
<h3 id="38ef737999f04f2d8c8d9fe7a44b47be">
Three alternatives <a href="#38ef737999f04f2d8c8d9fe7a44b47be">#</a>
</h3>
<p>
As an example, I'll explore three alternative data architectures, one of them being Ports and Adapters. My goal with this is only to raise awareness. Since I rarely (if ever) see my customers use anything other than Ports and Adapters, I think some readers may benefit from seeing some alternatives.
</p>
<p>
I'll show three ways to organize data with code, but that doesn't imply that these are the only three options. At the very least, some hybrid combinations are also possible. It's also possible that a fourth or fifth alternative exists, and I'm just not aware of it.
</p>
<p>
In three articles, you'll see each data architecture explored in more detail.
</p>
<ul>
<li><a href="/2024/07/29/using-ports-and-adapters-to-persist-restaurant-table-configurations">Using Ports and Adapters to persist restaurant table configurations</a></li>
<li><a href="/2024/08/05/using-a-shared-data-model-to-persist-restaurant-table-configurations">Using a Shared Data Model to persist restaurant table configurations</a></li>
<li><a href="/2024/08/12/using-only-a-domain-model-to-persist-restaurant-table-configurations">Using only a Domain Model to persist restaurant table configurations</a></li>
</ul>
<p>
As the titles suggest, all three examples will attempt to address the same problem: How to persist restaurant table configuration for a restaurant. The scenario is the same as already outlined in the article <a href="/2023/12/04/serialization-with-and-without-reflection">Serialization with and without Reflection</a>, and the example code base also attempts to follow the external data format of those articles.
</p>
<h3 id="0b2358ba517444eeb990d1ff72613b82">
Data formats <a href="#0b2358ba517444eeb990d1ff72613b82">#</a>
</h3>
<p>
In JSON, a table may be represented like this:
</p>
<p>
<pre>{
<span style="color:#2e75b6;">"singleTable"</span>: {
<span style="color:#2e75b6;">"capacity"</span>: 16,
<span style="color:#2e75b6;">"minimalReservation"</span>: 10
}
}</pre>
</p>
<p>
Or like this:
</p>
<p>
<pre>{ <span style="color:#2e75b6;">"communalTable"</span>: { <span style="color:#2e75b6;">"capacity"</span>: 10 } }</pre>
</p>
<p>
But I'll also explore what happens if you need to support multiple external formats, such as <a href="https://en.wikipedia.org/wiki/XML">XML</a>. Generally speaking, a given XML specification may lean towards favouring a verbose style based on elements, or a terser style based on attributes. An example of the former could be:
</p>
<p>
<pre><span style="color:blue;"><</span><span style="color:#a31515;">communal-table</span><span style="color:blue;">></span>
<span style="color:blue;"> <</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>12<span style="color:blue;"></</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>
<span style="color:blue;"></</span><span style="color:#a31515;">communal-table</span><span style="color:blue;">></span></pre>
</p>
<p>
or
</p>
<p>
<pre><span style="color:blue;"><</span><span style="color:#a31515;">single-table</span><span style="color:blue;">></span>
<span style="color:blue;"> <</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>4<span style="color:blue;"></</span><span style="color:#a31515;">capacity</span><span style="color:blue;">></span>
<span style="color:blue;"> <</span><span style="color:#a31515;">minimal-reservation</span><span style="color:blue;">></span>3<span style="color:blue;"></</span><span style="color:#a31515;">minimal-reservation</span><span style="color:blue;">></span>
<span style="color:blue;"></</span><span style="color:#a31515;">single-table</span><span style="color:blue;">></span></pre>
</p>
<p>
while examples of the latter style include
</p>
<p>
<pre><span style="color:blue;"><</span><span style="color:#a31515;">communal-table</span><span style="color:blue;"> </span><span style="color:red;">capacity</span><span style="color:blue;">=</span>"<span style="color:blue;">12</span>"<span style="color:blue;"> /></span></pre>
</p>
<p>
and
</p>
<p>
<pre><span style="color:blue;"><</span><span style="color:#a31515;">single-table</span><span style="color:blue;"> </span><span style="color:red;">capacity</span><span style="color:blue;">=</span>"<span style="color:blue;">4</span>"<span style="color:blue;"> </span><span style="color:red;">minimal-reservation</span><span style="color:blue;">=</span>"<span style="color:blue;">3</span>"<span style="color:blue;"> /></span></pre>
</p>
<p>
As it turns out, only one of the three data architectures is flexible enough to fully address such requirements.
</p>
<h3 id="76af298edac94997a28a92b865b2e508">
Comparisons <a href="#76af298edac94997a28a92b865b2e508">#</a>
</h3>
<p>
A <a href="https://en.wikipedia.org/wiki/REST">REST</a> API is the kind of application where data representation flexibility is most likely to be an issue. Thus, that only one of the three alternative architectures is able to exhibit enough expressive power in that dimension doesn't disqualify the other two. Each come with their own benefits and drawbacks.
</p>
<table>
<thead>
<tr>
<td></td>
<td>Ports and Adapters</td>
<td>Shared Data Model</td>
<td>Domain Model only</td>
</tr>
</thead>
<tbody>
<tr>
<td>Advantages</td>
<td>
<ul>
<li>Separation of concerns</li>
<li>Well-described</li>
</ul>
</td>
<td>
<ul>
<li>Simple</li>
<li>No mapping</li>
</ul>
</td>
<td>
<ul>
<li>Flexible</li>
<li>Congruent with reality</li>
</ul>
</td>
</tr>
<tr>
<td>Disadvantages</td>
<td>
<ul>
<li>Much mapping</li>
<li>Easy to get wrong</li>
</ul>
</td>
<td>
<ul>
<li>Inflexible</li>
<li>God Class attractor</li>
</ul>
</td>
<td>
<ul>
<li>Requires non-opinionated framework</li>
<li>Requires more testing</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>
I'll discuss each alternative's benefits and drawbacks in their individual articles.
</p>
<p>
An important point of all this is that none of these articles are meant to be prescriptive. While I do have favourites, my biases are shaped by the kind of work I typically do. In other contexts, another alternative may prevail.
</p>
<h3 id="3db82a1056ff4f4fbcb9bb2dd9c4643c">
Example code <a href="#3db82a1056ff4f4fbcb9bb2dd9c4643c">#</a>
</h3>
<p>
As usual, example code is in C#. Of the three languages in which I'm most proficient (the other two being <a href="https://fsharp.org/">F#</a> and <a href="https://www.haskell.org/">Haskell</a>), this is the most easily digestible for a larger audience.
</p>
<p>
All three alternatives are written with ASP.NET 8.0, and it's unavoidable that there will be some framework-specific details. In <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>, I made it an explicit point that while the examples in the book are in C#, the book (and the code in it) should be understandable by developers who normally use <a href="https://www.java.com/">Java</a>, <a href="https://en.wikipedia.org/wiki/C%2B%2B">C++</a>, <a href="https://www.typescriptlang.org/">TypeScript</a>, or similar C-based languages.
</p>
<p>
The book is, for that reason, light on .NET-specific details. Instead, I published <a href="/2021/06/14/new-book-code-that-fits-in-your-head">an article</a> that collects all the interesting .NET things I ran into while writing the book.
</p>
<p>
Not so here. The three articles cover enough ASP.NET particulars that readers who don't care about that framework are encouraged to skim-read.
</p>
<p>
I've developed the three examples as three branches of the same Git repository. The code is available upon request against a small <a href="/support">support donation</a> of 10 USD (or more). If you're one of my regular supporters, you have my gratitude and can get the code without further donation. <a href="/about#contact">Send me an email</a> in both cases.
</p>
<h3 id="83a76525d22a49d898609fc6c1963acf">
Conclusion <a href="#83a76525d22a49d898609fc6c1963acf">#</a>
</h3>
<p>
There's more than one way to organize a code base to deal with data. Depending on context, one may be a better choice than another. Thus, it pays to be aware of alternatives.
</p>
<p>
In the remaining articles in this series, you'll see three examples of how to deal with persistent data from a database. In order to establish a baseline, the first covers the well-known Ports and Adapters architecture.
</p>
<p>
<strong>Next:</strong> <a href="/2024/07/29/using-ports-and-adapters-to-persist-restaurant-table-configurations">Using Ports and Adapters to persist restaurant table configurations</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.The end of trust?https://blog.ploeh.dk/2024/07/15/the-end-of-trust2024-07-15T19:07:00+00:00Mark Seemann
<div id="post">
<p>
<em>Software development in a globalized, hostile world.</em>
</p>
<p>
Imagine that you're perusing the thriller section in an airport book store and come across a book with the following back cover blurb:
</p>
<blockquote>
<p>
Programmers are dying.
</p>
<p>
Holly-Ann Kerr works as a data scientist for an NGO that fights workplace discrimination. While scrubbing input, she discovers an unusual pattern in the data. Some employees seem to have an unusually high fatal accident rate. Programmers are dying in traffic accidents, falling on stairs, defect electrical wiring, smoking in bed. They work for a variety of companies. Some for Big Tech, others for specialized component vendors, some for IT-related NGOs, others again for utility companies. The deaths seem to have nothing in common, until H-A uncovers a disturbing pattern.
</p>
<p>
All victims had recently started in a new position. And all were of Iranian descent.
</p>
<p>
Is a racist killer on the loose? But if so, why is he only targeting new hires? And why only software developers?
</p>
<p>
When H-A shares her discovery with the wrong people, she soon discovers that she'll be the next victim.
</p>
</blockquote>
<p>
Okay, I'm not a professional editor, so this could probably do with a bit of polish. Does it sound like an exiting piece of fiction, though?
</p>
<p>
<img src="/content/binary/the-long-game-cover.jpg" alt="Cover of the imaginary thriller, The Long Game.">
</p>
<p>
I'm going to spoil the plot, since the book doesn't exist anyway.
</p>
<h3 id="269cc12b04c24fadb740f64ef4045625">
An international plot <a href="#269cc12b04c24fadb740f64ef4045625">#</a>
</h3>
<p>
(Apologies to Iranian readers. I have nothing against Iranians, but find the regime despicable. In any case, nothing in the following hinges on the <a href="https://en.wikipedia.org/wiki/Council_for_Intelligence_Coordination">ICC</a>. You can replace it with another adversarial intelligence agency that you don't like, including, but not limited to <a href="https://en.wikipedia.org/wiki/Reconnaissance_General_Bureau">RGB</a>, <a href="https://en.wikipedia.org/wiki/Federal_Security_Service">FSB</a>, or a clandestine Chinese intelligence organization. You could probably even swap the roles and make <a href="https://en.wikipedia.org/wiki/Central_Intelligence_Agency">CIA</a>, <a href="https://en.wikipedia.org/wiki/MI5">MI5</a>, or <a href="https://en.wikipedia.org/wiki/Mossad">Mossad</a> be the bad guys, if your loyalties lie elsewhere.)
</p>
<p>
In the story, it turns out that clandestine Iranian special operations are attempting to recruit <a href="https://en.wikipedia.org/wiki/Mole_(espionage)">moles</a> in software organizations that constitute the supply chain of Western digital infrastructure.
</p>
<p>
Intelligence bureaus and software organizations that directly develop sensitive software tend to have good security measures. Planting a mole in such an organization is difficult. The entire supply chain of software dependencies, on the other hand, is much more vulnerable. If you can get an employee to install a <a href="https://en.wikipedia.org/wiki/Backdoor_(computing)">backdoor</a> in <a href="https://en.wikipedia.org/wiki/Npm_left-pad_incident">left-pad</a>, chances are that you may attain <a href="https://en.wikipedia.org/wiki/Arbitrary_code_execution">remote execution</a> capabilities on an ostensibly secure system.
</p>
<p>
In my hypothetical thriller, the Iranians kill those software developers that they <em>fail</em> to recruit. After all, one can't run a clandestine operation if people notify the police that they've been approached by a foreign power.
</p>
<h3 id="8979c9d3d6484a9b8356b887220a594f">
Long game <a href="#8979c9d3d6484a9b8356b887220a594f">#</a>
</h3>
<p>
Does that plot sound far-fetched?
</p>
<p>
I admit that I did <a href="https://en.wikipedia.org/wiki/Up_to_eleven">turn to 11</a> some plot elements. This is, after all, supposed to be a thriller.
</p>
<p>
The story is, however, 'loosely based on real events'. Earlier this year, <a href="https://arstechnica.com/security/2024/04/what-we-know-about-the-xz-utils-backdoor-that-almost-infected-the-world/">a Microsoft developer revealed a backdoor that someone had intentionally planted in xz Utils</a>. That version of the software was close to being merged into <a href="https://www.debian.org/">Debian</a> and <a href="https://www.redhat.com/">Red Hat</a> Linux distributions. It would have enabled an attacker to execute arbitrary code on an infected machine.
</p>
<p>
The attack was singularly sophisticated. It also looks as though it was initiated years ago by one or more persons who contributed real, useful work to an open-source project, apparently (in hindsight) with the sole intention of gaining the trust of the rest of the community.
</p>
<p>
This is such a long game that it reeks of an adversarial state actor. The linked article speculates on which foreign power may be behind the attack. No, not the Iranians, after all.
</p>
<p>
If you think about it, it's an entirely rational gambit for a foreign intelligence agency to make. It's not that the <a href="https://en.wikipedia.org/wiki/Stuxnet">NSA hasn't already tried something comparable</a>. If anything, the xz hack mostly seems far-fetched because it's so unnecessarily sophisticated.
</p>
<p>
Usually, the most effective hacking techniques utilize human trust or gullibility. Why spend enormous effort developing sophisticated buffer overrun exploits if you can get a (perhaps unwitting) insider to run arbitrary code for you?
</p>
<p>
It'd be much cheaper, and much more reliable, to recruit moles on the inside of software companies, and get them to add the backdoors you need. It doesn't necessary have to be new hires, but perhaps (I'm speculating) it's easier to recruit people before they've developed any loyalty to their new team mates.
</p>
<h3 id="3a6d30419c8d4e869309502db610dfd6">
The soft underbelly <a href="#3a6d30419c8d4e869309502db610dfd6">#</a>
</h3>
<p>
Which software organizations are the most promising targets? If it were me, I'd particularly try to go after various component vendors. One category may be companies that produce <a href="https://en.wikipedia.org/wiki/Rapid_application_development">RAD</a> tools such as grid <a href="https://en.wikipedia.org/wiki/Graphical_user_interface">GUIs</a>, but also service providers that offer free <a href="https://en.wikipedia.org/wiki/Software_development_kit">SDKs</a> to, say, send email, generate invoices, send SMS, charge credit cards, etc.
</p>
<p>
I'm <em>not</em> implying that any such company has ill intent, but since such software run on many machines, it's a juicy target if you can sneak a backdoor into one.
</p>
<p>
Why not open-source software (OSS)? Many OSS libraries run on even more machines, so wouldn't that be an even more attractive target for an adversary? Yes, but on the other hand, most popular open-source code is also scrutinized by many independent agents, so it's harder to sneak in a backdoor. As the attempted xz hack demonstrates, even a year-long sophisticated attack is at risk of being discovered.
</p>
<p>
Doesn't commercial or closed-source code receive the same level of scrutiny?
</p>
<p>
In my experience, not always. Of course, some development organizations use proper shared-code-ownership techniques like code reviews or pair programming, but others rely on siloed solo development. Programmers just check in code that no-one else ever looks at.
</p>
<p>
In such an organization, imagine how easy it'd be for a mole to add a backdoor to a widely-distributed library. He or she wouldn't even have to resort to sophisticated ways to obscure the backdoor, because no colleague would be likely to look at the code. Particularly not if you bury it in seven levels of nested <code>for</code> loops and call the class <code>MonitorManager</code> or similar. As long as the reusable library ships as compiled code, it's unlikely that customers will discover the backdoor before its too late.
</p>
<h3 id="2987e2669c4c46c29a45281d3a6b3adc">
Trust <a href="#2987e2669c4c46c29a45281d3a6b3adc">#</a>
</h3>
<p>
Last year I published an article <a href="/2023/03/20/on-trust-in-software-development">on trust in software development</a>. The point of that piece wasn't that you should suspect your colleagues of ill intent, but rather that you can trust neither yourself nor your co-workers for the simple reason that people make mistakes.
</p>
<p>
Since then, I've been doing some work in the digital security space, and I've been forced to think about concerns like <a href="https://en.wikipedia.org/wiki/Supply_chain_attack">supply-chain attacks</a>. The implications are, unfortunately, that you can't automatically trust that your colleague has benign intentions.
</p>
<p>
This, obviously, will vary with context. If you're only writing a small web site for your HR department to use, it's hard to imagine how an adversarial state actor could take advantage of a backdoor in <em>your</em> code. If so, it's unlikely that anyone will go to the trouble of planting a mole in your organization.
</p>
<p>
On the other hand, if you're writing any kind of reusable library or framework, you just might be an interesting target. If so, you can no longer entirely trust your team mates.
</p>
<p>
As a Dane, that bothers me deeply. Denmark, along with the other Nordic countries, exhibit <a href="https://ourworldindata.org/trust">the highest levels of inter-societal trust in the world</a>. I was raised to trust strangers, and so far, it's worked well enough for me. A business transaction in Denmark is often just a short email exchange. It's a great benefit to the business environment, and the economy in general, that we don't have to waste a lot of resources filling out formulas, contracts, agreements, etc. Trust is grease that makes society run smoother.
</p>
<p>
Even so, Scandinavians aren't <em>naive</em>. We don't believe that we can trust everyone. To a large degree, we rely on a lot of subtle social cues to assess a given situation. Some people shouldn't be trusted, and we're able to identify those situations, too.
</p>
<p>
What remains is that insisting that you can trust your colleague, just because he or she is your colleague, would be descending into teleology. I'm not a proponent of wishful thinking if good arguments suggest the contrary.
</p>
<h3 id="295deb8a2c1041678830fcf173f7abf4">
Shared code ownership <a href="#295deb8a2c1041678830fcf173f7abf4">#</a>
</h3>
<p>
Perhaps you shouldn't trust your colleagues. How does that impact software development?
</p>
<p>
The good news is that this is yet another argument to practice the beneficial practices of shared code ownership. Crucially, what this should entail is not just that everyone is allowed to edit any line of code, but rather that all team members take responsibility for the entire code base. No-one should be allowed to write code in splendid isolation.
</p>
<p>
There are several ways to address this concern. I often phrase it as follows: <em>There should be at least two pair of eyes on every line of code before a merge to master</em>.
</p>
<p>
As I describe in <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, you can achieve that goal with pair programming, ensemble programming, or code reviews (including <a href="/2021/06/21/agile-pull-requests">agile pull request</a> reviews). That's a broad enough palette that it should be possible for every developer in every organization to find a modus vivendi that fits any personality and context.
</p>
<p>
Just looking at each others' code could significantly raise the bar for a would-be mole to add a backdoor to the code base. As an added benefit, it might also raise the general code quality.
</p>
<p>
What this <em>does</em> suggest to me, however, is that a too simplistic notion of <em>running on trunk</em> may be dangerous. Letting everyone commit to <em>master</em> and trusting that everyone means well no longer strikes me as a good idea (again, given the context, and all that).
</p>
<p>
Or, if you do, you should consider having some sort of systematic <ins datetime="2024-07-26T08:09Z">posterior</ins> <del datetime="2024-07-26T08:09Z">post mortem</del> review process. I've read of organizations that do that, but specific sources escape me at the moment. With Git, however, it's absolutely within the realms of the possible to make a diff of all change since the last ex-post review, and then go through those changes.
</p>
<h3 id="be306c291a644cd09762335becd1291e">
Conclusion <a href="#be306c291a644cd09762335becd1291e">#</a>
</h3>
<p>
The world is changed. I feel it in the <a href="https://owasp.org/www-project-top-ten/">OWASP top 10</a>. I sense it in the shifting geopolitical climate. I smell it on the code I review.
</p>
<p>
Much that once was, is lost. The dream of a global computer network with boundless trust is no more. There are countries whose interests no longer align with ours. Who pay full-time salaries to people whose job it is to wage 'cyber warfare' against us. We can't rule out that parts of such campaigns include planting moles in our midsts. Moles whose task it is to weaken the foundations of our digital infrastructure.
</p>
<p>
In that light, should you always trust your colleagues?
</p>
<p>
Despite the depressing thought that I probably shouldn't, I'm likely to bounce back to my usual Danish most-people-are-to-be-trusted attitude tomorrow. On the other hand, I'll still insist that more than one person is involved with every line of code. Not only because every other person may be a foreign agent, but mostly, still, because humans are fallible, and two brains think better than one.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="95fab96f981647a9a852c8d960b7f824">
<div class="comment-author"><a href="https://about.me/tysonwilliams">Tyson Williams</a> <a href="#95fab96f981647a9a852c8d960b7f824">#</a></div>
<div class="comment-content">
<blockquote>
Or, if you do, you should consider having some sort of systematic post mortem review process. I've read of organizations that do that, but specific sources escape me at the moment.
</blockquote>
<p>
My company has a Google Docs template for postmortem analysis that we use when something goes especially wrong. The primary focus is stating what went wrong according to the "five whys technique". Our template links to <a href="http://www.startuplessonslearned.com/2008/11/five-whys.html">this post by Eric Ries</a>. There is also<a href="https://en.wikipedia.org/wiki/Five_whys">this Wikipedia article on the subject</a>. The section heading are "What happened" (one sentence), "Impact on Customers" (duration and severity), "What went wrong (5 Whys)", "What went right (optional)", "Corrective Actions" (and all of the content so far should be short enough to fit on one page), "Timeline" (a bulleted list asking for "Event beginning", "Time to Detect (monitoring)", "Time to Notify (alerting)", "Time to Respond (devops)", "Time to Troubleshoot (devops)", "Time to Mitigate (devops)", "Event end"), "Logs (optional)".
</p>
</div>
<div class="comment-date">2024-07-21 15:37 UTC</div>
</div>
<div class="comment" id="0c1f8083882c4de8a11be963869cc098">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#0c1f8083882c4de8a11be963869cc098">#</a></div>
<div class="comment-content">
<p>
Tyson, thank you for writing. I now realize that 'post mortem' was a poor choice of words on my part, since it implies that something went wrong. I should have written 'posterior' instead. I'll update the article.
</p>
<p>
I've been digging around a bit to see if I can find the article that originally made me aware of that option. I'm fairly sure that it wasn't <a href="https://itnext.io/optimizing-the-software-development-process-for-continuous-integration-and-flow-of-work-56cf614b3f59">Optimizing the Software development process for continuous integration and flow of work</a>, but that article, on the other hand, seems to be the source that other articles cite. It's fairly long, and also discusses other concepts; the relevant section here is the one called <em>Non-blocking reviews</em>.
</p>
<p>
An shorter overview of this kind of process can be found in <a href="https://thinkinglabs.io/articles/2023/05/02/non-blocking-continuous-code-reviews-a-case-study.html">Non-Blocking, Continuous Code Reviews - a case study</a>.
</p>
</div>
<div class="comment-date">2024-07-26 08:04 UTC</div>
</div>
<div class="comment" id="b3ae6a8c02584d80aaa69eb00ca39548">
<div class="comment-author">Jiehong <a href="#b3ae6a8c02584d80aaa69eb00ca39548">#</a></div>
<div class="comment-content">
<p>
In change management/risk control, your <em>There should be at least two pair of eyes on every line of code</em> is called <a href="https://www.openriskmanual.org/wiki/Four_Eyes_Principle">four eye principle</a>,
and is a standard practice in my industry (IT services provider for the travel industry).
</p>
<p>
It goes further, and requires 2 more pair of eyes for any changes from the code review, to the load of a specific software in production.
</p>
<p>
I has a nice side-effect during code reviews: it's an automatic way to dessiminate knowledge in the team, so the bus factor is never 1.
</p>
<p>
I think that <em>real</em> people can <em>mostly</em> be trusted. But, software is not always run by people.
Even when it is, a single non-trust-worthy person's action is amplified by software being run by mindless computers.
It's like one rotten apple is enough to poison the full bag.
</p>
<p>
In the end, and a bit counter-intuitively, trusting people less now is leading to being able to trust more soon:
people are forced to say "you can trust me, and here are the proofs". (Eg: the recently announced Apple's <a href="https://security.apple.com/blog/private-cloud-compute/">Private Cloud Compute</a>).
</p>
</div>
<div class="comment-date">2024-07-29 14:29 UTC</div>
</div>
<div class="comment" id="4655bc2aa6664d5ca10dfb069102bbfd">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#4655bc2aa6664d5ca10dfb069102bbfd">#</a></div>
<div class="comment-content">
<p>
Jiehong, thank you for writing. Indeed, in <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a> I discuss how shared code ownership reduces the bus factor.
</p>
<p>
From this article and previous discussions I've had, I can see that the term <em>trust</em> is highly charged. People really don't like the notion that trust may be misplaced, or that mistrust, even, might be appropriate. I can't tell if it's a cultural bias of which I'm not aware. While English isn't my native language, I believe that I'm sufficiently acquainted with anglo-saxon culture to know of its most obvious quirks. Still, I'm sometimes surprised.
</p>
<p>
I admit that I, too, <em>first</em> consider whether I'm dealing with a deliberate adversary if I'm asked whether I trust someone, but soon after, there's a secondary interpretation that originates from normal human fallibility. I've <a href="/2023/03/20/on-trust-in-software-development">already written about that</a>: No, I don't trust my colleagues to be infallible, as I don't trust myself to be so.
</p>
<p>
Fortunately, it seems to me that the remedies that may address such concerns are the same, regardless of the underlying reasons.
</p>
</div>
<div class="comment-date">2024-08-06 05:57 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Should interfaces be asynchronous?https://blog.ploeh.dk/2024/07/08/should-interfaces-be-asynchronous2024-07-08T13:52:00+00:00Mark Seemann
<div id="post">
<p>
<em>Async and await are notorious for being contagious. Must all interfaces be Task-based, just in case?</em>
</p>
<p>
I recently came across this question on Mastodon:
</p>
<blockquote>
<p>
"To async or not to async?
</p>
<p>
"How would you define a library interface for a service that probably will be implemented with an in memory procedure - let's say returning a mapped value to a key you registered programmatically - and a user of your API might want to implement a decorator that needs a 'long running task' - for example you want to log a msg into your db or load additional mapping from a file?
</p>
<p>
"Would you define the interface to return a Task<string> or just a string?"
</p>
<footer><cite><a href="https://fosstodon.org/@Fandermill/112613967801632197">Fandermill</a></cite></footer>
</blockquote>
<p>
While seemingly a simple question, it's both fundamental and turns out to have deeper implications than you may at first realize.
</p>
<h3 id="e4c03ad0436340b4b510d51e14acd794">
Interpretation <a href="#e4c03ad0436340b4b510d51e14acd794">#</a>
</h3>
<p>
Before I proceed, I'll make my interpretation of the question more concrete. This is just how I <em>interpret</em> the question, so doesn't necessarily reflect the original poster's views.
</p>
<p>
The post itself doesn't explicitly mention a particular language, and since several languages now have <code>async</code> and <code>await</code> features, the question may be of more general interest that a question constrained to a single language. On the other hand, in order to have something concrete to discuss, it'll be useful with some real code examples. From perusing the discussion surrounding the original post, I get the impression that the language in question may be C#. That suits me well, since it's one of the languages with which I'm most familiar, and is also a language where programmers of other C-based languages should still be able to follow along.
</p>
<p>
My interpretation of the implementation, then, is this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">NameMap</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">Dictionary</span><<span style="color:#2b91af;">Guid</span>, <span style="color:blue;">string</span>> knownIds = <span style="color:blue;">new</span>()
{
{ <span style="color:blue;">new</span> <span style="color:#2b91af;">Guid</span>(<span style="color:#a31515;">"4778CA3D-FB1B-4665-AAC1-6649CEFA4F05"</span>), <span style="color:#a31515;">"Bob"</span> },
{ <span style="color:blue;">new</span> <span style="color:#2b91af;">Guid</span>(<span style="color:#a31515;">"8D3B9093-7D43-4DD2-B317-DCEE4C72D845"</span>), <span style="color:#a31515;">"Alice"</span> }
};
<span style="color:blue;">public</span> <span style="color:blue;">string</span> <span style="font-weight:bold;color:#74531f;">GetName</span>(<span style="color:#2b91af;">Guid</span> <span style="font-weight:bold;color:#1f377f;">guid</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> knownIds.<span style="font-weight:bold;color:#74531f;">TryGetValue</span>(<span style="font-weight:bold;color:#1f377f;">guid</span>, <span style="color:blue;">out</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">name</span>) ? <span style="font-weight:bold;color:#1f377f;">name</span> : <span style="color:#a31515;">"Trudy"</span>;
}
}</pre>
</p>
<p>
Nothing fancy, but, as <a href="https://fosstodon.org/@Fandermill">Fandermill</a> writes in a follow-up post:
</p>
<blockquote>
<p>
"Used examples that first came into mind, but it could be anything really."
</p>
<footer><cite><a href="https://fosstodon.org/@Fandermill/112613968890232099">Fandermill</a></cite></footer>
</blockquote>
<p>
The point, as I understand it, is that the intended implementation doesn't require asynchrony. A <a href="https://en.wikipedia.org/wiki/Decorator_pattern">Decorator</a>, on the other hand, may.
</p>
<p>
Should we, then, declare an interface like the following?
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">INameMap</span>
{
<span style="color:#2b91af;">Task</span><<span style="color:blue;">string</span>> <span style="font-weight:bold;color:#74531f;">GetName</span>(<span style="color:#2b91af;">Guid</span> <span style="font-weight:bold;color:#1f377f;">guid</span>);
}</pre>
</p>
<p>
If we do, the <code>NameMap</code> class can't automatically implement that interface because the return types of the two <code>GetName</code> methods don't match. What are the options?
</p>
<h3 id="9da822948cd049c0a625bb9a2c013d7b">
Conform <a href="#9da822948cd049c0a625bb9a2c013d7b">#</a>
</h3>
<p>
While the following may not be the 'best' answer, let's get the obvious solution out of the way first. Let the implementation conform to the interface:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">NameMap</span> : <span style="color:#2b91af;">INameMap</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">Dictionary</span><<span style="color:#2b91af;">Guid</span>, <span style="color:blue;">string</span>> knownIds = <span style="color:blue;">new</span>()
{
{ <span style="color:blue;">new</span> <span style="color:#2b91af;">Guid</span>(<span style="color:#a31515;">"4778CA3D-FB1B-4665-AAC1-6649CEFA4F05"</span>), <span style="color:#a31515;">"Bob"</span> },
{ <span style="color:blue;">new</span> <span style="color:#2b91af;">Guid</span>(<span style="color:#a31515;">"8D3B9093-7D43-4DD2-B317-DCEE4C72D845"</span>), <span style="color:#a31515;">"Alice"</span> }
};
<span style="color:blue;">public</span> <span style="color:#2b91af;">Task</span><<span style="color:blue;">string</span>> <span style="font-weight:bold;color:#74531f;">GetName</span>(<span style="color:#2b91af;">Guid</span> <span style="font-weight:bold;color:#1f377f;">guid</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">FromResult</span>(
knownIds.<span style="font-weight:bold;color:#74531f;">TryGetValue</span>(<span style="font-weight:bold;color:#1f377f;">guid</span>, <span style="color:blue;">out</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">name</span>) ? <span style="font-weight:bold;color:#1f377f;">name</span> : <span style="color:#a31515;">"Trudy"</span>);
}
}</pre>
</p>
<p>
This variation of the <code>NameMap</code> class conforms to the interface by making the <code>GetName</code> method look asynchronous.
</p>
<p>
We may even keep the synchronous implementation around as a public method if some client code might need it:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">NameMap</span> : <span style="color:#2b91af;">INameMap</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">Dictionary</span><<span style="color:#2b91af;">Guid</span>, <span style="color:blue;">string</span>> knownIds = <span style="color:blue;">new</span>()
{
{ <span style="color:blue;">new</span> <span style="color:#2b91af;">Guid</span>(<span style="color:#a31515;">"4778CA3D-FB1B-4665-AAC1-6649CEFA4F05"</span>), <span style="color:#a31515;">"Bob"</span> },
{ <span style="color:blue;">new</span> <span style="color:#2b91af;">Guid</span>(<span style="color:#a31515;">"8D3B9093-7D43-4DD2-B317-DCEE4C72D845"</span>), <span style="color:#a31515;">"Alice"</span> }
};
<span style="color:blue;">public</span> <span style="color:#2b91af;">Task</span><<span style="color:blue;">string</span>> <span style="font-weight:bold;color:#74531f;">GetName</span>(<span style="color:#2b91af;">Guid</span> <span style="font-weight:bold;color:#1f377f;">guid</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">FromResult</span>(<span style="font-weight:bold;color:#74531f;">GetNameSync</span>(<span style="font-weight:bold;color:#1f377f;">guid</span>));
}
<span style="color:blue;">public</span> <span style="color:blue;">string</span> <span style="font-weight:bold;color:#74531f;">GetNameSync</span>(<span style="color:#2b91af;">Guid</span> <span style="font-weight:bold;color:#1f377f;">guid</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> knownIds.<span style="font-weight:bold;color:#74531f;">TryGetValue</span>(<span style="font-weight:bold;color:#1f377f;">guid</span>, <span style="color:blue;">out</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">name</span>) ? <span style="font-weight:bold;color:#1f377f;">name</span> : <span style="color:#a31515;">"Trudy"</span>;
}
}</pre>
</p>
<p>
Since C# doesn't support return-type-based overloading, we need to distinguish these two methods by giving them different names. In C# it might be more <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> to name the asynchronous method <code>GetNameAsync</code> and the synchronous method just <code>GetName</code>, but for reasons that would be too much of a digression now, I've never much liked that naming convention. In any case, I'm not going to go in this direction for much longer, so it hardly matters how we name these two methods.
</p>
<h3 id="821f1bb35dac41eeade7c5d7b28a3b18">
Kinds of interfaces <a href="#821f1bb35dac41eeade7c5d7b28a3b18">#</a>
</h3>
<p>
Another digression is, however, quite important. Before we can look at some more code, I'm afraid that we have to perform a bit of practical ontology, as it were. It starts with the question: <em>Why do we even need interfaces?</em>
</p>
<p>
I should also make clear, as a digression within a digression, that by 'interface' in this context, I'm really interested in any kind of mechanism that enables you to achieve polymorphism. In languages like C# or <a href="https://www.java.com/">Java</a>, we may in fact avail ourselves of the <code>interface</code> keyword, as in the above <code>INameMap</code> example, but we may equally well use a base class or perhaps just what C# calls a <a href="https://learn.microsoft.com/dotnet/csharp/programming-guide/delegates/">delegate</a>. In other languages, we may use function or action types, or even <a href="https://en.wikipedia.org/wiki/Function_pointer">function pointers</a>.
</p>
<p>
Regardless of specific language constructs, there are, as far as I can tell, two kinds of interfaces:
</p>
<ul>
<li>Interfaces that enable variability or extensibility in behaviour.</li>
<li>Interfaces that mostly or exclusively exist to support automated testing.</li>
</ul>
<p>
While there may be some overlap between these two kinds, in my experience, the intersection between the two tends to be surprisingly small. Interfaces tend to mostly belong to one of those two categories.
</p>
<h3 id="d990222c075942ab851c1455c8efcb95">
Strategies and higher-order functions <a href="#d990222c075942ab851c1455c8efcb95">#</a>
</h3>
<p>
In design-patterns parlance, examples of the first kind are <a href="https://en.wikipedia.org/wiki/Builder_pattern">Builder</a>, <a href="https://en.wikipedia.org/wiki/State_pattern">State</a>, <a href="https://en.wikipedia.org/wiki/Chain-of-responsibility_pattern">Chain of Responsibility</a>, <a href="https://en.wikipedia.org/wiki/Template_method_pattern">Template Method</a>, and perhaps most starkly represented by the <a href="https://en.wikipedia.org/wiki/Strategy_pattern">Strategy</a> pattern. A Strategy is an encapsulated piece of behaviour that you pass around as a single 'thing' (an <em>object</em>).
</p>
<p>
And granted, you could also use a Strategy to access a database or make a web-service call, but that's not how the pattern was <a href="/ref/dp">originally described</a>. We'll return to that use case in the next section.
</p>
<p>
Rather, the first kind of interface exists to enable extensibility or variability in algorithms. Typical examples (from Design Patterns) include page layout, user interface component rendering, building a maze, finding the most appropriate help topic for a given application context, and so on. If we wish to relate this kind of interface to the <a href="https://en.wikipedia.org/wiki/SOLID">SOLID</a> principles, it mostly exists to support the <a href="https://en.wikipedia.org/wiki/Open%E2%80%93closed_principle">Open-closed principle</a>.
</p>
<p>
A good heuristics for identifying such interfaces is to consider the Reused Abstractions Principle (Jason Gorman, 2010, I'd link to it, but the page has fallen off the internet. Use your favourite web archive to read it.). If your code base contains <em>multiple</em> production-ready implementations of the same interface, you're reusing the interface, most likely to vary the behaviour of a general-purpose data structure.
</p>
<p>
And before the functional-programming (FP) crowd becomes too smug: FP uses this kind of interface <em>all the time</em>. In the FP jargon, however, we rather talk about <a href="https://en.wikipedia.org/wiki/Higher-order_function">higher-order functions</a> and the interfaces we use to modify behaviour are typically modelled as functions and passed as <a href="https://en.wikipedia.org/wiki/Anonymous_function">lambda expressions</a>. So when you write <code>Cata((_, xs) => xs.Sum(), _ => 1)</code> (<a href="/2019/08/05/rose-tree-catamorphism">as one does</a>), you <a href="/2018/06/25/visitor-as-a-sum-type">might as well</a> just have passed a <a href="https://en.wikipedia.org/wiki/Visitor_pattern">Visitor</a> implementation to an <code>Accept</code> method.
</p>
<p>
This hints at a more quantifiable distinction: If the interface models something that's intended to be a <a href="https://en.wikipedia.org/wiki/Pure_function">pure function</a>, it'd typically be part of a higher-order API in FP, while we in object-oriented design (once again) lack the terminology to distinguish these interfaces from the other kind.
</p>
<p>
These days, in C# <a href="/2023/09/04/decomposing-ctfiyhs-sample-code-base">I mostly use these kinds of interfaces for the Visitor pattern</a>.
</p>
<h3 id="36e2464953344fa2a0b173f98bb260c9">
Seams <a href="#36e2464953344fa2a0b173f98bb260c9">#</a>
</h3>
<p>
The other kind of interface exists to afford automated testing. In <a href="/ref/wewlc">Working Effectively with Legacy Code</a>, Michael Feathers calls such interfaces <em>Seams</em>. Modern object-oriented code bases often use <a href="/dippp">Dependency Injection</a> (DI) to control which Strategies are in use in a given context. The production system may use an object that communicates with a relational database, while an automated test environment might replace that with a <a href="https://martinfowler.com/bliki/TestDouble.html">Test Double</a>.
</p>
<p>
Yes, I wrote <em>Strategies</em>. As I suggested above, a Strategy is really a replaceable object in its purest form. When you use DI you may call all those interfaces <code>IUserRepository</code>, <code>ICommandHandler</code>, <code>IEmailGateway</code>, and so on, but they're really all Strategies.
</p>
<p>
Contrary to the first kind of interface, you typically only find a single production implementation of each of these interfaces. If you find more that one, the rest are usually <a href="https://en.wikipedia.org/wiki/Decorator_pattern">Decorators</a> (one that logs, one that caches, one that works as a <a href="https://martinfowler.com/bliki/CircuitBreaker.html">Circuit Breaker</a>, etc.). All other implementations will be defined in the test code as dynamic mocks or <a href="http://xunitpatterns.com/Fake%20Object.html">Fakes</a>.
</p>
<p>
Code bases that rely heavily on DI in order to support testing rub many people the wrong way. In 2014 <a href="https://en.wikipedia.org/wiki/David_Heinemeier_Hansson">David Heinemeier Hansson</a> published a serious criticism of such <a href="https://dhh.dk/2014/test-induced-design-damage.html">test-induced damage</a>. For the record, I agree with the criticism, but <a href="/2020/08/17/unit-testing-is-fine">not with the conclusion</a>. While I still practice test-driven development, I <a href="https://stackoverflow.blog/2022/01/03/favor-real-dependencies-for-unit-testing/">only define interfaces for true architectural dependencies</a>. So, yes, my code bases may have an <code>IReservationsRepository</code> or <code>IEmailGateway</code>, but no <code>ICommandHandler</code> or <code>IUserManager</code>.
</p>
<p>
The bottom line, though, is that some interfaces exist to support testing. If there's a better way to make inherently non-deterministic systems behave deterministically in a test context, I've yet to discover it.
</p>
<p>
(As an aside, it's worth looking into tests that adopt non-deterministic behaviour as a driving principle, or at least an unavoidable state of affairs. Property-based testing is one such approach, but I also found the article <a href="https://arialdomartini.github.io/when-im-done-i-dont-clean-up">When I'm done, I don't clean up</a> by <a href="https://arialdomartini.github.io/">Arialdo Martini</a> interesting. You may also want to refer to my article <a href="/2021/01/11/waiting-to-happen">Waiting to happen</a> for a discussion of how to make tests independent of system time.)
</p>
<h3 id="f8bd1ea29b2a4ddbaf6538c238ec6979">
Where to define interfaces <a href="#f8bd1ea29b2a4ddbaf6538c238ec6979">#</a>
</h3>
<p>
The reason the above distinction is important is that it fundamentally determines where interfaces should be defined. In short, the first kind of interface is part of an object model's API, and should be defined together with that API. The second kind, on the other hand, is part of a particular application's architecture, and should be defined by the client code that talks to the interface.
</p>
<p>
As an example of the first kind, consider <a href="/2024/06/24/a-mutable-priority-collection">this recent example</a>, where the <code><span style="color:#2b91af;">IPriorityEditor</span><<span style="color:#2b91af;">T</span>></code> interface is part of the <code><span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>></code> API. You <em>must</em> ship the interface together with the class, because the <code>Edit</code> method takes an interface implementation as an argument. It's how client code interacts with the API.
</p>
<p>
Another example is <a href="/2023/12/25/serializing-restaurant-tables-in-c">this Table class</a> that comes with an <code>ITableVisitor<T></code> interface. In both cases, we'd expect interface implementations to be deterministic. These interfaces don't exist to support automated testing, but rather to afford a flexible programming model.
</p>
<p>
For the sake of argument, imagine that you package such APIs in reusable libraries that you publish via a package manager. In that case, it's obvious that the interface is as much part of the package as the class.
</p>
<p>
Contrast this with the other kind of interface, as described in the article <a href="/2023/09/04/decomposing-ctfiyhs-sample-code-base">Decomposing CTFiYH's sample code base</a> or showcased in the article <a href="/2019/04/01/an-example-of-state-based-testing-in-c">An example of state-based testing in C#</a>. In the latter example, the interfaces <code>IUserReader</code> and <code>IUserRepository</code> are <em>not</em> part of any pre-packaged library. Rather, they are defined by the application code to support application-specific needs.
</p>
<p>
This may be even more evident if you contemplate the diagram in <a href="/2023/09/04/decomposing-ctfiyhs-sample-code-base">Decomposing CTFiYH's sample code base</a>. Interfaces like <code>IPostOffice</code> and <code>IReservationsRepository</code> only exist to support the application. Following the <a href="https://en.wikipedia.org/wiki/Dependency_inversion_principle">Dependency Inversion Principle</a>
</p>
<blockquote>
<p>
"clients [...] own the abstract interfaces"
</p>
<footer><cite>Robert C. Martin, <a href="/ref/appp">APPP</a>, chapter 11</cite></footer>
</blockquote>
<p>
In these code bases, only the Controllers (or rather the tests that exercise them) need these interfaces, so the Controllers get to define them.
</p>
<h3 id="8674154df1bf4c22b59f1fd3baf5996e">
Should it be asynchronous, then? <a href="#8674154df1bf4c22b59f1fd3baf5996e">#</a>
</h3>
<p>
Okay, so should <code>INameMap.GetName</code> return <code><span style="color:blue;">string</span></code> or <code><span style="color:#2b91af;">Task</span><<span style="color:blue;">string</span>></code>, then?
</p>
<p>
Hopefully, at this point, it should be clear that the answer depends on what kind of interface it is.
</p>
<p>
If it's the first kind, the return type should support the requirements of the API. If the object model doesn't need the return type to be asynchronous, it shouldn't be.
</p>
<p>
If it's the second kind of interface, the application code decides what <em>it</em> needs, and defines the interface accordingly.
</p>
<p>
In neither case, however, is it the concrete class' responsibility to second-guess what client code might need.
</p>
<p>
<em>But client code may need the method to be asynchronous. What's the harm of returning <code><span style="color:#2b91af;">Task</span><<span style="color:blue;">string</span>></code>, just in case?</em>
</p>
<p>
The problem, as you may well be aware, is that <a href="https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/">the asynchronous programming model is contagious</a>. Once you've made an API asynchronous, you can't easily make it synchronous, whereas if you have a synchronous API, you can easily make it asynchronous. This follows from <a href="https://en.wikipedia.org/wiki/Postel%27s_law">Postel's law</a>, in this case: Be conservative with what you send.
</p>
<h3 id="17aebf480ad4468cac7eb1045dfa6041">
Library API <a href="#17aebf480ad4468cac7eb1045dfa6041">#</a>
</h3>
<p>
Imagine, for the sake of argument, that the <code>NameMap</code> class is defined in a reusable library, wrapped in a package and imported into your code base via a package manager (NuGet, Maven, pip, NPM, Hackage, RubyGems, etc.).
</p>
<p>
Clearly it shouldn't implement any interface in order to 'support unit testing', since such interfaces should be defined by application code.
</p>
<p>
It <em>could</em> implement one or more 'extensibility' interfaces, if such interfaces are part of the wider API offered by the library. In the case of the <code>NameMap</code> class, we don't really know if that's the case. To complete this part of the argument, then, I'd just leave it as shown in the first code example, shown above. It doesn't need to implement any interface, and <code>GetName</code> can just return <code>string</code>.
</p>
<h3 id="66d0c20621ef43e5808f589a30314d6f">
Domain Model <a href="#66d0c20621ef43e5808f589a30314d6f">#</a>
</h3>
<p>
What if, instead of an external library, the <code>NameMap</code> class is part of an application's Domain Model?
</p>
<p>
In that case, you <em>could</em> define application-level interfaces as part of the Domain Model. In fact, most people do. Even so, I'd recommend that you don't, at least if you're aiming for a <a href="https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell">Functional Core, Imperative Shell</a> architecture, a <a href="/2018/11/19/functional-architecture-a-definition">functional architecture</a>, or even a <a href="/2013/12/03/layers-onions-ports-adapters-its-all-the-same">Ports and Adapters</a> or, if you will, <a href="/ref/clean-architecture">Clean Architecture</a>. The interfaces that exist only to support testing are application concerns, so keep them out of the Domain Model and instead define them in the Application Model.
</p>
<p>
<img src="/content/binary/ports-and-adapters-dependency-graph-2.png" alt="Ports and Adapters diagram, with arrows pointing inward.">
</p>
<p>
You don't have to follow my advice. If you want to define interfaces in the Domain Model, I can't stop you. But what if, as I recommend, you define application-specific interfaces in the Application Model? If you do that, your <code>NameMap</code> Domain Model can't implement your <code>INameMap</code> interface, because the dependencies point the other way, and most languages will not allow circular dependencies.
</p>
<p>
In that case, what do you do if, as the original toot suggested, you need to Decorate the <code>GetName</code> method with some asynchronous behaviour?
</p>
<p>
You can always introduce an <a href="https://en.wikipedia.org/wiki/Adapter_pattern">Adapter</a>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">NameMapAdapter</span> : <span style="color:#2b91af;">INameMap</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">NameMap</span> imp;
<span style="color:blue;">public</span> <span style="color:#2b91af;">NameMapAdapter</span>(<span style="color:#2b91af;">NameMap</span> <span style="font-weight:bold;color:#1f377f;">imp</span>)
{
<span style="color:blue;">this</span>.imp = <span style="font-weight:bold;color:#1f377f;">imp</span>;
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">Task</span><<span style="color:blue;">string</span>> <span style="font-weight:bold;color:#74531f;">GetName</span>(<span style="color:#2b91af;">Guid</span> <span style="font-weight:bold;color:#1f377f;">guid</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">FromResult</span>(imp.<span style="font-weight:bold;color:#74531f;">GetName</span>(<span style="font-weight:bold;color:#1f377f;">guid</span>));
}
}</pre>
</p>
<p>
Now any <code>NameMap</code> object can look like an <code>INameMap</code>. This is exactly the kind of problem that the Adapter pattern addresses.
</p>
<p>
<em>But,</em> you say, <em>that's too much trouble! I don't want to have to maintain two classes that are almost identical.</em>
</p>
<p>
I understand the concern, and it may even be appropriate. Maybe you're right. As usual, I don't really intend this article to be prescriptive. Rather, I'm offering ideas for your consideration, and you can choose to adopt them or ignore them as it best fits your context.
</p>
<p>
When it comes to whether or not an Adapter is an unwarranted extra complication, I'll return to that topic later in this article.
</p>
<h3 id="f51b8986514046e989556923bc19a063">
Application Model <a href="#f51b8986514046e989556923bc19a063">#</a>
</h3>
<p>
The final architectural option is when the concrete <code>NameMap</code> class is part of the Application Model, where you'd also define the application-specific <code>INameMap</code> interface. In that case, we must assume that the <code>NameMap</code> class implements some application-specific concern. If you want it to implement an interface so that you can wrap it in a Decorator, then do that. This means that the <code>GetName</code> method must conform to the interface, and if that means that it must be asynchronous, then so be it.
</p>
<p>
As <a href="https://en.wikipedia.org/wiki/Kent_Beck">Kent Beck</a> wrote in a Facebook article that used to be accessible without a Facebook account (but isn't any longer):
</p>
<blockquote>
<p>
"Things that change at the same rate belong together. Things that change at different rates belong apart."
</p>
<footer><cite><a href="https://www.facebook.com/notes/kent-beck/naming-from-the-outside-in/464270190272517">Naming From the Outside In</a>, Kent Beck, Facebook, 2012</cite></footer>
</blockquote>
<p>
If the concrete <code>NameMap</code> class and the <code>INameMap</code> interface are both part of the application model, it's not unreasonable to guess that they may change together. (Do be mindful of <a href="https://en.wikipedia.org/wiki/Shotgun_surgery">Shotgun Surgery</a>, though. If you expect the interface and the concrete class to frequently change, then perhaps another design might be more appropriate.)
</p>
<h3 id="9cf2a82474ea4af49b7f5b556a1e7fce">
Easier Adapters <a href="#9cf2a82474ea4af49b7f5b556a1e7fce">#</a>
</h3>
<p>
Before concluding this article, let's revisit the topic of introducing an Adapter for the sole purpose of 'architectural purity'. Should you really go to such lengths only to 'do it right'? You decide, but
</p>
<blockquote>
<p>
You can only be pragmatic if you know how to be dogmatic.
</p>
<footer><cite><a href="/2018/11/12/what-to-test-and-not-to-test">What to test and not to test</a></cite>, me</footer>
</blockquote>
<p>
I'm presenting a dogmatic solution for your consideration, so that you know what it might look like. Would I follow my own 'dogmatic' advice? Yes, I usually do, but then, <a href="/2020/03/23/repeatable-execution">I wouldn't log the return value of a pure function</a>, so I wouldn't introduce an interface for <em>that</em> purpose, at least. To be fair to Fandermill, he or she also wrote: "or load additional mapping from a file", which could be an appropriate motivation for introducing an interface. I'd probably go with an Adapter in that case.
</p>
<p>
Whether or not an Adapter is an unwarranted complication depends, however, on language specifics. In high-<a href="/2019/12/16/zone-of-ceremony">ceremony</a> languages like C#, Java, or <a href="https://en.wikipedia.org/wiki/C%2B%2B">C++</a>, adding an Adapter involves at least one new file, and dozens of lines of code.
</p>
<p>
Consider, on the other hand, a low-ceremony language like <a href="https://www.haskell.org/">Haskell</a>. The corresponding <code>getName</code> function might close over a statically defined map and have the type <code><span style="color:#2b91af;">getName</span> <span style="color:blue;">::</span> <span style="color:blue;">UUID</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">String</span></code>.
</p>
<p>
How do you adapt such a pure function to an API that returns <a href="/2020/06/08/the-io-container">IO</a> (which is <a href="/2020/07/27/task-asynchronous-programming-as-an-io-surrogate"><em>roughly</em> comparable to task-based programming</a>)? Trivially:
</p>
<p>
<pre><span style="color:#2b91af;">getNameM</span> <span style="color:blue;">::</span> <span style="color:blue;">Monad</span> m <span style="color:blue;">=></span> <span style="color:blue;">UUID</span> <span style="color:blue;">-></span> m <span style="color:#2b91af;">String</span>
getNameM = <span style="color:blue;">return</span> . getName</pre>
</p>
<p>
For didactic purposes I have here shown the 'Adapter' as an explicit function, but in idiomatic Haskell I'd consider this below the <a href="https://wiki.haskell.org/Fairbairn_threshold">Fairbairn threshold</a>; I'd usually just inline the composition <code><span style="color:blue;">return</span> . getName</code> if I needed to adapt the <code>getName</code> function to the <a href="/2022/04/04/kleisli-composition">Kleisli</a> category.
</p>
<p>
You can do the same in <a href="https://fsharp.org/">F#</a>, where the composition would be <code><span style="color:#74531f;">getName</span> >> <span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">FromResult</span></code>. F# compositions usually go in the (for Westerners) intuitive left-to-right directions, whereas Haskell compositions follow the mathematical right-to-left convention.
</p>
<p>
The point, however, is that there's nothing conceptually complicated about an Adapter. Unfortunately, however, some languages require substantial ceremony to implement them.
</p>
<h3 id="39dae93245f141a09754ff56305fe805">
Conclusion <a href="#39dae93245f141a09754ff56305fe805">#</a>
</h3>
<p>
Should an API return a Task-based (asynchronous) value 'just in case'? In general: No.
</p>
<p>
You can't predict all possible use cases, so don't make an API more complicated than it has to be. If you need to implement an application-specific interface, use the Adapter design pattern.
</p>
<p>
A possible exception to this rule is if the entire API (the concrete implementation <em>and</em> the interface) only exists to support a specific application. If the interface and its concrete implementation are both part of the Application Model, you may as well skip the Adapter step and consider the concrete implementation as its own Adapter.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.An immutable priority collectionhttps://blog.ploeh.dk/2024/07/01/an-immutable-priority-collection2024-07-01T17:28:00+00:00Mark Seemann
<div id="post">
<p>
<em>With examples in C# and F#.</em>
</p>
<p>
This article is part of a <a href="/2024/06/12/simpler-encapsulation-with-immutability">series about encapsulation and immutability</a>. After two attempts at an object-oriented, mutable implementation, I now turn toward immutability. As already suggested in the introductory article, immutability makes it easier to maintain invariants.
</p>
<p>
In the introductory article, I described the example problem in more details, but in short, the exercise is to develop a class that holds a collection of prioritized items, with the invariant that the priorities must always sum to 100. It should be impossible to leave the object in a state where that's not true. It's quite an illuminating exercise, so if you have the time, you should try it for yourself before reading on.
</p>
<h3 id="8543ed3d0ac44d3a8d75145da7e10626">
Initialization <a href="#8543ed3d0ac44d3a8d75145da7e10626">#</a>
</h3>
<p>
Once again, I begin by figuring out how to initialize the object, and how to model it. Since it's a kind of collection, and since I now plan to keep it immutable, it seems natural to implement <a href="https://learn.microsoft.com/dotnet/api/system.collections.generic.ireadonlycollection-1">IReadOnlyCollection<T></a>.
</p>
<p>
In this, the third attempt, I'll reintroduce <code><span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>></code>, with one important difference. It's now an immutable record:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">record</span> <span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">Item</span>, <span style="color:blue;">byte</span> <span style="font-weight:bold;color:#1f377f;">Priority</span>);</pre>
</p>
<p>
If you're not on a version of C# that supports <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/builtin-types/record">records</a> (which is also trivially true if you're not using C# at all), you can always define an immutable class by hand. It just requires more <a href="https://buttondown.email/hillelwayne/archive/why-do-we-call-it-boilerplate-code/">boilerplate</a> code.
</p>
<p>
<code><span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>></code> is going to be the <code>T</code> in the <code>IReadOnlyCollection<T></code> implementation:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>> : <span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>>></pre>
</p>
<p>
Since an invariant should always hold, it should also hold at initialization, so the <code><span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>></code> constructor must check that all is as it should be:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>>[] priorities;
<span style="color:blue;">public</span> <span style="color:#2b91af;">PriorityCollection</span>(<span style="color:blue;">params</span> <span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>>[] <span style="font-weight:bold;color:#1f377f;">priorities</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">priorities</span>.<span style="font-weight:bold;color:#74531f;">Sum</span>(<span style="font-weight:bold;color:#1f377f;">p</span> => <span style="font-weight:bold;color:#1f377f;">p</span>.Priority) != 100)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">ArgumentException</span>(
<span style="color:#a31515;">"The sum of all priorities must be 100."</span>,
<span style="color:blue;">nameof</span>(<span style="font-weight:bold;color:#1f377f;">priorities</span>));
<span style="color:blue;">this</span>.priorities = <span style="font-weight:bold;color:#1f377f;">priorities</span>;
}</pre>
</p>
<p>
The rest of the class is just the <code>IReadOnlyCollection<T></code> implementation, which just delegates to the <code>priorities</code> field.
</p>
<p>
That's it, really. That's the API. We're done.
</p>
<h3 id="8b7c8d67ea144bcfbdc10823bee1e770">
Projection <a href="#8b7c8d67ea144bcfbdc10823bee1e770">#</a>
</h3>
<p>
<em>But,</em> you may ask, <em>how does one edit such a collection?</em>
</p>
<p>
<img src="/content/binary/immutable-edit-comic.jpg" alt="Bob: How do I edit an immutable object Other man: You don't, because it's a persistent data structure. Bob: Fine, it's persist. How do I edit it? Other man: You make it a monomorphic functor and compose it with projections. Bob: Did you just tell me to go fuck myself? Other man: I believe I did, Bob.">
</p>
<p>
(Comic originally by John Muellerleile.)
</p>
<p>
Humour aside, you don't edit an immutable object, but rather make a new object from a previous one. Most modern languages now come with built-in collection-projection APIs; in .NET, it's called <a href="https://learn.microsoft.com/dotnet/csharp/linq/">LINQ</a>. Here's an example. You begin with a collection with two items:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">pc</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">PriorityCollection</span><<span style="color:blue;">string</span>>(
<span style="color:blue;">new</span> <span style="color:#2b91af;">Prioritized</span><<span style="color:blue;">string</span>>(<span style="color:#a31515;">"foo"</span>, 60),
<span style="color:blue;">new</span> <span style="color:#2b91af;">Prioritized</span><<span style="color:blue;">string</span>>(<span style="color:#a31515;">"bar"</span>, 40));</pre>
</p>
<p>
You'd now like to add a third item with priority 20:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">newPriority</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">Prioritized</span><<span style="color:blue;">string</span>>(<span style="color:#a31515;">"baz"</span>, 20);</pre>
</p>
<p>
How should you make room for this new item? One option is to evenly reduce each of the existing priorities:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">reduction</span> = <span style="font-weight:bold;color:#1f377f;">newPriority</span>.Priority / <span style="font-weight:bold;color:#1f377f;">pc</span>.Count;
<span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">Prioritized</span><<span style="color:blue;">string</span>>> <span style="font-weight:bold;color:#1f377f;">reduced</span> = <span style="font-weight:bold;color:#1f377f;">pc</span>
.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">p</span> => <span style="font-weight:bold;color:#1f377f;">p</span> <span style="color:blue;">with</span> { Priority = (<span style="color:blue;">byte</span>)(<span style="font-weight:bold;color:#1f377f;">p</span>.Priority - <span style="font-weight:bold;color:#1f377f;">reduction</span>) });</pre>
</p>
<p>
Notice that while the sum of priorities in <code>reduced</code> no longer sum to 100, it's okay, because <code>reduced</code> isn't a <code>PriorityCollection</code> object. It's just an <code><span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">Prioritized</span><<span style="color:blue;">string</span>>></code>.
</p>
<p>
You can now <a href="https://learn.microsoft.com/dotnet/api/system.linq.enumerable.append">Append</a> the <code>newPriority</code> to the <code>reduced</code> sequence and repackage that in a <code>PriorityCollection</code>:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">adjusted</span> = <span style="color:blue;">new</span> <span style="color:#2b91af;">PriorityCollection</span><<span style="color:blue;">string</span>>(<span style="font-weight:bold;color:#1f377f;">reduced</span>.<span style="font-weight:bold;color:#74531f;">Append</span>(<span style="font-weight:bold;color:#1f377f;">newPriority</span>).<span style="font-weight:bold;color:#74531f;">ToArray</span>());</pre>
</p>
<p>
Like the original <code>pc</code> object, the <code>adjusted</code> object is valid upon construction, and since its immutable, it'll remain valid.
</p>
<h3 id="53475fec07ce46929951858e3d5be5ba">
Edit <a href="#53475fec07ce46929951858e3d5be5ba">#</a>
</h3>
<p>
If you think this process of unwrapping and rewrapping seems cumbersome, we can make it a bit more palatable by defining a wrapping <code>Edit</code> function, similar to the one in the <a href="/2024/06/24/a-mutable-priority-collection">previous article</a>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#74531f;">Edit</span>(
<span style="color:#2b91af;">Func</span><<span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>>>, <span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>>>> <span style="font-weight:bold;color:#1f377f;">edit</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>>(<span style="font-weight:bold;color:#1f377f;">edit</span>(<span style="color:blue;">this</span>).<span style="font-weight:bold;color:#74531f;">ToArray</span>());
}</pre>
</p>
<p>
You can now write code equivalent to the above example like this:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">adjusted</span> = <span style="font-weight:bold;color:#1f377f;">pc</span>.<span style="font-weight:bold;color:#74531f;">Edit</span>(<span style="font-weight:bold;color:#1f377f;">col</span> =>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">reduced</span> = <span style="font-weight:bold;color:#1f377f;">col</span>.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">p</span> => <span style="font-weight:bold;color:#1f377f;">p</span> <span style="color:blue;">with</span> { Priority = (<span style="color:blue;">byte</span>)(<span style="font-weight:bold;color:#1f377f;">p</span>.Priority - <span style="font-weight:bold;color:#1f377f;">reduction</span>) });
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">reduced</span>.<span style="font-weight:bold;color:#74531f;">Append</span>(<span style="font-weight:bold;color:#1f377f;">newPriority</span>);
});</pre>
</p>
<p>
I'm not sure it's much of an improvement, though.
</p>
<h3 id="12f521756ae949f1bafc8294074446b4">
Using the right tool for the job <a href="#12f521756ae949f1bafc8294074446b4">#</a>
</h3>
<p>
While C# over the years has gained some functional-programming features, it's originally an object-oriented language, and working with immutable values may still seem a bit cumbersome. If so, consider using a language natively designed for this style of programming. On .NET, <a href="https://fsharp.org/">F#</a> is the obvious choice.
</p>
<p>
First, you define the required types:
</p>
<p>
<pre><span style="color:blue;">type</span> <span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">'a</span>> = { Item: <span style="color:#2b91af;">'a</span>; Priority: <span style="color:#2b91af;">byte</span> }
<span style="color:blue;">type</span> <span style="color:#2b91af;">PriorityList</span> = <span style="color:blue;">private</span> <span style="color:#2b91af;">PriorityList</span> <span style="color:blue;">of</span> <span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">string</span>> <span style="color:#2b91af;">list</span></pre>
</p>
<p>
Notice that <code>PriorityList</code> has a <code>private</code> constructor, so that client code can't just create any value. The type should protect its invariants, since <a href="/2022/10/24/encapsulation-in-functional-programming">encapsulation is also relevant in functional programming</a>. Since client code can't directly create <code>PriorityList</code> objects, you instead supply a function for that purpose:
</p>
<p>
<pre><span style="color:blue;">module</span> <span style="color:#2b91af;">PriorityList</span> =
<span style="color:blue;">let</span> <span style="color:#74531f;">tryCreate</span> <span style="font-weight:bold;color:#1f377f;">priorities</span> =
<span style="color:blue;">if</span> <span style="font-weight:bold;color:#1f377f;">priorities</span> |> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">sumBy</span> (_.Priority) = 100uy
<span style="color:blue;">then</span> <span style="color:#2b91af;">Some</span> (<span style="color:#2b91af;">PriorityList</span> <span style="font-weight:bold;color:#1f377f;">priorities</span>)
<span style="color:blue;">else</span> <span style="color:#2b91af;">None</span></pre>
</p>
<p>
That's really it, although you also need a way to work with the data. We supply two alternatives that correspond to the above C#:
</p>
<p>
<pre><span style="color:blue;">let</span> edit f (PriorityList priorities) = f priorities |> tryCreate
<span style="color:blue;">let</span> toList (PriorityList priorities) = priorities</pre>
</p>
<p>
These functions are also defined on the <code>PriorityList</code> module.
</p>
<p>
Here's the same adjustment example as shown above in C#:
</p>
<p>
<pre><span style="color:blue;">let</span> pl =
[ { Item = <span style="color:#a31515;">"foo"</span>; Priority = 60uy }; { Item = <span style="color:#a31515;">"bar"</span>; Priority = 40uy } ]
|> <span style="color:#2b91af;">PriorityList</span>.<span style="color:#74531f;">tryCreate</span>
<span style="color:blue;">let</span> newPriority = { Item = <span style="color:#a31515;">"baz"</span>; Priority = 20uy }
<span style="color:blue;">let</span> adjusted =
pl
|> <span style="color:#2b91af;">Option</span>.<span style="color:#74531f;">bind</span> (<span style="color:#2b91af;">PriorityList</span>.<span style="color:#74531f;">edit</span> (<span style="color:blue;">fun</span> <span style="font-weight:bold;color:#1f377f;">l</span> <span style="color:blue;">-></span>
<span style="font-weight:bold;color:#1f377f;">l</span>
|> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">map</span> (<span style="color:blue;">fun</span> <span style="font-weight:bold;color:#1f377f;">p</span> <span style="color:blue;">-></span>
{ <span style="font-weight:bold;color:#1f377f;">p</span> <span style="color:blue;">with</span> Priority = <span style="font-weight:bold;color:#1f377f;">p</span>.Priority - (newPriority.Priority / <span style="color:#74531f;">byte</span> <span style="font-weight:bold;color:#1f377f;">l</span>.Length) })
|> <span style="color:#2b91af;">List</span>.<span style="color:#74531f;">append</span> [ newPriority ]))</pre>
</p>
<p>
The entire F# definition is 15 lines of code, including namespace declaration and blank lines.
</p>
<h3 id="653bf17ab4cc4353b165638d497b74f1">
Conclusion <a href="#653bf17ab4cc4353b165638d497b74f1">#</a>
</h3>
<p>
With an immutable data structure, you only need to check the invariants upon creation. Invariants therefore become preconditions. Once a value is created in a valid state, it stays valid because it never changes state.
</p>
<p>
If you're having trouble maintaining invariants in an object-oriented design, try making the object immutable. It's likely to make it easier to attain good encapsulation.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="67a3a8d099f04aee9e38633c24919e03">
<div class="comment-author">Jiehong <a href="#67a3a8d099f04aee9e38633c24919e03">#</a></div>
<div class="comment-content">
<p>
First, it's a nice series of articles.
</p>
<p>
I see that nowadays C# has a generic projection, which is a sort of <a href="https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/operators/with-expression">wither</a>
in Java parlance.
I should be usable instead of having to define the `Edit` one.
</p>
<p>
A way to make it more palatable would be to have a `tryAddAndRedistrube(Prioritized<T> element) : PriorityCollection | None` method to `PriorityCollection` that would try to reduce
priorities of elements, before adding the new one and returning a new `PriorityCollection` using that same `with` projection. This would allow the caller to
have a slightly simpler method to call, at the expense of having to store the new collection and assuming this is the intended way the caller wants to insert the element.
</p>
<p>
But, it's usually not possible to anticipate all the ways the clients wants to add elements to something, so I think I prefer the open-ended way this API lets clients choose.
</p>
</div>
<div class="comment-date">2024-07-29 13:53 UTC</div>
</div>
<div class="comment" id="caea05695c184c8294874896fa8b7553">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#caea05695c184c8294874896fa8b7553">#</a></div>
<div class="comment-content">
<p>
Thank you for writing. Whether or not a <em>wither</em> works in this case depends on language implementation details. For example, the F# example code doesn't allow <a href="https://learn.microsoft.com/dotnet/fsharp/language-reference/copy-and-update-record-expressions">copy-and-update expressions</a> because the record constructor is <code>private</code>. This is as it should be, since otherwise, client code would be able to circumvent the encapsulation.
</p>
<p>
I haven't tried to refactor the C# class to a <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/builtin-types/record">record</a>, and I don't recall whether C# <em>with expressions</em> respect custom constructors. That's a good exercise for any reader to try out; unfortunately, I don't have time for that at the moment.
</p>
<p>
As to your other point, it's definitely conceivable that a library developer could add more convenient methods to the <code><span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>></code> class, including one that uses a simple formula to redistribute existing priorities to make way for the new one. As far as I can tell, though, you'd be able to implement such more convenient APIs as extension methods that are implemented using the basic affordances already on display here. If so, we may consider the constructor and the <code><span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>>></code> interface as the fundamental API. Everything else, including the <code>Edit</code> method, could build off that.
</p>
</div>
<div class="comment-date">2024-07-30 06:46 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.A mutable priority collectionhttps://blog.ploeh.dk/2024/06/24/a-mutable-priority-collection2024-06-24T17:59:00+00:00Mark Seemann
<div id="post">
<p>
<em>An encapsulated, albeit overly complicated, implementation.</em>
</p>
<p>
This is the second in a <a href="/2024/06/12/simpler-encapsulation-with-immutability">series of articles about encapsulation and immutability</a>. In the next article, you'll see how immutability makes encapsulation easier, but in order to appreciate that, you should see the alternative. This article, then, shows a working, albeit overly complicated, implementation that does maintain its invariants.
</p>
<p>
In the introductory article, I described the example problem in more details, but in short, the exercise is to develop a class that holds a collection of prioritized items, with the invariant that the priorities must always sum to 100. It should be impossible to leave the object in a state where that's not true. It's quite an illuminating exercise, so if you have the time, you should try it for yourself before reading on.
</p>
<h3 id="9f40c96077664ab7acbc2705d9e0d2ea">
Initialization <a href="#9f40c96077664ab7acbc2705d9e0d2ea">#</a>
</h3>
<p>
As the <a href="/2024/06/17/a-failed-attempt-at-priority-collection-with-inheritance">previous article</a> demonstrated, inheriting directly from a base class seems like a dead end. Once you see the direction that I go in this article, you may argue that it'd be possible to also make that design work with an inherited collection. It may be, but I'm not convinced that it would improve anything. Thus, for this iteration, I decided to eschew inheritance.
</p>
<p>
On the other hand, we need an API to <a href="https://en.wikipedia.org/wiki/Command%E2%80%93query_separation">query</a> the object about its state, and I found that it made sense to implement the <a href="https://learn.microsoft.com/dotnet/api/system.collections.generic.ireadonlydictionary-2">IReadOnlyDictionary</a> interface.
</p>
<p>
As before, invariants are statements that are always true about an object, and that includes a newly initialized object. Thus, the <code><span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>></code> class should require enough information to safely initialize.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>> : <span style="color:#2b91af;">IReadOnlyDictionary</span><<span style="color:#2b91af;">T</span>, <span style="color:blue;">byte</span>> <span style="color:blue;">where</span> <span style="color:#2b91af;">T</span> : <span style="color:blue;">notnull</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">Dictionary</span><<span style="color:#2b91af;">T</span>, <span style="color:blue;">byte</span>> dict;
<span style="color:blue;">public</span> <span style="color:#2b91af;">PriorityCollection</span>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">initial</span>)
{
dict = <span style="color:blue;">new</span> <span style="color:#2b91af;">Dictionary</span><<span style="color:#2b91af;">T</span>, <span style="color:blue;">byte</span>> { { <span style="font-weight:bold;color:#1f377f;">initial</span>, 100 } };
}
<span style="color:green;">// IReadOnlyDictionary implemented by delegating to dict field...</span>
}</pre>
</p>
<p>
Several design decisions are different from the previous article. This design has no <code><span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>></code> class. Instead it treats the item (of type <code>T</code>) as a dictionary key, and the priority as the value. The most important motivation for this design decision was that this enables me to avoid the 'leaf node mutation' problem that I demonstrated in the previous article. Notice how, while the general design in this iteration will be object-oriented and mutable, I already take advantage of a bit of immutability to make the design simpler and safer.
</p>
<p>
Another difference is that you can't initialize a <code><span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>></code> object with a list. Instead, you only need to tell the constructor what the <code>initial</code> item is. The constructor will then infer that, since this is the only item so far, its priority must be 100. It can't be anything else, because that would violate the invariant. Thus, no assertion is required in the constructor.
</p>
<h3 id="f3c5abe45ba949e69c349dc8e21e959a">
Mutation API <a href="#f3c5abe45ba949e69c349dc8e21e959a">#</a>
</h3>
<p>
So far, the code only implements the <code>IReadOnlyDictionary</code> API, so we need to add some methods that will enable us to add new items and so on. As a start, we can add methods to add, remove, or update items:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Add</span>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">key</span>, <span style="color:blue;">byte</span> <span style="font-weight:bold;color:#1f377f;">value</span>)
{
<span style="font-weight:bold;color:#74531f;">AssertInvariants</span>(dict.<span style="font-weight:bold;color:#74531f;">Append</span>(<span style="color:#2b91af;">KeyValuePair</span>.<span style="color:#74531f;">Create</span>(<span style="font-weight:bold;color:#1f377f;">key</span>, <span style="font-weight:bold;color:#1f377f;">value</span>)));
dict.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="font-weight:bold;color:#1f377f;">key</span>, <span style="font-weight:bold;color:#1f377f;">value</span>);
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Remove</span>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">key</span>)
{
<span style="font-weight:bold;color:#74531f;">AssertInvariants</span>(dict.<span style="font-weight:bold;color:#74531f;">Where</span>(<span style="font-weight:bold;color:#1f377f;">kvp</span> => !<span style="font-weight:bold;color:#1f377f;">kvp</span>.Key.<span style="font-weight:bold;color:#74531f;">Equals</span>(<span style="font-weight:bold;color:#1f377f;">key</span>)));
dict.<span style="font-weight:bold;color:#74531f;">Remove</span>(<span style="font-weight:bold;color:#1f377f;">key</span>);
}
<span style="color:blue;">public</span> <span style="color:blue;">byte</span> <span style="color:blue;">this</span>[<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">key</span>]
{
<span style="color:blue;">get</span> { <span style="font-weight:bold;color:#8f08c4;">return</span> dict[<span style="font-weight:bold;color:#1f377f;">key</span>]; }
<span style="color:blue;">set</span>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">l</span> = dict.<span style="font-weight:bold;color:#74531f;">ToDictionary</span>(<span style="font-weight:bold;color:#1f377f;">kvp</span> => <span style="font-weight:bold;color:#1f377f;">kvp</span>.Key, <span style="font-weight:bold;color:#1f377f;">kvp</span> => <span style="font-weight:bold;color:#1f377f;">kvp</span>.Value);
<span style="font-weight:bold;color:#1f377f;">l</span>[<span style="font-weight:bold;color:#1f377f;">key</span>] = <span style="color:blue;">value</span>;
<span style="font-weight:bold;color:#74531f;">AssertInvariants</span>(<span style="font-weight:bold;color:#1f377f;">l</span>);
dict[<span style="font-weight:bold;color:#1f377f;">key</span>] = <span style="color:blue;">value</span>;
}
}</pre>
</p>
<p>
I'm not going to show the <code>AssertInvariants</code> helper method yet, since it's going to change anyway.
</p>
<p>
At this point, the implementation suffers from the same problem as the example in the previous article. While you can add new items, you can only add an item with priority 0. You can only remove items if they have priority 0. And you can only 'update' an item if you set the priority to the same value as it already had.
</p>
<p>
We need to be able to add new items, change their priorities, and so on. How do we get around the above problem, without breaking the invariant?
</p>
<h3 id="f9400ef096f94b15b4c32ae2f35ba000">
Edit mode <a href="#f9400ef096f94b15b4c32ae2f35ba000">#</a>
</h3>
<p>
One way out of this conundrum is introduce a kind of 'edit mode'. The idea is to temporarily turn off the maintenance of the invariant for long enough to allow edits.
</p>
<p>
Af first glance, such an idea seems to go against the very definition of an invariant. After all, an invariant is a statement about the object that is <em>always</em> true. If you allow a client developer to turn off that guarantee, then, clearly, the guarantee is gone. Guarantees only work if you can trust them, and you can't trust them if they can be cancelled.
</p>
<p>
That idea in itself doesn't work, but if we can somehow encapsulate such an 'edit action' in an isolated scope that either succeeds or fails in its entirety, we may be getting somewhere. It's an idea similar to <a href="https://en.wikipedia.org/wiki/Unit_of_work">Unit of Work</a>, although here we're not involving an actual database. Still, an 'edit action' is a kind of in-memory transaction.
</p>
<p>
For didactic reasons, I'll move toward that design in a series of step, where the intermediate steps fail to maintain the invariant. We'll get there eventually. The first step is to introduce 'edit mode'.
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">bool</span> isEditing;</pre>
</p>
<p>
While I could have made that flag <code>public</code>, I found it more natural to wrap access to it in two methods:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">BeginEdit</span>()
{
isEditing = <span style="color:blue;">true</span>;
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">EndEdit</span>()
{
isEditing = <span style="color:blue;">false</span>;
}</pre>
</p>
<p>
This still doesn't accomplishes anything in itself, but the final change in this step is to change the assertion so that it respects the flag:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">AssertInvariants</span>(<span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">KeyValuePair</span><<span style="color:#2b91af;">T</span>, <span style="color:blue;">byte</span>>> <span style="font-weight:bold;color:#1f377f;">candidate</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (!isEditing && <span style="font-weight:bold;color:#1f377f;">candidate</span>.<span style="font-weight:bold;color:#74531f;">Sum</span>(<span style="font-weight:bold;color:#1f377f;">kvp</span> => <span style="font-weight:bold;color:#1f377f;">kvp</span>.Value) != 100)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">InvalidOperationException</span>(
<span style="color:#a31515;">"The sum of all values must be 100."</span>);
}</pre>
</p>
<p>
Finally, you can add or change priorities, as this little <a href="https://fsharp.org/">F#</a> example shows:
</p>
<p>
<pre>sut.BeginEdit ()
sut[<span style="color:#a31515;">"foo"</span>] <span style="color:blue;"><-</span> 50uy
sut[<span style="color:#a31515;">"bar"</span>] <span style="color:blue;"><-</span> 50uy
sut.EndEdit ()</pre>
</p>
<p>
Even if you nominally 'don't read F#', this little example is almost like C# without semicolons. The <code><span style="color:blue;"><-</span></code> arrow is F#'s mutation or assignment operator, which in C# would be <code>=</code>, and the <code>uy</code> suffix is <a href="https://learn.microsoft.com/dotnet/fsharp/language-reference/literals">the F# way of stating that the literal is a <code>byte</code></a>.
</p>
<p>
The above example is well-behaved because the final state of the object is valid. The priorities sum to 100. Even so, no code in <code><span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>></code> actually checks that, so we could trivially leave the object in an invalid state.
</p>
<h3 id="bcd53701e2214e99adcb1090acf3536a">
Assert invariant at end of edit <a href="#bcd53701e2214e99adcb1090acf3536a">#</a>
</h3>
<p>
The first step toward remedying that problem is to add a check to the <code>EndEdit</code> method:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">EndEdit</span>()
{
isEditing = <span style="color:blue;">false</span>;
<span style="font-weight:bold;color:#74531f;">AssertInvariants</span>(dict);
}</pre>
</p>
<p>
The class is still not effectively protecting its invariants, because a client developer could forget to call <code>EndEdit</code>, or client code might pass around a collection in edit mode. Other code, receiving such an object as an argument, may not know whether or not it's in edit mode, so again, doesn't know if it can trust it.
</p>
<p>
We'll return to that problem shortly, but first, there's another, perhaps more pressing issue that we should attend to.
</p>
<h3 id="45cbf95701e14f08974cb5f1a585ad79">
Edit dictionary <a href="#45cbf95701e14f08974cb5f1a585ad79">#</a>
</h3>
<p>
The current implementation directly edits the collection, and even if a client developer remembers to call <code>EndEdit</code>, other code, higher up in the call stack could circumvent the check and leave the object in an invalid state. Not that I expect client developers to be deliberately malicious, but the notion that someone might wrap a method call in a <code>try-catch</code> block seems realistic.
</p>
<p>
The following F# unit test demonstrates the issue:
</p>
<p>
<pre>[<<span style="color:#2b91af;">Fact</span>>]
<span style="color:blue;">let</span> <span style="color:#74531f;">``Attempt to circumvent``</span> () =
<span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">string</span>> <span style="color:#a31515;">"foo"</span>
<span style="color:blue;">try</span>
<span style="font-weight:bold;color:#1f377f;">sut</span>.<span style="font-weight:bold;color:#74531f;">BeginEdit</span> ()
<span style="font-weight:bold;color:#1f377f;">sut</span>[<span style="color:#a31515;">"foo"</span>] <span style="color:blue;"><-</span> 50uy
<span style="font-weight:bold;color:#1f377f;">sut</span>[<span style="color:#a31515;">"bar"</span>] <span style="color:blue;"><-</span> 48uy
<span style="font-weight:bold;color:#1f377f;">sut</span>.<span style="font-weight:bold;color:#74531f;">EndEdit</span> ()
<span style="color:blue;">with</span> _ <span style="color:blue;">-></span> ()
100uy =! <span style="font-weight:bold;color:#1f377f;">sut</span>[<span style="color:#a31515;">"foo"</span>]
<span style="color:#74531f;">test</span> <@ <span style="font-weight:bold;color:#1f377f;">sut</span>.<span style="font-weight:bold;color:#74531f;">ContainsKey</span> <span style="color:#a31515;">"bar"</span> |> <span style="color:#74531f;">not</span> @></pre>
</p>
<p>
Again, let me walk you through it in case you're unfamiliar with F#.
</p>
<p>
The <a href="https://learn.microsoft.com/dotnet/fsharp/language-reference/exception-handling/the-try-with-expression">try-with</a> block works just like C# <code>try-catch</code> blocks. Inside of that <code>try-with</code> block, the test enters edit mode, changes the values in such a way that the sum of them is 98, and then calls <code>EndEdit</code>. While <code>EndEdit</code> throws an exception, those four lines of code are wrapped in a <code>try-with</code> block that suppresses all exceptions.
</p>
<p>
The test attempts to verify that, since the edit failed, the <code>"foo"</code> value should be 100, and there should be no <code>"bar"</code> value. This turns out not to be the case. The test fails. The edits persist, even though <code>EndEdit</code> throws an exception, because there's no roll-back.
</p>
<p>
You could probably resolve that defect in various ways, but I chose to address it by introducing two, instead of one, backing dictionaries. One holds the data that always maintains the invariant, and the other is a temporary dictionary for editing.
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:#2b91af;">Dictionary</span><<span style="color:#2b91af;">T</span>, <span style="color:blue;">byte</span>> current;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">Dictionary</span><<span style="color:#2b91af;">T</span>, <span style="color:blue;">byte</span>> encapsulated;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">Dictionary</span><<span style="color:#2b91af;">T</span>, <span style="color:blue;">byte</span>> editable;
<span style="color:blue;">private</span> <span style="color:blue;">bool</span> isEditing;
<span style="color:blue;">public</span> <span style="color:#2b91af;">PriorityCollection</span>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">initial</span>)
{
encapsulated = <span style="color:blue;">new</span> <span style="color:#2b91af;">Dictionary</span><<span style="color:#2b91af;">T</span>, <span style="color:blue;">byte</span>> { { <span style="font-weight:bold;color:#1f377f;">initial</span>, 100 } };
editable = [];
current = encapsulated;
}</pre>
</p>
<p>
There are two dictionaries: <code>encapsulated</code> holds the always-valid list of priorities, while <code>editable</code> is the dictionary that client code will be editing when in edit mode. Finally, <code>current</code> is either of these: <code>editable</code> when the object is in edit mode, and <code>encapsulated</code> when it's not. Most of the existing code shown so far now uses <code>current</code>, which before was called <code>dict</code>. The important changes are in <code>BeginEdit</code> and <code>EndEdit</code>.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">BeginEdit</span>()
{
isEditing = <span style="color:blue;">true</span>;
editable.<span style="font-weight:bold;color:#74531f;">Clear</span>();
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">kvp</span> <span style="font-weight:bold;color:#8f08c4;">in</span> current)
editable.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="font-weight:bold;color:#1f377f;">kvp</span>.Key, <span style="font-weight:bold;color:#1f377f;">kvp</span>.Value);
current = editable;
}</pre>
</p>
<p>
Besides setting the <code>isEditing</code> flag, <code>BeginEdit</code> now copies all data from <code>current</code> to <code>editable</code>, and then sets <code>current</code> to <code>editable</code>. Keep in mind that <code>encapsulated</code> still holds the original, valid values.
</p>
<p>
Now that I'm writing this, I'm not even sure if this method is re-entrant, in the following sense: What happens if client code calls <code>BeginEdit</code>, makes some changes, and then calls <code>BeginEdit</code> again? It's questions like these that I don't feel intelligent enough to feel safe that I always answer correctly. That's why I like functional programming better. I don't have to think so hard.
</p>
<p>
Anyway, this will soon become irrelevant, since <code>BeginEdit</code> and <code>EndEdit</code> will eventually become <code>private</code> methods.
</p>
<p>
The <code>EndEdit</code> method performs the inverse manoeuvre:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">EndEdit</span>()
{
isEditing = <span style="color:blue;">false</span>;
<span style="font-weight:bold;color:#8f08c4;">try</span>
{
<span style="font-weight:bold;color:#74531f;">AssertInvariants</span>(current);
encapsulated.<span style="font-weight:bold;color:#74531f;">Clear</span>();
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">kvp</span> <span style="font-weight:bold;color:#8f08c4;">in</span> current)
encapsulated.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="font-weight:bold;color:#1f377f;">kvp</span>.Key, <span style="font-weight:bold;color:#1f377f;">kvp</span>.Value);
current = encapsulated;
}
<span style="font-weight:bold;color:#8f08c4;">catch</span>
{
current = encapsulated;
<span style="font-weight:bold;color:#8f08c4;">throw</span>;
}
}</pre>
</p>
<p>
It first checks the invariant, and only copies the edited values to the <code>encapsulated</code> dictionary if the invariant still holds. Otherwise, it restores the original <code>encapsulated</code> values and rethrows the exception.
</p>
<p>
This helps to make the nature of editing 'transactional' in nature, but it doesn't address the issue that the collection is in an invalid state during editing, or that a client developer may forget to call <code>EndEdit</code>.
</p>
<h3 id="d53f9fe83e944d1bbd58d478f9fc27bc">
Edit action <a href="#d53f9fe83e944d1bbd58d478f9fc27bc">#</a>
</h3>
<p>
As the next step towards addressing that problem, we may now introduce a 'wrapper method' for that little object protocol:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Edit</span>(<span style="color:#2b91af;">Action</span><<span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>>> <span style="font-weight:bold;color:#1f377f;">editAction</span>)
{
<span style="font-weight:bold;color:#74531f;">BeginEdit</span>();
<span style="font-weight:bold;color:#1f377f;">editAction</span>(<span style="color:blue;">this</span>);
<span style="font-weight:bold;color:#74531f;">EndEdit</span>();
}</pre>
</p>
<p>
As you can see, it just wraps that little call sequence so that you don't have to remember to call <code>BeginEdit</code> and <code>EndEdit</code>. My F# test code comes with this example:
</p>
<p>
<pre>sut.Edit (<span style="color:blue;">fun</span> col <span style="color:blue;">-></span>
col[<span style="color:#a31515;">"bar"</span>] <span style="color:blue;"><-</span> 55uy
col[<span style="color:#a31515;">"baz"</span>] <span style="color:blue;"><-</span> 45uy
col.Remove <span style="color:#a31515;">"foo"</span>
)</pre>
</p>
<p>
The <code><span style="color:blue;">fun</span> <span style="font-weight:bold;color:#1f377f;">col</span> <span style="color:blue;">-></span></code> part is just F# syntax for a lambda expression. In C#, you'd write it as <code>col =></code>.
</p>
<p>
We're close to a solution. What remains is to make <code>BeginEdit</code> and <code>EndEdit</code> <code>private</code>. This means that client code can only edit a <code><span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>></code> object through the <code>Edit</code> method.
</p>
<h3 id="b04cc69a08c749f39d8e9276e444046a">
Replace action with interface <a href="#b04cc69a08c749f39d8e9276e444046a">#</a>
</h3>
<p>
You may complain that this solution isn't properly object-oriented, since it makes use of <a href="https://learn.microsoft.com/dotnet/api/system.action-1">Action<T></a> and requires that client code uses lambda expressions.
</p>
<p>
We can easily fix that.
</p>
<p>
Instead of the action, you can introduce a <a href="https://en.wikipedia.org/wiki/Command_pattern">Command</a> interface with the same signature:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IPriorityEditor</span><<span style="color:#2b91af;">T</span>> <span style="color:blue;">where</span> <span style="color:#2b91af;">T</span> : <span style="color:blue;">notnull</span>
{
<span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">EditPriorities</span>(<span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">priorities</span>);
}</pre>
</p>
<p>
Next, change the <code>Edit</code> method:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Edit</span>(<span style="color:#2b91af;">IPriorityEditor</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">editor</span>)
{
<span style="font-weight:bold;color:#74531f;">BeginEdit</span>();
<span style="font-weight:bold;color:#1f377f;">editor</span>.<span style="font-weight:bold;color:#74531f;">EditPriorities</span>(<span style="color:blue;">this</span>);
<span style="font-weight:bold;color:#74531f;">EndEdit</span>();
}</pre>
</p>
<p>
Now you have a nice, object-oriented design, with no lambda expressions in sight.
</p>
<h3 id="e15b63b0a3ef4f229caedc15a2b64f39">
Full code dump <a href="#e15b63b0a3ef4f229caedc15a2b64f39">#</a>
</h3>
<p>
The final code is complex enough that it's easy to lose track of what it looks like, as I walk through my process. To make it easer, here's the full code for the collection class:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>> : <span style="color:#2b91af;">IReadOnlyDictionary</span><<span style="color:#2b91af;">T</span>, <span style="color:blue;">byte</span>>
<span style="color:blue;">where</span> <span style="color:#2b91af;">T</span> : <span style="color:blue;">notnull</span>
{
<span style="color:blue;">private</span> <span style="color:#2b91af;">Dictionary</span><<span style="color:#2b91af;">T</span>, <span style="color:blue;">byte</span>> current;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">Dictionary</span><<span style="color:#2b91af;">T</span>, <span style="color:blue;">byte</span>> encapsulated;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:#2b91af;">Dictionary</span><<span style="color:#2b91af;">T</span>, <span style="color:blue;">byte</span>> editable;
<span style="color:blue;">private</span> <span style="color:blue;">bool</span> isEditing;
<span style="color:blue;">public</span> <span style="color:#2b91af;">PriorityCollection</span>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">initial</span>)
{
encapsulated = <span style="color:blue;">new</span> <span style="color:#2b91af;">Dictionary</span><<span style="color:#2b91af;">T</span>, <span style="color:blue;">byte</span>> { { <span style="font-weight:bold;color:#1f377f;">initial</span>, 100 } };
editable = [];
current = encapsulated;
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Add</span>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">key</span>, <span style="color:blue;">byte</span> <span style="font-weight:bold;color:#1f377f;">value</span>)
{
<span style="font-weight:bold;color:#74531f;">AssertInvariants</span>(current.<span style="font-weight:bold;color:#74531f;">Append</span>(<span style="color:#2b91af;">KeyValuePair</span>.<span style="color:#74531f;">Create</span>(<span style="font-weight:bold;color:#1f377f;">key</span>, <span style="font-weight:bold;color:#1f377f;">value</span>)));
current.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="font-weight:bold;color:#1f377f;">key</span>, <span style="font-weight:bold;color:#1f377f;">value</span>);
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Remove</span>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">key</span>)
{
<span style="font-weight:bold;color:#74531f;">AssertInvariants</span>(current.<span style="font-weight:bold;color:#74531f;">Where</span>(<span style="font-weight:bold;color:#1f377f;">kvp</span> => !<span style="font-weight:bold;color:#1f377f;">kvp</span>.Key.<span style="font-weight:bold;color:#74531f;">Equals</span>(<span style="font-weight:bold;color:#1f377f;">key</span>)));
current.<span style="font-weight:bold;color:#74531f;">Remove</span>(<span style="font-weight:bold;color:#1f377f;">key</span>);
}
<span style="color:blue;">public</span> <span style="color:blue;">byte</span> <span style="color:blue;">this</span>[<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">key</span>]
{
<span style="color:blue;">get</span> { <span style="font-weight:bold;color:#8f08c4;">return</span> current[<span style="font-weight:bold;color:#1f377f;">key</span>]; }
<span style="color:blue;">set</span>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">l</span> = current.<span style="font-weight:bold;color:#74531f;">ToDictionary</span>(<span style="font-weight:bold;color:#1f377f;">kvp</span> => <span style="font-weight:bold;color:#1f377f;">kvp</span>.Key, <span style="font-weight:bold;color:#1f377f;">kvp</span> => <span style="font-weight:bold;color:#1f377f;">kvp</span>.Value);
<span style="font-weight:bold;color:#1f377f;">l</span>[<span style="font-weight:bold;color:#1f377f;">key</span>] = <span style="color:blue;">value</span>;
<span style="font-weight:bold;color:#74531f;">AssertInvariants</span>(<span style="font-weight:bold;color:#1f377f;">l</span>);
current[<span style="font-weight:bold;color:#1f377f;">key</span>] = <span style="color:blue;">value</span>;
}
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Edit</span>(<span style="color:#2b91af;">IPriorityEditor</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">editor</span>)
{
<span style="font-weight:bold;color:#74531f;">BeginEdit</span>();
<span style="font-weight:bold;color:#1f377f;">editor</span>.<span style="font-weight:bold;color:#74531f;">EditPriorities</span>(<span style="color:blue;">this</span>);
<span style="font-weight:bold;color:#74531f;">EndEdit</span>();
}
<span style="color:blue;">private</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">BeginEdit</span>()
{
isEditing = <span style="color:blue;">true</span>;
editable.<span style="font-weight:bold;color:#74531f;">Clear</span>();
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">kvp</span> <span style="font-weight:bold;color:#8f08c4;">in</span> current)
editable.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="font-weight:bold;color:#1f377f;">kvp</span>.Key, <span style="font-weight:bold;color:#1f377f;">kvp</span>.Value);
current = editable;
}
<span style="color:blue;">private</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">EndEdit</span>()
{
isEditing = <span style="color:blue;">false</span>;
<span style="font-weight:bold;color:#8f08c4;">try</span>
{
<span style="font-weight:bold;color:#74531f;">AssertInvariants</span>(current);
encapsulated.<span style="font-weight:bold;color:#74531f;">Clear</span>();
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">kvp</span> <span style="font-weight:bold;color:#8f08c4;">in</span> current)
encapsulated.<span style="font-weight:bold;color:#74531f;">Add</span>(<span style="font-weight:bold;color:#1f377f;">kvp</span>.Key, <span style="font-weight:bold;color:#1f377f;">kvp</span>.Value);
current = encapsulated;
}
<span style="font-weight:bold;color:#8f08c4;">catch</span>
{
current = encapsulated;
<span style="font-weight:bold;color:#8f08c4;">throw</span>;
}
}
<span style="color:blue;">private</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">AssertInvariants</span>(<span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">KeyValuePair</span><<span style="color:#2b91af;">T</span>, <span style="color:blue;">byte</span>>> <span style="font-weight:bold;color:#1f377f;">candidate</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (!isEditing && <span style="font-weight:bold;color:#1f377f;">candidate</span>.<span style="font-weight:bold;color:#74531f;">Sum</span>(<span style="font-weight:bold;color:#1f377f;">kvp</span> => <span style="font-weight:bold;color:#1f377f;">kvp</span>.Value) != 100)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">InvalidOperationException</span>(
<span style="color:#a31515;">"The sum of all values must be 100."</span>);
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">T</span>> Keys
{
<span style="color:blue;">get</span> { <span style="font-weight:bold;color:#8f08c4;">return</span> current.Keys; }
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">IEnumerable</span><<span style="color:blue;">byte</span>> Values
{
<span style="color:blue;">get</span> { <span style="font-weight:bold;color:#8f08c4;">return</span> current.Values; }
}
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Count
{
<span style="color:blue;">get</span> { <span style="font-weight:bold;color:#8f08c4;">return</span> current.Count; }
}
<span style="color:blue;">public</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">ContainsKey</span>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">key</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> current.<span style="font-weight:bold;color:#74531f;">ContainsKey</span>(<span style="font-weight:bold;color:#1f377f;">key</span>);
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">IEnumerator</span><<span style="color:#2b91af;">KeyValuePair</span><<span style="color:#2b91af;">T</span>, <span style="color:blue;">byte</span>>> <span style="font-weight:bold;color:#74531f;">GetEnumerator</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> current.<span style="font-weight:bold;color:#74531f;">GetEnumerator</span>();
}
<span style="color:blue;">public</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">TryGetValue</span>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">key</span>, [<span style="color:#2b91af;">MaybeNullWhen</span>(<span style="color:blue;">false</span>)] <span style="color:blue;">out</span> <span style="color:blue;">byte</span> <span style="font-weight:bold;color:#1f377f;">value</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> current.<span style="font-weight:bold;color:#74531f;">TryGetValue</span>(<span style="font-weight:bold;color:#1f377f;">key</span>, <span style="color:blue;">out</span> <span style="font-weight:bold;color:#1f377f;">value</span>);
}
<span style="color:#2b91af;">IEnumerator</span> <span style="color:#2b91af;">IEnumerable</span>.<span style="font-weight:bold;color:#74531f;">GetEnumerator</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#74531f;">GetEnumerator</span>();
}
}</pre>
</p>
<p>
The <code><span style="color:#2b91af;">IPriorityEditor</span><<span style="color:#2b91af;">T</span>></code> interface remains as shown above.
</p>
<h3 id="7994e307ceaf4857901378fe711f3ceb">
Conclusion <a href="#7994e307ceaf4857901378fe711f3ceb">#</a>
</h3>
<p>
Given how simple the problem is, this solution is surprisingly complicated, and I'm fairly sure that it's not even thread-safe.
</p>
<p>
At least it does, as far as I can tell, protect the invariant that the sum of priorities must always be exactly 100. Even so, it's just complicated enough that I wouldn't be surprised if a bug is lurking somewhere. It'd be nice if a simpler design existed.
</p>
<p>
<strong>Next:</strong> <a href="/2024/07/01/an-immutable-priority-collection">An immutable priority collection</a>.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="667d3faaebff4009bfffcd15612280ac">
<div class="comment-author">Joker_vD <a href="#667d3faaebff4009bfffcd15612280ac">#</a></div>
<div class="comment-content">
<p>
Where does the notion come that a data structure invariant has to be true <i>at all times</i>? I am fairly certain that it's only required to be true at "quiescent" points of executions. That is, just as the loop invariant is only required to hold before and after each loop step but not inside the loop step, so is the data structure invariant is only required to hold before and after each invocation of its public methods.
</p>
<p>
This definition actually has an interesting quirk which is absent in the loop invariant: a data structure's method can't, generally speaking, call other public methods of the very same data structure because the invariant might not hold at this particular point of execution! I've been personally bitten by this a couple of times, and I've seen others tripping over this subtle point as well. You yourself notice it when you muse about the re-entrancy of the <code>BeginEdit</code> method.
</p>
<p>
Now, this particular problem is quite similar to the problem with inner iteration, and can be solved the same way, with the outer editor, as you've done, although I would have probably provided each editor with its own, separate <code>editable</code> dictionary because right now, the editors cannot nest/compose... but that'd complicate implementation even further.
</p>
</div>
<div class="comment-date">2024-07-03 22:19 UTC</div>
</div>
<div class="comment" id="6a880573a3424f37b74bc78a8276d441">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#6a880573a3424f37b74bc78a8276d441">#</a></div>
<div class="comment-content">
<p>
Thank you for writing. As so many other areas of knowledge, the wider field of software development suffers from the problem of overlapping or overloaded terminology. The word <em>invariant</em> is just one of them. In this context, <em>invariant</em> doesn't refer to loop invariants, or any other kind of invariants used in algorithmic analysis.
</p>
<p>
As outlined in the <a href="/2024/06/12/simpler-encapsulation-with-immutability">introduction article</a>, when discussing <a href="/encapsulation-and-solid">encapsulation</a>, I follow <a href="/ref/oosc">Object-Oriented Software Construction</a> (OOSC). In that seminal work, <a href="https://en.wikipedia.org/wiki/Bertrand_Meyer">Bertrand Meyer</a> proposes the notion of design-by-contract, and specifically decomposes a contract into three parts: preconditions, invariants, and postconditions.
</p>
<p>
Having actually read the book, I'm well aware that it uses <a href="https://en.wikipedia.org/wiki/Eiffel_(programming_language)">Eiffel</a> as an exemplar of the concept. This has led many readers to conflate design-by-contract with Eiffel, and (in yet another logical derailment) conclude that it doesn't apply to, say, Java or C# programming.
</p>
<p>
It turns out, however, to transfer easily to other languages, and it's a concept with much practical potential.
</p>
<p>
A major problem with object-oriented design is that most ideas about good design are too 'fluffy' to be of immediate use to most developers. Take the <a href="https://en.wikipedia.org/wiki/Single-responsibility_principle">Single Responsibility Principle</a> (SRP) as an example. It's seductively easy to grasp the overall idea, but turns out to be hard to apply. Being able to identify <em>reasons to change</em> requires more programming experience than most people have. Or rather, the SRP is mostly useful to programmers who already have that experience. Being too 'fluffy', it's not a good learning tool.
</p>
<p>
I've spent quite some time with development organizations and individual programmers eager to learn, but struggling to find useful, concrete design rules. The decomposition of encapsulation into preconditions, invariants, and postconditions works well as a concrete, almost quantifiable heuristic.
</p>
<p>
Does it encompass everything that encapsulation means? Probably not, but it's by far the most effective heuristic that I've found so far.
</p>
<p>
Since I'm currently travelling, I don't have my copy of OOSC with me, but as far as I remember, the notion that an invariant should be true at all times originates there.
</p>
<p>
In any case, if an invariant doesn't always hold, then of what value is it? The whole idea behind encapsulation (as I read Meyer) is that client developers should be able to use 'objects' without having intimate knowledge of their implementation details. The use of <em>contracts</em> proposes to achieve that ideal by decoupling affordances from implementation details by condensing the legal protocol between object and client code into a contract. This means that a client developer, when making programming decisions, should be able to trust that certain guarantees stipulated by a contract always hold. If a client developer can't trust those guarantees, they aren't really guarantees.
</p>
<blockquote>
<p>
"the data structure invariant is only required to hold before and after each invocation of its public methods"
</p>
</blockquote>
<p>
I can see how a literal reading of OOSC may leave one with that impression. One must keep in mind, however, that the book was written in the eighties, at a time when multithreading wasn't much of a concern. (Incidentally, this is an omission that also mars a much later book on API design, the first edition of the .NET <a href="/ref/fdg">Framework Design Guidelines</a>.)
</p>
<p>
In modern code, concurrent execution is a real possibility, so is at least worth keeping in mind. I'm still most familiar with the .NET ecosystem, and in it, there are plenty of classes that are documented as <em>not</em> being thread-safe. You could say that such a statement is part of the contract, in which case what you wrote is true: The invariant is only required to hold before and after each method invocation.
</p>
<p>
If, on the other hand, you want to make the code thread-safe, you must be more rigorous than that. Then an invariant must truly always hold.
</p>
<p>
This is, of course, a design decision one may take. Just don't bother with thread-safety if it's not important.
</p>
<p>
Still, the overall thrust of this article series is that immutability makes encapsulation much simpler. This is also true when it comes to concurrency. Immutable data structures are automatically thread-safe.
</p>
</div>
<div class="comment-date">2024-07-06 8:07 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.A failed attempt at priority collection with inheritancehttps://blog.ploeh.dk/2024/06/17/a-failed-attempt-at-priority-collection-with-inheritance2024-06-17T08:04:00+00:00Mark Seemann
<div id="post">
<p>
<em>An instructive dead end.</em>
</p>
<p>
This article is part of <a href="/2024/06/12/simpler-encapsulation-with-immutability">a short series on encapsulation and immutability</a>. As the introductory article claims, object mutation makes it difficult to maintain invariants. In order to demonstrate the problem, I deliberately set out to do it wrong, and report on the result.
</p>
<p>
In subsequent articles in this series I will then show one way you can maintain the invariants in the face of mutation, as well as how much easier everything becomes if you choose an immutable design.
</p>
<p>
For now, however, I'll pretend to be naive and see how far I can get with that.
</p>
<p>
In the first article, I described the example problem in more details, but in short, the exercise is to develop a class that holds a collection of prioritized items, with the invariant that the priorities must always sum to 100. It should be impossible to leave the object in a state where that's not true. It's quite an illuminating exercise, so if you have the time, you should try it for yourself before reading on.
</p>
<h3 id="e5060f80472a46a7b5aac3061558f993">
Initialization <a href="#e5060f80472a46a7b5aac3061558f993">#</a>
</h3>
<p>
In object-oriented design it's common to inherit from a base class. Since I'll try to implement a collection of prioritized items, it seems natural to inherit from <a href="https://learn.microsoft.com/dotnet/api/system.collections.objectmodel.collection-1">Collection<T></a>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>> : <span style="color:#2b91af;">Collection</span><<span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>>></pre>
</p>
<p>
Of course, I also had to define <code><span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>></code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">Prioritized</span>(<span style="color:#2b91af;">T</span> <span style="font-weight:bold;color:#1f377f;">item</span>, <span style="color:blue;">byte</span> <span style="font-weight:bold;color:#1f377f;">priority</span>)
{
Item = <span style="font-weight:bold;color:#1f377f;">item</span>;
Priority = <span style="font-weight:bold;color:#1f377f;">priority</span>;
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">T</span> Item { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">byte</span> Priority { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
}</pre>
</p>
<p>
Since <code><span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>></code> is generic, it can be used to prioritize any kind of object. In the tests I wrote, however, I exclusively used strings.
</p>
<p>
A priority is a number between 0 and 100, so I chose to represent that with a <code>byte</code>. Not that this strongly protects invariants, because values can still exceed 100, but on the other hand, there's no reason to use a 32-bit integer to model a number between 0 and 100.
</p>
<p>
Now that I write this text, I realize that I could have added a Guard Clause to the <code><span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>></code> constructor to enforce that precondition, but as you can tell, I didn't think of doing that. This omission, however, doesn't change the conclusion, because the problems that we'll run into stems from another source.
</p>
<p>
In any case, just inheriting from <code><span style="color:#2b91af;">Collection</span><<span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>>></code> isn't enough to guarantee the invariant that the sum of priorities must be 100. An invariant must always hold, even for a newly initialized object. Thus, we need something like this ensure that this is the case:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>> : <span style="color:#2b91af;">Collection</span><<span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>>>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">PriorityCollection</span>(<span style="color:blue;">params</span> <span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>>[] <span style="font-weight:bold;color:#1f377f;">priorities</span>)
: <span style="color:blue;">base</span>(<span style="font-weight:bold;color:#1f377f;">priorities</span>)
{
<span style="font-weight:bold;color:#74531f;">AssertSumIsOneHundred</span>();
}
<span style="color:blue;">private</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">AssertSumIsOneHundred</span>()
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="color:blue;">this</span>.<span style="font-weight:bold;color:#74531f;">Sum</span>(<span style="font-weight:bold;color:#1f377f;">p</span> => <span style="font-weight:bold;color:#1f377f;">p</span>.Priority) != 100)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">InvalidOperationException</span>(
<span style="color:#a31515;">"The sum of all priorities must be 100."</span>);
}
}</pre>
</p>
<p>
So far, there's no real need to have a separate <code>AssertSumIsOneHundred</code> helper method; I could have kept that check in the constructor, and that would have been simpler. I did, however, anticipate that I'd need the helper method in other parts of the code base. As it turned out, I did, but not without having to change it.
</p>
<h3 id="1326a8ef64124d138a084c4511f3899a">
Protecting overrides <a href="#1326a8ef64124d138a084c4511f3899a">#</a>
</h3>
<p>
The <code>Collection<T></code> base class offers normal collection methods like <a href="https://learn.microsoft.com/dotnet/api/system.collections.objectmodel.collection-1.add">Add</a>, <a href="https://learn.microsoft.com/dotnet/api/system.collections.objectmodel.collection-1.insert">Insert</a>, <a href="https://learn.microsoft.com/dotnet/api/system.collections.objectmodel.collection-1.remove">Remove</a> and so on. The default implementation allows client code to make arbitrary changes to the collection, including <a href="https://learn.microsoft.com/dotnet/api/system.collections.objectmodel.collection-1.clear">clearing it</a>. The <code><span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>></code> class can't allow that, because such edits could easily violate the invariants.
</p>
<p>
<code>Collection<T></code> is explicitly designed to be a base class, so it offers various <code>virtual</code> methods that inheritors can override to change the behaviour. In this case, this is necessary.
</p>
<p>
As it turned out, I quickly realized that I had to change my assertion helper method to check the invariant in various cases:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> <span style="color:blue;">void</span> <span style="color:#74531f;">AssertSumIsOneHundred</span>(<span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>>> <span style="font-weight:bold;color:#1f377f;">priorities</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">priorities</span>.<span style="font-weight:bold;color:#74531f;">Sum</span>(<span style="font-weight:bold;color:#1f377f;">p</span> => <span style="font-weight:bold;color:#1f377f;">p</span>.Priority) != 100)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">InvalidOperationException</span>(
<span style="color:#a31515;">"The sum of all priorities must be 100."</span>);
}</pre>
</p>
<p>
By taking the sequence of <code>priorities</code> as an input argument, this enables me to simulate what would happen if I make a change to the actual collection, for example when adding an item to the collection:
</p>
<p>
<pre><span style="color:blue;">protected</span> <span style="color:blue;">override</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">InsertItem</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">index</span>, <span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">item</span>)
{
<span style="color:#74531f;">AssertSumIsOneHundred</span>(<span style="color:blue;">this</span>.<span style="font-weight:bold;color:#74531f;">Append</span>(<span style="font-weight:bold;color:#1f377f;">item</span>));
<span style="color:blue;">base</span>.<span style="font-weight:bold;color:#74531f;">InsertItem</span>(<span style="font-weight:bold;color:#1f377f;">index</span>, <span style="font-weight:bold;color:#1f377f;">item</span>);
}</pre>
</p>
<p>
By using <a href="https://learn.microsoft.com/dotnet/api/system.linq.enumerable.append">Append</a>, the <code>InsertItem</code> method creates a sequence of values that simulates what the collection would look like if we add the candidate <code>item</code>. The <code>Append</code> function returns a new collection, so this operation doesn't change the actual <code><span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>></code>. This only happens if we get past the assertion and call <code>InsertItem</code>.
</p>
<p>
Likewise, I can protect the invariant in the other overrides:
</p>
<p>
<pre><span style="color:blue;">protected</span> <span style="color:blue;">override</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RemoveItem</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">index</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">l</span> = <span style="color:blue;">this</span>.ToList();
<span style="font-weight:bold;color:#1f377f;">l</span>.RemoveAt(<span style="font-weight:bold;color:#1f377f;">index</span>);
<span style="color:#74531f;">AssertSumIsOneHundred</span>(<span style="font-weight:bold;color:#1f377f;">l</span>);
<span style="color:blue;">base</span>.<span style="font-weight:bold;color:#74531f;">RemoveItem</span>(<span style="font-weight:bold;color:#1f377f;">index</span>);
}
<span style="color:blue;">protected</span> <span style="color:blue;">override</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">SetItem</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">index</span>, Prioritized<<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">item</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">l</span> = <span style="color:blue;">this</span>.ToList();
<span style="font-weight:bold;color:#1f377f;">l</span>[<span style="font-weight:bold;color:#1f377f;">index</span>] = <span style="font-weight:bold;color:#1f377f;">item</span>;
<span style="color:#74531f;">AssertSumIsOneHundred</span>(<span style="font-weight:bold;color:#1f377f;">l</span>);
<span style="color:blue;">base</span>.<span style="font-weight:bold;color:#74531f;">SetItem</span>(<span style="font-weight:bold;color:#1f377f;">index</span>, <span style="font-weight:bold;color:#1f377f;">item</span>);
}</pre>
</p>
<p>
I can even use it in the implementation of <code>ClearItems</code>, although that may seem a tad redundant:
</p>
<p>
<pre><span style="color:blue;">protected</span> <span style="color:blue;">override</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">ClearItems</span>()
{
<span style="color:#74531f;">AssertSumIsOneHundred</span>([]);
}</pre>
</p>
<p>
I could also just have thrown an exception directly from this method, since it's never okay to clear the collection. This would violate the invariant, because the sum of an empty collection of priorities is zero.
</p>
<p>
As far as I recall, the entire API of <code>Collection<T></code> is (transitively) based on those four <code>virtual</code> methods, so now that I've protected the invariant in all four, the <code><span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>></code> class maintains the invariant, right?
</p>
<p>
Not yet. See if you can spot the problem.
</p>
<p>
There are, in fact, at least two remaining problems. One that we can recover from, and one that is insurmountable with this design. I'll get back to the serious problem later, but see if you can spot it already.
</p>
<h3 id="0179acb780534b558c947b35c7f9c137">
Leaf mutation <a href="#0179acb780534b558c947b35c7f9c137">#</a>
</h3>
<p>
In the <a href="/2024/06/12/simpler-encapsulation-with-immutability">introductory article</a> I wrote:
</p>
<blockquote>
<p>
"If the mutation happens on a leaf node in an object graph, the leaf may have to notify its parent, so that the parent can recheck the invariants."
</p>
</blockquote>
<p>
I realize that this may sound abstract, but the current code presents a simple example. What happens if you change the <code>Priority</code> of an item after you've initialized the collection?
</p>
<p>
Consider the following example. For various reasons, I wrote the examples (that is, the unit tests) for this exercise in <a href="https://fsharp.org/">F#</a>, but even if you're not an F# developer, you can probably understand what's going on. First, we create a <code><span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">string</span>></code> object and use it to initialize a <code><span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">string</span>></code> object <a href="/2020/11/30/name-by-role">named sut</a>:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="font-weight:bold;color:#1f377f;">item</span> = <span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">string</span>> (<span style="color:#a31515;">"foo"</span>, 40uy)
<span style="color:blue;">let</span> <span style="color:#1f377f;">sut</span> = <span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">string</span>> (<span style="font-weight:bold;color:#1f377f;">item</span>, <span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">string</span>> (<span style="color:#a31515;">"bar"</span>, 60uy))</pre>
</p>
<p>
The <code>item</code> has a priority of <code>40</code> (the <code>uy</code> suffix is <a href="https://learn.microsoft.com/dotnet/fsharp/language-reference/literals">the F# way of stating that the literal is a <code>byte</code></a>), and the other unnamed value has a priority of <code>60</code>, so all is good so far; the sum is 100.
</p>
<p>
Since, however, <code>item</code> is a mutable object, we can now change its <code>Priority</code>:
</p>
<p>
<pre><span style="font-weight:bold;color:#1f377f;">item</span>.Priority <span style="color:blue;"><-</span> 50uy</pre>
</p>
<p>
This changes <code>item.Priority</code> to 50, but since none of the four <code>virtual</code> base class methods of <code>Collection<T></code> are involved, the <code>sut</code> never notices, the assertion never runs, and the object is now in an invalid state.
</p>
<p>
That's what I meant when I discussed mutations in leaf nodes. You can think of a collection as a rather flat and boring <a href="https://en.wikipedia.org/wiki/Tree_(data_structure)">tree</a>. The collection object itself is the root, and each of the items are leaves, and no further nesting is allowed.
</p>
<p>
When you edit a leaf, the root isn't automatically aware of such an event. You explicitly have to wire the object graph up so that this happens.
</p>
<h3 id="2edbce9f883448e89d118233bfceefef">
Event propagation <a href="#2edbce9f883448e89d118233bfceefef">#</a>
</h3>
<p>
One possible way to address this issue is to take advantage of .NET's event system. If you're reading along, but you normally write in another language, you can also use the <a href="https://en.wikipedia.org/wiki/Observer_pattern">Observer pattern</a>, or even <a href="https://reactivex.io/">ReactiveX</a>.
</p>
<p>
We need to have <code><span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>></code> raise events, and one option is to let it implement <a href="https://learn.microsoft.com/dotnet/api/system.componentmodel.inotifypropertychanging">INotifyPropertyChanging</a>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>> : <span style="color:#2b91af;">INotifyPropertyChanging</span></pre>
</p>
<p>
A <code><span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>></code> object can now raise its <code>PropertyChanging</code> event before accepting an edit:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">byte</span> Priority
{
<span style="color:blue;">get</span> => priority;
<span style="color:blue;">set</span>
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (PropertyChanging <span style="color:blue;">is</span> { })
PropertyChanging(
<span style="color:blue;">this</span>,
<span style="color:blue;">new</span> <span style="color:#2b91af;">PriorityChangingEventArgs</span>(<span style="color:blue;">value</span>));
priority = <span style="color:blue;">value</span>;
}
}</pre>
</p>
<p>
where <code>PriorityChangingEventArgs</code> is a little helper class that carries the proposed <code>value</code> around:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">PriorityChangingEventArgs</span>(<span style="color:blue;">byte</span> <span style="font-weight:bold;color:#1f377f;">proposal</span>)
: <span style="color:#2b91af;">PropertyChangingEventArgs</span>(<span style="color:blue;">nameof</span>(Priority))
{
<span style="color:blue;">public</span> <span style="color:blue;">byte</span> Proposal { <span style="color:blue;">get</span>; } = <span style="font-weight:bold;color:#1f377f;">proposal</span>;
}</pre>
</p>
<p>
A <code><span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>></code> object can now subscribe to that event on each of the values it keeps track of, so that it can protect the invariant against leaf node mutations.
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Priority_PropertyChanging</span>(<span style="color:blue;">object</span>? <span style="font-weight:bold;color:#1f377f;">sender</span>, <span style="color:#2b91af;">PropertyChangingEventArgs</span> <span style="font-weight:bold;color:#1f377f;">e</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="font-weight:bold;color:#1f377f;">sender</span> <span style="color:blue;">is</span> <span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>> <span style="font-weight:bold;color:#1f377f;">p</span> &&
<span style="font-weight:bold;color:#1f377f;">e</span> <span style="color:blue;">is</span> <span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>>.<span style="color:#2b91af;">PriorityChangingEventArgs</span> <span style="font-weight:bold;color:#1f377f;">pcea</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">l</span> = <span style="color:blue;">this</span>.<span style="font-weight:bold;color:#74531f;">ToList</span>();
<span style="font-weight:bold;color:#1f377f;">l</span>[<span style="font-weight:bold;color:#1f377f;">l</span>.<span style="font-weight:bold;color:#74531f;">IndexOf</span>(<span style="font-weight:bold;color:#1f377f;">p</span>)] = <span style="color:blue;">new</span> <span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>>(<span style="font-weight:bold;color:#1f377f;">p</span>.Item, <span style="font-weight:bold;color:#1f377f;">pcea</span>.Proposal);
<span style="color:#74531f;">AssertSumIsOneHundred</span>(<span style="font-weight:bold;color:#1f377f;">l</span>);
}
}</pre>
</p>
<p>
Such a solution comes with its own built-in complexity, because the <code><span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>></code> class must be careful to subscribe to the <code>PropertyChanging</code> event in various different places. A new <code><span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>></code> object may be added to the collection during initialization, or via the <code>InsertItem</code> or <code>SetItem</code> methods. Furthermore, the collection should make sure to unsubscribe from the event if an item is removed from the collection.
</p>
<p>
To be honest, I didn't bother to implement these extra checks, because the point is moot anyway.
</p>
<h3 id="bebd0dc43a03466e9709a551bbd563ba">
Fatal flaw <a href="#bebd0dc43a03466e9709a551bbd563ba">#</a>
</h3>
<p>
The design shown here comes with a fatal flaw. Can you tell what it is?
</p>
<p>
Since the invariant is that the priorities must always sum to exactly 100, it's impossible to add, remove, or change any items after initialization.
</p>
<p>
Or, rather, you can add new <code><span style="color:#2b91af;">Prioritized</span><<span style="color:#2b91af;">T</span>></code> objects as long as their <code>Priority</code> is 0. Any other value breaks the invariant.
</p>
<p>
Likewise, the only item you can remove is one with a <code>Priority</code> of 0. Again, if you remove an item with any other <code>Priority</code>, you'd be violating the invariant.
</p>
<p>
A similar situation arises with editing an existing item. While you can change the <code>Priority</code> of an item, you can only 'change' it to the same value. So you can change 0 to 0, 42 to 42, or 100 to 100, but that's it.
</p>
<p>
<em>But</em>, I can hear you say, <em>I'll only change 60 to 40 because I intend to add a new item with a 20 priority! In the end, the sum will be 100!</em>
</p>
<p>
Yes, but this design doesn't know that, and you have no way of telling it.
</p>
<p>
While we may be able to rectify the situation, I consider this design so compromised that I think it better to start afresh with this realization. Thus, I'll abandon this version of <code><span style="color:#2b91af;">PriorityCollection</span><<span style="color:#2b91af;">T</span>></code> in favour of a fresh start in the next article.
</p>
<h3 id="6f81f5d6efb5476b9072f9d8e5b01031">
Conclusion <a href="#6f81f5d6efb5476b9072f9d8e5b01031">#</a>
</h3>
<p>
While I've titled this article "A failed attempt", the actual purpose was to demonstrate how 'aggregate' requirements make it difficult to maintain class invariants.
</p>
<p>
I've seen many code bases with poor encapsulation. As far as I can tell, a major reason for that is that the usual 'small-scale' object-oriented design techniques like Guard Clauses fall short when an invariant involves the interplay of multiple objects. And in real business logic, that's the rule rather than the exception.
</p>
<p>
Not all is lost, however. In the next article, I'll develop an alternative object-oriented solution to the priority collection problem.
</p>
<p>
<strong>Next:</strong> <a href="/2024/06/24/a-mutable-priority-collection">A mutable priority collection</a>.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="b86cf94255c34db29d0a6275787f18d8">
<div class="comment-author">Daniel Frost <a href="#b86cf94255c34db29d0a6275787f18d8">#</a></div>
<div class="comment-content">
<p>
2 things.
</p>
<p>
I had a difficult time getting this to work with as a mutable type and the only two things I could come with (i spent some hours on it, it was in fact hard!) was
<br><br>
1. To throw an exception when the items in the collection didn't sum up to the budget. That violates the variant because you can add and remove items all you want.
<br>
2. Another try, which I didn't finish, is to add some kind of result-object that could tell about the validity of the collection and not expose the collection items before the result is valid.
I haven't tried this and it doesn't resemble a collection but it could perhaps be a way to go.
<br>
I am also leaning towards a wrapper around the item type, making it immutable, so the items cannot change afterwards. Cheating ?
<br>
I tried with the events approach but it is as you put yourself not a very friendly type you end up with.
</p>
</div>
<div class="comment-date">2024-06-18 11:54 UTC</div>
</div>
<div class="comment" id="e9143a083d5449448e3bd69bfb8fde85">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#e9143a083d5449448e3bd69bfb8fde85">#</a></div>
<div class="comment-content">
<p>
Daniel, thank you for writing. You'll be interested in the next articles in the series, then.
</p>
</div>
<div class="comment-date">2024-06-18 13:55 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Simpler encapsulation with immutabilityhttps://blog.ploeh.dk/2024/06/12/simpler-encapsulation-with-immutability2024-06-12T15:33:00+00:00Mark Seemann
<div id="post">
<p>
<em>A worked example.</em>
</p>
<p>
I've noticed that many software organizations struggle with <a href="/encapsulation-and-solid">encapsulation</a> with 'bigger' problems. It may be understandable and easily applicable to <a href="/2022/08/22/can-types-replace-validation">define a NaturalNumber type</a> or ensure that a minimum value is less than a maximum value, and so on. How do you, however, guarantee invariants once the scope of the problem becomes bigger and more complex?
</p>
<p>
In this series of articles, I'll attempt to illustrate how and why this worthy design goal seems elusive, and what you can do to achieve it.
</p>
<h3 id="ff18a3c70d46435f9c301f20ce6bb830">
Contracts <a href="#ff18a3c70d46435f9c301f20ce6bb830">#</a>
</h3>
<p>
As usual, when I discuss <em>encapsulation</em>, I first need to establish what I mean by the term. It is, after all, one of the most misunderstood concepts in software development. As regular readers will know, I follow the lead of <a href="/ref/oosc">Object-Oriented Software Construction</a>. In that perspective, encapsulation is the appropriate modelling and application of <em>preconditions</em>, <em>invariants</em>, and <em>postconditions</em>.
</p>
<p>
Particularly when it comes to invariants, things seem to fall apart as the problem being modelled grows in complexity. Teams eventually give up guaranteeing any invariants, leaving client developers with no recourse but <a href="/2013/07/08/defensive-coding">defensive coding</a>, which again leads to code duplication, bugs, and maintenance problems.
</p>
<p>
If you need a reminder, an <em>invariant</em> is an assertion about an object that is always true. The more invariants an object has, the better guarantees it gives, and the more you can trust it. The more you can trust it, the less defensive coding you have to write. You don't have to check if return values are null, strings empty, numbers negative, collections empty, or so on.
</p>
<p>
<img src="/content/binary/contract-pre-post-invariant.png" alt="The three sets of preconditions, postconditions, and invariants, embedded in their common superset labeled contract.">
</p>
<p>
All together, I usually denote the collection of invariants, pre-, and postconditions as a type's <em>contract</em>.
</p>
<p>
For a simple example like modelling a natural number, or <a href="/2024/01/01/variations-of-the-range-kata">a range</a>, or a user name, most people are able to produce sensible and coherent designs. Once, however, the problem becomes more complex, and the invariants involve multiple interacting values, maintaining the contract becomes harder.
</p>
<h3 id="8d70ef6ac6054c87b920c972cf26b3a5">
Immutability to the rescue <a href="#8d70ef6ac6054c87b920c972cf26b3a5">#</a>
</h3>
<p>
I'm not going to bury the lede any longer. It strikes me that <em>mutation</em> is a major source of complexity. It's not that hard to check a set of conditions when you create a value (or object or record). What makes it hard to maintain invariants is when objects are allowed to change. This implies that for every possible change to the object, it needs to examine its current state in order to decide whether or not it should allow the operation.
</p>
<p>
If the mutation happens on a leaf node in an object graph, the leaf may have to notify its parent, so that the parent can recheck the invariants. If the <a href="https://en.wikipedia.org/wiki/Directed_graph">graph</a> has cycles it becomes more complicated still, and if you want to make the problem truly formidable, try making the object thread-safe.
</p>
<p>
Making the object immutable makes most of these problems go away. You don't have to worry about thread-safety, because immutable values are automatically thread-safe; there's no state for any thread to change.
</p>
<p>
Even better, though, is that an immutable object's contract is smaller and simpler. It still has preconditions, because there are rules that govern what has to be true before you can create such an object. Furthermore, there may also be rules that stipulate what must be true before you can call a method on it.
</p>
<p>
Likewise, postconditions are still relevant. If you call a method on the object, it may give you guarantees about what it returns.
</p>
<p>
There are, however, no independent invariants.
</p>
<p>
<img src="/content/binary/contract-pre-post.png" alt="The two sets of preconditions and postconditions, embedded in their common superset labeled contract.">
</p>
<p>
Or rather, the invariants for an immutable object entirely coincide with its preconditions. If it was valid at creation, it remains valid.
</p>
<h3 id="d281ea9242574c65bf85399178b2ce28">
Priority collection <a href="#d281ea9242574c65bf85399178b2ce28">#</a>
</h3>
<p>
As promised, I'll work through a problem to demonstrate what I mean. I'll first showcase how mutation makes the problem hard, and then how trivial it becomes with an immutable design.
</p>
<p>
The problem is this: Design and implement a class (or just a <em>data structure</em> if you don't want to do Object-Oriented programming) that models a priority list (not a <a href="https://en.wikipedia.org/wiki/Priority_queue">Priority Queue</a>) as you sometimes run into in surveys. You know, one of these survey questions that asks you to distribute 100 points on various different options:
</p>
<ul>
<li>Option F: 30%</li>
<li>Option A: 25%</li>
<li>Option C: 25%</li>
<li>Option E: 20%</li>
<li>Option B: 0%</li>
<li>Option D: 0%</li>
</ul>
<p>
If you have the time, I suggest that you <a href="/2020/01/13/on-doing-katas">treat this problem as a kata</a>. Try to do the exercise before reading the next articles in this series. You can assume the following, which is what I did.
</p>
<ul>
<li>The budget is 100. (You could make it configurable, but the problem is gnarly enough even with a hard-coded value.)</li>
<li>You don't need to include items with priority value 0, but you should allow it.</li>
<li>The sum of priorities must be exactly 100. This is the invariant.</li>
</ul>
<p>
The difficult part is that last invariant. Let me stress this requirement: At any time, the object should be in a consistent state; i.e. at any time should the sum of priorities be exactly 100. Not 101 or 99, but 100. Good luck with that.
</p>
<p>
The object should also be valid at initialization.
</p>
<p>
Of course, having read this far, you understand that all you have to do is to make the object immutable, but just for the sake of argument, try designing a mutable object with this invariant. Once you've tried your hand with that, read on.
</p>
<h3 id="e891a12040df4d81957d50b9110a6957">
Attempts <a href="#e891a12040df4d81957d50b9110a6957">#</a>
</h3>
<p>
There's educational value going through even failed attempts. When I thought of this example, I fairly quickly outlined in my head one approach that was unlikely to ever work, one that could work, and the nice immutable solution that trivially works.
</p>
<p>
I'll cover each in turn:
</p>
<ul>
<li><a href="/2024/06/17/a-failed-attempt-at-priority-collection-with-inheritance">A failed attempt at priority collection with inheritance</a></li>
<li><a href="/2024/06/24/a-mutable-priority-collection">A mutable priority collection</a></li>
<li><a href="/2024/07/01/an-immutable-priority-collection">An immutable priority collection</a></li>
</ul>
<p>
It's surprising how hard even a simple exercise like this one turns out to be, if you try to do it the object-oriented way.
</p>
<p>
In reality, business rules are much more involved than what's on display here. For only a taste of how bad it might get, read <a href="https://buttondown.email/hillelwayne/archive/making-illegal-states-unrepresentable/">Hillel Wayne's suggestions regarding a similar kind of problem</a>.
</p>
<h3 id="01740af0c0cc4afcac2cb2361689e209">
Conclusion <a href="#01740af0c0cc4afcac2cb2361689e209">#</a>
</h3>
<p>
If you've lived all your programming life with mutation as an ever-present possibility, you may not realize how much easier immutability makes everything. This includes invariants.
</p>
<p>
When you have immutable data, object graphs tend to be simpler. You can't easily define cyclic graphs (although <a href="https://www.haskell.org/">Haskell</a>, due to its laziness, surprisingly does enable this), and invariants essentially coincide with preconditions.
</p>
<p>
In the following articles, I'll show how mutability makes even simple invariants difficult to implement, and how immutability easily addresses the issue.
</p>
<p>
<strong>Next:</strong> <a href="/2024/06/17/a-failed-attempt-at-priority-collection-with-inheritance">A failed attempt at priority collection with inheritance</a>.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="667d3faaebff4009beefcd15612280ac">
<div class="comment-author">Marken Foo <a href="#667d3faaebff4009beefcd15612280ac">#</a></div>
<div class="comment-content">
<p>
I've been enjoying going through your articles in the past couple months, and I really like the very pedagogic treatment of functional programming and adjacent topics.
</p>
<p>
The kata here is an interesting one, but I don't think I'd link it with the concept of immutability/mutability. My immediate thought was a naïve struct that can represent illegal values and whose validity is managed through functions containing some tricky logic, but that didn't seem promising whether it was done immutably or not.
</p>
<p>
Instead, the phrase "distribute 100 points" triggered an association with the <a href="https://en.wikipedia.org/wiki/Stars_and_bars_(combinatorics)">stars and bars</a> method for similar problems. The idea is that we have N=100 points in a row, and inserting dividers to break it into (numOptions) groups. Concretely, our data structure is (dividers: int array), which is a sorted array of length (numOptions + 1) where the first element is 0 and the last element is N=100. The priorities are then exactly the differences between adjacent elements of the array. The example in the kata (A=25, B=0, C=25, D=0, E=20, F=30) is then represented by the array [| 0; 25; 25; 50; 50; 70; 100|].
</p>
<p>
This solution seems to respect the invariant, has a configurable budget, can work with other numerical types, and works well whether immutable or not (if mutable, just ensure the array remains sorted, has min 0, and max N). The invariant is encoded in the representation of the data, which seems to me to be the more relevant point than mutability.
</p>
<p>
And a somewhat disjoint thought, the kata reminded me of a WinForms TableLayoutPanel (or MS Word table) whose column widths all must fit within the container's width...
</p>
</div>
<div class="comment-date">2024-06-13 13:55 UTC</div>
</div>
<div class="comment" id="3b2e1954394849c4970a1ab30f692192">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#3b2e1954394849c4970a1ab30f692192">#</a></div>
<div class="comment-content">
<p>
Thank you for writing. The danger of writing these article series is always that as soon as I've published the first one, someone comes by and puts a big hole through my premise. Well, I write this blog for a couple of independent reasons, and one of them is to learn.
</p>
<p>
And you just taught me something. Thank you. That is, at least, an elegant implementation.
</p>
<p>
How would you design the API encapsulating that implementation?
</p>
<p>
Clearly, arrays already have APIs, so you could obviously define an array-like API that performs the appropriate boundary checks. That, however, doesn't seem to model the given problem. Rather, it reveals the implementation, and forces a client developer to think in terms of the data structure, rather the problem (s)he has to solve.
</p>
<p>
Ideally, again channelling Bertrand Meyer, an object should present as an Abstract Data Structure (ADT) that doesn't require client developers to understand the implementation details. I'm curious what such an API would look like.
</p>
<p>
You've already surprised me once, and please do so once again. I'm always happy to learn something new, and that little stars-and-bars concept I've now added to my tool belt.
</p>
<p>
All that said, this article makes a more general claim, although its possible that the example it showcases is a tad too simple and naive to be a truly revealing one. The claim is that this kind of 'aggregate constraint' often causes so much trouble in the face of arbitrary state mutation that most programmers give up on encapsulation.
</p>
<p>
What happens if we instead expand the requirements a bit? Let's say that we will require the user to spend at least 90% of the budget, but no more than 100%. Also, there must be at least three prioritized items, and no individual item can receive more than a third of the budget.
</p>
</div>
<div class="comment-date">2024-06-14 14:22 UTC</div>
</div>
<div class="comment" id="7c5c157868fe46f3aae343e5e145a5eb">
<div class="comment-author">Marken Foo <a href="#7c5c157868fe46f3aae343e5e145a5eb">#</a></div>
<div class="comment-content">
<p>
Thank you for the response. Here's my thoughts - it's a bit of a wall of text, I might be wrong in any of the following, and the conclusion may be disappointing. When you ask how I'd design the API, I'd say it depends on how the priority list is going to be used. The implementation trick with stars and bars might just be a one-off trick that happens to work here, but it doesn't (shouldn't) affect the contract with the outside world.
</p>
<p>
If we're considering survey questions or budgets, the interest is in the priority values. So I think the problem then <em>is</em> about a list of priorities with an aggregate constraint. So I would define... an array-like API that performs the appropriate boundary checks (wow), but for the item priorities. My approach would be to go for "private data, public functions", and rely on a legal starting state and preserving the legality through the public API. In pseudocode:
</p>
<pre>
type PriorityList = { budget: int; dividers: int list }
create :: numItems: int -> budget: int -> PriorityList
// Returns priorities.
getAll :: plist: PriorityList -> int list
get :: itemIdx: int -> plist: PriorityList -> int
// *Sets the priority for an item (taking the priority from other items, starting from the back).
set :: itemIdx: int -> priority: int -> plist: PriorityList -> PriorityList
// *Adds a new item to (the end of) the PriorityList (with priority zero).
addItem :: plist: PriorityList -> PriorityList
// *Removes an item from the PriorityList (and assigns its priority to the last item).
removeItem :: itemIdx: int -> plist PriorityList -> PriorityList
// Utility functions: see text
_toPriorities :: dividers: int list -> int list
_toDividers :: priorities: int list -> int list
</pre>
<p>
Crucially: since <code>set</code>, <code>addItem</code>, and <code>removeItem</code> must maintain the invariants, they must have "side effects" of altering other priorities. I think this is unavoidable here because we have aggregate/global constraints, rather than just elementwise/local constraints. (Is this why resizing rows and columns in WinForms tableLayoutPanels and MS Word tables is so tedious?) This will manifest in the API - the client needs to know what "side effects" there are (suggested behaviour in parentheses in the pseudocode comments above). See <a href="https://gist.github.com/Marken-Foo/d1e1a32afa91790f84f151c429c042cd">my crude attempt at implementation</a>.
</p>
<p>
You may already see where this is going. If I accept that boundary checks are needed, then my secondary goal in encapsulation is to express the constraints as clearly as possible, and hopefully not spread the checking logic all over the code.
</p>
<p>
Whence the utility functions: it turned out to be useful to convert from a list of dividers to priorities, and vice versa. This is because the elementwise operations/invariants like the individual priority values are easier to express in terms of raw priorities, while the aggregate ones like the total budget are easier in terms of "dividers" (the <em>cumulative</em> priorities). There is a runtime cost to the conversion, but the code becomes clearer. This smells similar to feature envy...
</p>
<p>
So why not just have the underlying implementation hold a list of priorities in the first place?! Almost everything in the implementation needs translation back to that anyway. D'oh! I refactored myself back to the naïve approach. The original representation seemed elegant, but I couldn't find a way to manipulate it that clients would find intuitive and useful in the given problem.
</p>
<p>
But... if I approach the design from the angle "what advantages does the cumulative priority model offer?", I might come up with the following candidate API functions, which could be implemented cleanly in the "divider" space:
</p>
<pre>
// (same type, create, get, getAll, addItem as above)
// Removes the item and merges its priority with the item before it.
merge :: ItemIdx: int -> PriorityList
// Sets the priority of an item to zero and gives it to the item after it.
collapse :: itemIdx: int -> PriorityList
// Swaps the priority of an item and the one after it (e.g. to "bubble" a priority value forwards or backwards, although this is easier in the "priority" space)
swap :: itemIdx: int -> PriorityList
// Sets (alternative: adds to) the priority of an item, taking the priority from the items after it in sequence ("consuming" them in the forward direction)
consume :: itemIdx: int -> priority: int -> PriorityList
// Splits the item into 2 smaller items each with half the priority (could be generalised to n items)
split :: ItemIdx: int -> PriorityList
// etc.
</pre>
<p>
And this seems like a more fitting API for that table column width example I keep bringing up. What's interesting to me is that despite the data structures of the budget/survey question and the table column widths being isomorphic, we can come up with rather different APIs depending on which view we consider. I think this is my main takeaway from this exploration, actually.
</p>
<p>
As for the additional requirements, individually each constraint is easy to handle, but their composition is tricky. If it's easy to transform an illegal PriorityList to make it respect the invariants, we can just apply the transformation after every create/set/add/remove. Something like:
</p>
<pre>
type PriorityList =
{ budget: int
dividers: int list
budgetCondition: int -> bool
maxPriority: int
minChoices: int }
let _enforceBudget (predicate: int -> bool) (defaultBudget: int) (dividers: int list) : int list =
if (List.last dividers |> predicate) then
dividers
else
List.take (dividers.Length - 1) dividers @ [ defaultBudget ]
let _enforceMaxPriority (maxPriority: int) (dividers: int list) : int list =
_toPriorities dividers |> List.map (fun p -> min p maxPriority) |> _toDividers
</pre>
<p>
The problem is those transforms may not preserve each others' invariant. Life would be easy if we could write a single transform to preserve everything (I haven't found one - notice that the two above are operating on different int lists so it's tricky). Otherwise, we could write validations instead of transformations, then let create/set/add/remove fail by returning Option.None (explicitly fail) or the original list (silently fail). This comes at the cost of making the API less friendly.
</p>
<p>
Ultimately with this approach I can't see a way to make all illegal states unrepresentable without sprinkling ad-hoc checks everywhere in the code. The advantages of the "cumulative priorities" representation I can think of are (a) it makes the total budget invariant obvious, and (b) it maps nicely to a UI where you click and drag segments around. Since you might have gone down a different path in the series, I'm curious to see how that shapes up.
</p>
</div>
<div class="comment-date">2024-06-15 14:48 UTC</div>
</div>
<div class="comment" id="595bcdcf19e446c7a23531b93b6d5a1c">
<div class="comment-author">Aliaksei Saladukhin <a href="#595bcdcf19e446c7a23531b93b6d5a1c">#</a></div>
<div class="comment-content">
<p>
Hello and thank you for your blog. It is really informative and provides great food for thought.
</p>
<p>
What if it will be impossible to compile and run program which would lead to illegal (list) state?
</p>
<p>
I've tried to implement priority collection in Rust, and what I've ended up with is a heterogenous priority list with compile-time priority validation.
Idea behind this implementation is simple: you declare recursive generic struct, which holds current element and tail (another list or unit type).
</p>
<pre>
struct PriorityList<const B: usize, const P: usize, H, T> {
head: H,
tail: T,
}
</pre>
<p>
If, for example, we need list of two Strings with budget 100, and 30/70 priority split, it will have the following type:
<code>PriorityList<100, 30, String, PriorityList<100, 70, String, ()>></code>
Note that information about list budget and current element priority is contained in generic arguments B and P respectively.
These are compile-time "variables", and will be replaced be their values in compiled program.
</p>
<p>
Since each element of such list is a list itself, and budget is the same for each element, all elements except the first are invalid priority lists.
So, in order to make it possible to create lists other than containing one element, or only one element with >0 priority, validity check should be targeted and deferred.
In order to target invariant validation on the first element of the list, I've included validation into list methods (except set_priority method).
Every time list method is called, compiler does recursive computation of priority sum, and compares it with list budget, giving compile-time error if there is mismatch.
Consider the following example, which will compile and run:
</p>
<pre>
let list = ListBuilder::new::<10, 10>("Hello");
let list = list.set_priority::<5>();
</pre>
<p>
Seems like invariants have been violated and sum of priorities is less than the budget.
But if we try to manipulate this list in any other way except to add element or change priority, program won't compile
</p>
<pre>
// Won't compile
let _ = list.pop();
// Won't compile
let list = list.push::<4>("Hi");
// Will do
let list = list.push::<5>("Hello there");
</pre>
<p>
This implementation may not be as practical as it could be due to verbose compilation error messages, but is a good showcase and exercise
I've also uploaded full source code at GitLab: https://gitlab.com/studiedlist/priority-collection
</p>
</div>
<div class="comment-date">2024-06-18 08:47 UTC</div>
</div>
<div class="comment" id="7f2ed4d386144b378cae2206e8269a6d">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#7f2ed4d386144b378cae2206e8269a6d">#</a></div>
<div class="comment-content">
<p>
Marken, thank you for writing. It's always interesting to learn new techniques, and, as I previously mentioned, the array-based implementation certainly seems to <a href="https://blog.janestreet.com/effective-ml-video/">make illegal states unrepresentable</a>. And then, as we'll see in the last (yet unpublished) article in this little series, if we also make the data structure immutable, we'll have a truly simple and easy-to-understand API to work with.
</p>
<p>
I've tried experimenting with the <a href="https://fsharp.org/">F#</a> script you linked, but I must admit that I'm having trouble understanding how to use it. You did write that it was a crude attempt, so I'm not complaining, but on the other hand, it doesn't work well as an example of good encapsulation. The following may seem as though I'm moving the goalpost, so apologies for that in advance.
</p>
<p>
Usually, when I consult development organizations about software architecture, the failure to maintain invariants is so fundamental that I usually have to start with that problem. That's the reason that this article series is so narrow-mindedly focused on contract, and seemingly not much else. We must not, though, lose sight of what ultimately motivates us to consider encapsulation beneficial. This is what I've tried to outline in <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>: That the human brain is ill-suited to keep all implementation details in mind at the same time. One way we may attempt to address this problem is to hide implementation details behind an API which, additionally, comes with some guarantees. Thus (and this is where you may, reasonably, accuse me of moving the goal post), not only should an object fulfil its contract, it should also be possible to interact with its API without understanding implementation details.
</p>
<p>
The API you propose seem to have problems, some of which may be rectifiable:
</p>
<ul>
<li>At a fundamental level, it's not really clear to me how to use the various functions in the script file.</li>
<li>The API doesn't keep track of <em>what</em> is being prioritized. This could probably be fixed.</li>
<li>It's not clear whether it's possible to transition from one arbitrary valid distribution to another arbitrary valid distribution.</li>
</ul>
<p>
I'll briefly expand on each.
</p>
<p>
As an example of the API being less that clear to me, I can't say that I understand what's going on here:
</p>
<p>
<pre>> create 1 100 |> set 1 50 |> addItem |> set 1 30;;
val it: PriorityList = { budget = 100
dividers = [0; 50; 100] }</pre>
</p>
<p>
As for what's being prioritized, you could probably mend that shortcoming by letting the array be an array of tuples.
</p>
<p>
The last part I'm not sure of, but you write:
</p>
<blockquote>
<p>
"Crucially: since <code>set</code>, <code>addItem</code>, and <code>removeItem</code> must maintain the invariants, they must have "side effects" of altering other priorities."
</p>
</blockquote>
<p>
As <a href="/2024/06/24/a-mutable-priority-collection">the most recent article in this series demonstrates</a>, this isn't an overall limitation imposed by the invariant, but rather by your chosen API design. Specifically, assuming that you initially have a <em>23, 31, 46</em> distribution, how do you transition to a <em>19, 29, 43, 7, 2</em> distribution?
</p>
</div>
<div class="comment-date">2024-06-27 6:42 UTC</div>
</div>
<div class="comment" id="388933ccbe5a4c71ba0b443a223e08ca">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#388933ccbe5a4c71ba0b443a223e08ca">#</a></div>
<div class="comment-content">
<p>
Aliaksei, thank you for writing. I've never programmed in Rust, so I didn't know it had that capability. At first I though it was dependent typing, but after reading up on it, it seems as though it's not quite that.
</p>
<p>
An exercise like the one in this article series is useful because it can help shed light on options and their various combinations of benefits and drawbacks. Thus, there are no entirely right or wrong solutions to such an exercise.
</p>
<p>
Since I don't know Rust, I can't easily distinguish what might be possible drawbacks here. I usually regard making illegal states unrepresentable as a benefit, but we must always be careful not to go too far in that direction. One thing is to reject invalid states, but can we still represent all valid states? What if priority distributions are run-time values?
</p>
</div>
<div class="comment-date">2024-06-28 7:21 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.You'll regret using natural keyshttps://blog.ploeh.dk/2024/06/03/youll-regret-using-natural-keys2024-06-03T19:46:00+00:00Mark Seemann
<div id="post">
<p>
<em>Beating another dead horse.</em>
</p>
<p>
Although I live in Copenhagen and mostly walk or ride my bicycle in order to get around town, I do own an old car for getting around the rest of the country. In Denmark, cars go through mandatory official inspection every other year, and I've been through a few of these in my life. A few years ago, the mechanic doing the inspection informed me that my car's <a href="https://en.wikipedia.org/wiki/Vehicle_identification_number">chassis number</a> was incorrect.
</p>
<p>
This did make me a bit nervous, because I'd bought the car used, and I was suddenly concerned that things weren't really as I thought. Had I unwittingly bought a stolen car?
</p>
<p>
But the mechanic just walked over to his computer in order to correct the error. That's when a different kind of unease hit me. When you've programmed for some decades, you learn to foresee various typical failure modes. Since a chassis number is an obvious candidate for a <a href="https://en.wikipedia.org/wiki/Natural_key">natural key</a>, I already predicted that changing the number would prove to be either impossible, or have all sorts of cascading effects, ultimately terminating in official records no longer recognizing that the car is mine.
</p>
<p>
As it turned out, though, whoever made that piece of software knew what they were doing, because the mechanic just changed the chassis number, and that was that. This is now five or six years ago, and I still own the same car, and I've never had any problems with the official ownership records.
</p>
<h3 id="8a38dcbfae1146f8868fe8a408ffe5d8">
Uniqueness <a href="#8a38dcbfae1146f8868fe8a408ffe5d8">#</a>
</h3>
<p>
The reason I related this story is that I'm currently following an undergraduate course in databases and information systems. Since this course is aimed at students with no real-world experience, it wisely moves forward in a pedagogical progression. In order to teach database keys, it starts with natural keys. From a didactic perspective, this makes sense, but the result, so far, is that the young people I work with now propose database designs with natural keys.
</p>
<p>
I'm not blaming anyone. You have to learn to crawl before you can walk.
</p>
<p>
Still, this situation made me reflect on the following question: <em>Are natural keys ever a good idea?</em>
</p>
<p>
Let's consider an example. For a little project we're doing, we've created a database of <a href="https://www.theworlds50best.com/">the World's 50 best restaurants</a>. My fellow students suggest a table design like this:
</p>
<p>
<pre><span style="color:blue;">CREATE</span> <span style="color:blue;">TABLE</span> Restaurants<span style="color:blue;"> </span><span style="color:gray;">(</span>
<span style="color:magenta;">year</span> <span style="color:blue;">TEXT</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL,</span>
<span style="color:magenta;">rank</span> <span style="color:blue;">TEXT</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL,</span>
restaurantName <span style="color:blue;">TEXT</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL,</span>
cityName <span style="color:blue;">TEXT</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL</span>
<span style="color:gray;">);</span></pre>
</p>
<p>
Granted, at this point, this table definition defines no key at all. I'm not complaining about that. After all, a month ago, the students probably hadn't seen a database table.
</p>
<p>
From following the course curriculum, it'd be natural, however, to define a key for the <code>Restaurants</code> table as the combination of <code>restaurantName</code>, <code>cityName</code>, and <code>year</code>. The assumption is that name and city uniquely identifies a restaurant.
</p>
<p>
In this particular example, this assumption may actually turn out to hold. So far. After all, the data set isn't that big, and it's important for restaurants in that league to have recognizable names. If I had to guess, I'd say that there's probably only one <a href="https://www.nobelhartundschmutzig.com/">Nobelhart & Schmutzig</a> in the world.
</p>
<p>
Still, a good software architect should challenge the underlying assumptions. Is name and city a <em>natural</em> key? It's easy to imagine that it's not. What if we expand the key to include the country as well? Okay, but what if we had a restaurant named <em>China Wok</em> in Springfield, USA? Hardly unique. Add the state, you say? Probably still not unique.
</p>
<h3 id="778bd217e62d4ffca25f487a39d34c1a">
Identity <a href="#778bd217e62d4ffca25f487a39d34c1a">#</a>
</h3>
<p>
Ensuring uniqueness is only the first of many problems with natural keys. You may quickly reach the conclusion that for a restaurant database, a <a href="https://en.wikipedia.org/wiki/Surrogate_key">synthetic key</a> is probably the best choice.
</p>
<p>
But what about 'natural' natural keys, so to speak? An example may be a car's chassis number. This is already an opaque number, and it probably originates from a database somewhere. Or how about a personal identification number? In Denmark we have the <a href="https://en.wikipedia.org/wiki/Personal_identification_number_(Denmark)">CPR number</a>, and I understand that the US <a href="https://en.wikipedia.org/wiki/Social_Security_number">Social Security Number</a> is vaguely analogous.
</p>
<p>
If you're designing a database that already includes such a personal identification number, you might be tempted to use it as a natural key. After all, it's already a key somewhere else, so it's guaranteed to be unique, right?
</p>
<p>
Yes, the number may uniquely identify a person, but the converse may not be true. A person may have more than one identification number. At least when time is a factor.
</p>
<p>
As an example, for technical-historical reasons, the Danish CPR number carries information (which keys shouldn't do), such as a person's date of birth and sex. Since 2014 a new law enables transsexual citizens to get a new CPR number that reflects their perceived gender. The consequence is that the same person may have more than one CPR number. Perhaps not more than one at the same time, but definitely two during a lifetime.
</p>
<p>
Even if existing keys are guaranteed to be unique, you can't assume that the uniqueness gives rise to a <a href="https://en.wikipedia.org/wiki/Bijection">bijection</a>. If you use an external unique key, you may lose track of the entities that you're trying to keep track of.
</p>
<p>
This is true not only for people, but cars, bicycles (which also have chassis numbers), network cards, etc.
</p>
<h3 id="93d43ca6c76a4c6c83e9542bb671f39c">
Clerical errors <a href="#93d43ca6c76a4c6c83e9542bb671f39c">#</a>
</h3>
<p>
Finally, even if you've found a natural key that is guaranteed to be unique <em>and</em> track the actual entity that you want to keep track of, there's a final argument against using an externally defined key in your system: Data-entry errors.
</p>
<p>
Take the story about my car's chassis number. The mechanic who spotted the discrepancy clearly interpreted it as a clerical error.
</p>
<p>
After a few decades of programming, I've learned that sooner or later, there <em>will</em> be errors in your data. Either it's a clerical error, or the end-user mistyped, or there was a data conversion error when importing from an external system. Or even data conversion errors within the <em>same</em> system, as it goes through upgrades and migrations.
</p>
<p>
Your system should be designed to allow corrections to data. This includes corrections of external keys, such as chassis numbers, government IDs, etc. This means that you can't use such keys as database keys in your own system.
</p>
<h3 id="ca9efd721698452f856b089fc3f69ad1">
Heuristic <a href="#ca9efd721698452f856b089fc3f69ad1">#</a>
</h3>
<p>
Many were the times, earlier in my career, when I decided to use a 'natural key' as a key in my own database. As far as I recall, I've regretted it every single time.
</p>
<p>
These days I follow a hard heuristic: Always use synthetic keys for database tables.
</p>
<h3 id="c174062e2ad14b6a9ff4056b1de80a0f">
Conclusion <a href="#c174062e2ad14b6a9ff4056b1de80a0f">#</a>
</h3>
<p>
Is it ever a good idea to use natural keys in a database design? My experience tells me that it's not. Ultimately, regardless of how certain you can be that the natural key is stable and correctly tracks the entity that it's supposed to keep track of, data errors will occur. This includes errors in those natural keys.
</p>
<p>
You should be able to correct such errors without losing track of the involved entities. You'll regret using natural keys. Use synthetic keys.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="63e21d2f538e4d67a30094e33ff507a7">
<div class="comment-author"><a href="http:/snape.me">James Snape</a> <a href="#63e21d2f538e4d67a30094e33ff507a7">#</a></div>
<div class="comment-content">
<p>
There are lots of different types of keys. I agree that using natural keys as physical primary keys is a bad idea but you really should be modelling your data logically with natural keys.
Thinking about uniqueness and identity is a part of your data design. Natural keys often end up as constraints, indexes and query plans.
When natural keys are not unique enough then you need to consider additional attributes in your design to ensure access to a specific record.
</p>
<p>
Considering natural keys during design can help elicit additional requirements and business rules. "Does a social security number uniquely identify a person? If not why?"
In the UK they recycle them so the natural key is a combination of national insurance number and birth year. You have to ask questions.
</p>
</div>
<div class="comment-date">2024-06-04 15:43 UTC</div>
</div>
<div class="comment" id="82ee5d39a4d34539899ddaf13d1336a9">
<div class="comment-author"><a href="https://thomascastiglione.com">Thomas Castiglione</a> <a href="#82ee5d39a4d34539899ddaf13d1336a9">#</a></div>
<div class="comment-content">
<img src="/content/binary/you-will-regret-this.png" border="0">
</div>
<div class="comment-date">2024-06-05 9:33 UTC</div>
</div>
<div class="comment" id="92ee5d39a4d34539899ddaf13d1336b7">
<div class="comment-author"><a href="https://processdecision.com">Nicholas Peterson</a> <a href="#92ee5d39a4d34539899ddaf13d1336b7">#</a></div>
<div class="comment-content">
<p>
I largely agree with James Snape, but wanted to throw in a few other thoughts on top.
Surrogates don't defend you from duplicate data, in fact they facilitate it, because the routine generating the surrogate key isn't influenced by any of the other data in the record.
The concept of being unable to correct a natural key is also odd, why can't you? Start a transaction, insert a new record with the correct key, update the related records to point to the new record, then delete the old record, done.
Want some crucial information about a related record but only have the surrogate to it? I guess you have to join it every time in order to get the columns the user actually wants to see.
A foreign key that uses a natural key often often prevents the join entirely, because it tells the user what they wanted to know.
</p>
<p>
I find the problem with natural keys usually comes from another source entirely.
Developers write code and don't tend to prefer using SQL.
They typically interact with databases through ORM libraries.
ORMs are complicated and rely on conventions to uniformly deal with data.
It's not uncommon for ORMs to dictate the structure of tables to some degree, or what datatypes to prefer.
It's usually easier in an ORM to have a single datatype for keys (BIGINT?) and use it uniformly across all the tables.
</p>
</div>
<div class="comment-date">2024-06-05 12:42 UTC</div>
</div>
<div class="comment" id="2960b65bbaec4db8ade70f551e3f5062">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#2960b65bbaec4db8ade70f551e3f5062">#</a></div>
<div class="comment-content">
<p>
James, Nicholas, thank you for writing. I realize that there are some unstated assumptions and implied concerns that I should have made more explicit. I certainly have no problem with adding constraints and other rules to model data. For the Danish CPR number, for example, while I wouldn't make it a primary key (for the reasons outlined in the article), I'd definitely put a <code>UNIQUE</code> constraint on it.
</p>
<p>
Another unspoken context that I had in mind is that systems often exist in a wider context where ACID guarantees fall apart. I suppose it's true that if you look at a database in isolation, you may be able to update a foreign key with the help of some cascading changes rippling through the database, but if you've ever shared the old key outside of the database, you now have orphaned data.
</p>
<p>
A simple example could be sending out an email with a link that embeds the old key. If you change the key after sending out the email, but before the user clicks, the link no longer works.
</p>
<p>
That's just a simple and easy-to-explain example. The more integration (particularly system-to-system integration) you have, the worse this kind of problem becomes. I briefly discussed the CPR number example with my doctor wife, and she immediately confirmed that this is a real problem in the Danish health sector, where many independent software systems need to exchange patient data.
</p>
<p>
You can probably work around such problems in various ways, but if you had avoided using natural keys, you wouldn't have had to change the key in the first place.
</p>
</div>
<div class="comment-date">2024-06-06 6:56 UTC</div>
</div>
<div class="comment" id="f7d04f04aade40d1b668c94a56c7c189">
<div class="comment-author"><a href="https://github.com/bantling">Greg Hall</a> <a href="#f7d04f04aade40d1b668c94a56c7c189">#</a></div>
<div class="comment-content">
<p>
I think it is best to have two separate generated keys for each row:
</p>
<ul>
<li>A key used only for relationships between tables. I like to call this relid, and make it serialised, so it
is just an increasing number. This key is the primary key and should never be exposed outside the database.
</li>
<li>A key used only outside the database as a unique reference to which row to update. I like to call this id,
and make it a uuid, since it is well accepted to uniquely identify rows by a uuid, and to expose them to
the outside world - many public APIs do this. Theoretically, the same uuid should never be generated twice,
so this key doesn't necessarily have to be declared as unique.
</li>
</ul>
<p>
The relid can be used in simple foreign keys, and in bridging/join tables - tables that contain primary keys of
multiple tables. Generally speaking, the relid is far more readable than a uuid - it is easier to hold in your head
a simple integer, which usually is not that large, than a 36 character sequence that looks similar to other 36
character sequences. UUIDs generally look like a jumble.
</p>
<p>
A relid can be 32-bits for tables you're confident will never need more than 2.1 billion rows, which really is
99.99% of all tables ever created by 99.99% of applications. If this turns out to be wrong, it is possible to
upgrade the relids to 64-bit for a given table. It's a bit of a pain, especially if there are lots of references to
it, but it can be done.
</p>
<p>
The relid doesn't always have to be a serialised value, and you don't always have to call the column relid. Since
the primary key is never exposed publicly, it doesn't matter if different column types or names are used for different
use cases. For example, code tables might use one of the codes as the primary key.
</p>
<p>
I don't think it makes sense to be religious on key usage; just like everything else, there are valid reasons for
periodically varying how they work. I'm sure somebody has a valid case where a single key is better than two.
I just think it generally makes sense to have a pair of internal and external keys for most cases.
</p>
</div>
<div class="comment-date">2024-06-07 3:31 UTC</div>
</div>
<div class="comment" id="7a1067a9e6fb4b6293777a1408518429">
<div class="comment-author"><a href="http://snape.me">James Snape</a> <a href="#7a1067a9e6fb4b6293777a1408518429">#</a></div>
<div class="comment-content">
<p>
The thing with databases keys is you really need to be precise on what you mean by a key. Any combination of attributes is a candidate key. There are also logical and physical representations of keys. For example, a SQL Server primary key is a physical record locator but logically a unique key constraint. Yes, these behave poorly when you use natural keys as the primary key for all the reasons you mention. They are a complete implementation detail. Users should never see these attributes though and you shouldn't share the values outside of your implementation. Sharing integer surrogate keys in urls is a classic issue allowing enumeration attacks on your data if not secured properly.
</p>
<p>
Foreign keys are another logical and physical dual use concept. In SQL Server a physical foreign key constrain must reference the primary key from a parent table but logically that doesn't need to happen for relational theory to work.
</p>
<p>
Alternate keys are combinations of attributes that identify a record (or many records); these are often the natural keys you use in your user interface and where clauses etc. Alternate keys are also how systems communicate. Take your CPR number example, you cannot exchange patient data unless both systems agree on a common key. This can't be an internally generated surrogate value.
</p>
<p>
Natural keys also serve another purpose in parent-child relationships. By sharing natural key attributes with a parent you can ensure a child is not accidentally moved to a new parent plus you can query a child table without needing to join to the parent table.
</p>
<p>There isn't a one-size-fits all when it comes to databases and keys. <a href="https://en.wikipedia.org/wiki/Joe_Celko">Joe Celko</a> has written extensively on the subject so maybe its better to read the following than my small commentary:
<ul>
<li><a href="https://www.informationweek.com/it-sectors/celko-on-sql-natural-artificial-and-surrogate-keys-explained">Celko on SQL: Natural, Artificial and Surrogate Keys Explained</a></li>
<li><a href="https://www.informationweek.com/data-management/celko-on-sql-identifiers-and-the-properties-of-relational-keys">Celko On SQL: Identifiers and the Properties of Relational Keys</a></li>
</ul>
</p>
</div>
<div class="comment-date">2024-06-07 09:57 UTC</div>
</div>
<div class="comment" id="3ef814a896f34b5485aeeea739766fa9">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#3ef814a896f34b5485aeeea739766fa9">#</a></div>
<div class="comment-content">
<p>
Greg, thank you for writing. I agree with everything you wrote, and I've been using that kind of design for... wow, at least a decade, it looks! <a href="/2014/08/11/cqs-versus-server-generated-ids">for a slightly different reason</a>. This kind of design seems, even if motivated by a different concern, congruent with what you describe.
</p>
<p>
Like you also imply, only a sith speaks in absolutes. The irony of the article is that I originally intended it to be more open-ended, in the sense that I was curious if there were genuinely good reasons to use natural keys. As I wrote, the article turned out more unconditional than I originally had in mind.
</p>
<p>
I am, in reality, quite ready to consider arguments to the contrary. But really, I was curious: <em>Is it ever a good idea to use natural keys as primary keys?</em> It sounds like a rhetorical question, but I don't mind if someone furnishes a counter-example.
</p>
<p>
As <a href="#92ee5d39a4d34539899ddaf13d1336b7">Nicholas Peterson intimated</a>, it's probably not a real problem if those keys never 'leave' the database. What I failed to make explicit in this article is that the problems I've consistently run into occur when a system has shared keys with external systems or users.
</p>
</div>
<div class="comment-date">2024-06-14 11:26 UTC</div>
</div>
<div class="comment" id="7f7d8205e5174c61b1b5e3d19482c0ab">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#7f7d8205e5174c61b1b5e3d19482c0ab">#</a></div>
<div class="comment-content">
<p>
James, thank you for writing. I think we're discussing issues at different levels of abstraction. This just underscores how difficult technical writing is. I should have made my context and assumptions more explicit. The error is mine.
</p>
<p>
Everything you write sounds correct to me. I <em>am</em> aware of both relational calculus and relational algebra, so I'm familiar with the claims you make, and I don't dispute them.
</p>
<p>
My focus is rather on systems architecture. Even an 'internal' system may actually be composed from multiple independent systems, and my concern is that using natural keys to exchange data between such systems ultimately turns out to make things more difficult than they could have been. The only statement of yours with which I think I disagree is that you can't exchange data between systems unless you use natural keys. You definitely can, although you need to appoint one of the systems to be a 'master key issuer'.
</p>
<p>
In practice, <a href="#f7d04f04aade40d1b668c94a56c7c189">like Greg Hall</a>, I'd prefer using GUIDs for that purpose, rather than sequential numbers. That also addresses the concern about enumeration attacks. (Somewhat tangentially, I also <a href="/2020/10/26/fit-urls">recommend signing URLs with a private key</a> in order to prevent reverse-engineering, or 'URL-hacking'.)
</p>
</div>
<div class="comment-date">2024-06-14 11:55 UTC</div>
</div>
<div class="comment" id="5b3ac2db5a7a4b8697528e757652e6af">
<div class="comment-author"><a href="http://snape.me">James Snape</a> <a href="#5b3ac2db5a7a4b8697528e757652e6af">#</a></div>
<div class="comment-content">
<p>I think we are basically agreeing here because I would never use natural keys nor externally visible synthetic keys for <em>physical</em> primary keys. (I think this statement is even more restrictive than the article's main premise). Well, with a rule exception for configurable enum type tables because the overhead of joining to resolve a single column value is inefficient. I would however always use a natural key for a <em>logical</em> primary key.</p>
<p>The only reason why I'm slightly pedantic about this is due the the number of clients why have used surrogate keys in a logical model and then gone on to create databases where the concept of entity identity doesn't exist. This creates many of the issues <a href="https://processdecision.com">Nicholas Peterson</a> mentioned above: duplicates, historical change tracking, etc. Frankly, it doesn't help that lots of code examples for ORMs just start with an entity that has an ID attribute.</p>
<p>One final comment on sharing data based on a golden master synthetic key. The moment you do I would argue that you have now committed to maintaining that key through all types of data mergers and acquisitions. It must never collide, and always point to exactly the same record and only that record. Since users can use it to refer to an entity and it makes up part of your external API, it now meets the definition of a natural key. Whether you agree or not on my stretching the definition a little, you still should not use this attribute as the physical primary key (record locator) because we should not expose implementation details in our APIs. The first Celko article I linked to explains some of the difficulties for externally visible synthetic keys.</p>
</div>
<div class="comment-date">2024-06-14 13:45 UTC</div>
</div>
<div class="comment" id="fee47871b1494293b039b67d21187e6b">
<div class="comment-author">Julius H <a href="#fee47871b1494293b039b67d21187e6b">#</a></div>
<div class="comment-content">
<p>I'd like to comment with an example where using a synthetic key came back to bite me. My system had posts and users with synthetic IDs. Now I wanted to track an unread state across them. Naively, I designed just another entity:</p>
<pre>
public int ID { get; set; }
public int PostID { get; set; }
public int UserID { get; set; }
</pre>
<p>And it worked flawlessly for years. One day, however, a user complained that he always got an exception "Sequence contains more than one element". Of course I used SingleOrDefault() in application code because I expected 0 or 1 record per user and post.
The quick solution was deleting the spurious table row. As a permanant solution I removed the ID field (and column) so the unread state had its natural key as primary key (both columns). So if it happens again in the future, the app will error on insertin rather than querying.</p>
<p>Since my application is in control of the IDs and it's just a very simple join table I think it was the best solution. If the future requirements hold different kinds of unread state, I can always add the key again.</p>
</div>
<div class="comment-date">2024-07-22 14:40 UTC</div>
</div>
<div class="comment" id="c573d77b80904d64b697418a7d4440cd">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#c573d77b80904d64b697418a7d4440cd">#</a></div>
<div class="comment-content">
<p>
Julius, thank you for writing. I see what you mean, and would also tend to model this as just a table with two foreign keys. From the perspective of <a href="https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model">entity-relationship modelling</a>, such a table isn't even an entity, but rather a relationship. For that reason, it doesn't need its own key; not because the combination is 'natural', but rather because it's not really an independent 'thing'.
</p>
</div>
<div class="comment-date">2024-07-29 14:39 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Continuous delivery without a CI serverhttps://blog.ploeh.dk/2024/05/27/continuous-delivery-without-a-ci-server2024-05-27T13:34:00+00:00Mark Seemann
<div id="post">
<p>
<em>An illustrative example.</em>
</p>
<p>
More than a decade ago, I worked on a small project. It was a small <a href="https://en.wikipedia.org/wiki/Single-page_application">single-page application</a> (SPA) with a <a href="https://en.wikipedia.org/wiki/REST">REST</a> API backend, deployed to <a href="https://azure.microsoft.com/">Azure</a>. As far as I recall, the REST API used <a href="https://en.wikipedia.org/wiki/Object_storage">blob storage</a>, so all in all it wasn't a complex system.
</p>
<p>
We were two developers, and although we wanted to do <a href="https://en.wikipedia.org/wiki/Continuous_delivery">continuous delivery</a> (CD), we didn't have much development infrastructure. This was a little startup, and back then, there weren't a lot of free build services available. We were using <a href="https://github.com/">GitHub</a>, but it was before it had any free services to compile your code and run tests.
</p>
<p>
Given those constraints, we figured out a simple way to do CD, even though we didn't have a <a href="https://en.wikipedia.org/wiki/Continuous_integration">continuous integration</a> (CI) server.
</p>
<p>
I'll tell you how we did this.
</p>
<h3 id="d4bf05ee06c64650a15e5dae86a6efbd">
Shining an extraordinary light on the mundane <a href="#d4bf05ee06c64650a15e5dae86a6efbd">#</a>
</h3>
<p>
The reason I'm relating this little story isn't to convince you that you, too, should do it that way. Rather, it's a didactic device. By doing something extreme, we can sometimes learn about the ordinary.
</p>
<blockquote>
<p>
You can only be pragmatic if you know how to be dogmatic.
</p>
<footer><cite><a href="/2018/11/12/what-to-test-and-not-to-test">What to test and not to test</a></cite>, me</footer>
</blockquote>
<p>
From what I hear and read, it seems that there's a lot of organizations that believe that they're doing CI (or perhaps even CD) because they have a CI server. What the following tale will hopefully highlight is that, while build servers are useful, they aren't a requirement for CI or CD.
</p>
<h3 id="08dc5521141f4907bc44d43852716196">
Distributed CD <a href="#08dc5521141f4907bc44d43852716196">#</a>
</h3>
<p>
Dramatis personae: My colleague and me. Scene: One small SPA project with a REST API and blob storage, to be deployed to Azure. Code base in GitHub. Two laptops. Remote work.
</p>
<p>
One of us (let's say me) would start on implementing a feature, or fixing a bug. I'd use test-driven development (TDD) to get feedback on API ideas, as well as to accumulate a suite of regression tests. After a few hours of effective work, I'd send a pull request to my colleague.
</p>
<p>
Since we were only two people on the team, the responsibility was clear. It was the other person's job to review the pull request. It was also clear that the longer the reviewer dawdled, the less efficient the process would be. For that reason, we'd typically have <a href="/2021/06/21/agile-pull-requests">agile pull requests</a> with a good turnaround time.
</p>
<p>
While we were taking advantage of GitHub as a central coordination hub for pull requests, <a href="https://git-scm.com/">Git</a> itself is famously distributed. Thus, we wondered whether it'd be possible to make the CD process distributed as well.
</p>
<p>
Yes, apart from GitHub, what we did was already distributed.
</p>
<h3 id="77a1cae8e41349f8b8092aec8fa512d3">
A little more automation <a href="#77a1cae8e41349f8b8092aec8fa512d3">#</a>
</h3>
<p>
Since we were both doing TDD, we already had automated tests. Due to the simple setup of the system, we'd already automated more than 80% of our process. It wasn't much of a stretch to automate whatever else needed automation. Such as deployment.
</p>
<p>
We agreed on a few simple rules:
</p>
<ul>
<li>Every part of our process should be automated.</li>
<li>Reviewing a pull request included running all tests.</li>
</ul>
<p>
When people review pull requests, they often just go to GitHub and look around before issuing an LGTM.
</p>
<p>
But, you <em>do</em> realize that this is Git, right? You can pull down the proposed changes and <em>run them</em>.
</p>
<p>
What if you're already in the middle of something, working on the same code base? Stash your changes and pull down the code.
</p>
<p>
The consequence of this process was that every time a pull request was accepted, we already knew that it passed all automated tests on two physical machines. We actually didn't need a server to run the tests a third time.
</p>
<p>
<img src="/content/binary/distributed-cd.png" alt="Two laptops, a box indicating GitHub, and another box indicating a production system.">
</p>
<p>
After a merge, the final part of the development process mandated that the original author should deploy to production. We had <a href="https://en.wikipedia.org/wiki/Bash_(Unix_shell)">Bash</a> script that did that.
</p>
<h3 id="cabd161b2cb34fe79caf3172bb339948">
Simplicity <a href="#cabd161b2cb34fe79caf3172bb339948">#</a>
</h3>
<p>
This process came with some built-in advantages. First of all, it was <em>simple</em>. There wasn't a lot of moving parts, so there weren't many steps that could break.
</p>
<p>
Have you ever had the pleasure of troubleshooting a build? The code works on your machine, but not on the build server.
</p>
<p>
It sometimes turns out that there's a configuration mismatch with the compiler or test tools. Thus, the problem with the build server doesn't mean that you prevented a dangerous defect from being deployed to production. No, the code just didn't compile on the build server, but would actually have run fine on the production system.
</p>
<p>
It's much easier troubleshooting issues on your own machine than on some remote server.
</p>
<p>
I've also seen build servers that were set up to run tests, but along the way, something had failed and the tests didn't run. And no-one was looking at logs or warning emails from the build system because that system would already be sending hundreds of warnings a day.
</p>
<p>
By agreeing to manually(!) run the automated tests as part of the review process, we were sure that they were exercised.
</p>
<p>
Finally, by keeping the process simple, we could focus on what mattered: Delivering value to our customer. We didn't have to waste time learning how a proprietary build system worked.
</p>
<h3 id="2a427dee73f44fb4ab6753cf5246cc52">
Does it scale? <a href="#2a427dee73f44fb4ab6753cf5246cc52">#</a>
</h3>
<p>
I know what you're going to say: This may have worked because the overall requirements were so simple. This will never work in a 'real' development organization, with a 'real' code base.
</p>
<p>
I understand. I never claimed that it would.
</p>
<p>
The point of this story is to highlight what CI and CD is. It's a way of working where you <em>continuously</em> integrate your code with everyone else's code, and where you <em>continuously</em> deploy changes to production.
</p>
<p>
In reality, having a dedicated build system for that can be useful. These days, such systems tend to be services that integrate with GitHub or other sites, rather than an actual server that you have to care for. Even so, having such a system doesn't mean that your organization makes use of CI or CD.
</p>
<p>
(Oh, and for the mathematically inclined: In this context <em>continuous</em> doesn't mean actually continuous. It just means <em>arbitrarily often</em>.)
</p>
<h3 id="097bd9130bb144c79ee3e75b0652a99f">
Conclusion <a href="#097bd9130bb144c79ee3e75b0652a99f">#</a>
</h3>
<p>
CI and CD are processes that describe how we work with code, and how we work together.
</p>
<p>
Continuous integration means that you often integrate your code with everyone else's code. How often? More than once a day.
</p>
<p>
Continuous deployment means that you often deploy code changes to production. How often? Every time new code is integrated.
</p>
<p>
A build system can be convenient to help along such processes, but it's strictly speaking not required.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Fundamentalshttps://blog.ploeh.dk/2024/05/20/fundamentals2024-05-20T07:04:00+00:00Mark Seemann
<div id="post">
<p>
<em>How to stay current with technology progress.</em>
</p>
<p>
A long time ago, I landed my dream job. My new employer was a consulting company, and my role was to be the resident <a href="https://en.wikipedia.org/wiki/Microsoft_Azure">Azure</a> expert. Cloud computing was still in its infancy, and there was a good chance that I might be able to establish myself as a leading regional authority on the topic.
</p>
<p>
As part of the role, I was supposed to write articles and give presentations showing how to solve various problems with Azure. I dug in with fervour, writing sample code bases and even <a href="http://msdn.microsoft.com/en-us/magazine/gg983487.aspx">an MSDN Magazine article</a>. To my surprise, after half a year I realized that I was bored.
</p>
<p>
At that time I'd already spent more than a decade learning new technology, and I knew that I was good at it. For instance, I worked five years for Microsoft Consulting Services, and a dirty little secret of that kind of role is that, although you're sold as an expert in some new technology, you're often only a few weeks ahead of your customer. For example, I was once engaged as a <a href="https://en.wikipedia.org/wiki/Windows_Workflow_Foundation">Windows Workflow Foundation</a> expert at a time when it was still in beta. No-one had years of experience with that technology, but I was still expected to know much more about it than my customer.
</p>
<p>
I had lots of engagements like that, and they usually went well. I've always been good at cramming, and as a consultant you're also unencumbered by all the daily responsibilities and politics that often occupy the time and energy of regular employees. The point being that while I'm decent at learning new stuff, the role of being a consultant also facilitates that sort of activity.
</p>
<p>
After more then a decade of learning new frameworks, new software libraries, new programming languages, new tools, new online services, it turned out that I was ready for something else. After spending a few months learning Azure, I realized that I'd lost interest in that kind of learning. When investigating a new Azure SDK, I'd quickly come to the conclusion that, <em>oh, this is just another object-oriented library</em>. There are these objects, and you call this method to do that, etc. That's not to say that learning a specific technology is a trivial undertaking. The worse the design, the more difficult it is to learn.
</p>
<p>
Still, after years of learning new technologies, I'd started recognizing certain patterns. Perhaps, I thought, well-designed technologies are based on some fundamental ideas that may be worth learning instead.
</p>
<h3 id="ac37913a2b8248e6b51d4506c2da0481">
Staying current <a href="#ac37913a2b8248e6b51d4506c2da0481">#</a>
</h3>
<p>
A common lament among software developers is that the pace of technology is so overwhelming that they can't keep up. This is true. You can't keep up.
</p>
<p>
There will always be something that you don't know. In fact, most things you don't know. This isn't a condition isolated only to technology. The sum total of all human knowledge is so vast that you can't know it all. What you will learn, even after a lifetime of diligent study, will be a nanoscopic fraction of all human knowledge - even of everything related to software development. You can't stay current. Get used to it.
</p>
<p>
A more appropriate question is: <em>How do I keep my skill set relevant?</em>
</p>
<p>
Assuming that you wish to stay employable in some capacity, it's natural to be concerned with how your mad <a href="https://en.wikipedia.org/wiki/Adobe_Flash">Flash</a> skillz will land you the next gig.
</p>
<p>
Trying to keep abreast of all new technologies in your field is likely to lead to burnout. Rather, put yourself in a position so that you can quickly learn necessary skills, just in time.
</p>
<h3 id="c529c0131b284fe1bca42bec0663fc8e">
Study fundamentals, rather than specifics <a href="#c529c0131b284fe1bca42bec0663fc8e">#</a>
</h3>
<p>
Those many years ago, I realized that it'd be a better investment of my time to study fundamentals. Often, once you have some foundational knowledge, you can apply it in many circumstances. Your general knowledge will enable you to get quickly up to speed with specific technologies.
</p>
<p>
Success isn't guaranteed, but knowing fundamentals increases your chances.
</p>
<p>
This may still seem too abstract. Which fundamentals should you learn?
</p>
<p>
In the remainder of this article, I'll give you some examples. The following collection of general programmer knowledge spans software engineering, computer science, broad ideas, but also specific tools. I only intend this set of examples to serve as inspiration. The list isn't complete, nor does it constitute a minimum of what you should learn.
</p>
<p>
If you have other interests, you may put together your own research programme. What follows here are just some examples of fundamentals that I've found useful during my career.
</p>
<p>
A criterion, however, for constituting foundational knowledge is that you should be able to apply that knowledge in a wide variety of contexts. The fundamental should not be tied to a particular programming language, platform, or operating system.
</p>
<h3 id="4f474189809f4d53b447b4005cef1bfd">
Design patterns <a href="#4f474189809f4d53b447b4005cef1bfd">#</a>
</h3>
<p>
Perhaps the first foundational notion that I personally encountered was that of <em>design patterns</em>. As the Gang of Four (GoF) wrote in <a href="https://en.wikipedia.org/wiki/Design_Patterns">the book</a>, a design pattern is an abstract description of a solution that has been observed 'in the wild', more than once, independently evolved.
</p>
<p>
Please pay attention to the causality. A design pattern isn't prescriptive, but descriptive. It's an observation that a particular code organisation tends to solve a particular problem.
</p>
<p>
There are lots of misconceptions related to design patterns. One of them is that the 'library of patterns' is finite, and more or less constrained to the patterns included in the original book.
</p>
<p>
There are, however, many more patterns. To illustrate how much wider this area is, here's a list of some patterns books in my personal library:
</p>
<ul>
<li><a href="/ref/dp">Design Patterns</a></li>
<li><a href="/ref/plopd3">Pattern Languages of Program Design 3</a></li>
<li><a href="/ref/peaa">Patterns of Enterprise Application Architecture</a></li>
<li><a href="/ref/eip">Enterprise Integration Patterns</a></li>
<li><a href="/ref/xunit-patterns">xUnit Test Patterns</a></li>
<li><a href="/ref/service-design-patterns">Service Design Patterns</a></li>
<li><a href="/ref/implementation-patterns">Implementation Patterns</a></li>
<li><a href="/ref/rest-cookbook">RESTful Web Services Cookbook</a></li>
<li><a href="/ref/antipatterns">AntiPatterns</a></li>
</ul>
<p>
In addition to these, there are many more books in my library that are patterns-adjacent, including <a href="/dippp">one of my own</a>. The point is that software design patterns is a vast topic, and it pays to know at least the most important ones.
</p>
<p>
A design pattern fits the criterion that you can apply the knowledge independently of technology. The original GoF book has examples in <a href="https://en.wikipedia.org/wiki/C%2B%2B">C++</a> and <a href="https://en.wikipedia.org/wiki/Smalltalk">Smalltalk</a>, but I've found that they apply well to C#. Other people employ them in their <a href="https://www.java.com/">Java</a> code.
</p>
<p>
Knowing design patterns not only helps you design solutions. That knowledge also enables you to recognize patterns in existing libraries and frameworks. It's this fundamental knowledge that makes it easier to learn new technologies.
</p>
<p>
Often (although not always) successful software libraries and frameworks tend to follow known patterns, so if you're aware of these patterns, it becomes easier to learn such technologies. Again, be aware of the causality involved. I'm not claiming that successful libraries are explicitly designed according to published design patterns. Rather, some libraries become successful because they offer good solutions to certain problems. It's not surprising if such a good solution falls into a pattern that other people have already observed and recorded. It's like <a href="https://en.wikipedia.org/wiki/Parallel_evolution">parallel evolution</a>.
</p>
<p>
This was my experience when I started to learn the details of Azure. Many of those SDKs and APIs manifested various design patterns, and once I'd recognized a pattern it became much easier to learn the rest.
</p>
<p>
The idea of design patterns, particularly object-oriented design patterns, have its detractors, too. Let's visit that as the next set of fundamental ideas.
</p>
<h3 id="c44e7624ea3e4cef9485522146d17a6d">
Functional programming abstractions <a href="#c44e7624ea3e4cef9485522146d17a6d">#</a>
</h3>
<p>
As I'm writing this, yet another Twitter thread pokes fun at object-oriented design (OOD) patterns as being nothing but a published collection of workarounds for the shortcomings of object orientation. The people who most zealously pursue that agenda tends to be functional programmers.
</p>
<p>
Well, I certainly like functional programming (FP) better than OOD too, but rather than poking fun at OOD, I'm more interested in <a href="/2018/03/05/some-design-patterns-as-universal-abstractions">how design patterns relate to universal abstractions</a>. I also believe that FP has shortcomings of its own, but I'll have more to say about that in a future article.
</p>
<p>
Should you learn about <a href="/2017/10/06/monoids">monoids</a>, <a href="/2018/03/22/functors">functors</a>, <a href="/2022/03/28/monads">monads</a>, <a href="/2019/04/29/catamorphisms">catamorphisms</a>, and so on?
</p>
<p>
Yes you should, because these ideas also fit the criterion that the knowledge is technology-independent. I've used my knowledge of these topics in <a href="https://www.haskell.org/">Haskell</a> (hardly surprising) and <a href="https://fsharp.org/">F#</a>, but also in C# and <a href="https://www.python.org/">Python</a>. The various <a href="https://en.wikipedia.org/wiki/Language_Integrated_Query">LINQ</a> methods are really just well-known APIs associated with, you guessed it, functors, monads, monoids, and catamorphisms.
</p>
<p>
Once you've learned these fundamental ideas, it becomes easier to learn new technologies. This has happened to me multiple times, for example in contexts as diverse as property-based testing and asynchronous message-passing architectures. Once I realize that an API gives rise to a monad, say, I know that certain functions must be available. I also know how I should best compose larger code blocks from smaller ones.
</p>
<p>
Must you know all of these concepts before learning, say, F#? No, not at all. Rather, a language like F# is a great vehicle for learning such fundamentals. There's a first time for learning anything, and you need to start somewhere. Rather, the point is that once you know these concepts, it becomes easier to learn the next thing.
</p>
<p>
If, for example, you already know what a monad is when learning F#, picking up the idea behind <a href="https://learn.microsoft.com/dotnet/fsharp/language-reference/computation-expressions">computation expressions</a> is easy once you realize that it's just a compiler-specific way to enable syntactic sugaring of monadic expressions. You can learn how computation expressions work without that knowledge, too; it's just harder.
</p>
<p>
This is a recurring theme with many of these examples. You can learn a particular technology without knowing the fundamentals, but you'll have to put in more time to do that.
</p>
<p>
On to the next example.
</p>
<h3 id="02c8fb3f6fe74fb9bffda719122c60a9">
SQL <a href="#02c8fb3f6fe74fb9bffda719122c60a9">#</a>
</h3>
<p>
Which <a href="https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping">object-relational mapper</a> (ORM) should you learn? <a href="https://hibernate.org/orm/">Hibernate</a>? <a href="https://learn.microsoft.com/ef/">Entity Framework</a>?
</p>
<p>
How about learning <a href="https://en.wikipedia.org/wiki/SQL">SQL</a>? I learned SQL in 1999, I believe, and it's served me well ever since. I <a href="/2023/09/18/do-orms-reduce-the-need-for-mapping">consider raw SQL to be more productive than using an ORM</a>. Once more, SQL is largely technology-independent. While each database typically has its own SQL dialect, the fundamentals are the same. I'm most well-versed in the <a href="https://en.wikipedia.org/wiki/Microsoft_SQL_Server">SQL Server</a> dialect, but I've also used my SQL knowledge to interact with <a href="https://www.oracle.com/database/">Oracle</a> and <a href="https://www.postgresql.org/">PostgreSQL</a>. Once you know one SQL dialect, you can quickly solve data problems in one of the other dialects.
</p>
<p>
It doesn't matter much whether you're interacting with a database from .NET, Haskell, Python, <a href="https://www.ruby-lang.org/">Ruby</a>, or another language. SQL is not only universal, the core of the language is stable. What I learned in 1999 is still useful today. Can you say the same about your current ORM?
</p>
<p>
Most programmers prefer learning the newest, most cutting-edge technology, but that's a risky gamble. Once upon a time <a href="https://en.wikipedia.org/wiki/Microsoft_Silverlight">Silverlight</a> was a cutting-edge technology, and more than one of my contemporaries went all-in on it.
</p>
<p>
On the contrary, most programmers find old stuff boring. It turns out, though, that it may be worthwhile learning some old technologies like SQL. Be aware of the <a href="https://en.wikipedia.org/wiki/Lindy_effect">Lindy effect</a>. If it's been around for a long time, it's likely to still be around for a long time. This is true for the next example as well.
</p>
<h3 id="e4a7c033c0964420a0abbf83a0bbb773">
HTTP <a href="#e4a7c033c0964420a0abbf83a0bbb773">#</a>
</h3>
<p>
The <a href="https://en.wikipedia.org/wiki/HTTP">HTTP protocol</a> has been around since 1991. It's an effectively text-based protocol, and you can easily engage with a web server on a near-protocol level. This is true for other older protocols as well.
</p>
<p>
In my first IT job in the late 1990s, one of my tasks was to set up and maintain <a href="https://en.wikipedia.org/wiki/Microsoft_Exchange_Server">Exchange Servers</a>. It was also my responsibility to make sure that email could flow not only within the organization, but that we could exchange email with the rest of the internet. In order to test my mail servers, I would often just <a href="https://en.wikipedia.org/wiki/Telnet">telnet</a> into them on port 25 and type in the correct, <a href="https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol">text-based instructions to send a test email</a>.
</p>
<p>
Granted, it's not that easy to telnet into a modern web server on port 80, but a ubiquitous tool like <a href="https://curl.se/">curl</a> accomplishes the same goal. I recently wrote how <a href="/2024/05/13/gratification">knowing curl is better</a> than knowing <a href="https://www.postman.com/">Postman</a>. While this wasn't meant as an attack on Postman specifically, neither was it meant as a facile claim that curl is the only tool useful for ad-hoc interaction with HTTP-based APIs. Sometimes you only realize an underlying truth when you write about a thing and then <a href="/2024/05/13/gratification#9efea1cadb8c4e388bfba1a2064dd59a">other people find fault with your argument</a>. The underlying truth, I think, is that it pays to understand HTTP and being able to engage with an HTTP-based web service at that level of abstraction.
</p>
<p>
Preferably in an automatable way.
</p>
<h3 id="e3a250b707b243dabc6609134e864aee">
Shells and scripting <a href="#e3a250b707b243dabc6609134e864aee">#</a>
</h3>
<p>
The reason I favour curl over other tools to interact with HTTP is that I already spend quite a bit of time at the command line. I typically have a little handful of terminal windows open on my laptop. If I need to test an HTTP server, curl is already available.
</p>
<p>
Many years ago, an employer introduced me to <a href="https://git-scm.com/">Git</a>. Back then, there were no good graphical tools to interact with Git, so I had to learn to use it from the command line. I'm eternally grateful that it turned out that way. I still use Git from the command line.
</p>
<p>
When you install Git, by default you also install Git Bash. Since I was already using that shell to interact with Git, it began to dawn on me that it's a full-fledged shell, and that I could do all sorts of other things with it. It also struck me that learning <a href="https://www.gnu.org/software/bash/">Bash</a> would be a better investment of my time than learning <a href="https://learn.microsoft.com/powershell/">PowerShell</a>. At the time, there was no indication that PowerShell would ever be relevant outside of Windows, while Bash was already available on most systems. Even today, knowing Bash strikes me as more useful than knowing PowerShell.
</p>
<p>
It's not that I do much Bash-scripting, but I could. Since I'm a programmer, if I need to automate something, I naturally reach for something more robust than shell scripting. Still, it gives me confidence to know that, since I already know Bash, Git, curl, etc., I <em>could</em> automate some tasks if I needed to.
</p>
<p>
Many a reader will probably complain that the Git CLI has horrible <a href="/2024/05/13/gratification">developer experience</a>, but I will, again, postulate that it's not that bad. It helps if you understand some fundamentals.
</p>
<h3 id="a511cfd8d9bf4bbda433dbf70184284a">
Algorithms and data structures <a href="#a511cfd8d9bf4bbda433dbf70184284a">#</a>
</h3>
<p>
Git really isn't that difficult to understand once you realize that a Git repository is just a <a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph">directed acyclic graph</a> (DAG), and that branches are just labels that point to nodes in the graph. There are basic data structures that it's just useful to know. DAGs, <a href="https://en.wikipedia.org/wiki/Tree_(graph_theory)">trees</a>, <a href="https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)">graphs</a> in general, <a href="https://en.wikipedia.org/wiki/Adjacency_list">adjacency lists</a> or <a href="https://en.wikipedia.org/wiki/Adjacency_matrix">adjacency matrices</a>.
</p>
<p>
Knowing that such data structures exist is, however, not that useful if you don't know what you can <em>do</em> with them. If you have a graph, you can find a <a href="https://en.wikipedia.org/wiki/Minimum_spanning_tree">minimum spanning tree</a> or a <a href="https://en.wikipedia.org/wiki/Shortest-path_tree">shortest-path tree</a>, which sometimes turn out to be useful. Adjacency lists or matrices give you ways to represent graphs in code, which is why they are useful.
</p>
<p>
Contrary to certain infamous interview practices, you don't need to know these algorithms by heart. It's usually enough to know that they exist. I can't remember <a href="https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm">Dijkstra's algorithm</a> off the top of my head, but if I encounter a problem where I need to find the shortest path, I can look it up.
</p>
<p>
Or, if presented with the problem of constructing current state from an Event Store, you may realize that it's just a left <a href="https://en.wikipedia.org/wiki/Fold_(higher-order_function)">fold</a> over a <a href="https://en.wikipedia.org/wiki/Linked_list">linked list</a>. (This isn't my own realization; I first heard it from <a href="https://gotocon.com/cph-2011/presentation/Behavior!">Greg Young in 2011</a>.)
</p>
<p>
Now we're back at one of the first examples, that of FP knowledge. A <a href="/2019/05/27/list-catamorphism">list fold is its catamorphism</a>. Again, these things are much easier to learn if you already know some fundamentals.
</p>
<h3 id="f109425f27014cd5bd395a74e9575355">
What to learn <a href="#f109425f27014cd5bd395a74e9575355">#</a>
</h3>
<p>
These examples may seems overwhelming. Do you really need to know all of that before things become easier?
</p>
<p>
No, that's not the point. I didn't start out knowing all these things, and some of them, I'm still not very good at. The point is rather that if you're wondering how to invest your limited time so that you can remain up to date, consider pursuing general-purpose knowledge rather than learning a specific technology.
</p>
<p>
Of course, if your employer asks you to use a particular library or programming language, you need to study <em>that</em>, if you're not already good at it. If, on the other hand, you decide to better yourself, you can choose what to learn next.
</p>
<p>
Ultimately, if your're learning for your own sake, the most important criterion may be: Choose something that interests you. If no-one forces you to study, it's too easy to give up if you lose interest.
</p>
<p>
If, however, you have the choice between learning <a href="https://mjvl.github.io/Noun.js/">Noun.js</a> or design patterns, may I suggest the latter?
</p>
<h3 id="94c3f380b556403d82dd9f3cd0c1d1e9">
For life <a href="#94c3f380b556403d82dd9f3cd0c1d1e9">#</a>
</h3>
<p>
When are you done, you ask?
</p>
<p>
Never. There's more stuff than you can learn in a lifetime. I've met a lot of programmers who finally give up on the grind to keep up, and instead become managers.
</p>
<p>
As if there's nothing to learn when you're a manager. I'm fortunate that, before <a href="/2011/11/08/Independency">I went solo</a>, I mainly had good managers. I'm under no illusion that they automatically became good managers. All I've heard said about management is that there's a lot to learn in that field, too. Really, it'd be surprising if that wasn't the case.
</p>
<p>
I can understand, however, how just keep learning the next library, the next framework, the next tool becomes tiring. As I've already outlined, I hit that wall more than a decade ago.
</p>
<p>
On the other hand, there are so many wonderful fundamentals that you can learn. You can do self-study, or you can enrol in a more formal programme if you have the opportunity. I'm currently following a course on compiler design. It's not that I expect to pivot to writing compilers for the rest of my career, but rather,
</p>
<blockquote>
<ol type="a">
<li>"It is considered a topic that you should know in order to be "well-cultured" in computer science.</li>
<li>"A good craftsman should know his tools, and compilers are important tools for programmers and computer scientists.</li>
<li>"The techniques used for constructing a compiler are useful for other purposes as well.</li>
<li>"There is a good chance that a programmer or computer scientist will need to write a compiler or interpreter for a domain-specific language."</li>
</ol>
<footer><cite><a href="/ref/introduction-to-compiler-design">Introduction to Compiler Design</a></cite> (from the introduction), Torben Ægidius Mogensen</footer>
</blockquote>
<p>
That's good enough for me, and so far, I'm enjoying the course (although it's also hard work).
</p>
<p>
You may not find this particular topic interesting, but then hopefully you can find something else that you fancy. 3D rendering? Machine learning? Distributed systems architecture?
</p>
<h3 id="7519e9b6147d49379f545c69871c381a">
Conclusion <a href="#7519e9b6147d49379f545c69871c381a">#</a>
</h3>
<p>
Technology moves at a pace with which it's impossible to keep up. It's not just you who's falling behind. Everyone is. Even the best-paid <a href="https://en.wikipedia.org/wiki/Big_Tech">GAMMA</a> programmer knows next to nothing of all there is to know in the field. They may have superior skills in certain areas, but there will be so much other stuff that they don't know.
</p>
<p>
You may think of me as a <a href="https://x.com/hillelogram/status/1445435617047990273">thought leader</a> if you will. If nothing else, I tend to be a prolific writer. Perhaps you even think I'm a good programmer. I should hope so. Who fancies themselves bad at something?
</p>
<p>
You should, however, have seen me struggle with <a href="https://en.wikipedia.org/wiki/C_(programming_language)">C</a> programming during a course on computer systems programming. There's a thing I'm happy if I never have to revisit.
</p>
<p>
You can't know it all. You can't keep up. But you can focus on learning the fundamentals. That tends to make it easier to learn specific technologies that build on those foundations.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Gratificationhttps://blog.ploeh.dk/2024/05/13/gratification2024-05-13T06:27:00+00:00Mark Seemann
<div id="post">
<p>
<em>Some thoughts on developer experience.</em>
</p>
<p>
Years ago, I was introduced to a concept called <em>developer ergonomics</em>. Despite the name, it's not about good chairs, standing desks, or multiple monitors. Rather, the concept was related to how easy it'd be for a developer to achieve a certain outcome. How easy is it to set up a new code base in a particular language? How much work is required to save a row in a database? How hard is it to read rows from a database and display the data on a web page? And so on.
</p>
<p>
These days, we tend to discuss <em>developer experience</em> rather than ergonomics, and that's probably a good thing. This term more immediately conveys what it's about.
</p>
<p>
I've recently had some discussions about developer experience (DevEx, DX) with one of my customers, and this has lead me to reflect more explicitly on this topic than previously. Most of what I'm going to write here are opinions and beliefs that go back a long time, but apparently, it's only recently that these notions have congealed in my mind under the category name <em>developer experience</em>.
</p>
<p>
This article may look like your usual old-man-yells-at-cloud article, but I hope that I can avoid that. It's not the case that I yearn for some lost past where 'we' wrote <a href="https://en.wikipedia.org/wiki/Plankalk%C3%BCl">Plankalkül</a> in <a href="https://en.wikipedia.org/wiki/Edlin">Edlin</a>. That, in fact, sounds like a horrible developer experience.
</p>
<p>
The point, rather, is that most attractive things come with consequences. For anyone who have been reading this blog even once in a while, this should come as no surprise.
</p>
<h3 id="cbc9752f754e40cc94267689f5dd87bf">
Instant gratification <a href="#cbc9752f754e40cc94267689f5dd87bf">#</a>
</h3>
<p>
Fat foods, cakes, and wine can be wonderful, but can be detrimental to your health if you overindulge. It can, however, be hard to resist a piece of chocolate, and even if we think that we shouldn't, we often fail to restrain ourselves. The temptation of instant gratification is simply too great.
</p>
<p>
There are other examples like this. The most obvious are the use of narcotics, lack of exercise, smoking, and dropping out of school. It may feel good in the moment, but can have long-term consequences.
</p>
<p>
Small children are notoriously bad at delaying gratification, and we often associate the ability to delay gratification with maturity. We all, however, fall in from time to time. Food and wine are my weak spots, while I don't do drugs, and I didn't drop out of school.
</p>
<p>
It strikes me that we often talk about ideas related to developer experience in a way where we treat developers as children. To be fair, many developers also act like children. I don't know how many times I've <ins datetime="2024-06-17T08:26Z">heard</ins> something like, <em>"I don't want to write tests/go through a code review/refactor! I just want to ship working code now!"</em>
</p>
<p>
Fine, so do I.
</p>
<p>
Even if wine is bad for me, it makes life worth living. As the saying goes, even if you don't smoke, don't drink, exercise rigorously, eat healthily, don't do drugs, and don't engage in dangerous activities, you're not guaranteed to live until ninety, but you're guaranteed that it's going to <em>feel</em> that long.
</p>
<p>
Likewise, I'm aware that doing everything right can sometimes take so long that by the time we've deployed the software, it's too late. The point isn't to always or never do certain things, but rather to be aware of the consequences of our choices.
</p>
<h3 id="ac2969093f264da092186fa0cb7196e5">
Developer experience <a href="#ac2969093f264da092186fa0cb7196e5">#</a>
</h3>
<p>
I've no problem with aiming to make the experience of writing software as good as possible. Some developer-experience thought leaders talk about the importance of documentation, predictability, and timeliness. Neither do I mind that a development environment looks good, completes my words, or helps me refactor.
</p>
<p>
To return to the analogy of human vices, not everything that feels good is ultimately bad for you. While I do like wine and chocolate, I also love <a href="https://en.wikipedia.org/wiki/Sushi">sushi</a>, white <a href="https://en.wikipedia.org/wiki/Asparagus">asparagus</a>, <a href="https://en.wikipedia.org/wiki/Turbot">turbot</a>, <a href="https://en.wikipedia.org/wiki/Chanterelle">chanterelles</a>, <a href="https://en.wikipedia.org/wiki/Cyclopterus">lumpfish</a> roe <a href="https://en.wikipedia.org/wiki/Caviar">caviar</a>, <a href="https://en.wikipedia.org/wiki/Morchella">true morels</a>, <a href="https://en.wikipedia.org/wiki/Nephrops_norvegicus">Norway lobster</a>, and various other foods that tend to be categorized as healthy.
</p>
<p>
A good <a href="https://en.wikipedia.org/wiki/Integrated_development_environment">IDE</a> with refactoring support, statement completion, type information, test runner, etc. is certainly preferable to writing all code in <a href="https://en.wikipedia.org/wiki/Windows_Notepad">Notepad</a>.
</p>
<p>
That said, there's a certain kind of developer tooling and language features that strikes me as more akin to candy. These are typically tools and technologies that tend to demo well. Recent examples include <a href="https://www.openapis.org/">OpenAPI</a>, <a href="https://github.com/features/copilot">GitHub Copilot</a>, <a href="https://learn.microsoft.com/dotnet/csharp/fundamentals/program-structure/top-level-statements">C# top-level statements</a>, code generation, and <a href="https://www.postman.com/">Postman</a>. Not all of these are unequivocally bad, but they strike me as mostly aiming at immature developers.
</p>
<p>
The point of this article isn't to single out these particular products, standards, or language features, but on the other hand, in order to make a point, I do have to at least outline why I find them problematic. They're just examples, and I hope that by explaining what is on my mind, you can see the pattern and apply it elsewhere.
</p>
<h3 id="f7f676bf5a334b189b3c2baab18b1e6a">
OpenAPI <a href="#f7f676bf5a334b189b3c2baab18b1e6a">#</a>
</h3>
<p>
A standard like OpenAPI, for example, looks attractive because it automates or standardizes much work related to developing and maintaining <a href="https://en.wikipedia.org/wiki/REST">REST APIs</a>. Frameworks and tools that leverage that standard automatically creates machine-readable <a href="/2024/04/15/services-share-schema-and-contract-not-class">schema and contract</a>, which can be used to generate client code. Furthermore, an OpenAPI-aware framework can also autogenerate an entire web-based graphical user interface, which developers can use for ad-hoc testing.
</p>
<p>
I've worked with clients who also published these OpenAPI user interfaces to their customers, so that it was easy to get started with the APIs. Easy onboarding.
</p>
<p>
Instant gratification.
</p>
<p>
What's the problem with this? There are clearly enough apparent benefits that I usually have a hard time talking my clients out of pursuing this strategy. What are the disadvantages? Essentially, OpenAPI locks you into <a href="https://martinfowler.com/articles/richardsonMaturityModel.html">level 2</a> APIs. No hypermedia controls, no <a href="/2015/06/22/rest-implies-content-negotiation">smooth conneg-based versioning</a>, no <a href="https://en.wikipedia.org/wiki/HATEOAS">HATEOAS</a>. In fact, most of what makes REST flexible is lost. What remains is an ad-hoc, informally-specified, bug-ridden, slow implementation of half of <a href="https://en.wikipedia.org/wiki/SOAP">SOAP</a>.
</p>
<p>
I've <a href="/2022/12/05/github-copilot-preliminary-experience-report">previously described my misgivings about Copilot</a>, and while I actually still use it, I don't want to repeat all of that here. Let's move on to another example.
</p>
<h3 id="f56e835825464650a86c557c7253f095">
Top-level statements <a href="#f56e835825464650a86c557c7253f095">#</a>
</h3>
<p>
Among many other language features, C# 9 got <em>top-level-statements</em>. This means that you don't need to write a <code>Main</code> method in a static class. Rather, you can have a single C# code file where you can immediately start executing code.
</p>
<p>
It's not that I consider this language feature particularly harmful, but it also solves what seems to me a non-problem. It demos well, though. If I understand the motivation right, the feature exists because 'modern' developers are used to languages like <a href="https://www.python.org/">Python</a> where you can, indeed, just create a <code>.py</code> file and start adding code statements.
</p>
<p>
In an attempt to make C# more attractive to such an audience, it, too, got that kind of developer experience enabled.
</p>
<p>
You may argue that this is a bid to remove some of the ceremony from the language, but I'm not convinced that this moves that needle much. The <a href="/2019/12/16/zone-of-ceremony">level of ceremony that a language like C# has is much deeper than that</a>. That's not to target C# in particular. <a href="https://www.java.com/">Java</a> is similar, and don't even get me started on <a href="https://en.wikipedia.org/wiki/C_(programming_language)">C</a> or <a href="https://en.wikipedia.org/wiki/C%2B%2B">C++</a>! Did anyone say <em>header files?</em>
</p>
<p>
Do 'modern' developers choose Python over C# because they can't be arsed to write a <code>Main</code> method? If that's the <em>only</em> reason, it strikes me as incredibly immature. <em>I want instant gratification, and writing a <code>Main</code> method is just too much trouble!</em>
</p>
<p>
If developers do, indeed, choose Python or JavaScript over C# and Java, I hope and believe that it's for other reasons.
</p>
<p>
This particular C# feature doesn't bother me, but I find it symptomatic of a kind of 'innovation' where language designers target instant gratification.
</p>
<h3 id="b9ce02aa90074838bd7b8e2cec0189e2">
Postman <a href="#b9ce02aa90074838bd7b8e2cec0189e2">#</a>
</h3>
<p>
Let's consider one more example. You may think that I'm now attacking a company that, for all I know, makes a decent product. I don't really care about that, though. What I do care about is the developer mentality that makes a particular tool so ubiquitous.
</p>
<p>
I've met web service developers who would be unable to interact with the HTTP APIs that they are themselves developing if they didn't have Postman. Likewise, there are innumerable questions on <a href="https://stackoverflow.com/">Stack Overflow</a> where people ask questions about HTTP APIs and post screen shots of Postman sessions.
</p>
<p>
It's okay if you don't know how to interact with an HTTP API. After all, there's a first time for everything, and there was a time when I didn't know how to do this either. Apparently, however, it's easier to install an application with a graphical user interface than it is to use <a href="https://curl.se/">curl</a>.
</p>
<p>
Do yourself a favour and learn curl instead of using Postman. Curl is a command-line tool, which means that you can use it for both ad-hoc experimentation and automation. It takes five to ten minutes to learn the basics. It's also free.
</p>
<p>
It still seems to me that many people are of a mind that it's easier to use Postman than to learn curl. Ultimately, I'd wager that for any task you do with some regularity, it's more productive to learn the text-based tool than the point-and-click tool. In a situation like this, I'd suggest that delayed gratification beats instant gratification.
</p>
<h3 id="50ed56effb784c95a6f6de4967e883ef">
CV-driven development <a href="#50ed56effb784c95a6f6de4967e883ef">#</a>
</h3>
<p>
It is, perhaps, easy to get the wrong impression from the above examples. I'm not pointing fingers at just any 'cool' new technology. There are techniques, languages, frameworks, and so on, which people pick up because they're exciting for other reasons. Often, such technologies solve real problems in their niches, but are then applied for the sole reason that people want to get on the bandwagon. Examples include <a href="https://kubernetes.io/">Kubernetes</a>, mocks, <a href="/2012/11/06/WhentouseaDIContainer">DI Containers</a>, <a href="/2023/12/04/serialization-with-and-without-reflection">reflection</a>, <a href="https://en.wikipedia.org/wiki/Aspect-oriented_programming">AOP</a>, and <a href="https://en.wikipedia.org/wiki/Microservices">microservices</a>. All of these have legitimate applications, but we also hear about many examples where people use them just to use them.
</p>
<p>
That's a different problem from the one I'm discussing in this article. Usually, learning about such advanced techniques requires delaying gratification. There's nothing wrong with learning new skills, but part of that process is also gaining the understanding of when to apply the skill, and when not to. That's a different discussion.
</p>
<h3 id="10cc039f39ba4c2caab34f66f17f90b2">
Innovation is fine <a href="#10cc039f39ba4c2caab34f66f17f90b2">#</a>
</h3>
<p>
The point of this article isn't that every innovation is bad. Contrary to <a href="https://www.charlespetzold.com/">Charles Petzold</a>, I don't really believe that Visual Studio rots the mind, although I once did publish <a href="/2013/02/04/BewareofProductivityTools">an article</a> that navigated the same waters.
</p>
<p>
Despite my misgivings, I haven't uninstalled GitHub Copilot, and I do enjoy many of the features in both Visual Studio (VS) and Visual Studio Code (VS Code). I also welcome and use many new language features in various languages.
</p>
<p>
I can certainly appreciate how an IDE makes many things easier. Every time I have to begin a new <a href="https://www.haskell.org/">Haskell</a> code base, I long for the hand-holding offered by Visual Studio when creating a new C# project.
</p>
<p>
And although I don't use the debugger much, the built-in debuggers in VS and VS Code sure beat <a href="https://en.wikipedia.org/wiki/GNU_Debugger">GDB</a>. It even works in Python!
</p>
<p>
There's even tooling that <a href="https://developercommunity.visualstudio.com/t/Test-Explorer:-Better-support-for-TDD-wo/701822">I wish for</a>, but apparently never will get.
</p>
<h3 id="675450a5c0cf441fa433b928251de8a5">
Simple made easy <a href="#675450a5c0cf441fa433b928251de8a5">#</a>
</h3>
<p>
In <a href="https://www.infoq.com/presentations/Simple-Made-Easy/">Simple Made Easy</a> Rich Hickey follows his usual look-up-a-word-in-the-dictionary-and-build-a-talk-around-the-definition style to contrast <em>simple</em> with <em>easy</em>. I find his distinction useful. A tool or technique that's <em>close at hand</em> is <em>easy</em>. This certainly includes many of the above instant-gratification examples.
</p>
<p>
An <em>easy</em> technique is not, however, necessarily <em>simple</em>. It may or may not be. <a href="https://en.wikipedia.org/wiki/Rich_Hickey">Rich Hickey</a> defines <em>simple</em> as the opposite of <em>complex</em>. Something that is complex is assembled from parts, whereas a simple thing is, ideally, single and undivisible. In practice, truly simple ideas and tools may not be available, and instead we may have to settle with things that are less complex than their alternatives.
</p>
<p>
Once you start looking for things that make simple things easy, you see them in many places. A big category that I personally favour contains all the language features and tools that make functional programming (FP) easier. FP tends to be simpler than object-oriented or procedural programming, because it <a href="/2018/11/19/functional-architecture-a-definition">explicitly distinguishes between and separates</a> predictable code from unpredictable code. This does, however, in itself tend to make some programming tasks harder. How do you generate a random number? Look up the system time? Write a record to a database?
</p>
<p>
Several FP languages have special features that make even those difficult tasks easy. <a href="https://fsharp.org/">F#</a> has <a href="https://learn.microsoft.com/dotnet/fsharp/language-reference/computation-expressions">computation expressions</a> and <a href="https://www.haskell.org/">Haskell</a> has <a href="https://en.wikibooks.org/wiki/Haskell/do_notation">do notation</a>.
</p>
<p>
Let's say you want to call a function that consumes a random number generator. In Haskell (as in .NET) random number generators are actually deterministic, as long as you give them the same seed. Generating a random seed, on the other hand, is non-deterministic, so has to happen in <a href="/2020/06/08/the-io-container">IO</a>.
</p>
<p>
Without <code>do</code> notation, you could write the action like this:
</p>
<p>
<pre><span style="color:#2b91af;">rndSelect</span> <span style="color:blue;">::</span> <span style="color:blue;">Integral</span> i <span style="color:blue;">=></span> [a] <span style="color:blue;">-></span> i <span style="color:blue;">-></span> <span style="color:#2b91af;">IO</span> [a]
rndSelect xs count = (\rnd -> rndGenSelect rnd xs count) <$> newStdGen</pre>
</p>
<p>
(The type annotation is optional.) While terse, this is hardly readable, and the developer experience also leaves something to be desired. Fortunately, however, you can <a href="/2018/07/09/typing-and-testing-problem-23">rewrite this action</a> with <code>do</code> notation, like this:
</p>
<p>
<pre><span style="color:#2b91af;">rndSelect</span> :: <span style="color:blue;">Integral</span> i <span style="color:blue;">=></span> [a] <span style="color:blue;">-></span> i <span style="color:blue;">-></span> <span style="color:#2b91af;">IO</span> [a]
rndSelect xs count = <span style="color:blue;">do</span>
rnd <- newStdGen
<span style="color:blue;">return</span> $ rndGenSelect rnd xs count
</pre>
</p>
<p>
Now we can clearly see that the action first creates the <code>rnd</code> random number generator and then passes it to <code>rndGenSelect</code>. That's what happened before, but it was buried in a lambda expression and Haskell's right-to-left causality. Most people would find the first version (without <code>do</code> notation) less readable, and more difficult to write.
</p>
<p>
Related to <em>developer ergonomics</em>, though, <code>do</code> notation makes the simple code (i.e. code that separates predictable code from unpredictable code) easy (that is; <em>at hand</em>).
</p>
<p>
F# computation expressions offer the same kind of syntactic sugar, making it easy to write simple code.
</p>
<h3 id="86e9a9bd6cfc4408bedace8acd330f64">
Delay gratification <a href="#86e9a9bd6cfc4408bedace8acd330f64">#</a>
</h3>
<p>
While it's possible to set up a development context in such a way that it nudges you to work in a way that's ultimately good for you, temptation is everywhere.
</p>
<p>
Not only may new language features, IDE functionality, or frameworks entice you to do something that may be disadvantageous in the long run. There may also be actions you don't take because it just feels better to move on.
</p>
<p>
Do you take the time to write good commit messages? Not just a single-line heading, but <a href="https://github.com/GreanTech/AtomEventStore/commit/615cdee2c4d675d412e6669bcc0678655376c4d1">a proper message that explains your context and reasoning</a>?
</p>
<p>
Most people I've observed working with source control 'just want to move on', and can't be bothered to write a useful commit message.
</p>
<p>
I hear about the same mindset when it comes to code reviews, particularly pull request reviews. Everyone 'just wants to write code', and no-one want to review other people's code. Yet, in a shared code base, you have to live with the code that other people write. Why not review it so that you have a chance to decide what that shared code base should look like?
</p>
<p>
Delay your own gratification a bit, and reap the awards later.
</p>
<h3 id="630055ed606d43289d71232dd1ef1c25">
Conclusion <a href="#630055ed606d43289d71232dd1ef1c25">#</a>
</h3>
<p>
The only goal I have with this article is to make you think about the consequences of new and innovative tools and frameworks. Particularly if they are immediately compelling, they may be empty calories. Consider if there may be disadvantages to adopting a new way of doing things.
</p>
<p>
Some tools and technologies give you instant gratification, but may be unhealthy in the long run. This is, like most other things, context-dependent. <a href="/2023/01/16/in-the-long-run">In the long run</a> your company may no longer be around. Sometimes, it pays to deliberately do something that you know is bad, in order to reach a goal before your competition. That was the original <em>technical debt</em> metaphor.
</p>
<p>
Often, however, it pays to delay gratification. Learn curl instead of Postman. Learn to design proper REST APIs instead of relying on OpenAI. If you need to write ad-hoc scripts, <a href="/2024/02/05/statically-and-dynamically-typed-scripts">use a language suitable for that</a>.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="9efea1cadb8c4e388bfba1a2064dd59a">
<div class="comment-author"><a href="https://thomaslevesque.com">Thomas Levesque</a> <a href="#9efea1cadb8c4e388bfba1a2064dd59a">#</a></div>
<div class="comment-content">
<p>
Regarding Postman vs. curl, I have to disagree. Sure, curl is pretty easy to use. But while it's good for one-off tests, it sucks when you need to maintain a collection of requests that you can re-execute whevenever you want.
In a testing session, you either need to re-type whole command, or reuse a previous command from the shell's history. Or have a file with all your commands and copy-paste to the shell. Either way, it's not a good experience.
</p>
<p>
That being said, I'm not very fond of Postman either. It's too heavyweight for what it does, IMHO, and the import/export mechanism is terrible for sharing collections with the team. These days, I tend to use VSCode extensions
like <a href="https://github.com/AnWeber/vscode-httpyac">httpYac</a> or <a href="https://github.com/Huachao/vscode-restclient">REST Client</a>, or the equivalent that is now built into Visual Studio and Rider. It's much easier
to work with than Postman (it's just text), while still being interactive. And since it's just a text file, you can just add it to the Git to share it with the team.
</p>
</div>
<div class="comment-date">2024-05-14 02:38 UTC</div>
</div>
<div class="comment" id="9efea1cadb8c4e388bfba1a2064dd59b">
<div class="comment-author"><a href="https://majiehong.com/">Jiehong</a> <a href="#9efea1cadb8c4e388bfba1a2064dd59b">#</a></div>
<div class="comment-content">
<p>
@Thomas Levesque: I agree with you, yet VSCode or Rider's extensions lock you into an editor quite quickly.
</p>
<p>
But you can have the best of both worlds: a cli tool first, with editor extensions.
Just like <a href="https://github.com/Orange-OpenSource/hurl">Hurl</a>.
</p>
<p>
Note that you can run a <a href="https://everything.curl.dev/cmdline/configfile.html#urls">curl command from a file with</a> <code>curl --config [curl_request.file]</code>,
it makes chaining requests (like with login and secrets) rather cumbersome very quickly.
</p>
</div>
<div class="comment-date">2024-05-16 13:57 UTC</div>
</div>
<div class="comment" id="2a6dd3839e2e4bf9b06071221b330356">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#2a6dd3839e2e4bf9b06071221b330356">#</a></div>
<div class="comment-content">
<p>
Thank you, both, for writing. In the end, it's up to every team to settle on technical solutions that work for them, in that context. Likewise, it's up to each developer to identify methodology and tools that work for her or him, as long as it doesn't impact the rest of the team.
</p>
<p>
The reason I suggest curl over other alternatives is that not only is it free, it also tends to be ubiquitous. Most systems come with curl baked in - perhaps not a consumer installation of Windows, but if you have developer tools installed, it's highly likely that you have curl on your machine. It's <a href="/2024/05/20/fundamentals">a fundamental skill that may serve you well if you know it</a>.
</p>
<p>
In addition to that, since curl is a CLI you can always script it if you need a kind of semi-automation. What prevents you from maintaining a collection of script files? They could even take command-line arguments, if you'd like.
</p>
<p>
That said, personally, if I realize that I need to maintain a collection of requests that I can re-execute whenever I want, I'd prefer writing a 'real' program. On the other hand, I find a tool like curl useful for ad-hoc testing.
</p>
</div>
<div class="comment-date">2024-05-21 5:36 UTC</div>
</div>
<div class="comment" id="be53c7b6d29a43e0aa0fdac3fcce835d">
<div class="comment-author">Johannes Egger <a href="#be53c7b6d29a43e0aa0fdac3fcce835d">#</a></div>
<div class="comment-content">
<blockquote>
... maintain a collection of requests that you can re-execute whevenever you want.
</blockquote>
<p>@Thomas Levesque: that sounds like a proper collection of automatically executable tests would be a better fit. But yeah, it's just easier to write those simple commands than to set up a test project - instant gratification 😉</p>
</div>
<div class="comment-date">2024-05-28 17:02 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Conservative codomain conjecturehttps://blog.ploeh.dk/2024/05/06/conservative-codomain-conjecture2024-05-06T06:35:00+00:00Mark Seemann
<div id="post">
<p>
<em>An API design heuristic.</em>
</p>
<p>
For a while now, I've been wondering whether, in the language of <a href="https://en.wikipedia.org/wiki/Robustness_principle">Postel's law</a>, one should favour being liberal in what one accepts over being conservative in what one sends. Yes, according to the design principle, a protocol or API should do both, but sometimes, you can't do that. Instead, you'll have to choose. I've recently reached the tentative conclusion that it may be a good idea favouring being conservative in what one sends.
</p>
<p>
Good API design explicitly considers <em>contracts</em>. What are the preconditions for invoking an operation? What are the postconditions? Are there any invariants? These questions are relevant far beyond object-oriented design. They are <a href="/2022/10/24/encapsulation-in-functional-programming">equally important in Functional Programming</a>, as well as <a href="/2024/04/15/services-share-schema-and-contract-not-class">in service-oriented design</a>.
</p>
<p>
If you have a type system at your disposal, you can often model pre- and postconditions as types. In practice, however, it frequently turns out that there's more than one way of doing that. You can model an additional precondition with an input type, but you can also model potential errors as a return type. Which option is best?
</p>
<p>
That's what this article is about, and my conjecture is that constraining the input type may be preferable, thus being conservative about what is returned.
</p>
<h3 id="7ef0610940fb4670b7cf12a21bdd725f">
An average example <a href="#7ef0610940fb4670b7cf12a21bdd725f">#</a>
</h3>
<p>
That's all quite abstract, so for the rest of this article, I'll discuss this kind of problem in the context of an example. We'll revisit the <a href="/2020/02/03/non-exceptional-averages">good old example of calculating an average value</a>. This example, however, is only a placeholder for any kind of API design problem. This article is only superficially about designing an API for calculating an <a href="https://en.wikipedia.org/wiki/Average">average</a>. More generally, this is about API design. I like the <em>average</em> example because it's easy to follow, and it does exhibit some characteristics that you can hopefully extrapolate from.
</p>
<p>
In short, what is the contract of the following method?
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">TimeSpan</span> <span style="color:#74531f;">Average</span>(<span style="color:blue;">this</span> <span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">TimeSpan</span>> <span style="font-weight:bold;color:#1f377f;">timeSpans</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sum</span> = <span style="color:#2b91af;">TimeSpan</span>.Zero;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">count</span> = 0;
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">ts</span> <span style="font-weight:bold;color:#8f08c4;">in</span> <span style="font-weight:bold;color:#1f377f;">timeSpans</span>)
{
<span style="font-weight:bold;color:#1f377f;">sum</span> <span style="font-weight:bold;color:#74531f;">+=</span> <span style="font-weight:bold;color:#1f377f;">ts</span>;
<span style="font-weight:bold;color:#1f377f;">count</span>++;
}
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">sum</span> <span style="font-weight:bold;color:#74531f;">/</span> <span style="font-weight:bold;color:#1f377f;">count</span>;
}</pre>
</p>
<p>
What are the preconditions? What are the postconditions? Are there any invariants?
</p>
<p>
Before I answer these questions, I'll offer equivalent code in two other languages. Here it is in <a href="https://fsharp.org/">F#</a>:
</p>
<p>
<pre>let average (timeSpans : TimeSpan seq) =
timeSpans
|> Seq.averageBy (_.Ticks >> double)
|> int64
|> TimeSpan.FromTicks</pre>
</p>
<p>
And in <a href="https://www.haskell.org/">Haskell</a>:
</p>
<p>
<pre><span style="color:#2b91af;">average</span> <span style="color:blue;">::</span> (<span style="color:blue;">Fractional</span> a, <span style="color:blue;">Foldable</span> t) <span style="color:blue;">=></span> t a <span style="color:blue;">-></span> a
average xs = <span style="color:blue;">sum</span> xs / <span style="color:blue;">fromIntegral</span> (<span style="color:blue;">length</span> xs)</pre>
</p>
<p>
These three examples have somewhat different implementations, but the same externally observable behaviour. What is the contract?
</p>
<p>
It seems straightforward: If you input a sequence of values, you get the average of all of those values. Are there any preconditions? Yes, the sequence can't be empty. Given an empty sequence, all three implementations throw an exception. (The Haskell version is a little more nuanced than that, but given an empty list of <a href="https://hackage.haskell.org/package/time/docs/Data-Time-Clock.html#t:NominalDiffTime">NominalDiffTime</a>, it does throw an exception.)
</p>
<p>
Any other preconditions? At least one more: The sequence must be finite. All three functions allow infinite streams as input, but if given one, they will fail to return an average.
</p>
<p>
Are there any postconditions? I can only think of a statement that relates to the preconditions: <em>If</em> the preconditions are fulfilled, the functions will return the correct average value (within the precision allowed by floating-point calculations).
</p>
<p>
All of this, however, is just warming up. We've <a href="/2020/02/03/non-exceptional-averages">been over this ground before</a>.
</p>
<h3 id="7922b269c9924877abe993cb282440a8">
Modelling contracts <a href="#7922b269c9924877abe993cb282440a8">#</a>
</h3>
<p>
Keep in mind that this <em>average</em> function is just an example. Think of it as a stand-in for a procedure that's much more complicated. Think of the most complicated operation in your code base.
</p>
<p>
Not only do real code bases have many complicated operations. Each comes with its own contract, different from the other operations, and if the team isn't explicitly thinking in terms of contracts, these contracts may change over time, as the team adds new features and fixes bugs.
</p>
<p>
It's difficult work to keep track of all those contracts. As I argue in <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, it helps if you can automate away some of that work. One way is having good test coverage. Another is to leverage a static type system, if you're fortunate enough to work in a language that has one. As I've <em>also</em> already covered, <a href="/2022/08/22/can-types-replace-validation">you can't replace all rules with types</a>, but it doesn't mean that using the type system is ineffectual. Quite the contrary. Every part of a contract that you can offload to the type system frees up your brain to think about something else - something more important, hopefully.
</p>
<p>
Sometimes there's no good way to to model a precondition with a type, or <a href="https://buttondown.email/hillelwayne/archive/making-illegal-states-unrepresentable/">perhaps it's just too awkward</a>. At other times, there's really only a single way to address a concern. When it comes to the precondition that you can't pass an infinite sequence to the <em>average</em> function, <a href="/2020/02/03/non-exceptional-averages">change the type so that it takes some finite collection</a> instead. That's not what this article is about, though.
</p>
<p>
Assuming that you've already dealt with the infinite-sequence issue, how do you address the other precondition?
</p>
<h3 id="03c13848cbd54058a3dfed204bc85878">
Error-handling <a href="#03c13848cbd54058a3dfed204bc85878">#</a>
</h3>
<p>
A typical object-oriented move is to introduce a Guard Clause:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">TimeSpan</span> <span style="color:#74531f;">Average</span>(<span style="color:blue;">this</span> <span style="color:#2b91af;">IReadOnlyCollection</span><<span style="color:#2b91af;">TimeSpan</span>> <span style="font-weight:bold;color:#1f377f;">timeSpans</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (!<span style="font-weight:bold;color:#1f377f;">timeSpans</span>.<span style="font-weight:bold;color:#74531f;">Any</span>())
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> <span style="color:#2b91af;">ArgumentOutOfRangeException</span>(
<span style="color:blue;">nameof</span>(<span style="font-weight:bold;color:#1f377f;">timeSpans</span>),
<span style="color:#a31515;">"Can't calculate the average of an empty collection."</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sum</span> = <span style="color:#2b91af;">TimeSpan</span>.Zero;
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">ts</span> <span style="font-weight:bold;color:#8f08c4;">in</span> <span style="font-weight:bold;color:#1f377f;">timeSpans</span>)
<span style="font-weight:bold;color:#1f377f;">sum</span> <span style="font-weight:bold;color:#74531f;">+=</span> <span style="font-weight:bold;color:#1f377f;">ts</span>;
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">sum</span> <span style="font-weight:bold;color:#74531f;">/</span> <span style="font-weight:bold;color:#1f377f;">timeSpans</span>.Count;
}</pre>
</p>
<p>
You could do the same in F#:
</p>
<p>
<pre>let average (timeSpans : TimeSpan seq) =
if Seq.isEmpty timeSpans then
raise (
ArgumentOutOfRangeException(
nameof timeSpans,
"Can't calculate the average of an empty collection."))
timeSpans
|> Seq.averageBy (_.Ticks >> double)
|> int64
|> TimeSpan.FromTicks</pre>
</p>
<p>
You <em>could</em> also replicate such behaviour in Haskell, but it'd be highly unidiomatic. Instead, I'd rather discuss one <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> solution in Haskell, and then back-port it.
</p>
<p>
While you can throw exceptions in Haskell, you typically handle <a href="/2024/01/29/error-categories-and-category-errors">predictable errors</a> with a <a href="https://en.wikipedia.org/wiki/Tagged_union">sum type</a>. Here's a version of the Haskell function equivalent to the above C# code:
</p>
<p>
<pre><span style="color:#2b91af;">average</span> <span style="color:blue;">::</span> (<span style="color:blue;">Foldable</span> t, <span style="color:blue;">Fractional</span> a) <span style="color:blue;">=></span> t a <span style="color:blue;">-></span> <span style="color:#2b91af;">Either</span> <span style="color:#2b91af;">String</span> a
average xs =
<span style="color:blue;">if</span> <span style="color:blue;">null</span> xs
<span style="color:blue;">then</span> Left <span style="color:#a31515;">"Can't calculate the average of an empty collection."</span>
<span style="color:blue;">else</span> Right $ <span style="color:blue;">sum</span> xs / <span style="color:blue;">fromIntegral</span> (<span style="color:blue;">length</span> xs)
</pre>
</p>
<p>
For the readers that don't know the Haskell <a href="https://hackage.haskell.org/package/base">base</a> library by heart, <a href="https://hackage.haskell.org/package/base/docs/Data-List.html#v:null">null</a> is a predicate that checks whether or not a collection is empty. It has nothing to do with <a href="https://en.wikipedia.org/wiki/Null_pointer">null pointers</a>.
</p>
<p>
This variation returns an <a href="/2018/06/11/church-encoded-either">Either</a> value. In practice you shouldn't just return a <code>String</code> as the error value, but rather a strongly-typed value that other code can deal with in a robust manner.
</p>
<p>
On the other hand, in this particular example, there's really only one error condition that the function is able to detect, so you often see a variation where instead of a single error message, such a function just doesn't return anything:
</p>
<p>
<pre><span style="color:#2b91af;">average</span> <span style="color:blue;">::</span> (<span style="color:blue;">Foldable</span> t, <span style="color:blue;">Fractional</span> a) <span style="color:blue;">=></span> t a <span style="color:blue;">-></span> <span style="color:#2b91af;">Maybe</span> a
average xs = <span style="color:blue;">if</span> <span style="color:blue;">null</span> xs <span style="color:blue;">then</span> Nothing <span style="color:blue;">else</span> Just $ <span style="color:blue;">sum</span> xs / <span style="color:blue;">fromIntegral</span> (<span style="color:blue;">length</span> xs)
</pre>
</p>
<p>
This iteration of the function returns a <a href="/2018/03/26/the-maybe-functor">Maybe</a> value, indicating that a return value may or may not be present.
</p>
<h3 id="ccc6a2a1804740a8942feee3b637db90">
Liberal domain <a href="#ccc6a2a1804740a8942feee3b637db90">#</a>
</h3>
<p>
We can back-port this design to F#, where I'd also consider it idiomatic:
</p>
<p>
<pre>let average (timeSpans : IReadOnlyCollection<TimeSpan>) =
if timeSpans.Count = 0 then None else
timeSpans
|> Seq.averageBy (_.Ticks >> double)
|> int64
|> TimeSpan.FromTicks
|> Some</pre>
</p>
<p>
This version returns a <code>TimeSpan option</code> rather than just a <code>TimeSpan</code>. While this may seem to put the burden of error-handling on the caller, nothing has really changed. The fundamental situation is the same. Now the function is just being more <a href="https://peps.python.org/pep-0020/">explicit</a> (more honest, you could say) about the pre- and postconditions. The type system also now insists that you deal with the possibility of error, rather than just hoping that the problem doesn't occur.
</p>
<p>
In C# you can <a href="/2024/01/29/error-categories-and-category-errors">expand the codomain by returning a nullable TimeSpan value</a>, but such an option may not always be available at the language level. Keep in mind that the <code>Average</code> method is just an example standing in for something that may be more complicated. If the original return type is a <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/keywords/reference-types">reference type</a> rather than a <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/builtin-types/value-types">value type</a>, only recent versions of C# allows statically-checked <a href="https://learn.microsoft.com/dotnet/csharp/nullable-references">nullable reference types</a>. What if you're working in an older version of C#, or another language that doesn't have that feature?
</p>
<p>
In that case, you may need to introduce an explicit <a href="/2018/03/26/the-maybe-functor">Maybe</a> class and return that:
</p>
<p>
<pre>public static Maybe<TimeSpan> Average(this IReadOnlyCollection<TimeSpan> timeSpans)
{
if (timeSpans.Count == 0)
return new Maybe<TimeSpan>();
var sum = TimeSpan.Zero;
foreach (var ts in timeSpans)
sum += ts;
return new Maybe<TimeSpan>(sum / timeSpans.Count);
}</pre>
</p>
<p>
Two things are going on here; one is obvious while the other is more subtle. Clearly, all of these alternatives change the static type of the function in order to make the pre- and postconditions more explicit. So far, they've all been loosening the <a href="https://en.wikipedia.org/wiki/Codomain">codomain</a> (the return <a href="/2021/11/15/types-as-sets">type</a>). This suggests a connection with <a href="https://en.wikipedia.org/wiki/Robustness_principle">Postel's law</a>: <em>be conservative in what you send, be liberal in what you accept</em>. These variations are all liberal in what they accept, but it seems that the API design pays the price by also having to widen the set of possible return values. In other words, such designs aren't conservative in what they send.
</p>
<p>
Do we have other options?
</p>
<h3 id="4fb2cc5775c44f80965cacbc37825f27">
Conservative codomain <a href="#4fb2cc5775c44f80965cacbc37825f27">#</a>
</h3>
<p>
Is it possible to instead design the API in such a way that it's conservative in what it returns? Ideally, we'd like it to guarantee that it returns a number. This is possible by making the preconditions even more explicit. I've also <a href="/2020/02/03/non-exceptional-averages">covered that alternative already</a>, so I'm just going to repeat the C# code here without further comments:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">TimeSpan</span> <span style="color:#74531f;">Average</span>(<span style="color:blue;">this</span> <span style="color:#2b91af;">NotEmptyCollection</span><<span style="color:#2b91af;">TimeSpan</span>> <span style="font-weight:bold;color:#1f377f;">timeSpans</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sum</span> = <span style="font-weight:bold;color:#1f377f;">timeSpans</span>.Head;
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">ts</span> <span style="font-weight:bold;color:#8f08c4;">in</span> <span style="font-weight:bold;color:#1f377f;">timeSpans</span>.Tail)
<span style="font-weight:bold;color:#1f377f;">sum</span> <span style="font-weight:bold;color:#74531f;">+=</span> <span style="font-weight:bold;color:#1f377f;">ts</span>;
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">sum</span> <span style="font-weight:bold;color:#74531f;">/</span> <span style="font-weight:bold;color:#1f377f;">timeSpans</span>.Count;
}</pre>
</p>
<p>
This variation promotes another precondition to a type. The precondition that the input collection mustn't be empty can be explicitly modelled with a type. This enables us to be conservative about the codomain. The method now guarantees that it will return a value.
</p>
<p>
This idea is also easily ported to F#:
</p>
<p>
<pre>type NonEmpty<'a> = { Head : 'a; Tail : IReadOnlyCollection<'a> }
let average (timeSpans : NonEmpty<TimeSpan>) =
[ timeSpans.Head ] @ List.ofSeq timeSpans.Tail
|> List.averageBy (_.Ticks >> double)
|> int64
|> TimeSpan.FromTicks</pre>
</p>
<p>
The <code>average</code> function now takes a <code>NonEmpty</code> collection as input, and always returns a proper <code>TimeSpan</code> value.
</p>
<p>
Haskell already comes with a built-in <a href="https://hackage.haskell.org/package/base/docs/Data-List-NonEmpty.html">NonEmpty</a> collection type, and while it oddly doesn't come with an <code>average</code> function, it's easy enough to write:
</p>
<p>
<pre><span style="color:blue;">import</span> <span style="color:blue;">qualified</span> Data.List.NonEmpty <span style="color:blue;">as</span> NE
<span style="color:#2b91af;">average</span> <span style="color:blue;">::</span> <span style="color:blue;">Fractional</span> a <span style="color:blue;">=></span> <span style="color:blue;">NE</span>.<span style="color:blue;">NonEmpty</span> a <span style="color:blue;">-></span> a
average xs = <span style="color:blue;">sum</span> xs / <span style="color:blue;">fromIntegral</span> (NE.<span style="color:blue;">length</span> xs)
</pre>
</p>
<p>
You can find a recent example of using a variation of that function <a href="/2024/04/08/extracting-curve-coordinates-from-a-bitmap">here</a>.
</p>
<h3 id="6f42a53e7c5f4ddb994e85c9d15ec37a">
Choosing between the two alternatives <a href="#6f42a53e7c5f4ddb994e85c9d15ec37a">#</a>
</h3>
<p>
While Postel's law recommends having liberal domains and conservative codomains, in the case of the <em>average</em> API, we can't have both. If we design the API with a liberal input type, the output type has to be liberal as well. If we design with a restrictive input type, the output can be guaranteed. In my experience, you'll often find yourself in such a conundrum. The <em>average</em> API examined in this article is just an example, while the problem occurs often.
</p>
<p>
Given such a choice, what should you choose? Is it even possible to give general guidance on this sort of problem?
</p>
<p>
For decades, I considered such a choice a toss-up. After all, these solutions seem to be equivalent. Perhaps even isomorphic?
</p>
<p>
When I recently began to explore this isomorphism more closely, it dawned on me that there's a small asymmetry in the isomorphism that favours the <em>conservative codomain</em> option.
</p>
<h3 id="976ac8645de44d51a5796be7481b1c12">
Isomorphism <a href="#976ac8645de44d51a5796be7481b1c12">#</a>
</h3>
<p>
An <a href="https://en.wikipedia.org/wiki/Isomorphism">isomorphism</a> is a two-way translation between two representations. You can go back and forth between the two alternatives without loss of information.
</p>
<p>
Is this possible with the two alternatives outlined above? For example, if you have the conservative version, can create the liberal alternative? Yes, you can:
</p>
<p>
<pre><span style="color:#2b91af;">average'</span> <span style="color:blue;">::</span> <span style="color:blue;">Fractional</span> a <span style="color:blue;">=></span> [a] <span style="color:blue;">-></span> <span style="color:#2b91af;">Maybe</span> a
average' = <span style="color:blue;">fmap</span> average . NE.nonEmpty</pre>
</p>
<p>
Not surprisingly, this is trivial in Haskell. If you have the conservative version, you can just map it over a more liberal input.
</p>
<p>
In F# it looks like this:
</p>
<p>
<pre>module NonEmpty =
let tryOfSeq xs =
if Seq.isEmpty xs then None
else Some { Head = Seq.head xs; Tail = Seq.tail xs |> List.ofSeq }
let average' (timeSpans : IReadOnlyCollection<TimeSpan>) =
NonEmpty.tryOfSeq timeSpans |> Option.map average</pre>
</p>
<p>
In C# we can create a liberal overload that calls the conservative method:
</p>
<p>
<pre>public static TimeSpan? Average(this IReadOnlyCollection<TimeSpan> timeSpans)
{
if (timeSpans.Count == 0)
return null;
var arr = timeSpans.ToArray();
return new NotEmptyCollection<TimeSpan>(arr[0], arr[1..]).Average();
}</pre>
</p>
<p>
Here I just used a Guard Clause and explicit construction of the <code>NotEmptyCollection</code>. I could also have added a <code>NotEmptyCollection.TryCreate</code> method, like in the F# and Haskell examples, but I chose the above slightly more imperative style in order to demonstrate that my point isn't tightly coupled to the concept of <a href="/2018/03/22/functors">functors</a>, mapping, and other Functional Programming trappings.
</p>
<p>
These examples highlight how you can trivially make a conservative API look like a liberal API. Is it possible to go the other way? Can you make a liberal API look like a conservative API?
</p>
<p>
Yes and no.
</p>
<p>
Consider the liberal Haskell version of <code>average</code>, shown above; that's the one that returns <code>Maybe a</code>. Can you make a conservative function based on that?
</p>
<p>
<pre><span style="color:#2b91af;">average'</span> <span style="color:blue;">::</span> <span style="color:blue;">Fractional</span> a <span style="color:blue;">=></span> <span style="color:blue;">NE</span>.<span style="color:blue;">NonEmpty</span> a <span style="color:blue;">-></span> a
average' xs = fromJust $ average xs</pre>
</p>
<p>
Yes, this is possible, but only by resorting to the <a href="https://wiki.haskell.org/Partial_functions">partial function</a> <a href="https://hackage.haskell.org/package/base/docs/Data-Maybe.html#v:fromJust">fromJust</a>. I'll explain why that is a problem once we've covered examples in the two other languages, such as F#:
</p>
<p>
<pre>let average' (timeSpans : NonEmpty<TimeSpan>) =
[ timeSpans.Head ] @ List.ofSeq timeSpans.Tail |> average |> Option.get</pre>
</p>
<p>
In this variation, <code>average</code> is the liberal version shown above; the one that returns a <code>TimeSpan option</code>. In order to make a conservative version, the <code>average'</code> function can call the liberal <code>average</code> function, but has to resort to the partial function <code>Option.get</code>.
</p>
<p>
The same issue repeats a third time in C#:
</p>
<p>
<pre>public static TimeSpan Average(this NotEmptyCollection<TimeSpan> timeSpans)
{
return timeSpans.ToList().Average().Value;
}</pre>
</p>
<p>
This time, the partial function is the unsafe <a href="https://learn.microsoft.com/dotnet/api/system.nullable-1.value">Value</a> property, which throws an <code>InvalidOperationException</code> if there's no value.
</p>
<p>
This even violates Microsoft's own design guidelines:
</p>
<blockquote>
<p>
"AVOID throwing exceptions from property getters."
</p>
<footer><cite><a href="https://learn.microsoft.com/dotnet/standard/design-guidelines/property">Krzystof Cwalina and Brad Abrams</a></cite></footer>
</blockquote>
<p>
I've cited Cwalina and Abrams as the authors, since this rule can be found in my 2006 edition of <a href="/ref/fdg">Framework Design Guidelines</a>. This isn't a new insight.
</p>
<p>
While the two alternatives are 'isomorphic enough' that we can translate both ways, the translations are asymmetric in the sense that one is safe, while the other has to resort to an inherently unsafe operation to make it work.
</p>
<h3 id="e10b4b0269b74efa9d89275644c88d8e">
Encapsulation <a href="#e10b4b0269b74efa9d89275644c88d8e">#</a>
</h3>
<p>
I've called the operations <code>fromJust</code>, <code>Option.get</code>, and <code>Value</code> <em>partial</em>, and only just now used the word <em>unsafe</em>. You may protest that neither of the three examples are unsafe in practice, since we know that the input is never empty. Thus, we know that the liberal function will always return a value, and therefore it's safe to call a partial function, even though these operations are unsafe in the general case.
</p>
<p>
While that's true, consider how the burden shifts. When you want to promote a conservative variant to a liberal variant, you can rely on all the operations being total. On the other hand, if you want to make a liberal variant look conservative, the onus is on you. None of the three type systems on display here can perform that analysis for you.
</p>
<p>
This may not be so bad when the example is as simple as taking the average of a collection of numbers, but does it scale? What if the operation you're invoking is much more complicated? Can you still be sure that you safely invoke a partial function on the return value?
</p>
<p>
As <a href="/ctfiyh">Code That Fits in Your Head</a> argues, procedures quickly become so complicated that they no longer fit in your head. If you don't have well-described and patrolled contracts, you don't know what the postconditions are. You can't trust the return values from method calls, or even the state of the objects you passed as arguments. This tend to lead to <a href="/2013/07/08/defensive-coding">defensive coding</a>, where you write code that checks the state of everything all too often.
</p>
<p>
The remedy is, as always, good old <a href="/encapsulation-and-solid">encapsulation</a>. In this case, check the preconditions at the beginning, and capture the result of that check in an object or type that is guaranteed to be always valid. This goes beyond <a href="https://blog.janestreet.com/effective-ml-video/">making illegal states unrepresentable</a> because it also works with <a href="https://www.hillelwayne.com/post/constructive/">predicative</a> types. Once you're past the Guard Clauses, you don't have to check the preconditions <em>again</em>.
</p>
<p>
This kind of thinking illustrates why you need a multidimensional view on API design. As useful as Postel's law sometimes is, it doesn't address all problems. In fact, it turned out to be unhelpful in this context, while another perspective proves more fruitful. Encapsulation is the art and craft of designing APIs in such a way that they suggest or even compels correct interactions. The more I think of this, the more it strikes me that a <em>ranking</em> is implied: Preconditions are more important than postconditions, because if the preconditions are unfulfilled, you can't trust the postconditions, either.
</p>
<h3 id="35bb4abc87b8402da82c82c5baa71235">
Mapping <a href="#35bb4abc87b8402da82c82c5baa71235">#</a>
</h3>
<p>
What's going on here? One perspective is to view <a href="/2021/11/15/types-as-sets">types as sets</a>. In the <em>average</em> example, the function maps from one set to another:
</p>
<p>
<img src="/content/binary/mapping-from-collections-to-reals.png" alt="Mapping from the set of collections to the set of real numbers.">
</p>
<p>
Which sets are they? We can think of the <em>average</em> function as a mapping from the set of non-empty collections of numbers to the set of <a href="https://en.wikipedia.org/wiki/Real_number">real numbers</a>. In programming, we can't represent real numbers, so instead, the left set is going to be the set of all the non-empty collections the computer or the language can represent and hold in (virtual) memory, and the right-hand set is the set of all the possible numbers of whichever type you'd like (32-bit signed integers, <a href="https://en.wikipedia.org/wiki/Double-precision_floating-point_format">64-bit floating-point numbers</a>, 8-bit unsigned integers, etc.).
</p>
<p>
In reality, the left-hand set is much larger than the set to the right.
</p>
<p>
Drawing all those arrows quickly becomes awkward , so instead, we may <a href="/2021/11/22/functions-as-pipes">draw each mapping as a pipe</a>. Such a pipe also corresponds to a function. Here's an intermediate step in such a representation:
</p>
<p>
<img src="/content/binary/mapping-from-collections-to-reals-transparent-pipe.png" alt="Mapping from one set to the other, drawn inside a transparent pipe.">
</p>
<p>
One common element is, however, missing from the left set. Which one?
</p>
<h3 id="8920c8df3b9f4f978a5c560d5c9cdcb4">
Pipes <a href="#8920c8df3b9f4f978a5c560d5c9cdcb4">#</a>
</h3>
<p>
The above mapping corresponds to the conservative variation of the function. It's a total function that maps all values in the domain to a value in the codomain. It accomplishes this trick by explicitly constraining the domain to only those elements on which it's defined. Due to the preconditions, that excludes the empty collection, which is therefore absent from the left set.
</p>
<p>
What if we also want to allow the empty collection to be a valid input?
</p>
<p>
Unless we find ourselves in some special context where it makes sense to define a 'default average value', we can't map an empty collection to any meaningful number. Rather, we'll have to map it to some special value, such as <code>Nothing</code>, <code>None</code>, or <code>null</code>:
</p>
<p>
<img src="/content/binary/mapping-with-none-channel-transparent-pipe.png" alt="Mapping the empty collection to null in a pipe separate, but on top of, the proper function pipe.">
</p>
<p>
This extra pipe is free, because it's supplied by the <a href="/2018/03/26/the-maybe-functor">Maybe functor</a>'s mapping (<code>Select</code>, <code>map</code>, <code>fmap</code>).
</p>
<p>
What happens if we need to go the other way? If the function is the liberal variant that also maps the empty collection to a special element that indicates a missing value?
</p>
<p>
<img src="/content/binary/mapping-from-all-collections-to-reals-transparent-pipe.png" alt="Mapping all collections, including the empty collection, to the set of real numbers.">
</p>
<p>
In this case, it's much harder to disentangle the mappings. If you imagine that a liquid flows through the pipes, we can try to be careful and avoid 'filling up' the pipe.
</p>
<p>
<img src="/content/binary/pipe-partially-filled-with-liquid.png" alt="Pipe partially filled with liquid.">
</p>
<p>
The liquid represents the data that we <em>do</em> want to transmit through the pipe. As this illustration suggests, we now have to be careful that nothing goes wrong. In order to catch just the right outputs on the right side, you need to know how high the liquid may go, and attach a an 'flat-top' pipe to it:
</p>
<p>
<img src="/content/binary/pipe-composed-with-open-top-pipe.png" alt="Pipe composed with open-top pipe.">
</p>
<p>
As this illustration tries to get across, this kind of composition is awkward and error-prone. What's worse is that you need to know how high the liquid is going to get on the right side. This depends on what actually goes on inside the pipe, and what kind of input goes into the left-hand side.
</p>
<p>
This is a metaphor. The longer the pipe is, the more difficult it gets to keep track of that knowledge. The stubby little pipe in these illustrations may correspond to the <em>average</em> function, which is an operation that easily fits in our heads. It's not too hard to keep track of the preconditions, and how they map to postconditions.
</p>
<p>
Thus, turning such a small liberal function into a conservative function is possible, but already awkward. If the operation is complicated, you can no longer keep track of all the details of how the inputs relate to the outputs.
</p>
<h3 id="18b682600e5a4d1baf542b0cd1dcda7f">
Additive extensibility <a href="#18b682600e5a4d1baf542b0cd1dcda7f">#</a>
</h3>
<p>
This really shouldn't surprise us. Most programming languages come with all sorts of facilities that enable <em>extensibility</em>: The ability to <em>add</em> more functionality, more behaviour, more capabilities, to existing building blocks. Conversely, few languages come with <em>removability</em> facilities. You can't, commonly, declare that an object is an instance of a class, <em>except</em> one method, or that a function is just like another function, <em>except</em> that it doesn't accept a particular subset of input.
</p>
<p>
This explains why we can safely make a conservative function liberal, but why it's difficult to make a liberal function conservative. This is because making a conservative function liberal <em>adds</em> functionality, while making a liberal function conservative attempts to remove functionality.
</p>
<h3 id="6545430a4e1f47a38e121aee1a342b40">
Conjecture <a href="#6545430a4e1f47a38e121aee1a342b40">#</a>
</h3>
<p>
All this leads me to the following conjecture: When faced with a choice between two versions of an API, where one has a liberal domain, and the other a conservative codomain, choose the design with the conservative codomain.
</p>
<p>
If you need the liberal version, you can create it from the conservative operation. The converse need not be true.
</p>
<h3 id="312f8f4df0c44390a43f4f0f92d2c9d6">
Conclusion <a href="#312f8f4df0c44390a43f4f0f92d2c9d6">#</a>
</h3>
<p>
Postel's law encourages us to be liberal with what we accept, but conservative with what we return. This is a good design heuristic, but sometimes you're faced with mutually exclusive alternatives. If you're liberal with what you accept, you'll also need to be too loose with what you return, because there are input values that you can't handle. On the other hand, sometimes the only way to be conservative with the output is to also be restrictive when it comes to input.
</p>
<p>
Given two such alternatives, which one should you choose?
</p>
<p>
This article conjectures that you should choose the conservative alternative. This isn't a political statement, but simply a result of the conservative design being the smaller building block. From a small building block, you can compose something bigger, whereas from a bigger unit, you can't easily extract something smaller that's still robust and useful.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Service compatibility is determined based on policyhttps://blog.ploeh.dk/2024/04/29/service-compatibility-is-determined-based-on-policy2024-04-29T11:12:00+00:00Mark Seemann
<div id="post">
<p>
<em>A reading of the fourth Don Box tenet, with some commentary.</em>
</p>
<p>
This article is part of a series titled <a href="/2024/03/04/the-four-tenets-of-soa-revisited">The four tenets of SOA revisited</a>. In each of these articles, I'll pull one of <a href="https://en.wikipedia.org/wiki/Don_Box">Don Box</a>'s <em>four tenets of service-oriented architecture</em> (SOA) out of the <a href="https://learn.microsoft.com/en-us/archive/msdn-magazine/2004/january/a-guide-to-developing-and-running-connected-systems-with-indigo">original MSDN Magazine article</a> and add some of my own commentary. If you're curious why I do that, I cover that in the introductory article.
</p>
<p>
In this article, I'll go over the fourth tenet, quoting from the MSDN Magazine article unless otherwise indicated.
</p>
<h3 id="57382e74449c40409a7d73d91bc5fd14">
Service compatibility is determined based on policy <a href="#57382e74449c40409a7d73d91bc5fd14">#</a>
</h3>
<p>
The fourth tenet is the forgotten one. I could rarely remember exactly what it included, but it does give me an opportunity to bring up a few points about compatibility. The articles said:
</p>
<blockquote>
<p>
Object-oriented designs often confuse structural compatibility with semantic compatibility. Service-orientation deals with these two axes separately. Structural compatibility is based on contract and schema and can be validated (if not enforced) by machine-based techniques (such as packet-sniffing, validating firewalls). Semantic compatibility is based on explicit statements of capabilities and requirements in the form of policy.
</p>
<p>
Every service advertises its capabilities and requirements in the form of a machine-readable policy expression. Policy expressions indicate which conditions and guarantees (called assertions) must hold true to enable the normal operation of the service. Policy assertions are identified by a stable and globally unique name whose meaning is consistent in time and space no matter which service the assertion is applied to. Policy assertions may also have parameters that qualify the exact interpretation of the assertion. Individual policy assertions are opaque to the system at large, which enables implementations to apply simple propositional logic to determine service compatibility.
</p>
</blockquote>
<p>
As you can tell, this description is the shortest of the four. This is also the point where I begin to suspect that my reading of <a href="/2024/04/15/services-share-schema-and-contract-not-class">the third tenet</a> may deviate from what Don Box originally had in mind.
</p>
<p>
This tenet is also the most baffling to me. As I understand it, the motivation behind the four tenets was to describe assumptions about the kind of systems that people would develop with <a href="https://en.wikipedia.org/wiki/Windows_Communication_Foundation">Windows Communication Foundation</a> (WCF), or <a href="https://en.wikipedia.org/wiki/SOAP">SOAP</a> in general.
</p>
<p>
While I worked with WCF for a decade, the above description doesn't ring a bell. Reading it now, the description of <em>policy</em> sounds more like a system such as <a href="https://clojure.org/about/spec">clojure.spec</a>, although that's not something I know much about either. I don't recall WCF ever having a machine-readable policy subsystem, and if it had, I never encountered it.
</p>
<p>
It does seem, however, as though what I interpret as <em>contract</em>, Don Box called <em>policy</em>.
</p>
<p>
Despite my confusion, the word <em>compatibility</em> is worth discussing, regardless of whether that was what Don Box meant. A well-designed service is one where you've explicitly considered forwards and backwards compatibility.
</p>
<h3 id="77bf7878d5304ba08f686cbfbc6cb941">
Versioning <a href="#77bf7878d5304ba08f686cbfbc6cb941">#</a>
</h3>
<p>
Planning for forwards and backwards compatibility does <em>not</em> imply that you're expected to be able to predict the future. It's fine if you have so much experience developing and maintaining online systems that you may have enough foresight to plan for certain likely changes that you may have to make in the future, but that's not what I have in mind.
</p>
<p>
Rather, what you <em>should</em> do is to have a system that enables you to detect breaking changes before you deploy them. Furthermore you should have a strategy for how to deal with the perceived necessity to introduce breaking changes.
</p>
<p>
The most effective strategy that I know of is to employ explicit versioning, particularly <em>message versioning</em>. You <em>can</em> version an entire service as one indivisible block, but I often find it more useful to version at the message level. If you're designing a <a href="https://en.wikipedia.org/wiki/REST">REST</a> API, for example, you can <a href="/2015/06/22/rest-implies-content-negotiation">take advantage of Content Negotiation</a>.
</p>
<p>
If you like, you can use <a href="https://semver.org/">Semantic Versioning</a> as a versioning scheme, but for services, the thing that mostly matters is the major version. Thus, you may simply label your messages with the version numbers <em>1</em>, <em>2</em>, etc.
</p>
<p>
If you already have a published service without explicit message version information, then you can still retrofit versioning afterwards. <a href="/2023/12/04/serialization-with-and-without-reflection">Imagine that your existing data looks like this</a>:
</p>
<p>
<pre>{
<span style="color:#2e75b6;">"singleTable"</span>: {
<span style="color:#2e75b6;">"capacity"</span>: 16,
<span style="color:#2e75b6;">"minimalReservation"</span>: 10
}
}</pre>
</p>
<p>
This <a href="https://json.org/">JSON</a> document has no explicit version information, but you can interpret that as implying that the document has the 'default' version, which is always <em>1:</em>
</p>
<p>
<pre>{
<span style="color:#2e75b6;">"singleTable"</span>: {
<span style="color:#2e75b6;">"version"</span>: 1,
<span style="color:#2e75b6;">"capacity"</span>: 16,
<span style="color:#2e75b6;">"minimalReservation"</span>: 10
}
}</pre>
</p>
<p>
If you later realize that you need to make a breaking change, you can do that by increasing the (major) version:
</p>
<p>
<pre>{
<span style="color:#2e75b6;">"singleTable"</span>: {
<span style="color:#2e75b6;">"version"</span>: 2,
<span style="color:#2e75b6;">"id"</span>: 12,
<span style="color:#2e75b6;">"capacity"</span>: 16,
<span style="color:#2e75b6;">"minimalReservation"</span>: 10
}
}</pre>
</p>
<p>
Recipients can now look for the <code>version</code> property to learn how to interpret the rest of the message, and failing to find it, infer that this is version <em>1</em>.
</p>
<p>
As Don Box wrote, in a service-oriented system, you can't just update all systems in a single coordinated release. Therefore, you must never break compatibility. Versioning enables you to move forward in a way that does break with the past, but without breaking existing clients.
</p>
<p>
Ultimately, you <a href="/2020/06/01/retiring-old-service-versions">may attempt to retire old service versions</a>, but be ready to keep them around for a long time.
</p>
<p>
For more of my thoughts about backwards compatibility, see <a href="/2021/12/13/backwards-compatibility-as-a-profunctor">Backwards compatibility as a profunctor</a>.
</p>
<h3 id="ad9cec4f54c243d08fc71d38ff13ac17">
Conclusion <a href="#ad9cec4f54c243d08fc71d38ff13ac17">#</a>
</h3>
<p>
The fourth tenet is the most nebulous, and I wonder if it was ever implemented. If it was, I'm not aware of it. Even so, compatibility is an important component of service design, so I took the opportunity to write about that. In most cases, it pays to think explicitly about message versioning.
</p>
<p>
I have the impression that Don Box had something in mind more akin to what I call <em>contract</em>. Whether you call it one thing or another, it stands to reason that you often need to attach extra rules to simple types. The <em>schema</em> may define an input value as a number, but the service does require that this particular number is a natural number. Or that a string is really a <a href="https://en.wikipedia.org/wiki/ISO_8601">proper encoding</a> of a date. Perhaps you call that <em>policy</em>. I call it <em>contract</em>. In any case, clearly communicating such expectations is important for systems to be compatible.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Fitting a polynomial to a set of pointshttps://blog.ploeh.dk/2024/04/22/fitting-a-polynomial-to-a-set-of-points2024-04-22T05:35:00+00:00Mark Seemann
<div id="post">
<p>
<em>The story of a fiasco.</em>
</p>
<p>
This is the second in a small series of articles titled <a href="/2024/04/01/trying-to-fit-the-hype-cycle">Trying to fit the hype cycle</a>. In the introduction, I've described the exercise I had in mind: Determining a formula, or at least a <a href="https://en.wikipedia.org/wiki/Piecewise">piecewise</a> <a href="https://en.wikipedia.org/wiki/Function_(mathematics)">function</a>, for the <a href="https://en.wikipedia.org/wiki/Gartner_hype_cycle">Gartner hype cycle</a>. This, to be clear, is an entirely frivolous exercise with little practical application.
</p>
<p>
In the previous article, I <a href="/2024/04/08/extracting-curve-coordinates-from-a-bitmap">extracted a set of <em>(x, y)</em> coordinates from a bitmap</a>. In this article, I'll showcase my failed attempt at fitting the data to a <a href="https://en.wikipedia.org/wiki/Polynomial">polynomial</a>.
</p>
<h3 id="36f71204d90b44a8b39a7d8103f46cca">
Failure <a href="#36f71204d90b44a8b39a7d8103f46cca">#</a>
</h3>
<p>
I've already revealed that I failed to accomplish what I set out to do. Why should you read on, then?
</p>
<p>
You don't have to, and I can't predict the many reasons my readers have for occasionally swinging by. Therefore, I can't tell you why <em>you</em> should keep reading, but I <em>can</em> tell you why I'm writing this article.
</p>
<p>
This blog is a mix of articles that I write because readers ask me interesting questions, and partly, it's my personal research-and-development log. In that mode, I write about things that I've learned, and I write in order to learn. One can learn from failure as well as from success.
</p>
<p>
I'm not <em>that</em> connected to 'the' research community (if such a thing exists), but I'm getting the sense that there's a general tendency in academia that researchers rarely publish their negative results. This could be a problem, because this means that the rest of us never learn about the <em>thousands of ways that don't work</em>.
</p>
<p>
Additionally, in 'the' programming community, we also tend to boast our victories and hide our failures. More than one podcast (sorry about the <a href="https://en.wikipedia.org/wiki/Weasel_word">weasel words</a>, but I don't remember which ones) have discussed how this gives young programmers the wrong impression of what programming is like. It is, indeed, a process of much trial and error, but usually, we only publish our polished, final result.
</p>
<p>
Well, I did manage to produce code to fit a polynomial to the Gartner hype cycle, but I never managed to get a <em>good</em> fit.
</p>
<h3 id="34ad323fc07f48709fb86c4045bd5892">
The big picture <a href="#34ad323fc07f48709fb86c4045bd5892">#</a>
</h3>
<p>
I realize that I have a habit of burying the lede when I write technical articles. I don't know if I've picked up that tendency from <a href="https://fsharp.org/">F#</a>, which does demand that you define a value or function before you can use it. This, by the way, <a href="/2015/04/15/c-will-eventually-get-all-f-features-right">is a good feature</a>.
</p>
<p>
Here, I'll try to do it the other way around, and start with the big picture:
</p>
<p>
<pre>data = numpy.loadtxt(<span style="color:#a31515;">'coords.txt'</span>, delimiter=<span style="color:#a31515;">','</span>)
x = data[:, 0]
t = data[:, 1]
w = fit_polynomial(x, t, 9)
plot_fit(x, t, w)</pre>
</p>
<p>
This, by the way, is a <a href="https://www.python.org/">Python</a> script, and it opens with these imports:
</p>
<p>
<pre><span style="color:blue;">import</span> numpy
<span style="color:blue;">import</span> matplotlib.pyplot <span style="color:blue;">as</span> plt</pre>
</p>
<p>
The first line of code reads the <a href="https://en.wikipedia.org/wiki/Comma-separated_values">CSV</a> file into the <code>data</code> variable. The first column in that file contains all the <em>x</em> values, and the second column the <em>y</em> values. <a href="/ref/rogers-girolami">The book</a> that I've been following uses <em>t</em> for the data, rather than <em>y</em>. (Now that I think about it, I believe that this may only be because it works from an example in which the data to be fitted are <a href="https://en.wikipedia.org/wiki/100_metres">100 m dash</a> times, denoted <em>t</em>.)
</p>
<p>
Once the script has extracted the data, it calls the <code>fit_polynomial</code> function to produce a set of weights <code>w</code>. The constant <code>9</code> is the degree of polynomial to fit, although I think that I've made an off-by-one error so that the result is only a eighth-degree polynomial.
</p>
<p>
Finally, the code plots the original data together with the polynomial:
</p>
<p>
<img src="/content/binary/hype-8th-degree-poly.png" alt="Gartner hype cycle and a eighth-degree fitted polynomial.">
</p>
<p>
The green dots are the <em>(x, y)</em> coordinates that I extracted in the previous article, while the red curve is the fitted eighth-degree polynomial. Even though we're definitely in the realm of over-fitting, it doesn't reproduce the Gartner hype cycle.
</p>
<p>
I've even arrived at the value <code>9</code> after some trial and error. After all, I wasn't trying to do any real science here, so over-fitting is definitely allowed. Even so, <code>9</code> seems to be the best fit I can achieve. With lover values, like <code>8</code>, below, the curve deviates too much:
</p>
<p>
<img src="/content/binary/hype-7th-degree-poly.png" alt="Gartner hype cycle and a seventh-degree fitted polynomial.">
</p>
<p>
The value <code>10</code> looks much like <code>9</code>, but above that (<code>11</code>), the curve completely disconnects from the data, it seems:
</p>
<p>
<img src="/content/binary/hype-10th-degree-poly.png" alt="Gartner hype cycle and a tenth-degree fitted polynomial.">
</p>
<p>
I'm not sure why it does this, to be honest. I would have thought that the more degrees you added, the more (over-)fitted the curve would be. Apparently, this is not so, or perhaps I made a mistake in my code.
</p>
<h3 id="183834d3c95544d9a185b5ba84bba9a1">
Calculating the weights <a href="#183834d3c95544d9a185b5ba84bba9a1">#</a>
</h3>
<p>
The <code>fit_polynomial</code> function calculates the polynomial coefficients using a <a href="https://en.wikipedia.org/wiki/Linear_algebra">linear algebra</a> formula that I've found in at least two text books. Numpy makes it easy to invert, transpose, and multiply matrices, so the formula itself is just a one-liner. Here it is in the entire context of the function, though:
</p>
<p>
<pre><span style="color:blue;">def</span> <span style="color:#2b91af;">fit_polynomial</span>(x, t, degree):
<span style="color:#a31515;">"""
Fits a polynomial to the given data.
Parameters
----------
x : Array of shape [n_samples]
t : Array of shape [n_samples]
degree : degree of the polynomial
Returns
-------
w : Array of shape [degree + 1]
"""</span>
<span style="color:green;"># This expansion creates a matrix, so we name that with an upper-case letter</span>
<span style="color:green;"># rather than a lower-case letter, which is used for vectors.</span>
X = expand(x.reshape((<span style="color:blue;">len</span>(x), 1)), degree)
<span style="color:blue;">return</span> numpy.linalg.inv(X.T @ X) @ X.T @ t</pre>
</p>
<p>
This may look daunting, but is really just two lines of code. The rest is <a href="https://en.wikipedia.org/wiki/Docstring">docstring</a> and a comment.
</p>
<p>
The above-mentioned formula is the last line of code. The one before that expands the input data <code>t</code> from a simple one-dimensional array to a matrix of those values squared, cubed, etc. That's how you use the <a href="https://en.wikipedia.org/wiki/Least_squares">least squares</a> method if you want to fit it to a polynomial of arbitrary degree.
</p>
<h3 id="782c5cbd64de43878eea4a3ddfcdf755">
Expansion <a href="#782c5cbd64de43878eea4a3ddfcdf755">#</a>
</h3>
<p>
The <code>expand</code> function looks like this:
</p>
<p>
<pre><span style="color:blue;">def</span> <span style="color:#2b91af;">expand</span>(x, degree):
<span style="color:#a31515;">"""
Expands the given array to polynomial elements of the given degree.
Parameters
----------
x : Array of shape [n_samples, 1]
degree : degree of the polynomial
Returns
-------
Xp : Array of shape [n_samples, degree + 1]
"""</span>
Xp = numpy.ones((<span style="color:blue;">len</span>(x), 1))
<span style="color:blue;">for</span> i <span style="color:blue;">in</span> <span style="color:blue;">range</span>(1, degree + 1):
Xp = numpy.hstack((Xp, numpy.power(x, i)))
<span style="color:blue;">return</span> Xp</pre>
</p>
<p>
The function begins by creating a column vector of ones, here illustrated with only three rows:
</p>
<p>
<pre>>>> Xp = numpy.ones((3, 1))
>>> Xp
array([[1.],
[1.],
[1.]])</pre>
</p>
<p>
It then proceeds to loop over as many degrees as you've asked it to, each time adding a column to the <code>Xp</code> matrix. Here's an example of doing that up to a power of three, on example input <code>[1,2,3]</code>:
</p>
<p>
<pre>>>> x = numpy.array([1,2,3]).reshape((3, 1))
>>> x
array([[1],
[2],
[3]])
>>> Xp = numpy.hstack((Xp, numpy.power(x, 1)))
>>> Xp
array([[1., 1.],
[1., 2.],
[1., 3.]])
>>> Xp = numpy.hstack((Xp, numpy.power(x, 2)))
>>> Xp
array([[1., 1., 1.],
[1., 2., 4.],
[1., 3., 9.]])
>>> Xp = numpy.hstack((Xp, numpy.power(x, 3)))
>>> Xp
array([[ 1., 1., 1., 1.],
[ 1., 2., 4., 8.],
[ 1., 3., 9., 27.]])</pre>
</p>
<p>
Once it's done looping, the <code>expand</code> function returns the resulting <code>Xp</code> matrix.
</p>
<h3 id="cfb27c6067d2486c95836dc61484b2a0">
Plotting <a href="#cfb27c6067d2486c95836dc61484b2a0">#</a>
</h3>
<p>
Finally, here's the <code>plot_fit</code> procedure:
</p>
<p>
<pre><span style="color:blue;">def</span> <span style="color:#2b91af;">plot_fit</span>(x, t, w):
<span style="color:#a31515;">"""
Plots the polynomial with the given weights and the data.
Parameters
----------
x : Array of shape [n_samples]
t : Array of shape [n_samples]
w : Array of shape [degree + 1]
"""</span>
xs = numpy.linspace(x[0], x[0]+<span style="color:blue;">len</span>(x), 100)
ys = numpy.polyval(w[::-1], xs)
plt.plot(xs, ys, <span style="color:#a31515;">'r'</span>)
plt.scatter(x, t, s=10, c=<span style="color:#a31515;">'g'</span>)
plt.show()</pre>
</p>
<p>
This is fairly standard pyplot code, so I don't have much to say about it.
</p>
<h3 id="3730027db8614b01960cf5379d8add78">
Conclusion <a href="#3730027db8614b01960cf5379d8add78">#</a>
</h3>
<p>
When I started this exercise, I'd hoped that I could get close to the Gartner hype cycle by over-fitting the model to some ridiculous polynomial degree. This turned out not to be the case, for reasons that I don't fully understand. As I increase the degree, the curve begins to deviate from the data.
</p>
<p>
I can't say that I'm a data scientist or a statistician of any skill, so it's possible that my understanding is still too shallow. Perhaps I'll return to this article later and marvel at the ineptitude on display here.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="4bef47fad250438a94c2f1de28dc330d">
<div class="comment-author"><a href="https://www.mit.edu/~amu/">Aaron M. Ucko</a> <a href="#4bef47fad250438a94c2f1de28dc330d">#</a></div>
<div class="comment-content">
<p>
I suspect that increasing the degree wound up backfiring by effectively putting too much weight on the right side, whose flatness clashed with the increasingly steep powers you were trying to mix in.
A vertically offset damped sinusoid might make a better starting point for modeling, though identifying its parameters wouldn't be quite as straightforward.
One additional wrinkle there is that you want to level fully off after the valley; you could perhaps make that happen by plugging a scaled arctangent or something along those lines into the sinusoid.
</p>
<p>
Incidentally, a neighboring post in my feed reader was about a new release of an open-source data analysis and curve fitting program (QSoas) that might help if you don't want to take such a DIY approach.
</p>
</div>
<div class="comment-date">2024-05-16 02:37 UTC</div>
</div>
<div class="comment" id="831d9f6360da4cbaa2ab5a08315b532a">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#831d9f6360da4cbaa2ab5a08315b532a">#</a></div>
<div class="comment-content">
<p>
Aaron, thank you for writing. In retrospect, it becomes increasingly clear to me why this doesn't work. This highlights, I think, why it's a good idea to sometimes do stupid exercises like this one. You learn something from it, even when you fail.
</p>
</div>
<div class="comment-date">2024-05-22 6:15 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Services share schema and contract, not classhttps://blog.ploeh.dk/2024/04/15/services-share-schema-and-contract-not-class2024-04-15T07:25:00+00:00Mark Seemann
<div id="post">
<p>
<em>A reading of the third Don Box tenet, with some commentary.</em>
</p>
<p>
This article is part of a series titled <a href="/2024/03/04/the-four-tenets-of-soa-revisited">The four tenets of SOA revisited</a>. In each of these articles, I'll pull one of <a href="https://en.wikipedia.org/wiki/Don_Box">Don Box</a>'s <em>four tenets of service-oriented architecture</em> (SOA) out of the <a href="https://learn.microsoft.com/en-us/archive/msdn-magazine/2004/january/a-guide-to-developing-and-running-connected-systems-with-indigo">original MSDN Magazine article</a> and add some of my own commentary. If you're curious why I do that, I cover that in the introductory article.
</p>
<p>
In this article, I'll go over the third tenet, quoting from the MSDN Magazine article unless otherwise indicated.
</p>
<h3 id="3a56e1083c454dec90a28f8c7ff44d5f">
Services share schema and contract, not class <a href="#3a56e1083c454dec90a28f8c7ff44d5f">#</a>
</h3>
<p>
Compared to <a href="/2024/03/25/services-are-autonomous">the second tenet</a>, the following description may at first seem more dated. Here's what the article said:
</p>
<blockquote>
<p>
Object-oriented programming encourages developers to create new abstractions in the form of classes. Most modern development environments not only make it trivial to define new classes, modern IDEs do a better job guiding you through the development process as the number of classes increases (as features like IntelliSense® provide a more specific list of options for a given scenario).
</p>
<p>
Classes are convenient abstractions as they share both structure and behavior in a single named unit. Service-oriented development has no such construct. Rather, services interact based solely on schemas (for structures) and contracts (for behaviors). Every service advertises a contract that describes the structure of messages it can send and/or receive as well as some degree of ordering constraints over those messages. This strict separation between structure and behavior vastly simplifies deployment, as distributed object concepts such as marshal-by-value require a common execution and security environment which is in direct conflict with the goals of autonomous computing.
</p>
<p>
Services do not deal in types or classes per se; rather, only with machine readable and verifiable descriptions of the legal "ins and outs" the service supports. The emphasis on machine verifiability and validation is important given the inherently distributed nature of how a service-oriented application is developed and deployed. Unlike a traditional class library, a service must be exceedingly careful about validating the input data that arrives in each message. Basing the architecture on machine-validatible schema and contract gives both developers and infrastructure the hints they need to protect the integrity of an individual service as well as the overall application as a whole.
</p>
<p>
Because the contract and schema for a given service are visible over broad ranges of both space and time, service-orientation requires that contracts and schema remain stable over time. In the general case, it is impossible to propagate changes in schema and/or contract to all parties who have ever encountered a service. For that reason, the contract and schema used in service-oriented designs tend to have more flexibility than traditional object-oriented interfaces. It is common for services to use features such as XML element wildcards (like xsd:any) and optional SOAP header blocks to evolve a service in ways that do not break already deployed code.
</p>
</blockquote>
<p>
With its explicit discussion of <a href="https://en.wikipedia.org/wiki/XML">XML</a>, <a href="https://en.wikipedia.org/wiki/SOAP">SOAP</a>, and <a href="https://en.wikipedia.org/wiki/XML_schema">XSD</a>, this description may seem more stuck in 2004 than the two first tenets.
</p>
<p>
I'll cover the most obvious consequence first.
</p>
<h3 id="7ddbc0f966b74c499d0414de8741e454">
At the boundaries... <a href="#7ddbc0f966b74c499d0414de8741e454">#</a>
</h3>
<p>
In the MSDN article, the four tenets guide the design of <a href="https://en.wikipedia.org/wiki/Windows_Communication_Foundation">Windows Communication Foundation</a> (WCF) - a technology that in 2004 was under development, but still not completed. While SOAP already existed as a platform-independent protocol, WCF was a .NET endeavour. Most developers using the Microsoft platform at the time were used to some sort of binary protocol, such as <a href="https://en.wikipedia.org/wiki/Distributed_Component_Object_Model">DCOM</a> or <a href="https://en.wikipedia.org/wiki/.NET_Remoting">.NET Remoting</a>. Thus, it makes sense that Don Box was deliberately explicit that this was <em>not</em> how SOA (or WCF) was supposed to work.
</p>
<p>
In fact, since SOAP is platform-independent, you could write a web service in one language (say, <a href="https://www.java.com/">Java</a>) and consume it with a different language (e.g. <a href="https://en.wikipedia.org/wiki/C%2B%2B">C++</a>). WCF was Microsoft's SOAP technology for .NET.
</p>
<p>
If you squint enough that you don't see the explicit references to XML or SOAP, however, the description still applies. Today, you may exchange data with <a href="https://www.json.org">JSON</a> over <a href="https://en.wikipedia.org/wiki/REST">REST</a>, <a href="https://en.wikipedia.org/wiki/Protocol_Buffers">Protocol Buffers</a> via <a href="https://en.wikipedia.org/wiki/GRPC">gRPC</a>, or something else, but it's still common to have a communications protocol that is independent of specific service implementations. A service may be written in <a href="https://www.python.org/">Python</a>, <a href="https://www.haskell.org/">Haskell</a>, <a href="https://en.wikipedia.org/wiki/C_(programming_language)">C</a>, or any other language that supports the wire format. As this little list suggests, the implementation language doesn't even have to be object-oriented.
</p>
<p>
In fact,
</p>
<ul>
<li><a href="/2011/05/31/AttheBoundaries,ApplicationsareNotObject-Oriented">At the Boundaries, Applications are Not Object-Oriented</a></li>
<li><a href="/2022/05/02/at-the-boundaries-applications-arent-functional">At the boundaries, applications aren't functional</a></li>
<li><a href="/2023/10/16/at-the-boundaries-static-types-are-illusory">At the boundaries, static types are illusory</a></li>
</ul>
<p>
A formal <a href="https://en.wikipedia.org/wiki/Interface_description_language">interface definition language</a> (IDL) may enable you to automate serialization and deserialization, but these are usually constrained to defining the shape of data and operations. Don Box talks about validation, and <a href="/2022/08/22/can-types-replace-validation">types don't replace validation</a> - particularly if you allow <code>xsd:any</code>. That particular remark is quite at odds with the notion that a formal schema definition is necessary, or even desirable.
</p>
<p>
And indeed, today we often see JSON-based REST APIs that are more loosely defined. Even so, the absence of a machine-readable IDL doesn't entail the absence of a schema. As <a href="https://lexi-lambda.github.io/">Alexis King</a> wrote related to the static-versus-dynamic-types debate, <a href="https://lexi-lambda.github.io/blog/2020/01/19/no-dynamic-type-systems-are-not-inherently-more-open/">dynamic type systems are not inherently more open</a>. A similar argument can be made about schema. Regardless of whether or not a formal specification exists, a service always has a de-facto schema.
</p>
<p>
To be honest, though, when I try to interpret what this and the next tenet seem to imply, an IDL may have been all that Don Box had in mind. By <em>schema</em> he may only have meant XSD, and by <em>contract</em>, he may only have meant SOAP. More broadly speaking, this notion of <em>contract</em> may entail nothing more than a list of named operations, and references to schemas that indicate what input each operation takes, and what output it returns.
</p>
<p>
What I have in mind with the rest of this article may be quite an embellishment on that notion. In fact, my usual interpretation of the word <em>contract</em> may be more aligned with what Don Box calls <em>policy</em>. Thus, if you want a very literal reading of the four tenets, what comes next may fit better with the fourth tenet, that service compatibility is determined based on policy.
</p>
<p>
Regardless of whether you think that the following discussion belongs here, or in the next article, I'll assert that it's paramount to designing and developing useful and maintainable web services.
</p>
<h3 id="99146c84ab1d4d439879970bc17ca728">
Encapsulation <a href="#99146c84ab1d4d439879970bc17ca728">#</a>
</h3>
<p>
If we, once more, ignore the particulars related to SOAP and XML, we may rephrase the notion of schema and contract as follows. Schema describes the shape of data: Is it a number, a string, a tuple, or a combination of these? Is there only one, or several? Is the data composed from smaller such definitions? Does the composition describe the combination of several such definitions, or does it describe mutually exclusive alternatives?
</p>
<p>
Compliant data may be encoded as objects or data structures in memory, or serialized to JSON, XML, <a href="https://en.wikipedia.org/wiki/Comma-separated_values">CSV</a>, byte streams, etc. We may choose to call a particular agglomeration of data a <em>message</em>, which we may pass from one system to another. The <a href="/2024/03/11/boundaries-are-explicit">first tenet</a> already used this metaphor.
</p>
<p>
You can't, however, just pass arbitrary valid messages from one system to another. Certain operations allow certain data, and may promise to return other kinds of messages. In additions to the schema, we also need to describe a <em>contract</em>.
</p>
<p>
What's a contract? If you consult <a href="/ref/oosc">Object-Oriented Software Construction</a>, a contract stipulates invariants, pre- and postconditions for various operations.
</p>
<p>
Preconditions state what must be true before an operation can take place. This often puts the responsibility on the caller to ensure that the system is in an appropriate state, and that the message that it intends to pass to the other system is valid according to that state.
</p>
<p>
Postconditions, on the other hand, detail what the caller can expect in return. This includes guarantees about response messages, but may also describe the posterior state of the system.
</p>
<p>
Invariants, finally, outline what is always true about the system.
</p>
<p>
Although such a description of a contract originates from a book about object-oriented design, it's <a href="/2022/10/24/encapsulation-in-functional-programming">useful in other areas, too, such as functional programming</a>. It strikes me that it applies equally well in the context of service-orientation.
</p>
<p>
The combination of contract and well-described message structure is, in other words, <a href="/encapsulation-and-solid">encapsulation</a>. There's nothing wrong with that: It works. If you actually apply it as a design principle, that is.
</p>
<h3 id="7d42dff045a24a4c89a894f8ed5d5166">
Conclusion <a href="#7d42dff045a24a4c89a894f8ed5d5166">#</a>
</h3>
<p>
The third SOA tenet emphasizes that only data travels over service boundaries. In order to communicate effectively, services must agree on the shape of data, and which operations are legal when. While they exchange data, however, they don't share address space, or even internal representation.
</p>
<p>
One service may be written in <a href="https://fsharp.org/">F#</a> and the client in <a href="https://clojure.org/">Clojure</a>. Even so, it's important that they have a shared understanding of what is possible, and what is not. The more explicit you, as a service owner, can be, the better.
</p>
<p>
<strong>Next:</strong> <a href="/2024/04/29/service-compatibility-is-determined-based-on-policy">Service compatibility is determined based on policy</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Extracting curve coordinates from a bitmaphttps://blog.ploeh.dk/2024/04/08/extracting-curve-coordinates-from-a-bitmap2024-04-08T05:32:00+00:00Mark Seemann
<div id="post">
<p>
<em>Another example of using Haskell as an ad-hoc scripting language.</em>
</p>
<p>
This article is part of a short series titled <a href="/2024/04/01/trying-to-fit-the-hype-cycle">Trying to fit the hype cycle</a>. In the first article, I outlined what it is that I'm trying to do. In this article, I'll describe how I extract a set of <em>x</em> and <em>y</em> coordinates from this bitmap:
</p>
<p>
<img src="/content/binary/hype-cycle-cleaned.png" alt="Gartner hype cycle.">
</p>
<p>
(Actually, this is scaled-down version of the image. The file I work with is a bit larger.)
</p>
<p>
As I already mentioned in the previous article, these days there are online tools for just about everything. Most likely, there's also an online tool that will take a bitmap like that and return a set of <em>(x, y)</em> coordinates.
</p>
<p>
Since I'm doing this for the programming exercise, I'm not interested in that. Rather, I'd like to write a little <a href="https://www.haskell.org/">Haskell</a> script to do it for me.
</p>
<h3 id="2ed7ee24ae244f3688dc8a362e149c17">
Module and imports <a href="#2ed7ee24ae244f3688dc8a362e149c17">#</a>
</h3>
<p>
Yes, I wrote Haskell <em>script</em>. As I've described before, with good type inference, <a href="/2024/02/05/statically-and-dynamically-typed-scripts">a statically typed language can be as good for scripting as a dynamic one</a>. Just as might be the case with, say, a <a href="https://www.python.org/">Python</a> script, you'll be iterating, trying things out until finally the script settles into its final form. What I present here is the result of my exercise. You should imagine that I made lots of mistakes underway, tried things that didn't work, commented out code and print statements, imported modules I eventually didn't need, etc. Just like I imagine you'd also do with a script in a dynamically typed language. At least, that's how I write Python, when I'm forced to do that.
</p>
<p>
In other words, the following is far from the result of perfect foresight, but rather the equilibrium into which the script settled.
</p>
<p>
I named the module <code>HypeCoords</code>, because the purpose of it is to extract the <em>(x, y)</em> coordinates from the above <a href="https://en.wikipedia.org/wiki/Gartner_hype_cycle">Gartner hype cycle</a> image. These are the imports it turned out that I ultimately needed:
</p>
<p>
<pre><span style="color:blue;">module</span> HypeCoords <span style="color:blue;">where</span>
<span style="color:blue;">import</span> <span style="color:blue;">qualified</span> Data.List.NonEmpty <span style="color:blue;">as</span> NE
<span style="color:blue;">import</span> Data.List.NonEmpty (<span style="color:blue;">NonEmpty</span>((:|)))
<span style="color:blue;">import</span> Codec.Picture
<span style="color:blue;">import</span> Codec.Picture.Types</pre>
</p>
<p>
The <code>Codec.Picture</code> modules come from the <a href="https://hackage.haskell.org/package/JuicyPixels">JuicyPixels</a> package. This is what enables me to read a <code>.png</code> file and extract the pixels.
</p>
<h3 id="e0f66bef266249ea8a9546c0edf0b15c">
Black and white <a href="#e0f66bef266249ea8a9546c0edf0b15c">#</a>
</h3>
<p>
If you look at the above bitmap, you may notice that it has some vertical lines in a lighter grey than the curve itself. My first task, then, is to get rid of those. The easiest way to do that is to convert the image to a black-and-white bitmap, with no grey scale.
</p>
<p>
Since this is a one-off exercise, I could easily do that with a bitmap editor, but on the other hand, I thought that this was a good first task to give myself. After all, I didn't know the JuicyPixels library <em>at all</em>, so this was an opportunity to start with a task just a notch simpler than the one that was my actual goal.
</p>
<p>
I thought that the easiest way to convert to a black-and-white image would be to turn all pixels white if they are lighter than some threshold, and black otherwise.
</p>
<p>
A <a href="https://en.wikipedia.org/wiki/PNG">PNG</a> file has more information than I need, so I first converted the image to an 8-bit <a href="https://en.wikipedia.org/wiki/RGB_color_model">RGB</a> bitmap. Even though the above image looks as though it's entirely grey scale, each pixel is actually composed of three colours. In order to compare a pixel with a threshold, I needed a single measure of how light or dark it is.
</p>
<p>
That turned out to be about as simple as it sounds: Just take the average of the three colours. Later, I'd need a function to compute the average for another reason, so I made it a reusable function:
</p>
<p>
<pre><span style="color:#2b91af;">average</span> <span style="color:blue;">::</span> <span style="color:blue;">Integral</span> a <span style="color:blue;">=></span> <span style="color:blue;">NE</span>.<span style="color:blue;">NonEmpty</span> a <span style="color:blue;">-></span> a
average nel = <span style="color:blue;">sum</span> nel `div` <span style="color:blue;">fromIntegral</span> (NE.<span style="color:blue;">length</span> nel)</pre>
</p>
<p>
It's a bit odd that the Haskell <a href="https://hackage.haskell.org/package/base">base</a> library doesn't come with such a function (at least to my knowledge), but anyway, this one is specialized to do integer division. Notice that this function computes only <a href="/2020/02/03/non-exceptional-averages">non-exceptional averages</a>, since it requires the input to be a <a href="https://hackage.haskell.org/package/base/docs/Data-List-NonEmpty.html">NonEmpty</a> list. No division-by-zero errors here, please!
</p>
<p>
Once I'd computed a pixel average and compared it to a threshold value, I wanted to replace it with either black or white. In order to make the code more readable I defined two named constants:
</p>
<p>
<pre><span style="color:#2b91af;">black</span> <span style="color:blue;">::</span> <span style="color:blue;">PixelRGB8</span>
black = PixelRGB8 <span style="color:blue;">minBound</span> <span style="color:blue;">minBound</span> <span style="color:blue;">minBound</span>
<span style="color:#2b91af;">white</span> <span style="color:blue;">::</span> <span style="color:blue;">PixelRGB8</span>
white = PixelRGB8 <span style="color:blue;">maxBound</span> <span style="color:blue;">maxBound</span> <span style="color:blue;">maxBound</span></pre>
</p>
<p>
With that in place, converting to black-and-white is only a few more lines of code:
</p>
<p>
<pre><span style="color:#2b91af;">toBW</span> <span style="color:blue;">::</span> <span style="color:blue;">PixelRGB8</span> <span style="color:blue;">-></span> <span style="color:blue;">PixelRGB8</span>
toBW (PixelRGB8 r g b) =
<span style="color:blue;">let</span> threshold = 192 :: Integer
lum = average (<span style="color:blue;">fromIntegral</span> r :| [<span style="color:blue;">fromIntegral</span> g, <span style="color:blue;">fromIntegral</span> b])
<span style="color:blue;">in</span> <span style="color:blue;">if</span> lum <= threshold <span style="color:blue;">then</span> black <span style="color:blue;">else</span> white</pre>
</p>
<p>
I arrived at the threshold of <code>192</code> after a bit of trial-and-error. That's dark enough that the light vertical lines fall to the <code>white</code> side, while the real curve becomes <code>black</code>.
</p>
<p>
What remained was to glue the parts together to save the black-and-white file:
</p>
<p>
<pre><span style="color:#2b91af;">main</span> <span style="color:blue;">::</span> <span style="color:#2b91af;">IO</span> ()
main = <span style="color:blue;">do</span>
readResult <- readImage <span style="color:#a31515;">"hype-cycle-cleaned.png"</span>
<span style="color:blue;">case</span> readResult <span style="color:blue;">of</span>
Left msg -> <span style="color:blue;">putStrLn</span> msg
Right img -> <span style="color:blue;">do</span>
<span style="color:blue;">let</span> bwImg = pixelMap toBW $ convertRGB8 img
writePng <span style="color:#a31515;">"hype-cycle-bw.png"</span> bwImg</pre>
</p>
<p>
The <a href="https://hackage.haskell.org/package/JuicyPixels/docs/Codec-Picture.html#v:convertRGB8">convertRGB8</a> function comes from JuicyPixels.
</p>
<p>
The <code>hype-cycle-bw.png</code> picture unsurprisingly looks like this:
</p>
<p>
<img src="/content/binary/hype-cycle-bw.png" alt="Black-and-white Gartner hype cycle.">
</p>
<p>
Ultimately, I didn't need the black-and-white bitmap <em>file</em>. I just wrote the script to create the file in order to be able to get some insights into what I was doing. Trust me, I made a lot of stupid mistakes along the way, and among other issues had some <a href="https://stackoverflow.com/q/77952762/126014">'fun' with integer overflows</a>.
</p>
<h3 id="2bd5b7d3dbd44e4a93594030bf5faca5">
Extracting image coordinates <a href="#2bd5b7d3dbd44e4a93594030bf5faca5">#</a>
</h3>
<p>
Now I had a general feel for how to work with the JuicyPixels library. It still required quite a bit of spelunking through the documentation before I found a useful API to extract all the pixels from a bitmap:
</p>
<p>
<pre><span style="color:#2b91af;">pixelCoordinates</span> <span style="color:blue;">::</span> <span style="color:blue;">Pixel</span> a <span style="color:blue;">=></span> <span style="color:blue;">Image</span> a <span style="color:blue;">-></span> [((<span style="color:#2b91af;">Int</span>, <span style="color:#2b91af;">Int</span>), a)]
pixelCoordinates = pixelFold (\acc x y px -> ((x,y),px):acc) <span style="color:blue;">[]</span></pre>
</p>
<p>
While this is, after all, just a one-liner, I'm surprised that something like this doesn't come in the box. It returns a list of tuples, where the first element contains the pixel coordinates (another tuple), and the second element the pixel information (e.g. the RGB value).
</p>
<h3 id="2b2a30265e1b4577b845bb8d235a97eb">
One y value per x value <a href="#2b2a30265e1b4577b845bb8d235a97eb">#</a>
</h3>
<p>
There were a few more issues to be addressed. The black curve in the black-and-white bitmap is thicker than a single pixel. This means that for each <em>x</em> value, there will be several black pixels. In order to do linear regression, however, we need a single <em>y</em> value per <em>x</em> value.
</p>
<p>
One easy way to address that concern is to calculate the average <em>y</em> value for each <em>x</em> value. This may not always be the best choice, but as far as we can see in the above black-and-white image, it doesn't look as though there's any noise left in the picture. This means that we don't have to worry about outliers pulling the average value away from the curve. In other words, finding the average <em>y</em> value is an easy way to get what we need.
</p>
<p>
<pre><span style="color:#2b91af;">averageY</span> <span style="color:blue;">::</span> <span style="color:blue;">Integral</span> b <span style="color:blue;">=></span> <span style="color:blue;">NonEmpty</span> (a, b) <span style="color:blue;">-></span> (a, b)
averageY nel = (<span style="color:blue;">fst</span> $ NE.<span style="color:blue;">head</span> nel, average $ <span style="color:blue;">snd</span> <$> nel)</pre>
</p>
<p>
The <code>averageY</code> function converts a <code>NonEmpty</code> list of tuples to a single tuple. <em>Watch out!</em> The input tuples are not the 'outer' tuples that <code>pixelCoordinates</code> returns, but rather a list of actual pixel coordinates. Each tuple is a set of coordinates, but since the function never manipulates the <em>x</em> coordinate, the type of the first element is just unconstrained <code>a</code>. It can literally be anything, but will, in practice, be an integer.
</p>
<p>
The assumption is that the input is a small list of coordinates that all share the same <em>x</em> coordinate, such as <code>(42, 99) :| [(42, 100), (42, 102)]</code>. The function simply returns a single tuple that it creates on the fly. For the first element of the return tuple, it picks the <code>head</code> tuple from the input (<code>(42, 99)</code> in the example), and then that tuple's <code>fst</code> element (<code>42</code>). For the second element, the function averages all the <code>snd</code> elements (<code>99</code>, <code>100</code>, and <code>102</code>) to get <code>100</code> (integer division, you may recall):
</p>
<p>
<pre>ghci> averageY ((42, 99) :| [(42, 100), (42, 102)])
(42,100)</pre>
</p>
<p>
What remains is to glue together the building blocks.
</p>
<h3 id="305d72ac4dd94c41aa23f6581f7aa716">
Extracting curve coordinates <a href="#305d72ac4dd94c41aa23f6581f7aa716">#</a>
</h3>
<p>
A few more steps were required, but these I just composed <em>in situ</em>. I found no need to define them as individual functions.
</p>
<p>
The final composition looks like this:
</p>
<p>
<pre><span style="color:#2b91af;">main</span> <span style="color:blue;">::</span> <span style="color:#2b91af;">IO</span> ()
main = <span style="color:blue;">do</span>
readResult <- readImage <span style="color:#a31515;">"hype-cycle-cleaned.png"</span>
<span style="color:blue;">case</span> readResult <span style="color:blue;">of</span>
Left msg -> <span style="color:blue;">putStrLn</span> msg
Right img -> <span style="color:blue;">do</span>
<span style="color:blue;">let</span> bwImg = pixelMap toBW $ convertRGB8 img
<span style="color:blue;">let</span> blackPixels =
<span style="color:blue;">fst</span> <$> <span style="color:blue;">filter</span> ((black ==) . <span style="color:blue;">snd</span>) (pixelCoordinates bwImg)
<span style="color:blue;">let</span> h = imageHeight bwImg
<span style="color:blue;">let</span> lineCoords = <span style="color:blue;">fmap</span> (h -) . averageY <$> NE.groupAllWith <span style="color:blue;">fst</span> blackPixels
<span style="color:blue;">writeFile</span> <span style="color:#a31515;">"coords.txt"</span> $
<span style="color:blue;">unlines</span> $ (\(x,y) -> <span style="color:blue;">show</span> x ++ <span style="color:#a31515;">","</span> ++ <span style="color:blue;">show</span> y) <$> lineCoords</pre>
</p>
<p>
The first lines of code, until and including <code>let bwImg</code>, are identical to what you've already seen.
</p>
<p>
We're only interested in the black pixels, so the <code>main</code> action uses the standard <code>filter</code> function to keep only those that are equal to the <code>black</code> constant value. Once the white pixels are gone, we no longer need the pixel information. The expression that defines the <code>blackPixels</code> value finally (remember, you read Haskell code from right to left) throws away the pixel information by only retaining the <code>fst</code> element. That's the tuple that contains the coordinates. You may want to refer back to the type signature of <code>pixelCoordinates</code> to see what I mean.
</p>
<p>
The <code>blackPixels</code> value has the type <code>[(Int, Int)]</code>.
</p>
<p>
Two more things need to happen. One is to group the pixels together per <em>x</em> value so that we can use <code>averageY</code>. The other is that we want the coordinates as normal Cartesian coordinates, and right now, they're in screen coordinates.
</p>
<p>
When working with bitmaps, it's quite common that pixels are measured out from the top left corner, instead of from the bottom left corner. It's not difficult to flip the coordinates, but we need to know the height of the image:
</p>
<p>
<pre><span style="color:blue;">let</span> h = imageHeight bwImg</pre>
</p>
<p>
The <a href="https://hackage.haskell.org/package/JuicyPixels/docs/Codec-Picture.html#v:imageHeight">imageHeight</a> function is another JuicyPixels function.
</p>
<p>
Because I sometimes get carried away, I write the code in a 'nice' compact style that could be more readable. I accomplished both of the above remaining tasks with a single line of code:
</p>
<p>
<pre><span style="color:blue;">let</span> lineCoords = <span style="color:blue;">fmap</span> (h -) . averageY <$> NE.groupAllWith <span style="color:blue;">fst</span> blackPixels</pre>
</p>
<p>
This first groups the coordinates according to <em>x</em> value, so that all coordinates that share an <em>x</em> value are collected in a single <code>NonEmpty</code> list. This means that we can map all of those groups over <code>averageY</code>. Finally, the expression flips from screen coordinates to Cartesian coordinates by subtracting the <em>y</em> coordinate from the height <code>h</code>.
</p>
<p>
The final <code>writeFile</code> expression writes the coordinates to a text file as <a href="https://en.wikipedia.org/wiki/Comma-separated_values">comma-separated values</a>. The first ten lines of that file looks like this:
</p>
<p>
<pre>9,13
10,13
11,13
12,14
13,15
14,15
15,16
16,17
17,17
18,18
...</pre>
</p>
<p>
Do these points plot the Gartner hype cycle?
</p>
<h3 id="eed82185c8cd43dda147bce839454ca9">
Sanity checking by plotting the coordinates <a href="#eed82185c8cd43dda147bce839454ca9">#</a>
</h3>
<p>
To check whether the coordinates look useful, we could plot them. If I wanted to use a few more hours, I could probably figure out how to do that with JuicyPixels as well, but on the other hand, I already know how to do that with Python:
</p>
<p>
<pre>data = numpy.loadtxt(<span style="color:#a31515;">'coords.txt'</span>, delimiter=<span style="color:#a31515;">','</span>)
x = data[:, 0]
t = data[:, 1]
plt.scatter(x, t, s=10, c=<span style="color:#a31515;">'g'</span>)
plt.show()</pre>
</p>
<p>
That produces this plot:
</p>
<p>
<img src="/content/binary/hype-cycle-pyplot.png" alt="Coordinates plotted with Python.">
</p>
<p>
LGTM.
</p>
<h3 id="9836c90de9ac487f9295acfb667090b7">
Conclusion <a href="#9836c90de9ac487f9295acfb667090b7">#</a>
</h3>
<p>
In this article, you've seen how a single Haskell script can extract curve coordinates from a bitmap. The file is 41 lines all in all, including module declaration and white space. This article shows every single line in that file, apart from some blank lines.
</p>
<p>
I loaded the file into GHCi and ran the <code>main</code> action in order to produce the CSV file.
</p>
<p>
I did spend a few hours looking around in the JuicyPixels documentation before I'd identified the functions that I needed. All in all I used some hours on this exercise. I didn't keep track of time, but I guess that I used more than three, but probably fewer than six, hours on this.
</p>
<p>
This was the successful part of the overall exercise. Now onto the fiasco.
</p>
<p>
<strong>Next:</strong> <a href="/2024/04/22/fitting-a-polynomial-to-a-set-of-points">Fitting a polynomial to a set of points</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Trying to fit the hype cyclehttps://blog.ploeh.dk/2024/04/01/trying-to-fit-the-hype-cycle2024-04-01T07:14:00+00:00Mark Seemann
<div id="post">
<p>
<em>An amateur tries his hand at linear modelling.</em>
</p>
<p>
About a year ago, I was contemplating a conference talk I was going to give. Although I later abandoned the idea for other reasons, for a few days I was thinking about using the <a href="https://en.wikipedia.org/wiki/Gartner_hype_cycle">Gartner hype cycle</a> for an animation. What I had in mind would require me to draw the curve in a way that would enable me to zoom in and out. Vector graphics would be much more useful for that job than a bitmap.
</p>
<p>
<img src="/content/binary/hype-cycle-cleaned.png" alt="Gartner hype cycle.">
</p>
<p>
Along the way, I considered if there was a <a href="https://en.wikipedia.org/wiki/Function_(mathematics)">function</a> that would enable me to draw it on the fly. A few web searches revealed the <a href="https://stats.stackexchange.com/">Cross Validated</a> question <a href="https://stats.stackexchange.com/q/268293/397132">Is there a linear/mixture function that can fit the Gartner hype curve?</a> So I wasn't the first person to have that idea, but at the time I found it, the question was effectively dismissed without a proper answer. Off topic, dontcha know?
</p>
<p>
A web search also seems to indicate the existence of a few research papers where people have undertaken this task, but there's not a lot about it. True, the Gartner hype cycle isn't a real function, but it sounds like a relevant exercise in statistics, if one's into that kind of thing.
</p>
<p>
Eventually, for my presentation, I went with another way to illustrate what I wanted to say, so for half I year, I didn't think more about it.
</p>
<h3 id="f3bfad5e6e80409e9703c80b1c98099b">
Linear regression? <a href="#f3bfad5e6e80409e9703c80b1c98099b">#</a>
</h3>
<p>
Recently, however, I was following a course in mathematical analysis of data, and among other things, I learned how to fit a line to data. Not just a straight line, but any degree of <a href="https://en.wikipedia.org/wiki/Polynomial">polynomial</a>. So I thought that perhaps it'd be an interesting exercise to see if I could fit the hype cycle to some high-degree polynomial - even though I do realize that the hype cycle isn't a real function, and neither does it look like a straight polynomial function.
</p>
<p>
In order to fit a polynomial to the curve, I needed some data, so my first task was to convert an image to a series of data points.
</p>
<p>
I'm sure that there are online tools and apps that offer to do that for me, but the whole point of this was that I wanted to learn how to tackle problems like these. It's like <a href="/2020/01/13/on-doing-katas">doing katas</a>. The journey is the goal.
</p>
<p>
This turned out to be an exercise consisting of two phases so distinct that I wrote them in two different languages.
</p>
<ul>
<li><a href="/2024/04/08/extracting-curve-coordinates-from-a-bitmap">Extracting curve coordinates from a bitmap</a></li>
<li><a href="/2024/04/22/fitting-a-polynomial-to-a-set-of-points">Fitting a polynomial to a set of points</a></li>
</ul>
<p>
As the articles will reveal, the first part went quite well, while the other was, essentially, a fiasco.
</p>
<h3 id="fc418f36d6c74aa2a056b48489be7162">
Conclusion <a href="#fc418f36d6c74aa2a056b48489be7162">#</a>
</h3>
<p>
There's not much point in finding a formula for the Gartner hype cycle, but the goal of this exercise was, for me, to tinker with some new techniques to see if I could learn from doing the exercise. And I <em>did</em> learn something.
</p>
<p>
In the next articles in this series, I'll go over some of the details.
</p>
<p>
<strong>Next:</strong> <a href="/2024/04/08/extracting-curve-coordinates-from-a-bitmap">Extracting curve coordinates from a bitmap</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Services are autonomoushttps://blog.ploeh.dk/2024/03/25/services-are-autonomous2024-03-25T08:31:00+00:00Mark Seemann
<div id="post">
<p>
<em>A reading of the second Don Box tenet, with some commentary.</em>
</p>
<p>
This article is part of a series titled <a href="/2024/03/04/the-four-tenets-of-soa-revisited">The four tenets of SOA revisited</a>. In each of these articles, I'll pull one of <a href="https://en.wikipedia.org/wiki/Don_Box">Don Box</a>'s <em>four tenets of service-oriented architecture</em> (SOA) out of the <a href="https://learn.microsoft.com/en-us/archive/msdn-magazine/2004/january/a-guide-to-developing-and-running-connected-systems-with-indigo">original MSDN Magazine article</a> and add some of my own commentary. If you're curious why I do that, I cover that in the introductory article.
</p>
<p>
In this article, I'll go over the second tenet. The quotes are from the MSDN Magazine article unless otherwise indicated.
</p>
<h3 id="5021be8510304665ba3a8b9d9287a531">
Services are autonomous <a href="#5021be8510304665ba3a8b9d9287a531">#</a>
</h3>
<p>
Compared with <a href="/2024/03/11/boundaries-are-explicit">the first tenet</a>, you'll see that Don Box had more to say about this one. I, conversely, have less to add. First, here's what the article said:
</p>
<blockquote>
<p>
Service-orientation mirrors the real world in that it does not assume the presence of an omniscient or omnipotent oracle that has awareness and control over all parts of a running system. This notion of service autonomy appears in several facets of development, the most obvious place being the area of deployment and versioning.
</p>
<p>
Object-oriented programs tend to be deployed as a unit. Despite the Herculean efforts made in the 1990s to enable classes to be independently deployed, the discipline required to enable object-oriented interaction with a component proved to be impractical for most development organizations. When coupled with the complexities of versioning object-oriented interfaces, many organizations have become extremely conservative in how they roll out object-oriented code. The popularity of the XCOPY deployment and private assemblies capabilities of the .NET Framework is indicative of this trend.
</p>
<p>
Service-oriented development departs from object-orientation by assuming that atomic deployment of an application is the exception, not the rule. While individual services are almost always deployed atomically, the aggregate deployment state of the overall system/application rarely stands still. It is common for an individual service to be deployed long before any consuming applications are even developed, let alone deployed into the wild. Amazon.com is one example of this build-it-and-they-will-come philosophy. There was no way the developers at Amazon could have known the multitude of ways their service would be used to build interesting and novel applications.
</p>
<p>
It is common for the topology of a service-oriented application to evolve over time, sometimes without direct intervention from an administrator or developer. The degree to which new services may be introduced into a service-oriented system depends on both the complexity of the service interaction and the ubiquity of services that interact in a common way. Service-orientation encourages a model that increases ubiquity by reducing the complexity of service interactions. As service-specific assumptions leak into the public facade of a service, fewer services can reasonably mimic that facade and stand in as a reasonable substitute.
</p>
<p>
The notion of autonomous services also impacts the way failures are handled. Objects are deployed to run in the same execution context as the consuming application. Service-oriented designs assume that this situation is the exception, not the rule. For that reason, services expect that the consuming application can fail without notice and often without any notification. To maintain system integrity, service-oriented designs use a variety of techniques to deal with partial failure modes. Techniques such as transactions, durable queues, and redundant deployment and failover are quite common in a service-oriented system.
</p>
<p>
Because many services are deployed to function over public networks (such as the Internet), service-oriented development assumes not only that incoming message data may be malformed but also that it may have been transmitted for malicious purposes. Service-oriented architectures protect themselves by placing the burden of proof on all message senders by requiring applications to prove that all required rights and privileges have been granted. Consistent with the notion of service autonomy, service-oriented architectures invariably rely on administratively managed trust relationships in order to avoid per-service authentication mechanisms common in classic Web applications.
</p>
</blockquote>
<p>
Again, I'd like to highlight how general these ideas are. Once lifted out of the context of <a href="https://en.wikipedia.org/wiki/Windows_Communication_Foundation">Windows Communication Foundation</a>, all of this applies more broadly.
</p>
<p>
Perhaps a few details now seem dated, but in general I find that this description holds up well.
</p>
<h3 id="f921c1135edd46d688729181489a9c73">
Wildlife <a href="#f921c1135edd46d688729181489a9c73">#</a>
</h3>
<p>
It's striking that someone in 2004 observed that big, complex, coordinated releases are impractical. Even so, it doesn't seem as though adopting a network-based technology and architecture in itself solves that problem. <a href="/2012/12/18/ZookeepersmustbecomeRangers">I wrote about that in 2012</a>, and I've seen <a href="https://youtu.be/jdliXz70NtM?si=NRSHFqaVHMvWnOPF">Adam Ralph make a similar observation</a>. Many organizations inadvertently create distributed monoliths. I think that this often stems from a failure of heeding the tenet that services are autonomous.
</p>
<p>
I've experienced the following more than once. A team of developers rely on a service. As they take on a new feature, they realize that the way things are currently modelled prevents them from moving forward. Typical examples include mismatched cardinalities. For example, a customer record has a single active address, but the new feature requires that customers may have multiple active addresses. It could be that a customer has a permanent address, but also a summerhouse.
</p>
<p>
It is, however, the other service that defines how customer addresses are modelled, so the development team contacts the service team to discuss a breaking change. The service team agrees to the breaking change, but this means that the service and the relying client team now have to coordinate when they deploy the new versions of their software. The service is no longer autonomous.
</p>
<p>
I've already discussed this kind of problem in <a href="/2023/11/27/synchronizing-concurrent-teams">a previous article</a>, and as Don Box also implied, this discussion is related to the question of versioning, which we'll get back to when covering the fourth tenet.
</p>
<h3 id="11028dabd5a540cf9160c06c3e1b283c">
Transactions <a href="#11028dabd5a540cf9160c06c3e1b283c">#</a>
</h3>
<p>
It may be worthwhile to comment on this sentence:
</p>
<blockquote>
<p>
Techniques such as transactions, durable queues, and redundant deployment and failover are quite common in a service-oriented system.
</p>
</blockquote>
<p>
Indeed, but particularly regarding database transactions, a service may use them <em>internally</em> (typically leveraging a database engine like <a href="https://en.wikipedia.org/wiki/Microsoft_SQL_Server">SQL Server</a>, <a href="https://en.wikipedia.org/wiki/Oracle_Database">Oracle</a>, <a href="https://en.wikipedia.org/wiki/PostgreSQL">PostgreSQL</a>, etc.), but not across services. Around the time Don Box wrote the original MSDN Magazine article an extension to SOAP colloquially known as <em>WS-Death Star</em> was in the works, and it included <a href="https://en.wikipedia.org/wiki/WS-Transaction">WS Transaction</a>.
</p>
<p>
I don't know whether Don Box had something like this in mind when he wrote the word <em>transaction</em>, but in my experience, you don't want to go there. If you need to, you can make use of database transactions to keep your own service <a href="https://en.wikipedia.org/wiki/ACID">ACID</a>-consistent, but don't presume that this is possible with multiple autonomous services.
</p>
<p>
As always, even if a catchphrase such as <em>services are autonomous</em> sounds good, it's always illuminating to understand that there are trade-offs involved - and what they are. Here, a major trade-off is that you need to think about error-handling in a different way. If you don't already know how to address such concerns, look up <em>lock-free transactions</em> and <a href="https://en.wikipedia.org/wiki/Eventual_consistency">eventual consistency</a>. As Don Box also mentioned, durable queues are often part of such a solution, as is <a href="https://en.wikipedia.org/wiki/Idempotence">idempotence</a>.
</p>
<h3 id="7dc237c5f67c42c8b2c439140fc7a05b">
Validation <a href="#7dc237c5f67c42c8b2c439140fc7a05b">#</a>
</h3>
<p>
From this discussion follows that an autonomous service should, ideally, exist independently of the software ecosystem in which it exists. While an individual service can't impose its will on its surroundings, it can, and should, behave in a consistent and correct manner.
</p>
<p>
This does include deliberate consistency for the service itself. An autonomous service may make use of ACID or eventual consistency as the service owner deems appropriate.
</p>
<p>
It should also treat all input as suspect, until proven otherwise. Input validation is an important part of service design. It is my belief that <a href="/2020/12/14/validation-a-solved-problem">validation is a solved problem</a>, but that doesn't mean that you don't have to put in the work. You should consider correctness, versioning, as well as <a href="https://en.wikipedia.org/wiki/Robustness_principle">Postel's law</a>.
</p>
<h3 id="2482dbc1c20248fdb61a7347abce49ef">
Security <a href="#2482dbc1c20248fdb61a7347abce49ef">#</a>
</h3>
<p>
A similar observation relates to security. Some services (particularly read-only services) may allow for anonymous access, but if a service needs to authenticate or authorize requests, consider how this is done in an autonomous manner. Looking up account information in a centralized database isn't the autonomous way. If a service does that, it now relies on the account database, and is no longer autonomous.
</p>
<p>
Instead, rely on <a href="https://en.wikipedia.org/wiki/Claims-based_identity">claims-based identity</a>. In my experience, <a href="https://en.wikipedia.org/wiki/OAuth">OAuth</a> with <a href="https://en.wikipedia.org/wiki/JSON_Web_Token">JWT</a> is usually fine.
</p>
<p>
If your service needs to know something about the user that only an external source can tell it, don't look it up in an external system. Instead, demand that it's included in the JWT as a claim. Do you need to validate the age of the user? Require a <em>date-of-birth</em> or <em>age</em> claim. Do you need to know if the request is made on behalf of a system administrator? Demand a list of <em>role</em> claims.
</p>
<h3 id="75412f1e737a45dfaaf11c54e28013fa">
Conclusion <a href="#75412f1e737a45dfaaf11c54e28013fa">#</a>
</h3>
<p>
The second of Don Box's four tenets of SOA state that services should be autonomous. At first glance, you may think that all this means is that a service shouldn't share its database with another service. That is, however, a minimum bar. You need to consider how a service exists in an environment that it doesn't control. Again, the <a href="/2012/12/18/RangersandZookeepers">wildlife metaphor</a> seems apt. Particularly if your service is exposed to the internet, it lives in a hostile environment.
</p>
<p>
Not only should you consider all input belligerent, you must also take into account that friendly systems may disappear or change. Your service exists by itself, supported by itself, relying on itself. If you need to coordinate work with other service owners, that's a strong hint that your service isn't, after all, autonomous.
</p>
<p>
<strong>Next:</strong> <a href="/2024/04/15/services-share-schema-and-contract-not-class">Services share schema and contract, not class</a>.
</p>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Extracting data from a small CSV file with Pythonhttps://blog.ploeh.dk/2024/03/18/extracting-data-from-a-small-csv-file-with-python2024-03-18T08:36:00+00:00Mark Seemann
<div id="post">
<p>
<em>My inept adventures with a dynamically typed language.</em>
</p>
<p>
This article is the third in <a href="/2024/02/05/statically-and-dynamically-typed-scripts">a small series about ad-hoc programming in two languages</a>. In <a href="/2024/02/19/extracting-data-from-a-small-csv-file-with-haskell">the previous article</a> you saw how I originally solved a small data extraction and analysis problem with <a href="https://www.haskell.org/">Haskell</a>, even though it was strongly implied that <a href="https://www.python.org/">Python</a> was the language for the job.
</p>
<p>
Months after having solved the problem I'd learned a bit more Python, so I decided to return to it and do it again in Python as an exercise. In this article, I'll briefly describe what I did.
</p>
<h3 id="590b0c98bf064ac0b8893ae41d398daa">
Reading CSV data <a href="#590b0c98bf064ac0b8893ae41d398daa">#</a>
</h3>
<p>
When writing Python, I feel the way I suppose a script kiddie might feel. I cobble together code based on various examples I've seen somewhere else, without a full or deep understanding of what I'm doing. There's more than a hint of <a href="/ref/pragmatic-programmer">programming by coincidence</a>, I'm afraid. One thing I've picked up along the way is that I can use <a href="https://pandas.pydata.org/">pandas</a> to read a <a href="https://en.wikipedia.org/wiki/Comma-separated_values">CSV file</a>:
</p>
<p>
<pre>data = pd.read_csv(<span style="color:#a31515;">'survey_data.csv'</span>, header=<span style="color:blue;">None</span>)
grades = data.iloc[:, 2]
experiences = data.iloc[:, 3]</pre>
</p>
<p>
In order for this to work, I needed to import <code>pandas</code>. Ultimately, my imports looked like this:
</p>
<p>
<pre><span style="color:blue;">import</span> pandas <span style="color:blue;">as</span> pd
<span style="color:blue;">from</span> collections <span style="color:blue;">import</span> Counter
<span style="color:blue;">from</span> itertools <span style="color:blue;">import</span> combinations, combinations_with_replacement
<span style="color:blue;">import</span> matplotlib.pyplot <span style="color:blue;">as</span> plt</pre>
</p>
<p>
In other Python code that I've written, I've been a heavy user of <a href="https://numpy.org/">NumPy</a>, and while I several times added it to my imports, I never needed it for this task. That was a bit surprising, but I've only done Python programming for a year, and I still don't have a good feel for the ecosystem.
</p>
<p>
The above code snippet also demonstrates how easy it is to slice a <em>dataframe</em> into columns: <code>grades</code> contains all the values in the (zero-indexed) second column, and <code>experiences</code> likewise the third column.
</p>
<h3 id="2a5c679e37394960acf5cf283abd41d5">
Sum of grades <a href="#2a5c679e37394960acf5cf283abd41d5">#</a>
</h3>
<p>
All the trouble I had with binomial choice without replacement that I had with my Haskell code is handled with <code>combinations</code>, which happily handles duplicate values:
</p>
<p>
<pre>>>> list(combinations('foo', 2))
[('f', 'o'), ('f', 'o'), ('o', 'o')]</pre>
</p>
<p>
Notice that <code>combinations</code> doesn't list <code>('o', 'f')</code>, since (apparently) it doesn't consider ordering important. That's more in line with the <a href="https://en.wikipedia.org/wiki/Binomial_coefficient">binomial coefficient</a>, whereas <a href="/2024/02/19/extracting-data-from-a-small-csv-file-with-haskell">my Haskell code</a> considers a tuple like <code>('f', 'o')</code> to be distinct from <code>('o', 'f')</code>. This is completely consistent with how Haskell works, but means that all the counts I arrived at with Haskell are double what they are in this article. Ultimately, <em>6/1406</em> is equal to <em>3/703</em>, so the probabilities are the same. I'll try to call out this factor-of-two difference whenever it occurs.
</p>
<p>
A <code>Counter</code> object counts the number of occurrences of each value, so reading, picking combinations without replacement and adding them together is just two lines of code, and one more to print them:
</p>
<p>
<pre>sumOfGrades = Counter(<span style="color:blue;">map</span>(<span style="color:blue;">sum</span>, combinations(grades, 2)))
sumOfGrades = <span style="color:blue;">sorted</span>(sumOfGrades.items(), key=<span style="color:blue;">lambda</span> item: item[0])
<span style="color:blue;">print</span>(<span style="color:blue;">f</span><span style="color:#a31515;">'Sums of grades: </span>{sumOfGrades}<span style="color:#a31515;">'</span>)</pre>
</p>
<p>
The output is:
</p>
<p>
<pre>Sums of grades: [(0, 3), (2, 51), (4, 157), (6, 119), (7, 24), (8, 21), (9, 136), (10, 3),
(11, 56), (12, 23), (14, 69), (16, 14), (17, 8), (19, 16), (22, 2), (24, 1)]</pre>
</p>
<p>
(Formatting courtesy of yours truly.)
</p>
<p>
As already mentioned, these values are off by a factor two compared to the previous Haskell code, but since I'll ultimately be dealing in ratios, it doesn't matter. What this output indicates is that the sum <em>0</em> occurs three times, the sum <em>2</em> appears <em>51</em> times, and so on.
</p>
<p>
This is where I, in my Haskell code, dropped down to a few ephemeral <a href="https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop">REPL</a>-based queries that enabled me to collect enough information to paste into Excel in order to produce a figure. In Python, however, I have <a href="https://matplotlib.org/">Matplotlib</a>, which means that I can create the desired plots entirely in code. It does require that I write a bit more code, though.
</p>
<p>
First, I need to calculate the range of the <a href="https://www.probabilitycourse.com/chapter3/3_1_3_pmf.php">Probability Mass Function</a> (PMF), since there are values that are possible, but not represented in the above data set. To calculate all possible values in the PMF's range, I use <code>combinations_with_replacement</code> against the <a href="https://en.wikipedia.org/wiki/Academic_grading_in_Denmark">Danish grading scale</a>.
</p>
<p>
<pre>grade_scale = [-3, 0, 2, 4, 7, 10, 12]
sumOfGradesRange = <span style="color:#2b91af;">set</span>(<span style="color:blue;">map</span>(<span style="color:blue;">sum</span>, combinations_with_replacement(grade_scale, 2)))
sumOfGradesRange = <span style="color:blue;">sorted</span>(sumOfGradesRange)
<span style="color:blue;">print</span>(<span style="color:blue;">f</span><span style="color:#a31515;">'Range of sums of grades: </span>{sumOfGradesRange}<span style="color:#a31515;">'</span>)</pre>
</p>
<p>
The output is this:
</p>
<p>
<pre>Range of sums of grades: [-6, -3, -1, 0, 1, 2, 4, 6, 7, 8, 9, 10, 11, 12, 14, 16, 17, 19, 20, 22, 24]</pre>
</p>
<p>
Next, I create a dictionary of all possible grades, initializing all entries to zero, but then updating that dictionary with the observed values, where they are present:
</p>
<p>
<pre>probs = <span style="color:#2b91af;">dict</span>.fromkeys(sumOfGradesRange, 0)
probs.update(<span style="color:#2b91af;">dict</span>(sumOfGrades))</pre>
</p>
<p>
Finally, I recompute the dictionary entries to probabilities.
</p>
<p>
<pre>total = <span style="color:blue;">sum</span>(x[1] <span style="color:blue;">for</span> x <span style="color:blue;">in</span> sumOfGrades)
<span style="color:blue;">for</span> k, v <span style="color:blue;">in</span> probs.items():
probs[k] = v / total</pre>
</p>
<p>
Now I have all the data needed to plot the desired bar char:
</p>
<p>
<pre>plt.bar(probs.keys(), probs.values())
plt.xlabel(<span style="color:#a31515;">'Sum'</span>)
plt.ylabel(<span style="color:#a31515;">'Probability'</span>)
plt.show()</pre>
</p>
<p>
The result looks like this:
</p>
<p>
<img src="/content/binary/sum-pmf-plot.png" alt="Bar chart of the sum-of-grades PMF.">
</p>
<p>
While I'm already on line 34 in my Python file, with one more question to answer, I've written proper code in order to produce data that I only wrote ephemeral queries for in Haskell.
</p>
<h3 id="8831d23c67bd48e9b22db86ca3c21bd4">
Difference of experiences <a href="#8831d23c67bd48e9b22db86ca3c21bd4">#</a>
</h3>
<p>
The next question is almost a repetition of the the first one, and I've addressed it by copying and pasting. After all, it's only <em>duplication</em>, not <em>triplication</em>, so I can always invoke the <a href="https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)">Rule of Three</a>. Furthermore, this is a one-off script that I don't expect to have to maintain in the future, so copy-and-paste, here we go:
</p>
<p>
<pre>diffOfExperiances = \
Counter(<span style="color:blue;">map</span>(<span style="color:blue;">lambda</span> x: <span style="color:blue;">abs</span>(x[0] - x[1]), combinations(experiences, 2)))
diffOfExperiances = <span style="color:blue;">sorted</span>(diffOfExperiances.items(), key=<span style="color:blue;">lambda</span> item: item[0])
<span style="color:blue;">print</span>(<span style="color:blue;">f</span><span style="color:#a31515;">'Differences of experiences: </span>{diffOfExperiances}<span style="color:#a31515;">'</span>)
experience_scale = <span style="color:#2b91af;">list</span>(<span style="color:blue;">range</span>(1, 8))
diffOfExperiancesRange = <span style="color:#2b91af;">set</span>(\
<span style="color:blue;">map</span>(<span style="color:blue;">lambda</span> x: <span style="color:blue;">abs</span>(x[0] - x[1]),\
combinations_with_replacement(experience_scale, 2)))
diffOfExperiancesRange = <span style="color:blue;">sorted</span>(diffOfExperiancesRange)
probs = <span style="color:#2b91af;">dict</span>.fromkeys(diffOfExperiancesRange, 0)
probs.update(<span style="color:#2b91af;">dict</span>(diffOfExperiances))
total = <span style="color:blue;">sum</span>(x[1] <span style="color:blue;">for</span> x <span style="color:blue;">in</span> diffOfExperiances)
<span style="color:blue;">for</span> k, v <span style="color:blue;">in</span> probs.items():
probs[k] = v / total
<span style="color:green;"># Plot the histogram of differences of experiences</span>
plt.bar(probs.keys(), probs.values())
plt.xlabel(<span style="color:#a31515;">'Difference'</span>)
plt.ylabel(<span style="color:#a31515;">'Probability'</span>)
plt.show()</pre>
</p>
<p>
The bar chart has the same style as before, but obviously displays different data. See the bar chart in the <a href="/2024/02/19/extracting-data-from-a-small-csv-file-with-haskell">previous article</a> for the Excel-based rendition of that data.
</p>
<h3 id="8d7d707edeba43c59d07b5753a4bdb2d">
Conclusion <a href="#8d7d707edeba43c59d07b5753a4bdb2d">#</a>
</h3>
<p>
The Python code runs to 61 lines of code, compared with the 34 lines of Haskell code. The Python code, however, is much more complete than the Haskell code, since it also contains the code that computes the range of each PMF, as well as code that produces the figures.
</p>
<p>
Like the Haskell code, it took me a couple of hours to produce this, so I can't say that I feel much more productive in Python than in Haskell. On the other hand, I also acknowledge that I have less experience writing Python code. If I had to do a lot of ad-hoc data crunching like this, I can see how Python is useful.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Boundaries are explicithttps://blog.ploeh.dk/2024/03/11/boundaries-are-explicit2024-03-11T08:03:00+00:00Mark Seemann
<div id="post">
<p>
<em>A reading of the first Don Box tenet, with some commentary.</em>
</p>
<p>
This article is part of a series titled <a href="/2024/03/04/the-four-tenets-of-soa-revisited">The four tenets of SOA revisited</a>. In each of these articles, I'll pull one of <a href="https://en.wikipedia.org/wiki/Don_Box">Don Box</a>'s <em>four tenets of service-oriented architecture</em> (SOA) out of the <a href="https://learn.microsoft.com/en-us/archive/msdn-magazine/2004/january/a-guide-to-developing-and-running-connected-systems-with-indigo">original MSDN Magazine article</a> and add some of my own commentary. If you're curious why I do that, I cover that in the introductory article.
</p>
<p>
In this article, I'll go over the first tenet, quoting from the MSDN Magazine article unless otherwise indicated.
</p>
<h3 id="3d25f37d4da8482fa846b8660823b8cd">
Boundaries are explicit <a href="#3d25f37d4da8482fa846b8660823b8cd">#</a>
</h3>
<p>
This tenet was the one I struggled with the most. It took me a long time to come to grips with how to apply it, but I'll get back to that in a moment. First, here's what the article said:
</p>
<blockquote>
<p>
A service-oriented application often consists of services that are spread over large geographical distances, multiple trust authorities, and distinct execution environments. The cost of traversing these various boundaries is nontrivial in terms of complexity and performance. Service-oriented designs acknowledge these costs by putting a premium on boundary crossings. Because each cross-boundary communication is potentially costly, service-orientation is based on a model of explicit message passing rather than implicit method invocation. Compared to distributed objects, the service-oriented model views cross-service method invocation as a private implementation technique, not as a primitive construct—the fact that a given interaction may be implemented as a method call is a private implementation detail that is not visible outside the service boundary.
</p>
<p>
Though service-orientation does not impose the RPC-style notion of a network-wide call stack, it can support a strong notion of causality. It is common for messages to explicitly indicate which chain(s) of messages a particular message belongs to. This indication is useful for message correlation and for implementing several common concurrency models.
</p>
<p>
The notion that boundaries are explicit applies not only to inter-service communication but also to inter-developer communication. Even in scenarios in which all services are deployed in a single location, it is commonplace for the developers of each service to be spread across geographical, cultural, and/or organizational boundaries. Each of these boundaries increases the cost of communication between developers. Service orientation adapts to this model by reducing the number and complexity of abstractions that must be shared across service boundaries. By keeping the surface area of a service as small as possible, the interaction and communication between development organizations is reduced. One theme that is consistent in service-oriented designs is that simplicity and generality aren't a luxury but rather a critical survival skill.
</p>
</blockquote>
<p>
Notice that there's nothing here about <a href="https://en.wikipedia.org/wiki/Windows_Communication_Foundation">Windows Communication Framework</a> (WCF), or any other specific technology. This is common to all four tenets, and one of the reasons that I think they deserve to be lifted out of their original context and put on display as the general ideas that they are.
</p>
<p>
I'm getting the vibe that the above description was written under the impression of the disenchantment with distributed objects that was setting in around that time. The year before, <a href="https://martinfowler.com/">Martin Fowler</a> had formulated his
</p>
<blockquote>
<p>
"<strong>First Law of Distributed Object Design:</strong> Don't distribute your objects!"
</p>
<footer><cite>Martin Fowler, <a href="/ref/peaa">Patterns of Enterprise Application Architecture</a>, (his emphasis)</cite></footer>
</blockquote>
<p>
The way that I read the tenet then, and the way I <em>still</em> read it today, is that in contrast to distributed objects, you should treat any service invocation as a noticeable operation, <em>"putting a premium on boundary crossings"</em>, somehow distinct from normal code.
</p>
<p>
Perhaps I read to much into that, because WCF immediately proceeded to convert any <a href="https://en.wikipedia.org/wiki/SOAP">SOAP</a> service into a lot of auto-generated C# code that would then enable you to invoke operations on a remote service using (you guessed it) a method invocation.
</p>
<p>
Here a code snippet from the <a href="https://learn.microsoft.com/dotnet/framework/wcf/how-to-use-a-wcf-client">WCF documentation</a>:
</p>
<p>
<pre><span style="color:blue;">double</span> <span style="font-weight:bold;color:#1f377f;">value1</span> = 100.00D;
<span style="color:blue;">double</span> <span style="font-weight:bold;color:#1f377f;">value2</span> = 15.99D;
<span style="color:blue;">double</span> <span style="font-weight:bold;color:#1f377f;">result</span> = client.Add(value1, value2);</pre>
</p>
<p>
What happens here is that <code>client.Add</code> creates and sends a SOAP message to a service, receives the response, unpacks it, and returns it as a <code>double</code> value. Even so, it looks just like any other method call. There's no <em>"premium on boundary crossings"</em> here.
</p>
<p>
So much for the principle that boundaries are explicit. They're not, and it bothered me twenty years ago, as it bothers me today.
</p>
<p>
I'll remind you what the problem is. When the boundary is <em>not</em> explicit, you may inadvertently write client code that makes network calls, and you may not be aware of it. This could noticeably slow down the application, particularly if you do it in a loop.
</p>
<h3 id="55bd772540a047a3b8db0d1aee373e87">
How do you make boundaries explicit? <a href="#55bd772540a047a3b8db0d1aee373e87">#</a>
</h3>
<p>
This problem isn't isolated to WCF or SOAP. <a href="https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing">Network calls are slow and unreliable</a>. Perhaps you're connecting to a system on the other side of the Earth. Perhaps the system is unavailable. This is true regardless of protocol.
</p>
<p>
From the software architect's perspective, the tenet that boundaries are explicit is a really good idea. The clearer it is where in a code base network operations take place, the easier it is to troubleshot and maintain that code. This could make it easier to spot <em>n + 1</em> problems, as well as give you opportunities to <a href="/2020/03/23/repeatable-execution">add logging</a>, <a href="https://martinfowler.com/bliki/CircuitBreaker.html">Circuit Breakers</a>, etc.
</p>
<p>
How do you make boundaries explicit? Clearly, WCF failed to do so, despite the design goal.
</p>
<h3 id="3c69e9213db946dc8d389c9b4bf19de2">
Only Commands <a href="#3c69e9213db946dc8d389c9b4bf19de2">#</a>
</h3>
<p>
After having struggled with this question for years, I had an idea. This idea, however, doesn't really work, but I'll briefly cover it here. After all, if I can have that idea, other people may get it as well. It could save you some time if I explain why I believe that it doesn't address the problem.
</p>
<p>
The idea is to mandate that all network operations are <a href="https://en.wikipedia.org/wiki/Command%E2%80%93query_separation">Commands</a>. In a C-like language, that would indicate a <code>void</code> method.
</p>
<p>
While it turns out that it ultimately doesn't work, this isn't just some arbitrary rule that I've invented. After all, if a method doesn't return anything, the boundary does, in a sense, become explicit. You can't just 'keep dotting', <a href="https://martinfowler.com/bliki/FluentInterface.html">fluent-interface</a> style.
</p>
<p>
<pre>channel.UpdateProduct(pc);</pre>
</p>
<p>
This gives you the opportunity to treat network operations as fire-and-forget operations. While you could still run such Commands in a tight loop, you could at least add them to a queue and move on. Such a queue could be be an in-process data structure, or a persistent queue. Your network card also holds a small queue of network packets.
</p>
<p>
This is essentially an asynchronous messaging architecture. It seems to correlate with Don Box's talk about messages.
</p>
<p>
Although this may seem as though it addresses some concerns about making boundaries explicit, an obvious question arises: How do you perform queries in this model?
</p>
<p>
You <em>could</em> keep such an architecture clean. You might, for example, implement a <a href="https://martinfowler.com/bliki/CQRS.html">CQRS</a> architecture where Commands create Events for which your application may subscribe. Such events could be handled by <em>event handlers</em> (other <code>void</code> methods) to update in-memory data as it changes.
</p>
<p>
Even so, there are practical limitations with such a model. What's likely to happen, instead, is the following.
</p>
<h3 id="04a7c349122e45e38341eb0b50b877c0">
Request-Reply <a href="#04a7c349122e45e38341eb0b50b877c0">#</a>
</h3>
<p>
It's hardly unlikely that you may need to perform some kind of Query against a remote system. If you can only communicate with services using <code>void</code> methods, such a scenario seems impossible.
</p>
<p>
It's not. There's even a pattern for that. <a href="/ref/eip">Enterprise Integration Patterns</a> call it <a href="https://www.enterpriseintegrationpatterns.com/patterns/messaging/RequestReply.html">Request-Reply</a>. You create a Query message and give it a correlation ID, send it, and wait for the reply message to arrive at your own <em>message handler</em>. Client code might look like this:
</p>
<p>
<pre>var correlationId = Guid.NewGuid();
var query = new FavouriteSongsQuery(UserId: 123, correlationId);
channel.Send(query);
IReadOnlyCollection<Song> songs = [];
while (true)
{
var response = subscriber.GetNextResponse(correlationId);
if (response is null)
Thread.Sleep(100);
else
songs = response;
break;
}</pre>
</p>
<p>
While this works, it's awkward to use, so it doesn't take long before someone decides to wrap it in a helpful helper method:
</p>
<p>
<pre>public IReadOnlyCollection<Song> GetFavouriteSongs(int userId)
{
var correlationId = Guid.NewGuid();
var query = new FavouriteSongsQuery(userId, correlationId);
channel.Send(query);
IReadOnlyCollection<Song> songs = [];
while (true)
{
var response = subscriber.GetNextResponse(correlationId);
if (response is null)
Thread.Sleep(100);
else
songs = response;
break;
}
return songs;
}</pre>
</p>
<p>
This now enables you to write client code like this:
</p>
<p>
<pre>var songService = new SongService();
var songs = songService.GetFavouriteSongs(123);</pre>
</p>
<p>
We're back where we started. Boundaries are no longer explicit. Equivalent to how <a href="/2020/11/23/good-names-are-skin-deep">good names are only skin-deep</a>, this attempt to make boundaries explicit can't resist programmers' natural tendency to make things easier for themselves.
</p>
<p>
If only there was some way to make an abstraction <a href="https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/">contagious</a>...
</p>
<h3 id="fffc782323e746ef9a7b662132e17257">
Contagion <a href="#fffc782323e746ef9a7b662132e17257">#</a>
</h3>
<p>
Ideally, we'd like to make boundaries explicit in such a way that they can't be hidden away. After all,
</p>
<blockquote>
<p>
"Abstraction is <em>the elimination of the irrelevant and the amplification of the essential.</em>"
</p>
<footer><cite>Robert C. Martin, <a href="/ref/doocautbm">Designing Object-Oriented C++ Applications Using The Booch Method</a>, chapter 00 (sic), (his emphasis)</cite></footer>
</blockquote>
<p>
The existence of a boundary is essential, so while we might want to eliminate various other irrelevant details, this is a property that we should retain and surface in APIs. Even better, it'd be best if we could do it in such a way that it can't easily be swept under the rug, as shown above.
</p>
<p>
In <a href="https://www.haskell.org/">Haskell</a>, this is true for all input/output - not only network requests, but also file access, and other non-deterministic actions. In Haskell this is a 'wrapper' type called <code>IO</code>; for an explanation with C# examples, see <a href="/2020/06/08/the-io-container">The IO Container</a>.
</p>
<p>
In a more <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> way, we can use <a href="/2020/07/27/task-asynchronous-programming-as-an-io-surrogate">task asynchronous programming as an IO surrogate</a>. People often complain that <code>async</code> code is contagious. By that they mean that once a piece of code is asynchronous, the caller must also be asynchronous. This effect is transitive, and while this is often lamented as a problem, this is exactly what we need. Amplify the essential. Make boundaries explicit.
</p>
<p>
This doesn't mean that your entire code base has to be asynchronous. Only your network (and similar, non-deterministic) code should be asynchronous. Write your Domain Model and application code as pure functions, and <a href="/2019/02/11/asynchronous-injection">compose them with the asynchronous code using standard combinators</a>.
</p>
<h3 id="16ff65dbc4784ad1939257635d08039c">
Conclusion <a href="#16ff65dbc4784ad1939257635d08039c">#</a>
</h3>
<p>
The first of Don Box's four tenets of SOA is that boundaries should be explicit. WCF failed to deliver on that ideal, and it took me more than a decade to figure out how to square that circle.
</p>
<p>
Many languages now come with support for asynchronous programming, often utilizing some kind of generic <code>Task</code> or <code>Async</code> <a href="/2022/03/28/monads">monad</a>. Since such types are usually contagious, you can use them to make boundaries explicit.
</p>
<p>
<strong>Next:</strong> <a href="/2024/03/25/services-are-autonomous">Services are autonomous</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.The four tenets of SOA revisitedhttps://blog.ploeh.dk/2024/03/04/the-four-tenets-of-soa-revisited2024-03-04T06:39:00+00:00Mark Seemann
<div id="post">
<p>
<em>Twenty years after.</em>
</p>
<p>
In the <a href="https://learn.microsoft.com/en-us/archive/msdn-magazine/2004/january/msdn-magazine-january-2004">January 2004 issue of MSDN Magazine</a> you can find an article by <a href="https://en.wikipedia.org/wiki/Don_Box">Don Box</a> titled <a href="https://learn.microsoft.com/en-us/archive/msdn-magazine/2004/january/a-guide-to-developing-and-running-connected-systems-with-indigo">A Guide to Developing and Running Connected Systems with Indigo</a>. Buried within the (now dated) discussion of the technology code-named <em>Indigo</em> (later <a href="https://en.wikipedia.org/wiki/Windows_Communication_Foundation">Windows Communication Foundation</a>) you can find a general discussion of <em>four tenets of service-oriented architecture</em> (SOA).
</p>
<p>
I remember that they resonated strongly with me back then, or that they at least prompted me to think explicitly about how to design software services. Some of these ideas have stayed with me ever since, while another has nagged at me for decades before I found a way to reconcile it with other principles of software design.
</p>
<p>
Now that it's twenty years ago that the MSDN article was published, I find that this is as good a time as ever to revisit it.
</p>
<h3 id="96e92c4bccef4d5789bbb5d860e3ce3f">
Legacy <a href="#96e92c4bccef4d5789bbb5d860e3ce3f">#</a>
</h3>
<p>
Why should we care about an old article about <a href="https://en.wikipedia.org/wiki/SOAP">SOAP</a> and SOA? Does anyone even use such things today, apart from legacy systems?
</p>
<p>
After all, we've moved on from SOAP to <a href="https://en.wikipedia.org/wiki/REST">REST</a>, <a href="https://en.wikipedia.org/wiki/GRPC">gRPC</a>, or <a href="https://en.wikipedia.org/wiki/GraphQL">GraphQL</a>, and from SOA to <a href="https://en.wikipedia.org/wiki/Microservices">microservices</a> - that is, if we're not already swinging back towards monoliths.
</p>
<p>
Even so, I find much of what Don Box wrote twenty years ago surprisingly prescient. If you're interested in distributed software design involving some kind of remote API design, the four tenets of service-orientation apply beyond their original context. Some of the ideas, at least.
</p>
<p>
As is often the case in our field, various resources discuss the tenets without much regard to proper citation. Thus, I can't be sure that the MSDN article is where they first appeared, but I haven't found any earlier source.
</p>
<p>
My motivation for writing these article is partly to rescue the four tenets from obscurity, and partly to add some of my own commentary.
</p>
<p>
Much of the original article is about Indigo, and I'm going to skip that. On the other hand, I'm going to quote rather extensively from the article, in order to lift the more universal ideas out of their original context.
</p>
<p>
I'll do that in a series of articles, each covering one of the tenets.
</p>
<ul>
<li><a href="/2024/03/11/boundaries-are-explicit">Boundaries are explicit</a></li>
<li><a href="/2024/03/25/services-are-autonomous">Services are autonomous</a></li>
<li><a href="/2024/04/15/services-share-schema-and-contract-not-class">Services share schema and contract, not class</a></li>
<li><a href="/2024/04/29/service-compatibility-is-determined-based-on-policy">Service compatibility is determined based on policy</a></li>
</ul>
<p>
Not all of the tenets have stood the test of time equally well, so I may not add an equal amount of commentary to all four.
</p>
<h3 id="ad6f66b0ac954647bebf4d288939d2ab">
Conclusion <a href="#ad6f66b0ac954647bebf4d288939d2ab">#</a>
</h3>
<p>
Ever since I first encountered the four tenets of SOA, they've stayed with me in one form or other. When helping teams to design services, even what we may today consider 'modern services', I've drawn on some of those ideas. There are insights of a general nature that are worth considering even today.
</p>
<p>
<strong>Next:</strong> <a href="/2024/03/11/boundaries-are-explicit">Boundaries are explicit</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Testing exceptionshttps://blog.ploeh.dk/2024/02/26/testing-exceptions2024-02-26T06:47:00+00:00Mark Seemann
<div id="post">
<p>
<em>Some thoughts on testing past the happy path.</em>
</p>
<p>
Test-driven development is a great development technique that enables you to get rapid feedback on design and implementation ideas. It enables you to rapidly move towards a working solution.
</p>
<p>
The emphasis on the <em>happy path</em>, however, can make you forget about all the things that may go wrong. Sooner or later, though, you may realize that the implementation can fail for a number of reasons, and, wanting to make things more robust, you may want to also subject your error-handling code to automated testing.
</p>
<p>
This doesn't have to be difficult, but can raise some interesting questions. In this article, I'll try to address a few.
</p>
<h3 id="ead73eb4bc4b45eba0bde0cf61269814">
Throwing exceptions with a dynamic mock <a href="#ead73eb4bc4b45eba0bde0cf61269814">#</a>
</h3>
<p>
In <a href="/2023/08/14/replacing-mock-and-stub-with-a-fake#0afe67b375254fe193a3fd10234a1ce9">a question to another article</a> AmirB asks how to use a <a href="http://xunitpatterns.com/Fake%20Object.html">Fake Object</a> to test exceptions. Specifically, since <a href="/2023/11/13/fakes-are-test-doubles-with-contracts">a Fake is a Test Double with a coherent contract</a> it'll be inappropriate to let it throw exceptions that relate to different implementations.
</p>
<p>
Egads, that was quite abstract, so let's consider a concrete example.
</p>
<p>
<a href="/2023/08/14/replacing-mock-and-stub-with-a-fake">The original article</a> that AmirB asked about used this interface as an example:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IUserRepository</span>
{
User <span style="font-weight:bold;color:#74531f;">Read</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>);
<span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Create</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>);
}</pre>
</p>
<p>
Granted, this interface is a little odd, but it should be good enough for the present purpose. As AmirB wrote:
</p>
<blockquote>
<p>
"In scenarios where dynamic mocks (like Moq) are employed, we can mock a method to throw an exception, allowing us to test the expected behavior of the System Under Test (SUT)."
</p>
<footer><cite><a href="/2023/08/14/replacing-mock-and-stub-with-a-fake#0afe67b375254fe193a3fd10234a1ce9">AmirB</a></cite></footer>
</blockquote>
<p>
Specifically, this might look like this, using <a href="https://github.com/devlooped/moq">Moq</a>:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">CreateThrows</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">td</span> = <span style="color:blue;">new</span> Mock<IUserRepository>();
td.Setup(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Read(1234)).Returns(<span style="color:blue;">new</span> User { Id = 0 });
td.Setup(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Create(It.IsAny<<span style="color:blue;">int</span>>())).Throws(MakeSqlException());
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> SomeController(td.Object);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = sut.GetUser(1234);
Assert.NotNull(actual);
}</pre>
</p>
<p>
It's just an example, but the point is that since you can make a dynamic mock do anything that you can define in code, you can also use it to simulate database exceptions. This test pretends that the <code>IUserRepository</code> throws a <a href="https://learn.microsoft.com/dotnet/api/system.data.sqlclient.sqlexception">SqlException</a> from the <code>Create</code> method.
</p>
<p>
Perhaps the <code>GetUser</code> implementation now looks like this:
</p>
<p>
<pre><span style="color:blue;">public</span> User <span style="font-weight:bold;color:#74531f;">GetUser</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">u</span> = <span style="color:blue;">this</span>.userRepository.Read(userId);
<span style="font-weight:bold;color:#8f08c4;">if</span> (u.Id == 0)
<span style="font-weight:bold;color:#8f08c4;">try</span>
{
<span style="color:blue;">this</span>.userRepository.Create(userId);
}
<span style="font-weight:bold;color:#8f08c4;">catch</span> (SqlException)
{
}
<span style="font-weight:bold;color:#8f08c4;">return</span> u;
}</pre>
</p>
<p>
If you find the example contrived, I don't blame you. The <code>IUserRepository</code> interface, the <code>User</code> class, and the <code>GetUser</code> method that orchestrates them are all degenerate in various ways. I originally created this little code example to discuss <a href="/2013/10/23/mocks-for-commands-stubs-for-queries">data flow verification</a>, and I'm now stretching it beyond any reason. I hope that you can look past that. The point I'm making here is more general, and doesn't hinge on particulars.
</p>
<h3 id="ed58e2e387234a7ebd3c97a384841d9f">
Fake <a href="#ed58e2e387234a7ebd3c97a384841d9f">#</a>
</h3>
<p>
<a href="/2023/08/14/replacing-mock-and-stub-with-a-fake">The article</a> also suggests a <code>FakeUserRepository</code> that is small enough that I can repeat it here.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FakeUserRepository</span> : Collection<User>, IUserRepository
{
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Create</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>)
{
Add(<span style="color:blue;">new</span> User { Id = userId });
}
<span style="color:blue;">public</span> User <span style="font-weight:bold;color:#74531f;">Read</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">user</span> = <span style="color:blue;">this</span>.SingleOrDefault(<span style="font-weight:bold;color:#1f377f;">u</span> => u.Id == userId);
<span style="font-weight:bold;color:#8f08c4;">if</span> (user == <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> User { Id = 0 };
<span style="font-weight:bold;color:#8f08c4;">return</span> user;
}
}</pre>
</p>
<p>
The question is how to use something like this when you want to test exceptions? It's possible that this little class may produce <a href="/2024/01/29/error-categories-and-category-errors">errors that I've failed to predict</a>, but it certainly doesn't throw any <code>SqlExceptions</code>!
</p>
<p>
Should we inflate <code>FakeUserRepository</code> by somehow also giving it the capability to throw particular exceptions?
</p>
<h3 id="c116b036864348b7938dfb6805e0c2dd">
Throwing exceptions from Test Doubles <a href="#c116b036864348b7938dfb6805e0c2dd">#</a>
</h3>
<p>
I understand why AmirB asks that question, because it doesn't seem right. As a start, it would go against the <a href="https://en.wikipedia.org/wiki/Single_responsibility_principle">Single Responsibility Principle</a>. The <code>FakeUserRepository</code> would then have more than reason to change: You'd have to change it if the <code>IUserRepository</code> interface changes, but you'd also have to change it if you wanted to simulate a different error situation.
</p>
<p>
Good coding practices apply to test code as well. Test code is code that you have to read and maintain, so all the good practices that keep production code in good shape also apply to test code. This may include <a href="https://en.wikipedia.org/wiki/SOLID">the SOLID principles</a>, unless you're of the mind that <a href="https://dannorth.net/cupid-for-joyful-coding/">SOLID ought to be a thing of the past</a>.
</p>
<p>
If you really <em>must</em> throw exceptions from a <a href="https://martinfowler.com/bliki/TestDouble.html">Test Double</a>, perhaps a dynamic mock object as shown above is the best option. No-one says that if you use a Fake Object for most of your tests you can't use a dynamic mock library for truly one-off test cases.Or perhaps a one-off Test Double that throws the desired exception.
</p>
<p>
I would, however, consider it a code smell if this happens too often. Not a test smell, but a code smell.
</p>
<h3 id="302a9e4462744a55974b8fdab6f70054">
Is the exception part of the contract? <a href="#302a9e4462744a55974b8fdab6f70054">#</a>
</h3>
<p>
You may ask yourself whether a particular exception type is part of an object's contract. As I always do, when I use the word <em>contract</em>, I refer to a set of invariants, pre-, and postconditions, taking a cue from <a href="/ref/oosc">Object-Oriented Software Construction</a>. See also my video course <a href="/encapsulation-and-solid">Encapsulation and SOLID</a> for more details.
</p>
<p>
You can <em>imply</em> many things about a contract when you have a static type system at your disposal, but there are always rules that you can't express that way. Parts of a contract are implicitly understood, or communicated in other ways. Code comments, <a href="https://en.wikipedia.org/wiki/Docstring">docstrings</a>, or similar, are good options.
</p>
<p>
What may you infer from the <code>IUserRepository</code> interface? What should you <em>not</em> infer?
</p>
<p>
I'd expect the <code>Read</code> method to return a <code>User</code> object. This code example hails us <a href="/2013/10/23/mocks-for-commands-stubs-for-queries">from 2013</a>, before C# had <a href="https://learn.microsoft.com/dotnet/csharp/nullable-references">nullable reference types</a>. Around that time I'd begun using <a href="/2018/03/26/the-maybe-functor">Maybe</a> to signal that the return value might be missing. This is a <em>convention</em>, so the reader needs to be aware of it in order to correctly infer that part of the contract. Since the <code>Read</code> method does <em>not</em> return <code>Maybe<User></code> I might infer that a non-null <code>User</code> object is guaranteed; that's a post-condition.
</p>
<p>
These days, I'd also use <a href="/2020/07/27/task-asynchronous-programming-as-an-io-surrogate">asynchronous APIs to hint that I/O is involved</a>, but again, the example is so old and simplified that this isn't the case here. Still, regardless of how this is communicated to the reader, if an interface (or base class) is intended for I/O, we may expect it to fail at times. In most languages, such errors manifest as exceptions.
</p>
<p>
At least two questions arise from such deliberations:
</p>
<ul>
<li>Which exception types may the methods throw?</li>
<li>Can you even handle such exceptions?</li>
</ul>
<p>
Should <code>SqlException</code> even be part of the contract? Isn't that an implementation detail?
</p>
<p>
The <code>FakeUserRepository</code> class neither uses SQL Server nor throws <code>SqlExceptions</code>. You may imagine other implementations that use a document database, or even just another relational database than SQL Server (Oracle, MySQL, PostgreSQL, etc.). Those wouldn't throw <code>SqlExceptions</code>, but perhaps other exception types.
</p>
<p>
According to the <a href="https://en.wikipedia.org/wiki/Dependency_inversion_principle">Dependency Inversion Principle</a>,
</p>
<blockquote>
<p>
"Abstractions should not depend upon details. Details should depend upon abstractions."
</p>
<footer><cite>Robert C. Martin, <a href="/ref/appp">Agile Principles, Patterns, and Practices in C#</a></cite></footer>
</blockquote>
<p>
If we make <code>SqlException</code> part of the contract, an implementation detail becomes part of the contract. Not only that: With an implementation like the above <code>GetUser</code> method, which catches <code>SqlException</code>, we've also violated the <a href="https://en.wikipedia.org/wiki/Liskov_substitution_principle">Liskov Substitution Principle</a>. If you injected another implementation, one that throws a different type of exception, the code would no longer work as intended.
</p>
<p>
Loosely coupled code shouldn't look like that.
</p>
<p>
Many specific exceptions are of <a href="/2024/01/29/error-categories-and-category-errors">a kind that you can't handle anyway</a>. On the other hand, if you do decide to handle particular error scenarios, make it part of the contract, or, as Michael Feathers puts it, <a href="https://youtu.be/AnZ0uTOerUI?si=1gJXYFoVlNTSbjEt">extend the domain</a>.
</p>
<h3 id="ed86f41415724219a0afbf9d669ec1b7">
Integration testing <a href="#ed86f41415724219a0afbf9d669ec1b7">#</a>
</h3>
<p>
How should we unit test specific exception? <a href="https://en.wikipedia.org/wiki/Mu_(negative)">Mu</a>, we shouldn't.
</p>
<blockquote>
<p>
"Personally, I avoid using try-catch blocks in repositories or controllers and prefer handling exceptions in middleware (e.g., ErrorHandler). In such cases, I write separate unit tests for the middleware. Could this be a more fitting approach?"
</p>
<footer><cite><a href="/2023/08/14/replacing-mock-and-stub-with-a-fake#0afe67b375254fe193a3fd10234a1ce9">AmirB</a></cite></footer>
</blockquote>
<p>
That is, I think, an excellent approach to those exceptions that that you've decided to not handle explicitly. Such middleware would typically log or otherwise notify operators that a problem has arisen. You could also write some general-purpose middleware that performs retries or implements the <a href="https://martinfowler.com/bliki/CircuitBreaker.html">Circuit Breaker</a> pattern, but reusable libraries that do that already exist. Consider using one.
</p>
<p>
Still, you may have decided to implement a particular feature by leveraging a capability of a particular piece of technology, and the code you intent to write is complicated enough, or important enough, that you'd like to have good test coverage. How do you do that?
</p>
<p>
I'd suggest an integration test.
</p>
<p>
I don't have a good example lying around that involves throwing specific exceptions, but something similar may be of service. The example code base that accompanies my book <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a> pretends to be an online restaurant reservation system. Two concurrent clients may compete for the last table on a particular date; a typical race condition.
</p>
<p>
There are more than one way to address such a concern. As implied in <a href="/2024/01/29/error-categories-and-category-errors">a previous article</a>, you may decide to rearchitect the entire application to be able to handle such edge cases in a robust manner. For the purposes of the book's example code base, however, I considered a <em>lock-free architecture</em> out of scope. Instead, I had in mind dealing with that issue by taking advantage of .NET and SQL Server's support for lightweight transactions via a <a href="https://learn.microsoft.com/dotnet/api/system.transactions.transactionscope">TransactionScope</a>. While this is a handy solution, it's utterly dependent on the technology stack. It's a good example of an implementation detail that I'd rather not expose to a unit test.
</p>
<p>
Instead, I wrote a <a href="/2021/01/25/self-hosted-integration-tests-in-aspnet">self-hosted integration test</a> that runs against a real SQL Server instance (automatically deployed and configured on demand). It tests <em>behaviour</em> rather than implementation details:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task <span style="font-weight:bold;color:#74531f;">NoOverbookingRace</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">start</span> = DateTimeOffset.UtcNow;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">timeOut</span> = TimeSpan.FromSeconds(30);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">i</span> = 0;
<span style="font-weight:bold;color:#8f08c4;">while</span> (DateTimeOffset.UtcNow - start < timeOut)
<span style="color:blue;">await</span> PostTwoConcurrentLiminalReservations(start.DateTime.AddDays(++i));
}
<span style="color:blue;">private</span> <span style="color:blue;">static</span> <span style="color:blue;">async</span> Task <span style="font-weight:bold;color:#74531f;">PostTwoConcurrentLiminalReservations</span>(DateTime <span style="font-weight:bold;color:#1f377f;">date</span>)
{
date = date.Date.AddHours(18.5);
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">service</span> = <span style="color:blue;">new</span> RestaurantService();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">task1</span> = service.PostReservation(<span style="color:blue;">new</span> ReservationDtoBuilder()
.WithDate(date)
.WithQuantity(10)
.Build());
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">task2</span> = service.PostReservation(<span style="color:blue;">new</span> ReservationDtoBuilder()
.WithDate(date)
.WithQuantity(10)
.Build());
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = <span style="color:blue;">await</span> Task.WhenAll(task1, task2);
Assert.Single(actual, <span style="font-weight:bold;color:#1f377f;">msg</span> => msg.IsSuccessStatusCode);
Assert.Single(
actual,
<span style="font-weight:bold;color:#1f377f;">msg</span> => msg.StatusCode == HttpStatusCode.InternalServerError);
}</pre>
</p>
<p>
This test attempts to make two concurrent reservations for ten people. This is also the maximum capacity of the restaurant: It's impossible to seat twenty people. We'd like for one of the requests to win that race, while the server should reject the loser.
</p>
<p>
This test is only concerned with the behaviour that clients can observe, and since this code base contains hundreds of other tests that inspect HTTP response messages, this one only looks at the status codes.
</p>
<p>
The implementation handles the potential overbooking scenario like this:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">async</span> Task<ActionResult> <span style="font-weight:bold;color:#74531f;">TryCreate</span>(Restaurant <span style="font-weight:bold;color:#1f377f;">restaurant</span>, Reservation <span style="font-weight:bold;color:#1f377f;">reservation</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">scope</span> = <span style="color:blue;">new</span> TransactionScope(TransactionScopeAsyncFlowOption.Enabled);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">reservations</span> = <span style="color:blue;">await</span> Repository.ReadReservations(restaurant.Id, reservation.At);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">now</span> = Clock.GetCurrentDateTime();
<span style="font-weight:bold;color:#8f08c4;">if</span> (!restaurant.MaitreD.WillAccept(now, reservations, reservation))
<span style="font-weight:bold;color:#8f08c4;">return</span> NoTables500InternalServerError();
<span style="color:blue;">await</span> Repository.Create(restaurant.Id, reservation);
scope.Complete();
<span style="font-weight:bold;color:#8f08c4;">return</span> Reservation201Created(restaurant.Id, reservation);
}</pre>
</p>
<p>
Notice the <code>TransactionScope</code>.
</p>
<p>
I'm under the illusion that I could radically change this implementation detail without breaking the above test. Granted, unlike <a href="/2023/09/04/decomposing-ctfiyhs-sample-code-base">another experiment</a>, this hypothesis isn't one I've put to the test.
</p>
<h3 id="9659f21863e74c288d5c2d36534eaa37">
Conclusion <a href="#9659f21863e74c288d5c2d36534eaa37">#</a>
</h3>
<p>
How does one automatically test error branches? Most unit testing frameworks come with APIs that makes it easy to verify that specific exceptions were thrown, so that's not the hard part. If a particular exception is part of the System Under Test's contract, just test it like that.
</p>
<p>
On the other hand, when it comes to objects composed with other objects, implementation details may easily leak through in the shape of specific exception types. I'd think twice before writing a test that verifies whether a piece of client code (such as the above <code>SomeController</code>) handles a particular exception type (such as <code>SqlException</code>).
</p>
<p>
If such a test is difficult to write because all you have is a Fake Object (e.g. <code>FakeUserRepository</code>), that's only good. The rapid feedback furnished by test-driven development strikes again. Listen to your tests.
</p>
<p>
You should probably not write that test at all, because there seems to be an issue with the planned structure of the code. Address <em>that</em> problem instead.
</p>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Extracting data from a small CSV file with Haskellhttps://blog.ploeh.dk/2024/02/19/extracting-data-from-a-small-csv-file-with-haskell2024-02-19T12:57:00+00:00Mark Seemann
<div id="post">
<p>
<em>Statically typed languages are also good for ad-hoc scripting.</em>
</p>
<p>
This article is part of a <a href="/2024/02/05/statically-and-dynamically-typed-scripts">short series of articles</a> that compares ad-hoc scripting in <a href="https://www.haskell.org/">Haskell</a> with solving the same problem in <a href="https://www.python.org/">Python</a>. The <a href="/2024/02/05/statically-and-dynamically-typed-scripts">introductory article</a> describes the problem to be solved, so here I'll jump straight into the Haskell code. In the next article I'll give a similar walkthrough of my Python script.
</p>
<h3 id="0a705367eb2f4080ac168eb1bbe9b2ec">
Getting started <a href="#0a705367eb2f4080ac168eb1bbe9b2ec">#</a>
</h3>
<p>
When working with Haskell for more than a few true one-off expressions that I can type into GHCi (the Haskell <a href="https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop">REPL</a>), I usually create a module file. Since I'd been asked to crunch some data, and I wasn't feeling very imaginative that day, I just named the module (and the file) <code>Crunch</code>. After some iterative exploration of the problem, I also arrived at a set of imports:
</p>
<p>
<pre><span style="color:blue;">module</span> Crunch <span style="color:blue;">where</span>
<span style="color:blue;">import</span> Data.List (<span style="color:#2b91af;">sort</span>)
<span style="color:blue;">import</span> <span style="color:blue;">qualified</span> Data.List.NonEmpty <span style="color:blue;">as</span> NE
<span style="color:blue;">import</span> Data.List.Split
<span style="color:blue;">import</span> Control.Applicative
<span style="color:blue;">import</span> Control.Monad
<span style="color:blue;">import</span> Data.Foldable</pre>
</p>
<p>
As we go along, you'll see where some of these fit in.
</p>
<p>
Reading the actual data file, however, can be done with just the Haskell <code>Prelude</code>:
</p>
<p>
<pre>inputLines = <span style="color:blue;">words</span> <$> <span style="color:blue;">readFile</span> <span style="color:#a31515;">"survey_data.csv"</span></pre>
</p>
<p>
Already now, it's possible to load the module in GHCi and start examining the data:
</p>
<p>
<pre>ghci> :l Crunch.hs
[1 of 1] Compiling Crunch ( Crunch.hs, interpreted )
Ok, one module loaded.
ghci> length <$> inputLines
38</pre>
</p>
<p>
Looks good, but reading a text file is hardly the difficult part. The first obstacle, surprisingly, is to split comma-separated values into individual parts. For some reason that I've never understood, the Haskell base library doesn't even include something as basic as <a href="https://learn.microsoft.com/dotnet/api/system.string.split">String.Split</a> from .NET. I could probably hack together a function that does that, but on the other hand, it's available in the <a href="https://hackage.haskell.org/package/split/docs/Data-List-Split.html">split</a> package; that explains the <code>Data.List.Split</code> import. It's just a bit of a bother that one has to pull in another package only to do that.
</p>
<h3 id="a70030690c1645a2b0923ad354fa665b">
Grades <a href="#a70030690c1645a2b0923ad354fa665b">#</a>
</h3>
<p>
Extracting all the grades are now relatively easy. This function extracts and parses a grade from a single line:
</p>
<p>
<pre><span style="color:#2b91af;">grade</span> <span style="color:blue;">::</span> <span style="color:blue;">Read</span> a <span style="color:blue;">=></span> <span style="color:#2b91af;">String</span> <span style="color:blue;">-></span> a
grade line = <span style="color:blue;">read</span> $ splitOn <span style="color:#a31515;">","</span> line !! 2</pre>
</p>
<p>
It splits the line on commas, picks the third element (zero-indexed, of course, so element <code>2</code>), and finally parses it.
</p>
<p>
One may experiment with it in GHCi to get an impression that it works:
</p>
<p>
<pre>ghci> fmap grade <$> inputLines :: IO [Int]
[2,2,12,10,4,12,2,7,2,2,2,7,2,7,2,4,2,7,4,7,0,4,0,7,2,2,2,2,2,2,4,4,2,7,4,0,7,2]</pre>
</p>
<p>
This lists all 38 expected grades found in the data file.
</p>
<p>
In the <a href="/2024/02/05/statically-and-dynamically-typed-scripts">introduction article</a> I spent some time explaining how languages with strong type inference don't need type declarations. This makes iterative development easier, because you can fiddle with an expression until it does what you'd like it to do. When you change an expression, often the inferred type changes as well, but there's no programmer overhead involved with that. The compiler figures that out for you.
</p>
<p>
Even so, the above <code>grade</code> function does have a type annotation. How does that gel with what I just wrote?
</p>
<p>
It doesn't, on the surface, but when I was fiddling with the code, there was no type annotation. The Haskell compiler is perfectly happy to infer the type of an expression like
</p>
<p>
<pre>grade line = <span style="color:blue;">read</span> $ splitOn <span style="color:#a31515;">","</span> line !! 2</pre>
</p>
<p>
The human reader, however, is not so clever (I'm not, at least), so once a particular expression settles, and I'm fairly sure that it's not going to change further, I sometimes add the type annotation to aid myself.
</p>
<p>
When writing this, I was debating the didactics of showing the function <em>with</em> the type annotation, against showing it without it. Eventually I decided to include it, because it's more understandable that way. That decision, however, prompted this explanation.
</p>
<h3 id="754ea5fced264c439f8784705176851b">
Binomial choice <a href="#754ea5fced264c439f8784705176851b">#</a>
</h3>
<p>
The next thing I needed to do was to list all pairs from the data file. Usually, <a href="/2022/01/17/enumerate-wordle-combinations-with-an-applicative-functor">when I run into a problem related to combinations, I reach for applicative composition</a>. For example, to list all possible combinations of the first three primes, I might do this:
</p>
<p>
<pre>ghci> liftA2 (,) [2,3,5] [2,3,5]
[(2,2),(2,3),(2,5),(3,2),(3,3),(3,5),(5,2),(5,3),(5,5)]</pre>
</p>
<p>
You may now protest that this is sampling with replacement, whereas the task is to pick two <em>different</em> rows from the data file. Usually, when I run into that requirement, I just remove the ones that pick the same value twice:
</p>
<p>
<pre>ghci> filter (uncurry (/=)) $ liftA2 (,) [2,3,5] [2,3,5]
[(2,3),(2,5),(3,2),(3,5),(5,2),(5,3)]</pre>
</p>
<p>
That works great as long as the values are unique, but what if that's not the case?
</p>
<p>
<pre>ghci> liftA2 (,) "foo" "foo"
[('f','f'),('f','o'),('f','o'),('o','f'),('o','o'),('o','o'),('o','f'),('o','o'),('o','o')]
ghci> filter (uncurry (/=)) $ liftA2 (,) "foo" "foo"
[('f','o'),('f','o'),('o','f'),('o','f')]</pre>
</p>
<p>
This removes too many values! We don't want the combinations where the first <code>o</code> is paired with itself, or when the second <code>o</code> is paired with itself, but we <em>do</em> want the combination where the first <code>o</code> is paired with the second, and vice versa.
</p>
<p>
This is relevant because the data set turns out to contain identical rows. Thus, I needed something that would deal with that issue.
</p>
<p>
Now, bear with me, because it's quite possible that what i did do isn't the simplest solution to the problem. On the other hand, I'm reporting what I did, and how I used Haskell to solve a one-off problem. If you have a simpler solution, please <a href="https://github.com/ploeh/ploeh.github.com?tab=readme-ov-file#comments">leave a comment</a>.
</p>
<p>
You often reach for the tool that you already know, so I used a variation of the above. Instead of combining values, I decided to combine row indices instead. This meant that I needed a function that would produce the indices for a particular list:
</p>
<p>
<pre><span style="color:#2b91af;">indices</span> <span style="color:blue;">::</span> <span style="color:blue;">Foldable</span> t <span style="color:blue;">=></span> t a <span style="color:blue;">-></span> [<span style="color:#2b91af;">Int</span>]
indices f = [0 .. <span style="color:blue;">length</span> f - 1]</pre>
</p>
<p>
Again, the type annotation came later. This just produces sequential numbers, starting from zero:
</p>
<p>
<pre>ghci> indices <$> inputLines
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,
21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37]</pre>
</p>
<p>
Such a function hovers just around the <a href="https://wiki.haskell.org/Fairbairn_threshold">Fairbairn threshold</a>; some experienced Haskellers would probably just inline it.
</p>
<p>
Since row numbers (indices) are unique, the above approach to binomial choice works, so I also added a function for that:
</p>
<p>
<pre><span style="color:#2b91af;">choices</span> <span style="color:blue;">::</span> <span style="color:blue;">Eq</span> a <span style="color:blue;">=></span> [a] <span style="color:blue;">-></span> [(a, a)]
choices = <span style="color:blue;">filter</span> (<span style="color:blue;">uncurry</span> <span style="color:#2b91af;">(/=)</span>) . join (liftA2 <span style="color:#2b91af;">(,)</span>)</pre>
</p>
<p>
Combined with <code>indices</code> I can now enumerate all combinations of two rows in the data set:
</p>
<p>
<pre>ghci> choices . indices <$> inputLines
[(0,1),(0,2),(0,3),(0,4),(0,5),(0,6),(0,7),(0,8),(0,9),...</pre>
</p>
<p>
I'm only showing the first ten results here, because in reality, there are <em>1406</em> such pairs.
</p>
<p>
Perhaps you think that all of this seems quite elaborate, but so far it's only four lines of code. The reason it looks like more is because I've gone to some lengths to explain what the code does.
</p>
<h3 id="bf2d1d52fb3c4a29b6987840b2d46530">
Sum of grades <a href="#bf2d1d52fb3c4a29b6987840b2d46530">#</a>
</h3>
<p>
The above combinations are pairs of <em>indices</em>, not values. What I need is to use each index to look up the row, from the row get the grade, and then sum the two grades. The first parts of that I can accomplish with the <code>grade</code> function, but I need to do if for every row, and for both elements of each pair.
</p>
<p>
While tuples are <code>Functor</code> instances, they only map over the second element, and that's not what I need:
</p>
<p>
<pre>ghci> rows = ["foo", "bar", "baz"]
ghci> fmap (rows!!) <$> [(0,1),(0,2)]
[(0,"bar"),(0,"baz")]</pre>
</p>
<p>
While this is just a simple example that maps over the two pairs <code>(0,1)</code> and <code>(0,2)</code>, it illustrates the problem: It only finds the row for each tuple's second element, but I need it for both.
</p>
<p>
On the other hand, a type like <code>(a, a)</code> gives rise to a <a href="/2018/03/22/functors">functor</a>, and while a wrapper type like that is not readily available in the <em>base</em> library, defining one is a one-liner:
</p>
<p>
<pre><span style="color:blue;">newtype</span> Pair a = Pair { unPair :: (a, a) } <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Eq</span>, <span style="color:#2b91af;">Show</span>, <span style="color:#2b91af;">Functor</span>)</pre>
</p>
<p>
This enables me to map over pairs in one go:
</p>
<p>
<pre>ghci> unPair <$> fmap (rows!!) <$> Pair <$> [(0,1),(0,2)]
[("foo","bar"),("foo","baz")]</pre>
</p>
<p>
This makes things a little easier. What remains is to use the <code>grade</code> function to look up the grade value for each row, then add the two numbers together, and finally count how many occurrences there are of each:
</p>
<p>
<pre>sumGrades ls =
liftA2 <span style="color:#2b91af;">(,)</span> NE.<span style="color:blue;">head</span> <span style="color:blue;">length</span> <$> NE.group
(sort (<span style="color:blue;">uncurry</span> <span style="color:#2b91af;">(+)</span> . unPair . <span style="color:blue;">fmap</span> (grade . (ls !!)) . Pair <$>
choices (indices ls)))</pre>
</p>
<p>
You'll notice that this function doesn't have a type annotation, but we can ask GHCi if we're curious:
</p>
<p>
<pre>ghci> :t sumGrades
sumGrades :: (Ord a, Num a, Read a) => [String] -> [(a, Int)]</pre>
</p>
<p>
This enabled me to get a count of each sum of grades:
</p>
<p>
<pre>ghci> sumGrades <$> inputLines
[(0,6),(2,102),(4,314),(6,238),(7,48),(8,42),(9,272),(10,6),
(11,112),(12,46),(14,138),(16,28),(17,16),(19,32),(22,4),(24,2)]</pre>
</p>
<p>
The way to read this is that the sum <em>0</em> occurs six times, <em>2</em> appears <em>102</em> times, etc.
</p>
<p>
There's one remaining task to accomplish before we can produce a PMF of the sum of grades: We need to enumerate the range, because, as it turns out, there are sums that are possible, but that don't appear in the data set. Can you spot which ones?
</p>
<p>
Using tools already covered, it's easy to enumerate all possible sums:
</p>
<p>
<pre>ghci> import Data.List
ghci> sort $ nub $ (uncurry (+)) <$> join (liftA2 (,)) [-3,0,2,4,7,10,12]
[-6,-3,-1,0,1,2,4,6,7,8,9,10,11,12,14,16,17,19,20,22,24]</pre>
</p>
<p>
The sums <em>-6</em>, <em>-3</em>, <em>-1</em>, and more, are possible, but don't appear in the data set. Thus, in the PMF for two randomly picked grades, the probability that the sum is <em>-6</em> is <em>0</em>. On the other hand, the probability that the sum is <em>0</em> is <em>6/1406 ~ 0.004267</em>, and so on.
</p>
<h3 id="ddcac27fb3ff468cb9312f0fcc333865">
Difference of experience levels <a href="#ddcac27fb3ff468cb9312f0fcc333865">#</a>
</h3>
<p>
The other question posed in the assignment was to produce the PMF for the absolute difference between two randomly selected students' experience levels.
</p>
<p>
Answering that question follows the same mould as above. First, extract experience level from each data row, instead of the grade:
</p>
<p>
<pre><span style="color:#2b91af;">experience</span> <span style="color:blue;">::</span> <span style="color:blue;">Read</span> a <span style="color:blue;">=></span> <span style="color:#2b91af;">String</span> <span style="color:blue;">-></span> a
experience line = <span style="color:blue;">read</span> $ splitOn <span style="color:#a31515;">","</span> line !! 3</pre>
</p>
<p>
Since I was doing an ad-hoc script, I just copied the <code>grade</code> function and changed the index from <code>2</code> to <code>3</code>. Enumerating the experience differences were also a close copy of <code>sumGrades</code>:
</p>
<p>
<pre>diffExp ls =
liftA2 <span style="color:#2b91af;">(,)</span> NE.<span style="color:blue;">head</span> <span style="color:blue;">length</span> <$> NE.group
(sort (<span style="color:blue;">abs</span> . <span style="color:blue;">uncurry</span> <span style="color:#2b91af;">(-)</span> . unPair . <span style="color:blue;">fmap</span> (experience . (ls !!)) . Pair <$>
choices (indices ls)))</pre>
</p>
<p>
Running it in the REPL produces some other numbers, to be interpreted the same way as above:
</p>
<p>
<pre>ghci> diffExp <$> inputLines
[(0,246),(1,472),(2,352),(3,224),(4,82),(5,24),(6,6)]</pre>
</p>
<p>
This means that the difference <em>0</em> occurs <em>246</em> times, <em>1</em> appears <em>472</em> times, and so on. From those numbers, it's fairly simple to set up the PMF.
</p>
<h3 id="948660097f7748f2844ebfd91371b2a2">
Figures <a href="#948660097f7748f2844ebfd91371b2a2">#</a>
</h3>
<p>
Another part of the assignment was to produce plots of both PMFs. I don't know how to produce figures with Haskell, and since the final results are just a handful of numbers each, I just copied them into a text editor to align them, and then pasted them into Excel to produce the figures there.
</p>
<p>
Here's the PMF for the differences:
</p>
<p>
<img src="/content/binary/difference-pmf-plot.png" alt="Bar chart of the differences PMF.">
</p>
<p>
I originally created the figure with Danish labels. I'm sure that you can guess what <em>differens</em> means, and <em>sandsynlighed</em> means <em>probability</em>.
</p>
<h3 id="9b225242b63e4616b25954ad9141e273">
Conclusion <a href="#9b225242b63e4616b25954ad9141e273">#</a>
</h3>
<p>
In this article you've seen the artefacts of an ad-hoc script to extract and analyze a small data set. While I've spent quite a few words to explain what's going on, the entire <code>Crunch</code> module is only 34 lines of code. Add to that a few ephemeral queries done directly in GHCi, but never saved to a file. It's been some months since I wrote the code, but as far as I recall, it took me a few hours all in all.
</p>
<p>
If you do stuff like this every day, you probably find that appalling, but data crunching isn't really my main thing.
</p>
<p>
Is it quicker to do it in Python? Not for me, it turns out. It also took me a couple of hours to repeat the exercise in Python.
</p>
<p>
<strong>Next:</strong> <a href="/2024/03/18/extracting-data-from-a-small-csv-file-with-python">Extracting data from a small CSV file with Python</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Range as a functorhttps://blog.ploeh.dk/2024/02/12/range-as-a-functor2024-02-12T06:59:00+00:00Mark Seemann
<div id="post">
<p>
<em>With examples in C#, F#, and Haskell.</em>
</p>
<p>
This article is an instalment in <a href="/2024/01/01/variations-of-the-range-kata">a short series of articles on the Range kata</a>. In the previous three articles you've seen <a href="https://codingdojo.org/kata/Range/">the Range kata</a> implemented <a href="/2024/01/08/a-range-kata-implementation-in-haskell">in Haskell</a>, <a href="/2024/01/15/a-range-kata-implementation-in-f">in F#</a>, and <a href="/2024/01/22/a-range-kata-implementation-in-c">in C#</a>.
</p>
<p>
The reason I engaged with this kata was that I find that it provides a credible example of a how a pair of <a href="/2018/03/22/functors">functors</a> itself forms a functor. In this article, you'll see how that works out in three languages. If you don't care about one or two of those languages, just skip that section.
</p>
<h3 id="f8b28f239ca6444c8f32c768e725fad7">
Haskell perspective <a href="#f8b28f239ca6444c8f32c768e725fad7">#</a>
</h3>
<p>
If you've done any <a href="https://www.haskell.org/">Haskell</a> programming, you may be thinking that I have in mind the default <code>Functor</code> instances for tuples. As part of the <a href="https://hackage.haskell.org/package/base">base</a> library, tuples (pairs, triples, quadruples, etc.) are already <code>Functor</code> instances. Specifically for pairs, we have this instance:
</p>
<p>
<pre>instance Functor ((,) a)</pre>
</p>
<p>
Those are not the functor instances I have in mind. To a degree, I find these default <code>Functor</code> instances unfortunate, or at least arbitrary. Let's briefly explore the above instance to see why that is.
</p>
<p>
Haskell is a notoriously terse language, but if we expand the above instance to (invalid) pseudocode, it says something like this:
</p>
<p>
<pre>instance Functor ((a,b) b)</pre>
</p>
<p>
What I'm trying to get across here is that the <code>a</code> type argument is fixed, and only the second type argument <code>b</code> can be mapped. Thus, you can map a <code>(Bool, String)</code> pair to a <code>(Bool, Int)</code> pair:
</p>
<p>
<pre>ghci> fmap length (True, "foo")
(True,3)</pre>
</p>
<p>
but the first element (<code>Bool</code>, in this example) is fixed, and you can't map that. To be clear, the first element can be any type, but once you've fixed it, you can't change it (within the constraints of the <code>Functor</code> API, mind):
</p>
<p>
<pre>ghci> fmap (replicate 3) (42, 'f')
(42,"fff")
ghci> fmap ($ 3) ("bar", (* 2))
("bar",6)</pre>
</p>
<p>
The reason I find these default instances arbitrary is that this isn't the only possible <code>Functor</code> instance. Pairs, in particular, are also <a href="https://hackage.haskell.org/package/base/docs/Data-Bifunctor.html">Bifunctor</a> instances, so you can easily map over the first element, instead of the second:
</p>
<p>
<pre>ghci> first show (42, 'f')
("42",'f')</pre>
</p>
<p>
Similarly, one can easily imagine a <code>Functor</code> instance for triples (three-tuples) that map the middle element. The default instance, however, maps the third (i.e. last) element only.
</p>
<p>
There are some hand-wavy rationalizations out there that argue that in Haskell, application and reduction is usually done from the right, so therefore it's most appropriate to map over the rightmost element of tuples. I admit that it at least argues from a position of consistency, and it does make it easier to remember, but from a didactic perspective I still find it a bit unfortunate. It suggests that a tuple functor only maps the last element.
</p>
<p>
What I had in mind for <em>ranges</em> however, wasn't to map only the first or the last element. Neither did I wish to treat ranges as <a href="/2018/12/24/bifunctors">bifunctors</a>. What I really wanted was the ability to project an entire range.
</p>
<p>
In my Haskell Range implementation, I'd simply treated ranges as tuples of <code>Endpoint</code> values, and although I didn't show that in the article, I ultimately declared <code>Endpoint</code> as a <code>Functor</code> instance:
</p>
<p>
<pre><span style="color:blue;">data</span> Endpoint a = Open a | Closed a <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Eq</span>, <span style="color:#2b91af;">Show</span>, <span style="color:#2b91af;">Functor</span>)</pre>
</p>
<p>
This enables you to map a single <code>Endpoint</code> value:
</p>
<p>
<pre>ghci> fmap length $ Closed "foo"
Closed 3</pre>
</p>
<p>
That's just a single value, but the Range kata API operates with pairs of <code>Endpoint</code> value. For example, the <code>contains</code> function has this type:
</p>
<p>
<pre><span style="color:#2b91af;">contains</span> <span style="color:blue;">::</span> (<span style="color:blue;">Foldable</span> t, <span style="color:blue;">Ord</span> a) <span style="color:blue;">=></span> (<span style="color:blue;">Endpoint</span> a, <span style="color:blue;">Endpoint</span> a) <span style="color:blue;">-></span> t a <span style="color:blue;">-></span> <span style="color:#2b91af;">Bool</span></pre>
</p>
<p>
Notice the <code>(Endpoint a, Endpoint a)</code> input type.
</p>
<p>
Is it possible to treat such a pair as a functor? Yes, indeed, just import <a href="https://hackage.haskell.org/package/base/docs/Data-Functor-Product.html">Data.Functor.Product</a>, which enables you to package two functor values in a single wrapper:
</p>
<p>
<pre>ghci> import Data.Functor.Product
ghci> Pair (Closed "foo") (Open "corge")
Pair (Closed "foo") (Open "corge")</pre>
</p>
<p>
Now, granted, the <code>Pair</code> data constructor doesn't wrap a <em>tuple</em>, but that's easily fixed:
</p>
<p>
<pre>ghci> uncurry Pair (Closed "foo", Open "corge")
Pair (Closed "foo") (Open "corge")</pre>
</p>
<p>
The resulting <code>Pair</code> value is a <code>Functor</code> instance, which means that you can project it:
</p>
<p>
<pre>ghci> fmap length $ uncurry Pair (Closed "foo", Open "corge")
Pair (Closed 3) (Open 5)</pre>
</p>
<p>
Now, granted, I find the <code>Data.Functor.Product</code> API a bit lacking in convenience. For instance, there's no <code>getPair</code> function to retrieve the underlying values; you'd have to use pattern matching for that.
</p>
<p>
In any case, my motivation for covering this ground wasn't to argue that <code>Data.Functor.Product</code> is all we need. The point was rather to observe that when you have two functors, you can combine them, and the combination is also a functor.
</p>
<p>
This is one of the many reasons I get so much value out of Haskell. Its abstraction level is so high that it substantiates relationships that may also exist in other code bases, written in other programming languages. Even if a language like <a href="https://fsharp.org/">F#</a> or C# can't formally express some of those abstraction, you can still make use of them as 'design patterns' (for lack of a better term).
</p>
<h3 id="ba94968ed2bc4780b995639212f8371b">
F# functor <a href="#ba94968ed2bc4780b995639212f8371b">#</a>
</h3>
<p>
What we've learned from Haskell is that if we have two functors we can combine them into one. Specifically, I made <code>Endpoint</code> a <code>Functor</code> instance, and from that followed automatically that a <code>Pair</code> of those was also a <code>Functor</code> instance.
</p>
<p>
I can do the same in F#, starting with <code>Endpoint</code>. In F# I've unsurprisingly defined the type like this:
</p>
<p>
<pre><span style="color:blue;">type</span> Endpoint<'a> = Open <span style="color:blue;">of</span> 'a | Closed <span style="color:blue;">of</span> 'a</pre>
</p>
<p>
That's just a standard <a href="https://en.wikipedia.org/wiki/Tagged_union">discriminated union</a>. In order to make it a functor, you'll have to add a <code>map</code> function:
</p>
<p>
<pre><span style="color:blue;">module</span> Endpoint =
<span style="color:blue;">let</span> map f = <span style="color:blue;">function</span>
| Open x <span style="color:blue;">-></span> Open (f x)
| Closed x <span style="color:blue;">-></span> Closed (f x)</pre>
</p>
<p>
The function alone, however, isn't enough to give rise to a functor. We must also convince ourselves that the <code>map</code> function obeys the functor laws. One way to do that is to write tests. While tests aren't <em>proofs</em>, we may still be sufficiently reassured by the tests that that's good enough for us. While I could, I'm not going to <em>prove</em> that <code>Endpoint.map</code> satisfies the functor laws. I will, later, do just that with the pair, but I'll leave this one as an exercise for the interested reader.
</p>
<p>
Since I was already using <a href="https://hedgehog.qa/">Hedgehog</a> for property-based testing in my F# code, it was obvious to write properties for the functor laws as well.
</p>
<p>
<pre>[<Fact>]
<span style="color:blue;">let</span> ``First functor law`` () = Property.check <| property {
<span style="color:blue;">let</span> genInt32 = Gen.int32 (Range.linearBounded ())
<span style="color:blue;">let!</span> expected = Gen.choice [Gen.map Open genInt32; Gen.map Closed genInt32]
<span style="color:blue;">let</span> actual = Endpoint.map id expected
expected =! actual }</pre>
</p>
<p>
This property exercises the first functor law for integer endpoints. Recall that this law states that if you map a value with the <a href="https://en.wikipedia.org/wiki/Identity_function">identity function</a>, nothing really happens.
</p>
<p>
The second functor law is more interesting.
</p>
<p>
<pre>[<Fact>]
<span style="color:blue;">let</span> ``Second functor law`` () = Property.check <| property {
<span style="color:blue;">let</span> genInt32 = Gen.int32 (Range.linearBounded ())
<span style="color:blue;">let!</span> endpoint = Gen.choice [Gen.map Open genInt32; Gen.map Closed genInt32]
<span style="color:blue;">let!</span> f = Gen.item [id; ((+) 1); ((*) 2)]
<span style="color:blue;">let!</span> g = Gen.item [id; ((+) 1); ((*) 2)]
<span style="color:blue;">let</span> actual = Endpoint.map (f << g) endpoint
Endpoint.map f (Endpoint.map g endpoint) =! actual }</pre>
</p>
<p>
This property again exercises the property for integer endpoints. Not only does the property pick a random integer and varies whether the <code>Endpoint</code> is <code>Open</code> or <code>Closed</code>, it also picks two random functions from a small list of functions: The identity function (again), a function that increments by one, and a function that doubles the input. These two functions, <code>f</code> and <code>g</code>, might then be the same, but might also be different from each other. Thus, the composition <code>f << g</code> <em>might</em> be <code>id << id</code> or <code>((+) 1) << ((+) 1)</code>, but might just as well be <code>((+) 1) << ((*) 2)</code>, or one of the other possible combinations.
</p>
<p>
The law states that the result should be the same regardless of whether you first compose the functions and then map them, or map them one after the other.
</p>
<p>
Which is the case.
</p>
<p>
A <code>Range</code> is defined like this:
</p>
<p>
<pre><span style="color:blue;">type</span> Range<'a> = { LowerBound : Endpoint<'a>; UpperBound : Endpoint<'a> }</pre>
</p>
<p>
This record type also gives rise to a functor:
</p>
<p>
<pre><span style="color:blue;">module</span> Range =
<span style="color:blue;">let</span> map f { LowerBound = lowerBound; UpperBound = upperBound } =
{ LowerBound = Endpoint.map f lowerBound
UpperBound = Endpoint.map f upperBound }</pre>
</p>
<p>
This <code>map</code> function uses the projection <code>f</code> on both the <code>lowerBound</code> and the <code>upperBound</code>. It, too, obeys the functor laws:
</p>
<p>
<pre>[<Fact>]
<span style="color:blue;">let</span> ``First functor law`` () = Property.check <| property {
<span style="color:blue;">let</span> genInt64 = Gen.int64 (Range.linearBounded ())
<span style="color:blue;">let</span> genEndpoint = Gen.choice [Gen.map Open genInt64; Gen.map Closed genInt64]
<span style="color:blue;">let!</span> expected = Gen.tuple genEndpoint |> Gen.map Range.ofEndpoints
<span style="color:blue;">let</span> actual = expected |> Ploeh.Katas.Range.map id
expected =! actual }
[<Fact>]
<span style="color:blue;">let</span> ``Second functor law`` () = Property.check <| property {
<span style="color:blue;">let</span> genInt16 = Gen.int16 (Range.linearBounded ())
<span style="color:blue;">let</span> genEndpoint = Gen.choice [Gen.map Open genInt16; Gen.map Closed genInt16]
<span style="color:blue;">let!</span> range = Gen.tuple genEndpoint |> Gen.map Range.ofEndpoints
<span style="color:blue;">let!</span> f = Gen.item [id; ((+) 1s); ((*) 2s)]
<span style="color:blue;">let!</span> g = Gen.item [id; ((+) 1s); ((*) 2s)]
<span style="color:blue;">let</span> actual = range |> Ploeh.Katas.Range.map (f << g)
Ploeh.Katas.Range.map f (Ploeh.Katas.Range.map g range) =! actual }</pre>
</p>
<p>
These two Hedgehog properties are cast in the same mould as the <code>Endpoint</code> properties, only they create 64-bit and 16-bit ranges for variation's sake.
</p>
<h3 id="c91ec56a7b22445b85ac4253f81c5c74">
C# functor <a href="#c91ec56a7b22445b85ac4253f81c5c74">#</a>
</h3>
<p>
As I wrote about the Haskell result, it teaches us which abstractions are possible, even if we can't formalise them to the same degree in, say, C# as we can in Haskell. It should come as no surprise, then, that we can also make <code><span style="color:#2b91af;">Range</span><<span style="color:#2b91af;">T</span>></code> a functor in C#.
</p>
<p>
In C# we <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatically</a> do that by giving a class a <code>Select</code> method. Again, we'll have to begin with <code>Endpoint</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> Endpoint<TResult> <span style="font-weight:bold;color:#74531f;">Select</span><<span style="color:#2b91af;">TResult</span>>(Func<T, TResult> <span style="font-weight:bold;color:#1f377f;">selector</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> Match(
whenClosed: <span style="font-weight:bold;color:#1f377f;">x</span> => Endpoint.Closed(selector(x)),
whenOpen: <span style="font-weight:bold;color:#1f377f;">x</span> => Endpoint.Open(selector(x)));
}</pre>
</p>
<p>
Does that <code>Select</code> method obey the functor laws? Yes, as we can demonstrate (not prove) with a few properties:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">FirstFunctorLaw</span>()
{
Gen.OneOf(
Gen.Int.Select(Endpoint.Open),
Gen.Int.Select(Endpoint.Closed))
.Sample(<span style="font-weight:bold;color:#1f377f;">expected</span> =>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = expected.Select(<span style="font-weight:bold;color:#1f377f;">x</span> => x);
Assert.Equal(expected, actual);
});
}
[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">ScondFunctorLaw</span>()
{
(<span style="color:blue;">from</span> endpoint <span style="color:blue;">in</span> Gen.OneOf(
Gen.Int.Select(Endpoint.Open),
Gen.Int.Select(Endpoint.Closed))
<span style="color:blue;">from</span> f <span style="color:blue;">in</span> Gen.OneOfConst<Func<<span style="color:blue;">int</span>, <span style="color:blue;">int</span>>>(<span style="font-weight:bold;color:#1f377f;">x</span> => x, <span style="font-weight:bold;color:#1f377f;">x</span> => x + 1, <span style="font-weight:bold;color:#1f377f;">x</span> => x * 2)
<span style="color:blue;">from</span> g <span style="color:blue;">in</span> Gen.OneOfConst<Func<<span style="color:blue;">int</span>, <span style="color:blue;">int</span>>>(<span style="font-weight:bold;color:#1f377f;">x</span> => x, <span style="font-weight:bold;color:#1f377f;">x</span> => x + 1, <span style="font-weight:bold;color:#1f377f;">x</span> => x * 2)
<span style="color:blue;">select</span> (endpoint, f, g))
.Sample(<span style="font-weight:bold;color:#1f377f;">t</span> =>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = t.endpoint.Select(<span style="font-weight:bold;color:#1f377f;">x</span> => t.g(t.f(x)));
Assert.Equal(
t.endpoint.Select(t.f).Select(t.g),
actual);
});
}</pre>
</p>
<p>
These two tests follow the scheme laid out by the above F# properties, and they both pass.
</p>
<p>
The <code>Range</code> class gets the same treatment. First, a <code>Select</code> method:
</p>
<p>
<pre><span style="color:blue;">public</span> Range<TResult> <span style="font-weight:bold;color:#74531f;">Select</span><<span style="color:#2b91af;">TResult</span>>(Func<T, TResult> <span style="font-weight:bold;color:#1f377f;">selector</span>)
<span style="color:blue;">where</span> TResult : IComparable<TResult>
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> Range<TResult>(min.Select(selector), max.Select(selector));
}</pre>
</p>
<p>
which, again, can be demonstrated with two properties that exercise the functor laws:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">FirstFunctorLaw</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">genEndpoint</span> = Gen.OneOf(
Gen.Int.Select(Endpoint.Closed),
Gen.Int.Select(Endpoint.Open));
genEndpoint.SelectMany(<span style="font-weight:bold;color:#1f377f;">min</span> => genEndpoint
.Select(<span style="font-weight:bold;color:#1f377f;">max</span> => <span style="color:blue;">new</span> Range<<span style="color:blue;">int</span>>(min, max)))
.Sample(<span style="font-weight:bold;color:#1f377f;">sut</span> =>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = sut.Select(<span style="font-weight:bold;color:#1f377f;">x</span> => x);
Assert.Equal(sut, actual);
});
}
[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">SecondFunctorLaw</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">genEndpoint</span> = Gen.OneOf(
Gen.Int.Select(Endpoint.Closed),
Gen.Int.Select(Endpoint.Open));
(<span style="color:blue;">from</span> min <span style="color:blue;">in</span> genEndpoint
<span style="color:blue;">from</span> max <span style="color:blue;">in</span> genEndpoint
<span style="color:blue;">from</span> f <span style="color:blue;">in</span> Gen.OneOfConst<Func<<span style="color:blue;">int</span>, <span style="color:blue;">int</span>>>(<span style="font-weight:bold;color:#1f377f;">x</span> => x, <span style="font-weight:bold;color:#1f377f;">x</span> => x + 1, <span style="font-weight:bold;color:#1f377f;">x</span> => x * 2)
<span style="color:blue;">from</span> g <span style="color:blue;">in</span> Gen.OneOfConst<Func<<span style="color:blue;">int</span>, <span style="color:blue;">int</span>>>(<span style="font-weight:bold;color:#1f377f;">x</span> => x, <span style="font-weight:bold;color:#1f377f;">x</span> => x + 1, <span style="font-weight:bold;color:#1f377f;">x</span> => x * 2)
<span style="color:blue;">select</span> (sut : <span style="color:blue;">new</span> Range<<span style="color:blue;">int</span>>(min, max), f, g))
.Sample(<span style="font-weight:bold;color:#1f377f;">t</span> =>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = t.sut.Select(<span style="font-weight:bold;color:#1f377f;">x</span> => t.g(t.f(x)));
Assert.Equal(
t.sut.Select(t.f).Select(t.g),
actual);
});
}</pre>
</p>
<p>
These tests also pass.
</p>
<h3 id="222d65b253b145679994d1b9336069c7">
Laws <a href="#222d65b253b145679994d1b9336069c7">#</a>
</h3>
<p>
Exercising a pair of properties can give us a good warm feeling that the data structures and functions defined above are proper functors. Sometimes, tests are all we have, but in this case we can do better. We can prove that the functor laws always hold.
</p>
<p>
The various above incarnations of a <code>Range</code> type are all <a href="https://en.wikipedia.org/wiki/Product_type">product types</a>, and the canonical form of a product type is a tuple (see e.g. <a href="https://thinkingwithtypes.com/">Thinking with Types</a> for a clear explanation of why that is). That's the reason I stuck with a tuple in my Haskell code.
</p>
<p>
Consider the implementation of the <code>fmap</code> implementation of <code>Pair</code>:
</p>
<p>
<pre>fmap f (Pair x y) = Pair (fmap f x) (fmap f y)</pre>
</p>
<p>
We can use equational reasoning, and as always I'll use the <a href="https://bartoszmilewski.com/2015/01/20/functors/">the notation that Bartosz Milewski uses</a>. It's only natural to begin with the first functor law, using <code>F</code> and <code>G</code> as placeholders for two arbitrary <code>Functor</code> data constructors.
</p>
<p>
<pre> fmap id (Pair (F x) (G y))
= { definition of fmap }
Pair (fmap id (F x)) (fmap id (G y))
= { first functor law }
Pair (F x) (G y)
= { definition of id }
id (Pair (F x) (G y))</pre>
</p>
<p>
Keep in mind that in this notation, the equal signs are true equalities, going both ways. Thus, you can read this proof from the top to the bottom, or from the bottom to the top. The equality holds both ways, as should be the case for a true equality.
</p>
<p>
We can proceed in the same vein to prove the second functor law, being careful to distinguish between <code>Functor</code> instances (<code>F</code> and <code>G</code>) and functions (<code>f</code> and <code>g</code>):
</p>
<p>
<pre> fmap (g . f) (Pair (F x) (G y))
= { definition of fmap }
Pair (fmap (g . f) (F x)) (fmap (g . f) (G y))
= { second functor law }
Pair ((fmap g . fmap f) (F x)) ((fmap g . fmap f) (G y))
= { definition of composition }
Pair (fmap g (fmap f (F x))) (fmap g (fmap f (G y)))
= { definition of fmap }
fmap g (Pair (fmap f (F x)) (fmap f (G y)))
= { definition of fmap }
fmap g (fmap f (Pair (F x) (G y)))
= { definition of composition }
(fmap g . fmap f) (Pair (F x) (G y))</pre>
</p>
<p>
Notice that both proofs make use of the functor laws. This may seem self-referential, but is rather recursive. When the proofs refer to the functor laws, they refer to the functors <code>F</code> and <code>G</code>, which are both assumed to be lawful.
</p>
<p>
This is how we know that the product of two lawful functors is itself a functor.
</p>
<h3 id="54ac7f2fadef46c4a295333a2037656e">
Negations <a href="#54ac7f2fadef46c4a295333a2037656e">#</a>
</h3>
<p>
During all of this, you may have thought: <em>What happens if we project a range with a negation?</em>
</p>
<p>
As a simple example, let's consider the range from <em>-1</em> to <em>2:</em>
</p>
<p>
<pre>ghci> uncurry Pair (Closed (-1), Closed 2)
Pair (Closed (-1)) (Closed 2)</pre>
</p>
<p>
We may draw this range on the number line like this:
</p>
<p>
<img src="/content/binary/single-range-on-number-line.png" alt="The range from -1 to 2 drawn on the number line.">
</p>
<p>
What happens if we map that range by multiplying with <em>-1?</em>
</p>
<p>
<pre>ghci> fmap negate $ uncurry Pair (Closed (-1), Closed 2)
Pair (Closed 1) (Closed (-2))</pre>
</p>
<p>
We get a range from <em>1</em> to <em>-2!</em>
</p>
<p>
<em>Aha!</em> you say, <em>clearly that's wrong!</em> We've just found a counterexample. After all, <em>range</em> isn't a functor.
</p>
<p>
Not so. The functor laws say nothing about the interpretation of projections (but I'll get back to that in a moment). Rather, they say something about composition, so let's consider an example that reaches a similar, seemingly wrong result:
</p>
<p>
<pre>ghci> fmap ((+1) . negate) $ uncurry Pair (Closed (-1), Closed 2)
Pair (Closed 2) (Closed (-1))</pre>
</p>
<p>
This is a range from <em>2</em> to <em>-1</em>, so just as problematic as before.
</p>
<p>
The second functor law states that the outcome should be the same if we map piecewise:
</p>
<p>
<pre>ghci> (fmap (+ 1) . fmap negate) $ uncurry Pair (Closed (-1), Closed 2)
Pair (Closed 2) (Closed (-1))</pre>
</p>
<p>
Still a range from <em>2</em> to <em>-1</em>. The second functor law holds.
</p>
<p>
<em>But,</em> you protest, <em>that's doesn't make any sense!</em>
</p>
<p>
I disagree. It could make sense in at least three different ways.
</p>
<p>
What does a range from <em>2</em> to <em>-1</em> mean? I can think of three interpretations:
</p>
<ul>
<li>It's the empty set</li>
<li>It's the range from <em>-1</em> to <em>2</em></li>
<li>It's the set of numbers that are either less than or equal to <em>-1</em> or greater than or equal to <em>2</em></li>
</ul>
<p>
We may illustrate those three interpretations, together with the original range, like this:
</p>
<p>
<img src="/content/binary/three-ranges-on-number-lines.png" alt="Four number lines, each with a range interpretation drawn in.">
</p>
<p>
According to the first interpretation, we consider the range as the Boolean <em>and</em> of two predicates. In this interpretation the initial range is really the Boolean expression <em>-1 ≤ x ∧ x ≤ 2</em>. The projected range then becomes the expression <em>2 ≤ x ∧ x ≤ -1</em>, which is not possible. This is how I've chosen to implement the <code>contains</code> function:
</p>
<p>
<pre>ghci> Pair x y = fmap ((+1) . negate) $ uncurry Pair (Closed (-1), Closed 2)
ghci> contains (x, y) [0]
False
ghci> contains (x, y) [-3]
False
ghci> contains (x, y) [4]
False</pre>
</p>
<p>
In this interpretation, the result is the empty set. The range isn't impossible; it's just empty. That's the second number line from the top in the above illustration.
</p>
<p>
This isn't, however, the only interpretation. Instead, we may choose to <a href="https://en.wikipedia.org/wiki/Robustness_principle">be liberal in what we accept</a> and interpret the range from <em>2</em> to <em>-1</em> as a 'programmer mistake': <em>What you asked me to do is formally wrong, but I think that I understand that you meant the range from </em>-1<em> to </em>2.
</p>
<p>
That's the third number line in the above illustration.
</p>
<p>
The fourth interpretation is that when the first element of the range is greater than the second, the range represents the <a href="https://en.wikipedia.org/wiki/Complement_(set_theory)">complement</a> of the range. That's the fourth number line in the above illustration.
</p>
<p>
The reason I spent some time on this is that it's easy to confuse the functor laws with other properties that you may associate with a data structure. This may lead you to falsely conclude that a functor isn't a functor, because you feel that it violates some other invariant.
</p>
<p>
If this happens, consider instead whether you could possibly expand the interpretation of the data structure in question.
</p>
<h3 id="859ece7acfdb415da1ba52578189e9ca">
Conclusion <a href="#859ece7acfdb415da1ba52578189e9ca">#</a>
</h3>
<p>
You can model a <em>range</em> as a functor, which enables you to project ranges, either moving them around on an imaginary number line, or changing the type of the range. This might for example enable you to map a date range to an integer range, or vice versa.
</p>
<p>
A functor enables mapping or projection, and some maps may produce results that you find odd or counter-intuitive. In this article you saw an example of that in the shape of a negated range where the first element (the 'minimum', in one interpretation) becomes greater than the second element (the 'maximum'). You may take that as an indication that the functor isn't, after all, a functor.
</p>
<p>
This isn't the case. A data structure and its <em>map</em> function is a functor if the the mapping obeys the functor laws, which is the case for the range structures you've seen here.
</p>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Statically and dynamically typed scriptshttps://blog.ploeh.dk/2024/02/05/statically-and-dynamically-typed-scripts2024-02-05T07:53:00+00:00Mark Seemann
<div id="post">
<p>
<em>Extracting and analysing data in Haskell and Python.</em>
</p>
<p>
I was recently following a course in mathematical analysis and probability for computer scientists. One assignment asked to analyze a small <a href="https://en.wikipedia.org/wiki/Comma-separated_values">CSV file</a> with data collected in a student survey. The course contained a mix of pure maths and practical application, and the official programming language to be used was <a href="https://www.python.org/">Python</a>. It was understood that one was to do the work in Python, but it wasn't an explicit requirement, and I was so tired that didn't have the energy for it.
</p>
<p>
I can get by in Python, but it's not a language I'm actually comfortable with. For small experiments, ad-hoc scripting, etc. I reach for <a href="https://www.haskell.org/">Haskell</a>, so that's what I did.
</p>
<p>
This was a few months ago, and I've since followed another course that required more intense use of Python. With a few more months of Python programming under my belt, I decided to revisit that old problem and do it in Python with the explicit purpose of comparing and contrasting the two.
</p>
<h3 id="ae9c59e5fd0744f98841c6f864b20e33">
Static or dynamic types for scripting <a href="#ae9c59e5fd0744f98841c6f864b20e33">#</a>
</h3>
<p>
I'd like to make one point with these articles, and that is that dynamically typed languages aren't inherently better suited for scripting than statically typed languages. From this, it does not, however, follow that statically typed languages are better, either. Rather, I increasingly believe that whether you find one or the other more productive is a question of personality, past experiences, programming background, etc. I've been over this ground before. <a href="/2021/08/09/am-i-stuck-in-a-local-maximum">Many of my heroes seem to favour dynamically typed languages</a>, while I keep returning to statically typed languages.
</p>
<p>
For more than a decade I've preferred <a href="https://fsharp.org/">F#</a> or Haskell for ad-hoc scripting. Note that while these languages are statically typed, they are <a href="/2019/12/16/zone-of-ceremony">low on ceremony</a>. Types are <em>inferred</em> rather than declared. This means that for scripts, you can experiment with small code blocks, iteratively move closer to what you need, just as you would with a language like Python. Change a line of code, and the inferred type changes with it; there are no type declarations that you also need to fix.
</p>
<p>
When I talk about writing scripts in statically typed languages, I have such languages in mind. I wouldn't write a script in C#, <a href="https://en.wikipedia.org/wiki/C_(programming_language)">C</a>, or <a href="https://www.java.com/">Java</a>.
</p>
<blockquote>
<p>
"Let me stop you right there: I don't think there is a real dynamic typing versus static typing debate.
</p>
<p>
"What such debates normally are is language X vs language Y debates (where X happens to be dynamic and Y happens to be static)."
</p>
<footer><cite><a href="https://twitter.com/KevlinHenney/status/1425513161252278280">Kevlin Henney</a></cite></footer>
</blockquote>
<p>
The present articles compare Haskell and Python, so be careful that you don't extrapolate and draw any conclusions about, say, <a href="https://en.wikipedia.org/wiki/C%2B%2B">C++</a> versus <a href="https://www.erlang.org/">Erlang</a>.
</p>
<p>
When writing an ad-hoc script to extract data from a file, it's important to be able to experiment and iterate. Load the file, inspect the data, figure out how to extract subsets of it (particular columns, for example), calculate totals, averages, etc. A <a href="https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop">REPL</a> is indispensable in such situations. The Haskell REPL (called <em><a href="https://en.wikipedia.org/wiki/Glasgow_Haskell_Compiler">Glasgow Haskell Compiler</a> interactive</em>, or just <em>GHCi</em>) is the best one I've encountered.
</p>
<p>
I imagine that a Python expert would start by reading the data to slice and dice it various ways. We may label this a <em>data-first</em> approach, but be careful not to read too much into this, as I don't really know what I'm talking about. That's not how my mind works. Instead, I tend to take a <em>types-first</em> approach. I'll look at the data and start with the types.
</p>
<h3 id="6454710fbb5644ae979af4aa247dce96">
The assignment <a href="#6454710fbb5644ae979af4aa247dce96">#</a>
</h3>
<p>
The actual task is the following. At the beginning of the course, the professors asked students to fill out a survey. Among the questions asked was which grade the student expected to receive, and how much experience with programming he or she already had.
</p>
<p>
Grades are given according to the <a href="https://en.wikipedia.org/wiki/Academic_grading_in_Denmark">Danish academic scale</a>: -3, 00, 02, 4, 7, 10, and 12, and experience level on a simple numeric scale from 1 to 7, with 1 indicating no experience and 7 indicating expert-level experience.
</p>
<p>
Here's a small sample of the data:
</p>
<p>
<pre>No,3,2,6,6
No,4,2,3,7
No,1,12,6,2
No,4,10,4,3
No,3,4,4,6</pre>
</p>
<p>
The expected grade is in the third column (i.e. <em>2, 2, 12, 10, 4</em>) and the experience level is in the fourth column (<em>6,3,6,4,4</em>). The other columns are answers to different survey questions. The full data set contains 38 rows.
</p>
<p>
The assignment poses the following questions: Two rows from the survey data are randomly selected. What is the <a href="https://www.probabilitycourse.com/chapter3/3_1_3_pmf.php">probability mass function</a> (PMF) of the sum of their expected grades, and what is the PMF of the absolute difference between their programming experience levels?
</p>
<p>
In both cases I was also asked to plot the PMFs.
</p>
<h3 id="331b2ed3198f4a59872eb0e9d2f4ebd9">
Comparisons <a href="#331b2ed3198f4a59872eb0e9d2f4ebd9">#</a>
</h3>
<p>
As outlined above, I originally wrote a Haskell script to answer the questions, and only months later returned to the problem to give it a go in Python. When reading my detailed walkthroughs, keep in mind that I have 8-9 years of Haskell experience, and that I tend to 'think in Haskell', while I have only about a year of experience with Python. I don't consider myself proficient with Python, so the competition is rigged from the outset.
</p>
<ul>
<li><a href="/2024/02/19/extracting-data-from-a-small-csv-file-with-haskell">Extracting data from a small CSV file with Haskell</a></li>
<li><a href="/2024/03/18/extracting-data-from-a-small-csv-file-with-python">Extracting data from a small CSV file with Python</a></li>
</ul>
<p>
For this small task, I don't think that there's a clear winner. I still like my Haskell code the best, but I'm sure someone better at Python could write a much cleaner script. I also have to admit that <a href="https://matplotlib.org/">Matplotlib</a> makes it a breeze to produce nice-looking plots with Python, whereas I don't even know where to start with that with Haskell.
</p>
<p>
Recently I've done some more advanced data analysis with Python, such as random forest classification, principal component analysis, KNN-classification, etc. While I understand that I'm only scratching the surface of data science and machine learning, it's obvious that there's a rich Python ecosystem for that kind of work.
</p>
<h3 id="dcf63d011f02487eb051e3a75cbe59f7">
Conclusion <a href="#dcf63d011f02487eb051e3a75cbe59f7">#</a>
</h3>
<p>
This lays the foundations for comparing a small Haskell script with an equivalent Python script. There's no scientific method to the comparison; it's just me doing the same exercise twice, a bit like I'd <a href="/2020/01/13/on-doing-katas">do katas</a> with multiple variations in order to learn.
</p>
<p>
While I still like Haskell better than Python, that's only a personal preference. I'm deliberately not declaring a winner.
</p>
<p>
One point I'd like to make, however, is that there's nothing inherently better about a dynamically typed language when it comes to ad-hoc scripting. Languages with strong type inference work well, too.
</p>
<p>
<strong>Next:</strong> <a href="/2024/02/19/extracting-data-from-a-small-csv-file-with-haskell">Extracting data from a small CSV file with Haskell</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Error categories and category errorshttps://blog.ploeh.dk/2024/01/29/error-categories-and-category-errors2024-01-29T16:05:00+00:00Mark Seemann
<div id="post">
<p>
<em>How I currently think about errors in programming.</em>
</p>
<p>
A reader <a href="/2023/08/14/replacing-mock-and-stub-with-a-fake#0afe67b375254fe193a3fd10234a1ce9">recently asked a question</a> that caused me to reflect on the way I think about errors in software. While my approach to error handling has remained largely the same for years, I don't think I've described it in an organized way. I'll try to present those thoughts here.
</p>
<p>
This article is, for lack of a better term, a <em>think piece</em>. I don't pretend that it represents any fundamental truth, or that this is the only way to tackle problems. Rather, I write this article for two reasons.
</p>
<ul>
<li>Writing things down often helps clarifying your thoughts. While I already feel that my thinking on the topic of error handling is fairly clear, I've written enough articles that I know that by writing this one, I'll learn something new.</li>
<li>Publishing this article enables the exchange of ideas. By sharing my thoughts, I enable readers to point out errors in my thinking, or to improve on my work. Again, I may learn something. Perhaps others will, too.</li>
</ul>
<p>
Although I don't claim that the following is universal, I've found it useful for years.
</p>
<h3 id="43e58fcae3184b0597def9a4ec5629d7">
Error categories <a href="#43e58fcae3184b0597def9a4ec5629d7">#</a>
</h3>
<p>
Almost all software is at risk of failing for a myriad of reasons: User input, malformed data, network partitions, cosmic rays, race conditions, bugs, etc. Even so, we may categorize errors like this:
</p>
<ul>
<li>Predictable errors we can handle</li>
<li>Predictable errors we can't handle</li>
<li>Errors we've failed to predict</li>
</ul>
<p>
This distinction is hardly original. I believe I've picked it up from Michael Feathers, but although I've searched, I can't find the source, so perhaps I'm remembering it wrong.
</p>
<p>
You may find these three error categories underwhelming, but I find it useful to first consider what may be done about an error. Plenty of error situations are predictable. For example, all input should be considered suspect. This includes user input, but also data you receive from other systems. This kind of potential error you can typically solve with input validation, which I believe is <a href="/2020/12/14/validation-a-solved-problem">a solved problem</a>. Another predictable kind of error is unavailable services. Many systems store data in databases. You can easily predict that the database <em>will</em>, sooner or later, be unreachable. Potential causes include network partitions, a misconfigured connection string, logs running full, a crashed server, denial-of-service attacks, etc.
</p>
<p>
With some experience with software development, it's not that hard producing a list of things that could go wrong. The next step is to decide what to do about it.
</p>
<p>
There are scenarios that are so likely to happen, and where the solution is so well-known, that they fall into the category of predictable errors that you can handle. User input belongs here. You examine the input and inform the user if it's invalid.
</p>
<p>
Even with input, however, other scenarios may lead you down different paths. What if, instead of a system with a user interface, you're developing a batch job that receives a big data file every night? How do you deal with invalid input in that scenario? Do you reject the entire data set, or do you filter it so that you only handle the valid input? Do you raise a notification to asynchronously inform the sender that input was malformed?
</p>
<p>
Notice how categorization is context-dependent. It would be a (category?) error to interpret the above model as fixed and universal. Rather, it's an analysis framework that helps identifying how to categorize various fault scenarios in a particular application context.
</p>
<p>
Another example may be in order. If your system depends on a database, a predictable error is that the database will be unavailable. Can you handle that situation?
</p>
<p>
A common reaction is that there's really not a lot one can do about that. You may retry the operation, log the problem, or notify an on-call engineer, but ultimately the system <em>depends</em> on the database. If the database is unreachable, the system can't work. You can't handle that problem, so this falls in the category of predictable errors that you can't handle.
</p>
<p>
Or does it?
</p>
<h3 id="b703b2c68f3b4656a1cf1f4042974ab7">
Trade-offs of error handling <a href="#b703b2c68f3b4656a1cf1f4042974ab7">#</a>
</h3>
<p>
The example of an unreachable database is useful to explore in order to demonstrate that error handling isn't writ in stone, but rather an architectural design decision. Consider <a href="/2014/08/11/cqs-versus-server-generated-ids">a common API design</a> like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IRepository</span><T>
{
<span style="color:blue;">int</span> Create(T item);
<span style="color:green;">// other members</span>
}</pre>
</p>
<p>
What happens if client code calls <code>Create</code> but the database is unreachable? This is C# code, but the problem generalizes. With most implementations, the <code>Create</code> method will throw an exception.
</p>
<p>
Can you handle that situation? You may retry a couple of times, but if you have a user waiting for a response, you can't retry for too long. Once time is up, you'll have to accept that the operation failed. In a language like C#, the most robust implementation is to <em>not</em> handle the specific exception, but instead let it bubble up to be handled by a global exception handler that usually can't do much else than showing the user a generic error message, and then log the exception.
</p>
<p>
This isn't your only option, though. You may find yourself in a context where this kind of attitude towards errors is unacceptable. If you're working with <a href="https://twitter.com/ploeh/status/530320252790669313">BLOBAs</a> it's probably fine, but if you're working with medical life-support systems, or deep-space probes, or in other high-value contexts, the overall error-tolerance may be lower. Then what do you do?
</p>
<p>
You may try to address the concern with IT operations: Configure failover systems for the database, installing two network cards in every machine, and so on. This may (also) be a way to address the problem, but isn't your only option. You may also consider changing the software architecture.
</p>
<p>
One option may be to switch to an asynchronous message-based system where messages are transmitted via durable queues. Granted, durables queues may fail as well (everything may fail), but when done right, they tend to be more robust. Even a machine that has lost all network connectivity may queue messages on its local disk until the network returns. Yes, the disk may run full, etc. but it's <em>less</em> likely to happen than a network partition or an unreachable database.
</p>
<p>
Notice that an unreachable database now goes into the category of errors that you've predicted, and that you can handle. On the other hand, failing to send an asynchronous message is now a new kind of error in your system: One that you can predict, but can't handle.
</p>
<p>
Making this change, however, impacts your software architecture. You can no longer have an interface method like the above <code>Create</code> method, because you can't rely on it returning an <code>int</code> in reasonable time. During error scenarios, messages may sit in queues for hours, if not days, so you can't block on such code.
</p>
<p>
As <a href="/2014/08/11/cqs-versus-server-generated-ids">I've explained elsewhere</a> you can instead model a <code>Create</code> method like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IRepository</span><T>
{
<span style="color:blue;">void</span> Create(<span style="color:#2b91af;">Guid</span> id, T item);
<span style="color:green;">// other members</span>
}</pre>
</p>
<p>
Not only does this follow the <a href="https://en.wikipedia.org/wiki/Command%E2%80%93query_separation">Command Query Separation</a> principle, it also makes it easier for you to adopt an asynchronous message-based architecture. Done consistently, however, this requires that you approach application design in a way different from a design where you assume that the database is reachable.
</p>
<p>
It may even impact a user interface, because it'd be a good idea to design user experience in such a way that it helps the user have a congruent mental model of how the system works. This may include making the concept of an <em>outbox</em> explicit in the user interface, as it may help users realize that writes happen asynchronously. Most users understand that email works that way, so it's not inconceivable that they may be able to adopt a similar mental model of other applications.
</p>
<p>
The point is that this is an <em>option</em> that you may consider as an architect. Should you always design systems that way? I wouldn't. There's much extra complexity that you have to deal with in order to make asynchronous messaging work: UX, out-of-order messages, dead-letter queues, message versioning, etc. Getting to <a href="https://en.wikipedia.org/wiki/High_availability">five nines</a> is expensive, and often not warranted.
</p>
<p>
The point is rather that what goes in the <em>predictable errors we can't handle</em> category isn't fixed, but context-dependent. Perhaps we should rather name the category <em>predictable errors we've decided not to handle</em>.
</p>
<h3 id="529bcc700301441aa3337bdb1911f74d">
Bugs <a href="#529bcc700301441aa3337bdb1911f74d">#</a>
</h3>
<p>
How about the third category of errors, those we've failed to predict? We also call these <em>bugs</em> or <em>defects</em>. By definition, we only learn about them when they manifest. As soon as they become apparent, however, they fall into one of the other categories. If an error occurs once, it may occur again. It is now up to you to decide what to do about it.
</p>
<p>
I usually consider <a href="/2023/01/23/agilean">errors as stop-the-line-issues</a>, so I'd be inclined to immediately address them. On they other hand, if you don't do that, you've implicitly decided to put them in the category of predictable errors that you've decided not to handle.
</p>
<p>
We don't intentionally write bugs; there will always be some of those around. On the other hand, various practices help reducing them: Test-driven development, code reviews, property-based testing, but also up-front design.
</p>
<h3 id="4f0b84e839d54f16af37d0295140dc17">
Error-free code <a href="#4f0b84e839d54f16af37d0295140dc17">#</a>
</h3>
<p>
Do consider explicitly how code may fail.
</p>
<p>
Despite the title of this section, there's no such thing as error-free code. Still, you can explicitly think about edge cases. For example, how might the following function fail?
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:#2b91af;">TimeSpan</span> <span style="color:#74531f;">Average</span>(<span style="color:blue;">this</span> <span style="color:#2b91af;">IEnumerable</span><<span style="color:#2b91af;">TimeSpan</span>> <span style="font-weight:bold;color:#1f377f;">timeSpans</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sum</span> = <span style="color:#2b91af;">TimeSpan</span>.Zero;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">count</span> = 0;
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">ts</span> <span style="font-weight:bold;color:#8f08c4;">in</span> <span style="font-weight:bold;color:#1f377f;">timeSpans</span>)
{
<span style="font-weight:bold;color:#1f377f;">sum</span> <span style="font-weight:bold;color:#74531f;">+=</span> <span style="font-weight:bold;color:#1f377f;">ts</span>;
<span style="font-weight:bold;color:#1f377f;">count</span>++;
}
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="font-weight:bold;color:#1f377f;">sum</span> <span style="font-weight:bold;color:#74531f;">/</span> <span style="font-weight:bold;color:#1f377f;">count</span>;
}</pre>
</p>
<p>
In at least two ways: The input collection may be empty or infinite. I've <a href="/2020/02/03/non-exceptional-averages">already suggested a few ways to address those problems</a>. Some of them are similar to what Michael Feathers calls <a href="https://youtu.be/AnZ0uTOerUI?si=1gJXYFoVlNTSbjEt">unconditional code</a>, in that we may change the <a href="https://en.wikipedia.org/wiki/Domain_of_a_function">domain</a>. Another option, that I didn't cover in the linked article, is to expand the <a href="https://en.wikipedia.org/wiki/Codomain">codomain</a>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> TimeSpan? <span style="font-weight:bold;color:#74531f;">Average</span>(<span style="color:blue;">this</span> IReadOnlyCollection<TimeSpan> <span style="font-weight:bold;color:#1f377f;">timeSpans</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (!timeSpans.Any())
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sum</span> = TimeSpan.Zero;
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (var <span style="font-weight:bold;color:#1f377f;">ts</span> <span style="font-weight:bold;color:#8f08c4;">in</span> timeSpans)
sum += ts;
<span style="font-weight:bold;color:#8f08c4;">return</span> sum / timeSpans.Count;
}</pre>
</p>
<p>
Now, instead of diminishing the domain, we expand the codomain by allowing the return value to be null. (Interestingly, this is the inverse of <a href="/2021/12/06/the-liskov-substitution-principle-as-a-profunctor">my profunctor description of the Liskov Substitution Principle</a>. I don't yet know what to make of that. See: Just by writing things down, I learn something I hadn't realized before.)
</p>
<p>
This is beneficial in a statically typed language, because such a change makes hidden knowledge explicit. It makes it so explicit that a type checker can point out when we make mistakes. <a href="https://blog.janestreet.com/effective-ml-video/">Make illegal states unrepresentable</a>. <a href="https://en.wikipedia.org/wiki/Poka-yoke">Poka-yoke</a>. A potential run-time exception is now a compile-time error, and it's firmly in the category of errors that we've predicted and decided to handle.
</p>
<p>
In the above example, we could use the built-in .NET <a href="https://learn.microsoft.com/dotnet/api/system.nullable-1">Nullable<T></a> (with the <code>?</code> syntactic-sugar alias). In other cases, you may resort to returning a <a href="/2018/03/26/the-maybe-functor">Maybe</a> (AKA <em>option</em>).
</p>
<h3 id="0855ccbcd46f4c29980a219993293279">
Modelling errors <a href="#0855ccbcd46f4c29980a219993293279">#</a>
</h3>
<p>
Explicitly expanding the codomain of functions to signal potential errors is beneficial if you expect the caller to be able to handle the problem. If callers can't handle an error, forcing them to deal with it is just going to make things more difficult. I've never done any professional <a href="https://www.java.com/">Java</a> programming, but I've heard plenty of Java developers complain about checked exceptions. As far as I can tell, the problem in Java isn't so much with the language feature per se, but rather with the exception types that APIs force you to handle.
</p>
<p>
As an example, imagine that every time you call a database API, the compiler forces you to handle an <a href="https://learn.microsoft.com/dotnet/api/system.io.ioexception">IOException</a>. Unless you explicitly architect around it (as outlined above), this is likely to be one of the errors you can predict, but decide not to handle. But if the compiler forces you to handle it, then what do you do? You probably find some workaround that involves re-throwing the exception, or, as I understand that some Java developers do, declare that their own APIs may throw <em>any</em> exception, and by that means just pass the buck. Not helpful.
</p>
<p>
As far as I can tell, (checked) exceptions are equivalent to the <a href="/2018/06/11/church-encoded-either">Either</a> container, also known as <em>Result</em>. We may imagine that instead of throwing exceptions, a function may return an Either value: <em>Right</em> for a right result (explicit mnemonic, there!), and left for an error.
</p>
<p>
It might be tempting to model all error-producing operations as Either-returning, but <a href="https://eiriktsarpalis.wordpress.com/2017/02/19/youre-better-off-using-exceptions/">you're often better off using exceptions</a>. Throw exceptions in those situations that you expect most clients can't recover from. Return <em>left</em> (or <em>error</em>) cases in those situations that you expect that a typical client would want to handle.
</p>
<p>
Again, it's context-specific, so if you're developing a reusable library, there's a balance to strike in API design (or overloads to supply).
</p>
<h3 id="4ecf10fdb9a842e88aeecf1024cd466a">
Most errors are just branches <a href="#4ecf10fdb9a842e88aeecf1024cd466a">#</a>
</h3>
<p>
In many languages, errors are somehow special. Most modern languages include a facility to model errors as exceptions, and special syntax to throw or catch them. (The odd man out may be C, with its reliance on error codes as return values, but that is incredible awkward for other reasons. You may also reasonably argue that C is hardly a modern language.)
</p>
<p>
Even Haskell has exceptions, even though it also has deep language support for <code>Maybe</code> and <code>Either</code>. Fortunately, Haskell APIs <em>tend</em> to only throw exceptions in those cases where average clients are unlikely to handle them: Timeouts, I/O failures, and so on.
</p>
<p>
It's unfortunate that languages treat errors as something exceptional, because this nudges us to make a proper category error: That errors are somehow special, and that we can't use normal coding constructs or API design practices to model them.
</p>
<p>
But you can. That's what <a href="https://youtu.be/AnZ0uTOerUI?si=1gJXYFoVlNTSbjEt">Michael Feathers' presentation is about</a>, and that's what you can do by <a href="https://blog.janestreet.com/effective-ml-video/">making illegal states unrepresentable</a>, or by returning Maybe or Either values.
</p>
<p>
Most errors are just branches in your code; where it diverges from the happy path in order to do something else.
</p>
<h3 id="a0e442d5e5c842108555236745f23155">
Conclusion <a href="#a0e442d5e5c842108555236745f23155">#</a>
</h3>
<p>
This article presents a framework for thinking about software errors. There are those you can predict may happen, and you choose to handle; those you predict may happen, but you choose to ignore; and those that you have not yet predicted: bugs.
</p>
<p>
A little up-front thinking will often help you predict some errors, but I'm not advocating that you foresee all errors. Some errors are programmer errors, and we make those errors because we're human, exactly because we're failing to predict the behaviour of a particular state of the code. Once you discover a bug, however, you have a choice: Do you address it or ignore it?
</p>
<p>
There are error conditions that you may deliberately choose to ignore. This doesn't necessarily make you an irresponsible programmer, but may rather be the result of a deliberate feasibility study. For example, every network operation may fail. How important is it that your application can keep running without the network? Is it worthwhile to make the code so robust that it can handle that situation? Or can you rather live with a few hours of downtime per quarter? If the latter, it may be best to let a human deal with network partitions when they occur.
</p>
<p>
The three error categories I suggest here are context-dependent. You decide which problems to deal with, and which ones to ignore, but apart from that, error-handling doesn't have to be difficult.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.A Range kata implementation in C#https://blog.ploeh.dk/2024/01/22/a-range-kata-implementation-in-c2024-01-22T07:05:00+00:00Mark Seemann
<div id="post">
<p>
<em>A port of the corresponding F# code.</em>
</p>
<p>
This article is an instalment in <a href="/2024/01/01/variations-of-the-range-kata">a short series of articles on the Range kata</a>. In the <a href="/2024/01/15/a-range-kata-implementation-in-f">previous article</a> I made a pass at <a href="https://codingdojo.org/kata/Range/">the kata</a> in <a href="https://fsharp.org/">F#</a>, using property-based testing with <a href="https://hedgehog.qa/">Hedgehog</a> to generate test data.
</p>
<p>
In the conclusion I mused about the properties I was able to come up with. Is it possible to describe open, closed, and mixed ranges in a way that's less coupled to the implementation? To be honest, I still don't have an answer to that question. Instead, in this article, I describe a straight port of the F# code to C#. There's value in that, too, for people who wonder <a href="/2015/04/15/c-will-eventually-get-all-f-features-right">how to reap the benefits of F# in C#</a>.
</p>
<p>
The code is <a href="https://github.com/ploeh/RangeCSharp">available on GitHub</a>.
</p>
<h3 id="2b05848a3b494ec99cc0e50da22bdd15">
First property <a href="#2b05848a3b494ec99cc0e50da22bdd15">#</a>
</h3>
<p>
Both F# and C# are .NET languages. They run in the same substrate, and are interoperable. While Hedgehog is written in F#, it's possible to consume F# libraries from C#, and vice versa. I've done this multiple times with <a href="https://fscheck.github.io/FsCheck/">FsCheck</a>, but I admit to never having tried it with Hedgehog.
</p>
<p>
If you want to try property-based testing in C#, a third alternative is available: <a href="https://github.com/AnthonyLloyd/CsCheck">CsCheck</a>. It's written in C# and is more <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> in that context. While I sometimes <a href="/2021/02/15/when-properties-are-easier-than-examples">still use FsCheck from C#</a>, I often choose CsCheck for didactic reasons.
</p>
<p>
The first property I wrote was a direct port of the idea of the first property I wrote in F#:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">ClosedRangeContainsList</span>()
{
(<span style="color:blue;">from</span> xs <span style="color:blue;">in</span> Gen.Short.Enumerable.Nonempty
<span style="color:blue;">let</span> min = xs.Min()
<span style="color:blue;">let</span> max = xs.Max()
<span style="color:blue;">select</span> (xs, min, max))
.Sample(<span style="font-weight:bold;color:#1f377f;">t</span> =>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> Range<<span style="color:blue;">short</span>>(
<span style="color:blue;">new</span> ClosedEndpoint<<span style="color:blue;">short</span>>(t.min),
<span style="color:blue;">new</span> ClosedEndpoint<<span style="color:blue;">short</span>>(t.max));
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = sut.Contains(t.xs);
Assert.True(actual, <span style="color:#a31515;">$"</span><span style="color:#a31515;">Expected </span>{t.xs}<span style="color:#a31515;"> to be contained in </span>{sut}<span style="color:#a31515;">.</span><span style="color:#a31515;">"</span>);
});
}</pre>
</p>
<p>
This test (or property, if you will) uses a technique that I often use with property-based testing. I'm still searching for a catchy name for this, but here we may call it something like <em>reverse test-case assembly</em>. My <em>goal</em> is to test a predicate, and this particular property should verify that for a given <a href="https://en.wikipedia.org/wiki/Equivalence_class">Equivalence Class</a>, the predicate is always true.
</p>
<p>
While we may think of an Equivalence Class as a set from which we pick test cases, I don't actually have a full enumeration of such a set. I can't have that, since that set is infinitely big. Instead of randomly picking values from a set that I can't fully populate, I instead carefully pick test case values in such a way that they would all belong to the same <a href="https://en.wikipedia.org/wiki/Partition_of_a_set">set partition</a> (Equivalence Class).
</p>
<p>
The <a href="/2022/06/13/some-thoughts-on-naming-tests">test name suggests the test case</a>: I'd like to verify that given I have a closed range, when I ask it whether a list <em>within</em> that range is contained, then the answer is <em>true</em>. How do I pick such a test case?
</p>
<p>
I do it in reverse. You can say that the sampling is the dual of the test. I start with a list (<code>xs</code>) and only then do I create a range that contains it. Since the first test case is for a closed range, the <code>min</code> and <code>max</code> values are sufficient to define such a range.
</p>
<p>
How do I pass that property?
</p>
<p>
Degenerately, as is often the case with TDD beginnings:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">Contains</span>(IEnumerable<T> <span style="font-weight:bold;color:#1f377f;">candidates</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">true</span>;
}</pre>
</p>
<p>
Even though the <code>ClosedRangeContainsList</code> property effectively executes a hundred test cases, the <a href="/2019/10/07/devils-advocate">Devil's Advocate</a> can easily ignore that and instead return hard-coded <code>true</code>.
</p>
<h3 id="61c1050202934baa99baeaa4dfed40e1">
Endpoint sum type <a href="#61c1050202934baa99baeaa4dfed40e1">#</a>
</h3>
<p>
I'm not going to bore you with the remaining properties. The repository is available on GitHub if you're interested in those details.
</p>
<p>
If you've programmed in F# for some time, you typically miss <a href="https://en.wikipedia.org/wiki/Algebraic_data_type">algebraic data types</a> when forced to return to C#. A language like C# does have <a href="https://en.wikipedia.org/wiki/Product_type">product types</a>, but lack native <a href="https://en.wikipedia.org/wiki/Tagged_union">sum types</a>. Even so, not all is lost. I've previously demonstrated that <a href="/2018/06/25/visitor-as-a-sum-type">you can employ the Visitor pattern to encode a sum type</a>. Another option is to use <a href="/2018/05/22/church-encoding">Church encoding</a>, which I've decided to do here.
</p>
<p>
When choosing between Church encoding and the <a href="https://en.wikipedia.org/wiki/Visitor_pattern">Visitor</a> pattern, Visitor is more object-oriented (after all, it's an original <a href="/ref/dp">GoF</a> design pattern), but Church encoding has fewer moving parts. Since I was just doing an exercise, I went for the simpler implementation.
</p>
<p>
An <code>Endpoint</code> object should allow one of two cases: <code>Open</code> or <code>Closed</code>. To avoid <a href="https://wiki.c2.com/?PrimitiveObsession">primitive obsession</a> I gave the class a <code>private</code> constructor:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Endpoint</span><<span style="color:#2b91af;">T</span>>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> T value;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">bool</span> isClosed;
<span style="color:blue;">private</span> <span style="color:#2b91af;">Endpoint</span>(T <span style="font-weight:bold;color:#1f377f;">value</span>, <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#1f377f;">isClosed</span>)
{
<span style="color:blue;">this</span>.value = value;
<span style="color:blue;">this</span>.isClosed = isClosed;
}</pre>
</p>
<p>
Since the constructor is <code>private</code> you need another way to create <code>Endpoint</code> objects. Two factory methods provide that affordance:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Endpoint<T> <span style="font-weight:bold;color:#74531f;">Closed</span><<span style="color:#2b91af;">T</span>>(T <span style="font-weight:bold;color:#1f377f;">value</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> Endpoint<T>.Closed(value);
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Endpoint<T> <span style="font-weight:bold;color:#74531f;">Open</span><<span style="color:#2b91af;">T</span>>(T <span style="font-weight:bold;color:#1f377f;">value</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> Endpoint<T>.Open(value);
}</pre>
</p>
<p>
The heart of the Church encoding is the <code>Match</code> method:
</p>
<p>
<pre><span style="color:blue;">public</span> TResult <span style="font-weight:bold;color:#74531f;">Match</span><<span style="color:#2b91af;">TResult</span>>(
Func<T, TResult> <span style="font-weight:bold;color:#1f377f;">whenClosed</span>,
Func<T, TResult> <span style="font-weight:bold;color:#1f377f;">whenOpen</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (isClosed)
<span style="font-weight:bold;color:#8f08c4;">return</span> whenClosed(value);
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> whenOpen(value);
}</pre>
</p>
<p>
Such an API is an example of <a href="https://en.wikipedia.org/wiki/Poka-yoke">poka-yoke</a> because it obliges you to deal with both cases. The compiler will keep you honest: <em>Did you remember to deal with both the open and the closed case?</em> When calling the <code>Match</code> method, you must supply both arguments, or your code doesn't compile. <a href="https://blog.janestreet.com/effective-ml-video/">Make illegal states unrepresentable</a>.
</p>
<h3 id="484d1f121cc44050b76f23d92bca429c">
Containment <a href="#484d1f121cc44050b76f23d92bca429c">#</a>
</h3>
<p>
With the <code>Endpoint</code> class in place, you can implement a <code>Range</code> class.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Range</span><<span style="color:#2b91af;">T</span>> <span style="color:blue;">where</span> T : IComparable<T></pre>
</p>
<p>
It made sense to me to constrain the <code>T</code> type argument to <code>IComparable<T></code>, although it's possible that I could have deferred that constraint to the actual <code>Contains</code> method, like I did with <a href="/2024/01/08/a-range-kata-implementation-in-haskell">my Haskell implementation</a>.
</p>
<p>
A <code>Range</code> holds two <code>Endpoint</code> values:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">Range</span>(Endpoint<T> <span style="font-weight:bold;color:#1f377f;">min</span>, Endpoint<T> <span style="font-weight:bold;color:#1f377f;">max</span>)
{
<span style="color:blue;">this</span>.min = min;
<span style="color:blue;">this</span>.max = max;
}</pre>
</p>
<p>
The <code>Contains</code> method makes use of the built-in <a href="https://learn.microsoft.com/dotnet/api/system.linq.enumerable.all">All</a> method, using a <code>private</code> helper function as the predicate:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">IsInRange</span>(T <span style="font-weight:bold;color:#1f377f;">candidate</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> min.Match(
whenClosed: <span style="font-weight:bold;color:#1f377f;">l</span> => max.Match(
whenClosed: <span style="font-weight:bold;color:#1f377f;">h</span> => l.CompareTo(candidate) <= 0 && candidate.CompareTo(h) <= 0,
whenOpen: <span style="font-weight:bold;color:#1f377f;">h</span> => l.CompareTo(candidate) <= 0 && candidate.CompareTo(h) < 0),
whenOpen: <span style="font-weight:bold;color:#1f377f;">l</span> => max.Match(
whenClosed: <span style="font-weight:bold;color:#1f377f;">h</span> => l.CompareTo(candidate) < 0 && candidate.CompareTo(h) <= 0,
whenOpen: <span style="font-weight:bold;color:#1f377f;">h</span> => l.CompareTo(candidate) < 0 && candidate.CompareTo(h) < 0));
}</pre>
</p>
<p>
This implementation performs a nested <code>Match</code> to arrive at the appropriate answer. The code isn't as elegant or readable as its F# counterpart, but it comes with comparable compile-time safety. You can't forget a combination, because if you do, your code isn't going to compile.
</p>
<p>
Still, you can't deny that C# involves more <a href="/2019/12/16/zone-of-ceremony">ceremony</a>.
</p>
<h3 id="54fd635d237d4f57891bda8ef5d13623">
Conclusion <a href="#54fd635d237d4f57891bda8ef5d13623">#</a>
</h3>
<p>
Once you know how, it's not that difficult to port a functional design from F# or <a href="https://www.haskell.org/">Haskell</a> to a language like C#. The resulting code tends to be more complicated, but to a large degree, it's possible to retain the type safety.
</p>
<p>
In this article you saw a sketch of how to make that transition, using the Range kata as an example. The resulting C# API is perfectly serviceable, as the test code demonstrates.
</p>
<p>
Now that we have covered the fundamentals of the Range kata we have learned enough about it to go beyond the exercise and examine some more abstract properties.
</p>
<p>
<strong>Next:</strong> <a href="/2024/02/12/range-as-a-functor">Range as a functor</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.A Range kata implementation in F#https://blog.ploeh.dk/2024/01/15/a-range-kata-implementation-in-f2024-01-15T07:20:00+00:00Mark Seemann
<div id="post">
<p>
<em>This time with some property-based testing.</em>
</p>
<p>
This article is an instalment in <a href="/2024/01/01/variations-of-the-range-kata">a short series of articles on the Range kata</a>. In the <a href="/2024/01/08/a-range-kata-implementation-in-haskell">previous article</a> I described my first attempt at the kata, and also complained that I had to think of test cases myself. When I find it tedious coming up with new test cases, I usually start to wonder if it'd be easier to use property-based testing.
</p>
<p>
Thus, when I decided to revisit <a href="https://codingdojo.org/kata/Range/">the kata</a>, the variation that I was most interested in pursuing was to explore whether it would make sense to use property-based testing instead of a set of existing examples.
</p>
<p>
Since I also wanted to do the second attempt in <a href="https://fsharp.org/">F#</a>, I had a choice between <a href="https://fscheck.github.io/FsCheck/">FsCheck</a> and <a href="https://hedgehog.qa/">Hedgehog</a>. Each have their strengths and weaknesses, but since I already know FsCheck so well, I decided to go with Hedgehog.
</p>
<p>
I also soon discovered that I had no interest in developing the full suite of capabilities implied by the kata. Instead, I decided to focus on just the data structure itself, as well as the <code>contains</code> function. As in the previous article, this function can also be used to cover the kata's <em>ContainsRange</em> feature.
</p>
<h3 id="81447639b1a54c8c9437b81f3856018f">
Getting started <a href="#81447639b1a54c8c9437b81f3856018f">#</a>
</h3>
<p>
There's no rule that you can't combine property-based testing with test-driven development (TDD). On the contrary, that's how I often do it. In this exercise, I first wrote this test:
</p>
<p>
<pre>[<Fact>]
<span style="color:blue;">let</span> ``Closed range contains list`` () = Property.check <| property {
<span style="color:blue;">let!</span> xs = Gen.int16 (Range.linearBounded ()) |> Gen.list (Range.linear 1 99)
<span style="color:blue;">let</span> min = List.min xs
<span style="color:blue;">let</span> max = List.max xs
<span style="color:blue;">let</span> actual = (Closed min, Closed max) |> Range.contains xs
Assert.True (actual, sprintf <span style="color:#a31515;">"Range [%i, %i] expected to contain list."</span> min max) }</pre>
</p>
<p>
We have to be careful when reading and understanding this code: There are two <code>Range</code> modules in action here!
</p>
<p>
Hedgehog comes with a <code>Range</code> module that you must use to define how it samples values from <a href="https://en.wikipedia.org/wiki/Domain_of_a_function">domains</a>. Examples of that here are <code>Range.linearBounded</code> and <code>Range.linear</code>.
</p>
<p>
On the other hand, I've defined <em>my</em> <code>contains</code> function in a <code>Range</code> module, too. As long as there's no ambiguity, the F# compiler doesn't have a problem with that. Since there's no <code>contains</code> function in the Hedgehog <code>Range</code> module, the F# compiler isn't confused.
</p>
<p>
We humans, on the other hand, might be confused, and had this been a code base that I had to maintain for years, I might seriously consider whether I should rename my own <code>Range</code> module to something else, like <code>Interval</code>, perhaps.
</p>
<p>
In any case, the first test (or property, if you will) uses a technique that I often use with property-based testing. I'm still searching for a catchy name for this, but here we may call it something like <em>reverse test-case assembly</em>. My <em>goal</em> is to test a predicate, and this particular property should verify that for a given <a href="https://en.wikipedia.org/wiki/Equivalence_class">Equivalence Class</a>, the predicate is always true.
</p>
<p>
While we may think of an Equivalence Class as a set from which we pick test cases, I don't actually have a full enumeration of such a set. I can't have that, since that set is infinitely big. Instead of randomly picking values from a set that I can't fully populate, I instead carefully pick test case values in such a way that they would all belong to the same <a href="https://en.wikipedia.org/wiki/Partition_of_a_set">set partition</a> (Equivalence Class).
</p>
<p>
The <a href="/2022/06/13/some-thoughts-on-naming-tests">test name suggests the test case</a>: I'd like to verify that given I have a closed range, when I ask it whether a list <em>within</em> that range is contained, then the answer is <em>true</em>. How do I pick such a test case?
</p>
<p>
I do it in reverse. You can say that the sampling is the dual of the test. I start with a list (<code>xs</code>) and only then do I create a range that contains it. Since the first test case is for a closed range, the <code>min</code> and <code>max</code> values are sufficient to define such a range.
</p>
<p>
How do I pass that property?
</p>
<p>
Degenerately, as is often the case with TDD beginnings:
</p>
<p>
<pre><span style="color:blue;">module</span> Range =
<span style="color:blue;">let</span> contains _ _ = <span style="color:blue;">true</span></pre>
</p>
<p>
Even though the <code>Closed range contains list</code> property effectively executes a hundred test cases, the <a href="/2019/10/07/devils-advocate">Devil's Advocate</a> can easily ignore that and instead return hard-coded <code>true</code>.
</p>
<p>
More properties are required to flesh out the behaviour of the function.
</p>
<h3 id="50564e684f074ae1956b106333320c9b">
Open range <a href="#50564e684f074ae1956b106333320c9b">#</a>
</h3>
<p>
While I do keep the <a href="https://blog.cleancoder.com/uncle-bob/2013/05/27/TheTransformationPriorityPremise.html">transformation priority premise</a> in mind when picking the next test (or, here, <em>property</em>), I'm rarely particularly analytic about it. Since the first property tests that a closed range barely contains a list of values from its minimum to its maximum, it seemed like a promising next step to consider the case where the range consisted of open endpoints. That was the second test I wrote, then:
</p>
<p>
<pre>[<Fact>]
<span style="color:blue;">let</span> ``Open range doesn't contain endpoints`` () = Property.check <| property {
<span style="color:blue;">let!</span> min = Gen.int32 (Range.linearBounded ())
<span style="color:blue;">let!</span> max = Gen.int32 (Range.linearBounded ())
<span style="color:blue;">let</span> actual = (Open min, Open max) |> Range.contains [min; max]
Assert.False (actual, sprintf <span style="color:#a31515;">"Range (%i, %i) expected not to contain list."</span> min max) }</pre>
</p>
<p>
This property simply states that if you query the <code>contains</code> predicate about a list that only contains the endpoints of an open range, then the answer is <code>false</code> because the endpoints are <code>Open</code>.
</p>
<p>
One implementation that passes both tests is this one:
</p>
<p>
<pre><span style="color:blue;">module</span> Range =
<span style="color:blue;">let</span> contains _ endpoints =
<span style="color:blue;">match</span> endpoints <span style="color:blue;">with</span>
| Open _, Open _ <span style="color:blue;">-></span> <span style="color:blue;">false</span>
| _ <span style="color:blue;">-></span> <span style="color:blue;">true</span></pre>
</p>
<p>
This implementation is obviously still incorrect, but we have reason to believe that we're moving closer to something that will eventually work.
</p>
<h3 id="f01d9a05b57d470fac8f48bcfc85df4d">
Tick-tock <a href="#f01d9a05b57d470fac8f48bcfc85df4d">#</a>
</h3>
<p>
In the spirit of the transformation priority premise, I've often found that when test-driving a predicate, I seem to fall into a tick-tock pattern where I alternate between tests for a <code>true</code> return value, followed by a test for a <code>false</code> return value, or the other way around. This was also the case here. The previous test was for a <code>false</code> value, so the third test requires <code>true</code> to be returned:
</p>
<p>
<pre>[<Fact>]
<span style="color:blue;">let</span> ``Open range contains list`` () = Property.check <| property {
<span style="color:blue;">let!</span> xs = Gen.int64 (Range.linearBounded ()) |> Gen.list (Range.linear 1 99)
<span style="color:blue;">let</span> min = List.min xs - 1L
<span style="color:blue;">let</span> max = List.max xs + 1L
<span style="color:blue;">let</span> actual = (Open min, Open max) |> Range.contains xs
Assert.True (actual, sprintf <span style="color:#a31515;">"Range (%i, %i) expected to contain list."</span> min max) }</pre>
</p>
<p>
This then led to this implementation of the <code>contains</code> function:
</p>
<p>
<pre><span style="color:blue;">module</span> Range =
<span style="color:blue;">let</span> contains ys endpoints =
<span style="color:blue;">match</span> endpoints <span style="color:blue;">with</span>
| Open x, Open z <span style="color:blue;">-></span>
ys |> List.forall (<span style="color:blue;">fun</span> y <span style="color:blue;">-></span> x < y && y < z)
| _ <span style="color:blue;">-></span> <span style="color:blue;">true</span></pre>
</p>
<p>
Following up on the above <code>true</code>-demanding test, I added one that tested a <code>false</code> scenario:
</p>
<p>
<pre>[<Fact>]
<span style="color:blue;">let</span> ``Open-closed range doesn't contain endpoints`` () = Property.check <| property {
<span style="color:blue;">let!</span> min = Gen.int16 (Range.linearBounded ())
<span style="color:blue;">let!</span> max = Gen.int16 (Range.linearBounded ())
<span style="color:blue;">let</span> actual = (Open min, Closed max) |> Range.contains [min; max]
Assert.False (actual, sprintf <span style="color:#a31515;">"Range (%i, %i] expected not to contain list."</span> min max) }</pre>
</p>
<p>
This again led to this implementation:
</p>
<p>
<pre><span style="color:blue;">module</span> Range =
<span style="color:blue;">let</span> contains ys endpoints =
<span style="color:blue;">match</span> endpoints <span style="color:blue;">with</span>
| Open x, Open z <span style="color:blue;">-></span>
ys |> List.forall (<span style="color:blue;">fun</span> y <span style="color:blue;">-></span> x < y && y < z)
| Open x, Closed z <span style="color:blue;">-></span> <span style="color:blue;">false</span>
| _ <span style="color:blue;">-></span> <span style="color:blue;">true</span></pre>
</p>
<p>
I had to add four more tests before I felt confident that I had the right implementation. I'm not going to show them all here, but you can look at the <a href="https://github.com/ploeh/RangeFSharp">repository on GitHub</a> if you're interested in the interim steps.
</p>
<h3 id="0cd66550fe5a43b0a7e8a6d3a2b0ea32">
Types and functionality <a href="#0cd66550fe5a43b0a7e8a6d3a2b0ea32">#</a>
</h3>
<p>
So far I had treated a range as a pair (two-tuple), just as I had done with the code in <a href="/2024/01/08/a-range-kata-implementation-in-haskell">my first attempt</a>. I did, however, have a few other things planned for this code base, so I introduced a set of explicit types:
</p>
<p>
<pre><span style="color:blue;">type</span> Endpoint<'a> = Open <span style="color:blue;">of</span> 'a | Closed <span style="color:blue;">of</span> 'a
<span style="color:blue;">type</span> Range<'a> = { LowerBound : Endpoint<'a>; UpperBound : Endpoint<'a> }</pre>
</p>
<p>
The <code>Range</code> record type is isomorphic to a pair of <code>Endpoint</code> values, so it's not strictly required, but does make things <a href="https://peps.python.org/pep-0020/">more explicit</a>.
</p>
<p>
To support the new type, I added an <code>ofEndpoints</code> function, and finalized the implementation of <code>contains</code>:
</p>
<p>
<pre><span style="color:blue;">module</span> Range =
<span style="color:blue;">let</span> ofEndpoints (lowerBound, upperBound) =
{ LowerBound = lowerBound; UpperBound = upperBound }
<span style="color:blue;">let</span> contains ys r =
<span style="color:blue;">match</span> r.LowerBound, r.UpperBound <span style="color:blue;">with</span>
| Open x, Open z <span style="color:blue;">-></span> ys |> List.forall (<span style="color:blue;">fun</span> y <span style="color:blue;">-></span> x < y && y < z)
| Open x, Closed z <span style="color:blue;">-></span> ys |> List.forall (<span style="color:blue;">fun</span> y <span style="color:blue;">-></span> x < y && y <= z)
| Closed x, Open z <span style="color:blue;">-></span> ys |> List.forall (<span style="color:blue;">fun</span> y <span style="color:blue;">-></span> x <= y && y < z)
| Closed x, Closed z <span style="color:blue;">-></span> ys |> List.forall (<span style="color:blue;">fun</span> y <span style="color:blue;">-></span> x <= y && y <= z)</pre>
</p>
<p>
As is so often the case in F#, pattern matching makes such functions a pleasure to implement.
</p>
<h3 id="c00252811495433987c37f7bcfc751a5">
Conclusion <a href="#c00252811495433987c37f7bcfc751a5">#</a>
</h3>
<p>
I was curious whether using property-based testing would make the development process of the Range kata simpler. While each property was simple, I still had to write eight of them before I felt I'd fully described the problem. This doesn't seem like much of an improvement over the example-driven approach I took the first time around. It seems to be a comparable amount of code, and on one hand a property is more abstract than an example, but on the hand usually also covers more ground. I feel more confident that this implementation works, because I know that it's being exercised more rigorously.
</p>
<p>
When I find myself writing a property per branch, so to speak, I always feel that I missed a better way to describe the problem. As an example, for years <a href="https://youtu.be/2oN9caQflJ8?si=em1VvFqYFA_AjDlk">I would demonstrate</a> how to test <a href="https://codingdojo.org/kata/FizzBuzz/">the FizzBuzz kata</a> with property-based testing by dividing the problem into Equivalence Classes and then writing a property for each partition. Just as I've done here. This is usually possible, but smells of being too coupled to the implementation.
</p>
<p>
Sometimes, if you think about the problem long enough, you may be able to produce an alternative set of properties that describe the problem in a way that's entirely decoupled from the implementation. After years, <a href="/2021/06/28/property-based-testing-is-not-the-same-as-partition-testing">I finally managed to do that with the FizzBuzz kata</a>.
</p>
<p>
I didn't succeed doing that with the Range kata this time around, but maybe later.
</p>
<p>
<strong>Next:</strong> <a href="/2024/01/22/a-range-kata-implementation-in-c">A Range kata implementation in C#</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.A Range kata implementation in Haskellhttps://blog.ploeh.dk/2024/01/08/a-range-kata-implementation-in-haskell2024-01-08T07:06:00+00:00Mark Seemann
<div id="post">
<p>
<em>A first crack at the exercise.</em>
</p>
<p>
This article is an instalment in <a href="/2024/01/01/variations-of-the-range-kata">a short series of articles on the Range kata</a>. Here I describe my first attempt at the exercise. As I usually advise people <a href="/2020/01/13/on-doing-katas">on doing katas</a>, the first time you try your hand at a kata, use the language with which you're most comfortable. To be honest, I may be most habituated to C#, having programmed in it since 2002, but on the other hand, I currently 'think in <a href="https://www.haskell.org/">Haskell</a>', and am often frustrated with C#'s lack of structural equality, higher-order abstractions, and support for functional expressions.
</p>
<p>
Thus, I usually start with Haskell even though I always find myself struggling with the ecosystem. If you do, too, the source code is <a href="https://github.com/ploeh/RangeHaskell">available on GitHub</a>.
</p>
<p>
I took my own advice by setting out with the explicit intent to follow <a href="https://codingdojo.org/kata/Range/">the Range kata description</a> as closely as possible. This kata doesn't beat about the bush, but instead just dumps a set of test cases on you. It wasn't clear if this is the most useful set of tests, or whether the order in which they're represented is the one most conducive to a good experience of test-driven development, but there was only one way to find out.
</p>
<p>
I quickly learned, however, that the suggested test cases were insufficient in describing the behaviour in enough details.
</p>
<h3 id="287256d6f0fe412585cce5f16fcf5363">
Containment <a href="#287256d6f0fe412585cce5f16fcf5363">#</a>
</h3>
<p>
I started by adding the first two test cases as <a href="/2018/05/07/inlined-hunit-test-lists">inlined HUnit test lists</a>:
</p>
<p>
<pre><span style="color:#a31515;">"integer range contains"</span> ~: <span style="color:blue;">do</span>
(r, candidate, expected) <-
[
((Closed 2, Open 6), [2,4], True),
((Closed 2, Open 6), [-1,1,6,10], False)
]
<span style="color:blue;">let</span> actual = r `contains` candidate
<span style="color:blue;">return</span> $ expected ~=? actual</pre>
</p>
<p>
I wasn't particularly keen on going full <a href="/2019/10/07/devils-advocate">Devil's Advocate</a> on the exercise. I could, on the other hand, trivially pass both tests with this obviously degenerate implementation:
</p>
<p>
<pre><span style="color:blue;">import</span> Data.List
<span style="color:blue;">data</span> Endpoint a = Open a | Closed a <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Eq</span>, <span style="color:#2b91af;">Show</span>)
contains _ candidate = [2] `isPrefixOf` candidate</pre>
</p>
<p>
Reluctantly, I had to invent some additional test cases:
</p>
<p>
<pre><span style="color:#a31515;">"integer range contains"</span> ~: <span style="color:blue;">do</span>
(r, candidate, expected) <-
[
((Closed 2 , Open 6), [2,4], True),
((Closed 2 , Open 6), [-1,1,6,10], False),
((Closed (-1), Closed 10), [-1,1,6,10], True),
((Closed (-1), Open 10), [-1,1,6,10], False),
((Closed (-1), Open 10), [-1,1,6,9], True),
(( Open 2, Closed 6), [3,5,6], True),
(( Open 2, Open 6), [2,5], False),
(( Open 2, Open 6), <span style="color:blue;">[]</span>, True),
((Closed 2, Closed 6), [3,7,4], False)
]
<span style="color:blue;">let</span> actual = r `contains` candidate
<span style="color:blue;">return</span> $ expected ~=? actual</pre>
</p>
<p>
This was when I began to wonder whether it would have been easier to use property-based testing. That would entail, however, a departure from the kata's suggested test cases, so I decided to stick to the plan and then perhaps return to property-based testing when repeating the exercise.
</p>
<p>
Ultimately I implemented the <code>contains</code> function this way:
</p>
<p>
<pre><span style="color:#2b91af;">contains</span> <span style="color:blue;">::</span> (<span style="color:blue;">Foldable</span> t, <span style="color:blue;">Ord</span> a) <span style="color:blue;">=></span> (<span style="color:blue;">Endpoint</span> a, <span style="color:blue;">Endpoint</span> a) <span style="color:blue;">-></span> t a <span style="color:blue;">-></span> <span style="color:#2b91af;">Bool</span>
contains (lowerBound, upperBound) =
<span style="color:blue;">let</span> isHighEnough = <span style="color:blue;">case</span> lowerBound <span style="color:blue;">of</span>
Closed x -> (x <=)
Open x -> (x <)
isLowEnough = <span style="color:blue;">case</span> upperBound <span style="color:blue;">of</span>
Closed y -> (<= y)
Open y -> (< y)
isContained x = isHighEnough x && isLowEnough x
<span style="color:blue;">in</span> <span style="color:blue;">all</span> isContained</pre>
</p>
<p>
In some ways it seems a bit verbose to me, but I couldn't easily think of a simpler implementation.
</p>
<p>
One of the features I find so fascinating about Haskell is how <em>general</em> it enables me to be. While the tests use integers for concision, the <code>contains</code> function works with any <code>Ord</code> instance; not only <code>Integer</code>, but also <code>Double</code>, <code>Word</code>, <code>Day</code>, <code>TimeOfDay</code>, or some new type I can't even predict.
</p>
<h3 id="67c8d29aeb5d4c2ca18b4a6664cf6af8">
All points <a href="#67c8d29aeb5d4c2ca18b4a6664cf6af8">#</a>
</h3>
<p>
The next function suggested by the kata is a function to enumerate all points in a range. There's only a single test case, so again I added some more:
</p>
<p>
<pre><span style="color:#a31515;">"getAllPoints"</span> ~: <span style="color:blue;">do</span>
(r, expected) <-
[
((Closed 2, Open 6), [2..5]),
((Closed 4, Open 8), [4..7]),
((Closed 2, Closed 6), [2..6]),
((Closed 4, Closed 8), [4..8]),
(( Open 2, Closed 6), [3..6]),
(( Open 4, Closed 8), [5..8]),
(( Open 2, Open 6), [3..5]),
(( Open 4, Open 8), [5..7])
]
<span style="color:blue;">let</span> actual = allPoints r
<span style="color:blue;">return</span> $ expected ~=? actual</pre>
</p>
<p>
Ultimately, after I'd implemented the <em>next</em> feature, I refactored the <code>allPoints</code> function to make use of it, and it became a simple one-liner:
</p>
<p>
<pre><span style="color:#2b91af;">allPoints</span> <span style="color:blue;">::</span> (<span style="color:blue;">Enum</span> a, <span style="color:blue;">Num</span> a) <span style="color:blue;">=></span> (<span style="color:blue;">Endpoint</span> a, <span style="color:blue;">Endpoint</span> a) <span style="color:blue;">-></span> [a]
allPoints = <span style="color:blue;">uncurry</span> <span style="color:blue;">enumFromTo</span> . endpoints</pre>
</p>
<p>
The <code>allPoints</code> function also enabled me to express the kata's <em>ContainsRange</em> test cases without introducing a new API:
</p>
<p>
<pre><span style="color:#a31515;">"ContainsRange"</span> ~: <span style="color:blue;">do</span>
(r, candidate, expected) <-
[
((Closed 2, Open 5), allPoints (Closed 7, Open 10), False),
((Closed 2, Open 5), allPoints (Closed 3, Open 10), False),
((Closed 3, Open 5), allPoints (Closed 2, Open 10), False),
((Closed 2, Open 10), allPoints (Closed 3, Closed 5), True),
((Closed 3, Closed 5), allPoints (Closed 3, Open 5), True)
]
<span style="color:blue;">let</span> actual = r `contains` candidate
<span style="color:blue;">return</span> $ expected ~=? actual</pre>
</p>
<p>
As I've already mentioned, the above implementation of <code>allPoints</code> is based on the next feature, <code>endpoints</code>.
</p>
<h3 id="a16cc4c45e614bb9a726c14ef19afc8f">
Endpoints <a href="#a16cc4c45e614bb9a726c14ef19afc8f">#</a>
</h3>
<p>
The kata also suggests a function to return the two endpoints of a range, as well as some test cases to describe it. Once more, I had to add more test cases to adequately describe the desired functionality:
</p>
<p>
<pre><span style="color:#a31515;">"endPoints"</span> ~: <span style="color:blue;">do</span>
(r, expected) <-
[
((Closed 2, Open 6), (2, 5)),
((Closed 1, Open 7), (1, 6)),
((Closed 2, Closed 6), (2, 6)),
((Closed 1, Closed 7), (1, 7)),
(( Open 2, Open 6), (3, 5)),
(( Open 1, Open 7), (2, 6)),
(( Open 2, Closed 6), (3, 6)),
(( Open 1, Closed 7), (2, 7))
]
<span style="color:blue;">let</span> actual = endpoints r
<span style="color:blue;">return</span> $ expected ~=? actual</pre>
</p>
<p>
The implementation is fairly trivial:
</p>
<p>
<pre><span style="color:#2b91af;">endpoints</span> <span style="color:blue;">::</span> (<span style="color:blue;">Num</span> a1, <span style="color:blue;">Num</span> a2) <span style="color:blue;">=></span> (<span style="color:blue;">Endpoint</span> a2, <span style="color:blue;">Endpoint</span> a1) <span style="color:blue;">-></span> (a2, a1)
endpoints (Closed x, Closed y) = (x , y)
endpoints (Closed x, Open y) = (x , y-1)
endpoints ( Open x, Closed y) = (x+1, y)
endpoints ( Open x, Open y) = (x+1, y-1)</pre>
</p>
<p>
One attractive quality of <a href="https://en.wikipedia.org/wiki/Algebraic_data_type">algebraic data types</a> is that the 'algebra' of the type(s) tell you how many cases you need to pattern-match against. Since I'm treating a range as a pair of <code>Endpoint</code> values, and since each <code>Endpoint</code> can be one of two cases (<code>Open</code> or <code>Closed</code>), there's exactly 2 * 2 = 4 possible combinations (since a tuple is a <a href="https://en.wikipedia.org/wiki/Product_type">product type</a>).
</p>
<p>
That fits with the number of pattern-matches required to implement the function.
</p>
<h3 id="c2b611a0c9ba494b9ccc30c5cd3ec4e8">
Overlapping ranges <a href="#c2b611a0c9ba494b9ccc30c5cd3ec4e8">#</a>
</h3>
<p>
The final interesting feature is a predicate to determine whether one range overlaps another. As has become a refrain by now, I didn't find the suggested test cases sufficient to describe the desired behaviour, so I had to add a few more:
</p>
<p>
<pre><span style="color:#a31515;">"overlapsRange"</span> ~: <span style="color:blue;">do</span>
(r, candidate, expected) <-
[
((Closed 2, Open 5), (Closed 7, Open 10), False),
((Closed 2, Open 10), (Closed 3, Open 5), True),
((Closed 3, Open 5), (Closed 3, Open 5), True),
((Closed 2, Open 5), (Closed 3, Open 10), True),
((Closed 3, Open 5), (Closed 2, Open 10), True),
((Closed 3, Open 5), (Closed 1, Open 3), False),
((Closed 3, Open 5), (Closed 5, Open 7), False)
]
<span style="color:blue;">let</span> actual = r `overlaps` candidate
<span style="color:blue;">return</span> $ expected ~=? actual</pre>
</p>
<p>
I'm not entirely happy with the implementation:
</p>
<p>
<pre><span style="color:#2b91af;">overlaps</span> <span style="color:blue;">::</span> (<span style="color:blue;">Ord</span> a1, <span style="color:blue;">Ord</span> a2) <span style="color:blue;">=></span>
(Endpoint a1, Endpoint a2) -> (Endpoint a2, Endpoint a1) -> Bool
overlaps (l1, h1) (l2, h2) =
<span style="color:blue;">let</span> less (Closed x) (Closed y) = x <= y
less (Closed x) (Open y) = x < y
less (Open x) (Closed y) = x < y
less (Open x) (Open y) = x < y
<span style="color:blue;">in</span> l1 `less` h2 && l2 `less` h1</pre>
</p>
<p>
Noth that the code presented here is problematic in isolation, but if you compare it to the above <code>contains</code> function, there seems to be some repetition going on. Still, it's not <em>quite</em> the same, but the code looks similar enough that it bothers me. I feel that some kind of abstraction is sitting there, right before my nose, mocking me because I can't see it. Still, the code isn't completely duplicated, and even if it was, I can always invoke the <a href="https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)">rule of three</a> and let it remain as it is.
</p>
<p>
Which is ultimately what I did.
</p>
<h3 id="c09e7c817ecc48c097d6660a5438f5e0">
Equality <a href="#c09e7c817ecc48c097d6660a5438f5e0">#</a>
</h3>
<p>
The kata also suggests some test cases to verify that it's possible to compare two ranges for equality. Dutifully I added those test cases to the code base, even though I knew that they'd automatically pass.
</p>
<p>
<pre><span style="color:#a31515;">"Equals"</span> ~: <span style="color:blue;">do</span>
(x, y, expected) <-
[
((Closed 3, Open 5), (Closed 3, Open 5), True),
((Closed 2, Open 10), (Closed 3, Open 5), False),
((Closed 2, Open 5), (Closed 3, Open 10), False),
((Closed 3, Open 5), (Closed 2, Open 10), False)
]
<span style="color:blue;">let</span> actual = x == y
<span style="color:blue;">return</span> $ expected ~=? actual</pre>
</p>
<p>
In the beginning of this article, I called attention to C#'s regrettable lack of structural equality. Here's an example of what I mean. In Haskell, these tests automatically pass because <code>Endpoint</code> is an <code>Eq</code> instance (by declaration), and all pairs of <code>Eq</code> instances are themselves <code>Eq</code> instances. Simple, elegant, powerful.
</p>
<h3 id="094e6dd4c07e40739d6fca4945dc7018">
Conclusion <a href="#094e6dd4c07e40739d6fca4945dc7018">#</a>
</h3>
<p>
As a first pass at the (admittedly uncomplicated) Range kata, I tried to follow the 'plan' implied by the kata description's test cases. I quickly became frustrated with their lack of completion. They were adequate in indicating to a human (me) what the desired behaviour should be, but insufficient to satisfactorily describe the desired behaviour.
</p>
<p>
I could, of course, have stuck with only those test cases, and instead of employing the Devil's Advocate technique (which I actively tried to avoid) made an honest effort to implement the functionality.
</p>
<p>
The things is, however, that <a href="/2023/03/20/on-trust-in-software-development">I don't trust myself</a>. At its essence, the Range kata is all about edge cases, which are where most bugs tend to lurk. Thus, these are exactly the cases that should be covered by tests.
</p>
<p>
Having made enough 'dumb' programming mistakes during my career, I didn't trust myself to be able to write correct implementations without more test coverage than originally suggested. That's the reason I added more tests.
</p>
<p>
On the other hand, I more than once speculated whether property-based testing would make this work easier. I decided to pursue that idea during my second pass at the kata.
</p>
<p>
<strong>Next:</strong> <a href="/2024/01/15/a-range-kata-implementation-in-f">A Range kata implementation in F#</a>.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="f768e9d0ec73603ff40542ae07a1a9bd">
<div class="comment-author"><a href="https://github.com/mormegil-cz">Petr Kadlec</a> <a href="#f768e9d0ec73603ff40542ae07a1a9bd">#</a></div>
<div class="comment-content">
<p>
I’d have another test case for the Equality function: <code>((Open 2, Open 6), (Closed 3, Closed 5), True)</code>. While it is nice Haskell provides (automatic) structural equality, I don’t think we want to say that the (2, 6) range (on integers!) is something else than the [3, 5] range.
</p>
<p>
But yes, this opens a can of worms: While (2, 6) = [3, 5] on integers, (2.0, 6.0) is obviously different than [3.0, 5.0] (on reals/Doubles/…). I have no idea: In Haskell, could you write an implementation of a function which would behave differently depending on whether the type argument belongs to a typeclass or not?
</p>
</div>
<div class="comment-date">2024-01-09 13:38 UTC</div>
</div>
<div class="comment" id="9d0f60b0a2654424b10d264cfd8b6c96">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#9d0f60b0a2654424b10d264cfd8b6c96">#</a></div>
<div class="comment-content">
<p>
Petr, thank you for writing. I don't think I'd add that (or similar) test cases, but it's a judgment call, and it's partly language-specific. What you're suggesting is to consider things that are <em>equivalent</em> equal. I agree that for integers this would be the case, but it wouldn't be for rational numbers, or floating points (or real numbers, if we had those in programming).
</p>
<p>
In Haskell it wouldn't really be idiomatic, because equality is defined by the <code>Eq</code> type class, and most types just go with the default implementation. What you suggest requires writing an explicit <code>Eq</code> instance for <code>Endpoint</code>. It'd be possible, but then you'd have to deal explicitly with the various integer representations separately from other representations that use floating points.
</p>
<p>
The distinction between <em>equivalence</em> and <em>equality</em> is largely artificial or a convenient hand wave. To explain what I mean, consider mathematical expressions. Obviously, <em>3 + 1</em> is equal to <em>2 + 2</em> when evaluated, but they're different <em>expressions</em>. Thus, on an expression level, we don't consider those two expressions equal. I think of the integer ranges <em>(2, 6)</em> and <em>[3, 6]</em> the same way. They evaluate to the same, but there aren't equal.
</p>
<p>
I don't think that this is a strong argument, mind. In other programming languages, I might arrive at a different decision. It also matters what client code needs to <em>do</em> with the API. In any case, the decision to not consider <em>equivalence</em> the same as <em>equality</em> is congruent with how Haskell works.
</p>
<p>
The existence of floating points and rational numbers, however, opens another can of worms that I happily glossed over, since I had a completely different goal with the kata than producing a reusable library.
</p>
<p>
Haskell actually supports rational numbers with the <code>%</code> operator:
</p>
<p>
<pre>ghci> 1%2
1 % 2</pre>
</p>
<p>
This value represents ½, to be explicit.
</p>
<p>
Unfortunately, according to the specification (or, at least, <a href="https://hackage.haskell.org/package/base/docs/GHC-Enum.html#v:succ">the documentation</a>) of the <code>Enum</code> type class, the two 'movement' operations <code>succ</code> and <code>pred</code> jump by increments of <em>1</em>:
</p>
<p>
<pre>ghci> succ $ 1%2
3 % 2
ghci> succ $ succ $ 1%2
5 % 2</pre>
</p>
<p>
The same is the case with floating points:
</p>
<p>
<pre>ghci> succ 1.5
2.5
ghci> succ $ succ 1.5
3.5</pre>
</p>
<p>
This is unfortunate when it comes to floating points, since it would be possible to enumerate all floating points in a range. (For example, if a <a href="https://en.wikipedia.org/wiki/Single-precision_floating-point_format">single-precision floating point</a> occupies 32 bits, there's a finite number of them, and you can enumerate them.)
</p>
<p>
As <a href="https://twitter.com/sonatsuer/status/1744326173524394372">Sonat Süer points out</a>, this means that the <code>allPoints</code> function is fundamentally broken for floating points and rational numbers (and possibly other types as well).
</p>
<p>
One way around that in Haskell would be to introduce a <em>new</em> type class for the purpose of truly enumerating ranges, and either implement it correctly for floating points, or explicitly avoid making <code>Float</code> and <code>Double</code> instances of that new type class. This, on the other hand, would have the downside that all of a sudden, the <code>allPoints</code> function wouldn't support any custom type of which I, as the implementer, is unaware.
</p>
<p>
If this was a library that I'd actually have to ship as a reusable API, I think I'd start by <em>not</em> including the <code>allPoints</code> function, and then see if anyone asks for it. If or when that happens, I'd begin a process to chart why people need it, and what could be done to serve those needs in a useful and mathematically consistent manner.
</p>
</div>
<div class="comment-date">2024-01-13 19:51 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Variations of the Range katahttps://blog.ploeh.dk/2024/01/01/variations-of-the-range-kata2024-01-01T17:00:00+00:00Mark Seemann
<div id="post">
<p>
<em>In the languages I usually employ.</em>
</p>
<p>
The <a href="https://codingdojo.org/kata/Range/">Range kata</a> is succinct, bordering on the spartan in both description and requirements. To be honest, it's hardly the most inspiring kata available, and yet it may help showcase a few interesting points about software design in general. It's what it demonstrates about <a href="/2018/03/22/functors">functors</a> that makes it marginally interesting.
</p>
<p>
In this short article series I first cover a few incarnations of the kata in my usual programming languages, and then conclude by looking at <em>range</em> as a functor.
</p>
<p>
The article series contains the following articles:
</p>
<ul>
<li><a href="/2024/01/08/a-range-kata-implementation-in-haskell">A Range kata implementation in Haskell</a></li>
<li><a href="/2024/01/15/a-range-kata-implementation-in-f">A Range kata implementation in F#</a></li>
<li><a href="/2024/01/22/a-range-kata-implementation-in-c">A Range kata implementation in C#</a></li>
<li><a href="/2024/02/12/range-as-a-functor">Range as a functor</a></li>
</ul>
<p>
I didn't take the same approaches through all three exercises. An important point about <a href="/2020/01/13/on-doing-katas">doing katas</a> is to learn something, and when you've done the kata once, you've already gained some knowledge that can't easily be unlearned. Thus, on the second, or third time through, it's only natural to apply that knowledge, but then try different tactics to solve the problem in a different way. That's what I did here, starting with <a href="https://www.haskell.org/">Haskell</a>, proceeding with <a href="https://fsharp.org/">F#</a>, and concluding with C#.
</p>
<p>
<strong>Next:</strong> <a href="/2024/01/08/a-range-kata-implementation-in-haskell">A Range kata implementation in Haskell</a>.
</p>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Serializing restaurant tables in C#https://blog.ploeh.dk/2023/12/25/serializing-restaurant-tables-in-c2023-12-25T11:42:00+00:00Mark Seemann
<div id="post">
<p>
<em>Using System.Text.Json, with and without Reflection.</em>
</p>
<p>
This article is part of a short series of articles about <a href="/2023/12/04/serialization-with-and-without-reflection">serialization with and without Reflection</a>. In this instalment I'll explore some options for serializing <a href="https://en.wikipedia.org/wiki/JSON">JSON</a> with C# using the API built into .NET: <a href="https://learn.microsoft.com/dotnet/api/system.text.json">System.Text.Json</a>. I'm not going use <a href="https://www.newtonsoft.com/json">Json.NET</a> in this article, but I've <a href="/2022/01/03/to-id-or-not-to-id">done similar things with that library</a> in the past, so what's here is, at least, somewhat generalizable.
</p>
<p>
Since the API is the same, the only difference from <a href="/2023/12/18/serializing-restaurant-tables-in-f">the previous article</a> is the language syntax.
</p>
<h3 id="e949466d51f647bfbee9016d551d9b78">
Natural numbers <a href="#e949466d51f647bfbee9016d551d9b78">#</a>
</h3>
<p>
Before we start investigating how to serialize to and from JSON, we must have something to serialize. As described in the <a href="/2023/12/04/serialization-with-and-without-reflection">introductory article</a> we'd like to parse and write restaurant table configurations like this:
</p>
<p>
<pre>{
<span style="color:#2e75b6;">"singleTable"</span>: {
<span style="color:#2e75b6;">"capacity"</span>: 16,
<span style="color:#2e75b6;">"minimalReservation"</span>: 10
}
}</pre>
</p>
<p>
On the other hand, I'd like to represent the Domain Model in a way that <a href="/encapsulation-and-solid">encapsulates the rules</a> governing the model, <a href="https://blog.janestreet.com/effective-ml-video/">making illegal states unrepresentable</a>. Even though that's a catchphrase associated with functional programming, it applies equally well to a statically typed object-oriented language like C#.
</p>
<p>
As the first step, we observe that the numbers involved are all <a href="https://en.wikipedia.org/wiki/Natural_number">natural numbers</a>. In C# it's rarer to define <a href="https://www.hillelwayne.com/post/constructive/">predicative data types</a> than in a language like <a href="https://fsharp.org/">F#</a>, but people should do it more.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">readonly</span> <span style="color:blue;">struct</span> <span style="color:#2b91af;">NaturalNumber</span> : IEquatable<NaturalNumber>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">int</span> value;
<span style="color:blue;">public</span> <span style="color:#2b91af;">NaturalNumber</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">value</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (value < 1)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> ArgumentOutOfRangeException(
nameof(value),
<span style="color:#a31515;">"Value must be a natural number greater than zero."</span>);
<span style="color:blue;">this</span>.value = value;
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> NaturalNumber? <span style="font-weight:bold;color:#74531f;">TryCreate</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">candidate</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (candidate < 1)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> NaturalNumber(candidate);
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">bool</span> <span style="color:blue;">operator</span> <(NaturalNumber <span style="font-weight:bold;color:#1f377f;">left</span>, NaturalNumber <span style="font-weight:bold;color:#1f377f;">right</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> left.value < right.value;
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">bool</span> <span style="color:blue;">operator</span> >(NaturalNumber <span style="font-weight:bold;color:#1f377f;">left</span>, NaturalNumber <span style="font-weight:bold;color:#1f377f;">right</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> left.value > right.value;
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">bool</span> <span style="color:blue;">operator</span> <=(NaturalNumber <span style="font-weight:bold;color:#1f377f;">left</span>, NaturalNumber <span style="font-weight:bold;color:#1f377f;">right</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> left.value <= right.value;
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">bool</span> <span style="color:blue;">operator</span> >=(NaturalNumber <span style="font-weight:bold;color:#1f377f;">left</span>, NaturalNumber <span style="font-weight:bold;color:#1f377f;">right</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> left.value >= right.value;
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">bool</span> <span style="color:blue;">operator</span> ==(NaturalNumber <span style="font-weight:bold;color:#1f377f;">left</span>, NaturalNumber <span style="font-weight:bold;color:#1f377f;">right</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> left.value == right.value;
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">bool</span> <span style="color:blue;">operator</span> !=(NaturalNumber <span style="font-weight:bold;color:#1f377f;">left</span>, NaturalNumber <span style="font-weight:bold;color:#1f377f;">right</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> left.value != right.value;
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">explicit</span> <span style="color:blue;">operator</span> <span style="color:blue;">int</span>(NaturalNumber <span style="font-weight:bold;color:#1f377f;">number</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> number.value;
}
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">Equals</span>(<span style="color:blue;">object</span>? <span style="font-weight:bold;color:#1f377f;">obj</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> obj <span style="color:blue;">is</span> NaturalNumber <span style="font-weight:bold;color:#1f377f;">number</span> && Equals(number);
}
<span style="color:blue;">public</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">Equals</span>(NaturalNumber <span style="font-weight:bold;color:#1f377f;">other</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> value == other.value;
}
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">int</span> <span style="font-weight:bold;color:#74531f;">GetHashCode</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> HashCode.Combine(value);
}
}</pre>
</p>
<p>
When comparing all that boilerplate code to the <a href="/2023/12/18/serializing-restaurant-tables-in-f">three lines required to achieve the same result in F#</a>, it seems, at first glance, understandable that C# developers rarely reach for that option. Still, <a href="/2018/09/17/typing-is-not-a-programming-bottleneck">typing is not a programming bottleneck</a>, and most of that code was generated by a combination of Visual Studio and <a href="https://github.com/features/copilot">GitHub Copilot</a>.
</p>
<p>
The <code>TryCreate</code> method may not be <em>strictly</em> necessary, but I consider it good practice to give client code a way to perform a fault-prone operation in a safe manner, without having to resort to a <code>try/catch</code> construct.
</p>
<p>
That's it for natural numbers. 72 lines of code. Compare that to <a href="/2023/12/18/serializing-restaurant-tables-in-f">the F# implementation</a>, which required three lines of code. Syntax does matter.
</p>
<h3 id="542d19e6713f46d79cfc013fc577980a">
Domain Model <a href="#542d19e6713f46d79cfc013fc577980a">#</a>
</h3>
<p>
Modelling a restaurant table follows in the same vein. One invariant I would like to enforce is that for a 'single' table, the minimal reservation should be a <code>NaturalNumber</code> less than or equal to the table's capacity. It doesn't make sense to configure a table for four with a minimum reservation of six.
</p>
<p>
In the same spirit as above, then, define this type:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">readonly</span> <span style="color:blue;">struct</span> <span style="color:#2b91af;">Table</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> NaturalNumber capacity;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> NaturalNumber? minimalReservation;
<span style="color:blue;">private</span> <span style="color:#2b91af;">Table</span>(NaturalNumber <span style="font-weight:bold;color:#1f377f;">capacity</span>, NaturalNumber? <span style="font-weight:bold;color:#1f377f;">minimalReservation</span>)
{
<span style="color:blue;">this</span>.capacity = capacity;
<span style="color:blue;">this</span>.minimalReservation = minimalReservation;
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Table? <span style="font-weight:bold;color:#74531f;">TryCreateSingle</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">capacity</span>, <span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">minimalReservation</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">cap</span> = NaturalNumber.TryCreate(capacity);
<span style="font-weight:bold;color:#8f08c4;">if</span> (cap <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">min</span> = NaturalNumber.TryCreate(minimalReservation);
<span style="font-weight:bold;color:#8f08c4;">if</span> (min <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="font-weight:bold;color:#8f08c4;">if</span> (cap < min)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> Table(cap.Value, min.Value);
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Table? <span style="font-weight:bold;color:#74531f;">TryCreateCommunal</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">capacity</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">cap</span> = NaturalNumber.TryCreate(capacity);
<span style="font-weight:bold;color:#8f08c4;">if</span> (cap <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> Table(cap.Value, <span style="color:blue;">null</span>);
}
<span style="color:blue;">public</span> T <span style="font-weight:bold;color:#74531f;">Accept</span><<span style="color:#2b91af;">T</span>>(ITableVisitor<T> <span style="font-weight:bold;color:#1f377f;">visitor</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (minimalReservation <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> visitor.VisitCommunal(capacity);
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> visitor.VisitSingle(capacity, minimalReservation.Value);
}
}</pre>
</p>
<p>
Here I've <a href="/2018/06/25/visitor-as-a-sum-type">Visitor-encoded</a> the <a href="https://en.wikipedia.org/wiki/Tagged_union">sum type</a> that <code>Table</code> is. It can either be a 'single' table or a communal table.
</p>
<p>
Notice that <code>TryCreateSingle</code> checks the invariant that the <code>capacity</code> must be greater than or equal to the <code>minimalReservation</code>.
</p>
<p>
The point of this little exercise, so far, is that it <em>encapsulates</em> the contract implied by the Domain Model. It does this by using the static type system to its advantage.
</p>
<h3 id="13ac203cee494ec18420959fbad03003">
JSON serialization by hand <a href="#13ac203cee494ec18420959fbad03003">#</a>
</h3>
<p>
At the boundaries of applications, however, <a href="/2023/10/16/at-the-boundaries-static-types-are-illusory">there are no static types</a>. Is the static type system still useful in that situation?
</p>
<p>
For a long time, the most popular .NET library for JSON serialization was <a href="https://www.newtonsoft.com/json">Json.NET</a>, but these days I find the built-in API offered in the <a href="https://learn.microsoft.com/dotnet/api/system.text.json">System.Text.Json</a> namespace adequate. This is also the case here.
</p>
<p>
The original rationale for this article series was to demonstrate how serialization can be done without Reflection, so I'll start there and return to Reflection later.
</p>
<p>
In this article series, I consider the JSON format fixed. A single table should be rendered as shown above, and a communal table should be rendered like this:
</p>
<p>
<pre>{ <span style="color:#2e75b6;">"communalTable"</span>: { <span style="color:#2e75b6;">"capacity"</span>: 42 } }</pre>
</p>
<p>
Often in the real world you'll have to conform to a particular protocol format, or, even if that's not the case, being able to control the shape of the wire format is important to deal with backwards compatibility.
</p>
<p>
As I outlined in the <a href="/2023/12/04/serialization-with-and-without-reflection">introduction article</a> you can usually find a more weakly typed API to get the job done. For serializing <code>Table</code> to JSON it looks like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">string</span> <span style="font-weight:bold;color:#74531f;">Serialize</span>(<span style="color:blue;">this</span> Table <span style="font-weight:bold;color:#1f377f;">table</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> table.Accept(<span style="color:blue;">new</span> TableVisitor());
}
<span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">TableVisitor</span> : ITableVisitor<<span style="color:blue;">string</span>>
{
<span style="color:blue;">public</span> <span style="color:blue;">string</span> <span style="font-weight:bold;color:#74531f;">VisitCommunal</span>(NaturalNumber <span style="font-weight:bold;color:#1f377f;">capacity</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">j</span> = <span style="color:blue;">new</span> JsonObject
{
[<span style="color:#a31515;">"communalTable"</span>] = <span style="color:blue;">new</span> JsonObject
{
[<span style="color:#a31515;">"capacity"</span>] = (<span style="color:blue;">int</span>)capacity
}
};
<span style="font-weight:bold;color:#8f08c4;">return</span> j.ToJsonString();
}
<span style="color:blue;">public</span> <span style="color:blue;">string</span> <span style="font-weight:bold;color:#74531f;">VisitSingle</span>(NaturalNumber <span style="font-weight:bold;color:#1f377f;">capacity</span>, NaturalNumber <span style="font-weight:bold;color:#1f377f;">value</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">j</span> = <span style="color:blue;">new</span> JsonObject
{
[<span style="color:#a31515;">"singleTable"</span>] = <span style="color:blue;">new</span> JsonObject
{
[<span style="color:#a31515;">"capacity"</span>] = (<span style="color:blue;">int</span>)capacity,
[<span style="color:#a31515;">"minimalReservation"</span>] = (<span style="color:blue;">int</span>)value
}
};
<span style="font-weight:bold;color:#8f08c4;">return</span> j.ToJsonString();
}
}</pre>
</p>
<p>
In order to separate concerns, I've defined this functionality in a new static class that references the Domain Model. The <code>Serialize</code> extension method uses a <code>private</code> Visitor to write two different <a href="https://learn.microsoft.com/dotnet/api/system.text.json.nodes.jsonobject">JsonObject</a> objects, using the JSON API's underlying Document Object Model (DOM).
</p>
<h3 id="53c250b548c64292b2d704be12c91aa5">
JSON deserialization by hand <a href="#53c250b548c64292b2d704be12c91aa5">#</a>
</h3>
<p>
You can also go the other way, and when it looks more complicated, it's because it is. When serializing an encapsulated value, not a lot can go wrong because the value is already valid. When deserializing a JSON string, on the other hand, all sorts of things can go wrong: It might not even be a valid string, or the string may not be valid JSON, or the JSON may not be a valid <code>Table</code> representation, or the values may be illegal, etc.
</p>
<p>
Since there are several values that explicitly must be integers, it makes sense to define a helper method to try to parse an integer:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> <span style="color:blue;">int</span>? <span style="font-weight:bold;color:#74531f;">TryInt</span>(<span style="color:blue;">this</span> JsonNode? <span style="font-weight:bold;color:#1f377f;">node</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (node <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="font-weight:bold;color:#8f08c4;">if</span> (node.GetValueKind() != JsonValueKind.Number)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="font-weight:bold;color:#8f08c4;">try</span>
{
<span style="font-weight:bold;color:#8f08c4;">return</span> (<span style="color:blue;">int</span>)node;
}
<span style="font-weight:bold;color:#8f08c4;">catch</span> (FormatException)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
}
}</pre>
</p>
<p>
I'm surprised that there's no built-in way to do that, but if there is, I couldn't find it.
</p>
<p>
With a helper method like that you can now implement the <code>Deserialize</code> method:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Table? <span style="font-weight:bold;color:#74531f;">Deserialize</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">json</span>)
{
<span style="font-weight:bold;color:#8f08c4;">try</span>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">node</span> = JsonNode.Parse(json);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">cnode</span> = node?[<span style="color:#a31515;">"communalTable"</span>];
<span style="font-weight:bold;color:#8f08c4;">if</span> (cnode <span style="color:blue;">is</span> { })
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">capacity</span> = cnode[<span style="color:#a31515;">"capacity"</span>].TryInt();
<span style="font-weight:bold;color:#8f08c4;">if</span> (capacity <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="font-weight:bold;color:#8f08c4;">return</span> Table.TryCreateCommunal(capacity.Value);
}
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">snode</span> = node?[<span style="color:#a31515;">"singleTable"</span>];
<span style="font-weight:bold;color:#8f08c4;">if</span> (snode <span style="color:blue;">is</span> { })
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">capacity</span> = snode[<span style="color:#a31515;">"capacity"</span>].TryInt();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">minimalReservation</span> = snode[<span style="color:#a31515;">"minimalReservation"</span>].TryInt();
<span style="font-weight:bold;color:#8f08c4;">if</span> (capacity <span style="color:blue;">is</span> <span style="color:blue;">null</span> || minimalReservation <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="font-weight:bold;color:#8f08c4;">return</span> Table.TryCreateSingle(
capacity.Value,
minimalReservation.Value);
}
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
}
<span style="font-weight:bold;color:#8f08c4;">catch</span> (JsonException)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
}
}</pre>
</p>
<p>
Since both serialisation and deserialization is based on string values, you should write automated tests that verify that the code works, and in fact, I did. Here are a few examples:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">DeserializeSingleTableFor4</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">json</span> = <span style="color:#a31515;">"""{"singleTable":{"capacity":4,"minimalReservation":3}}"""</span>;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = TableJson.Deserialize(json);
Assert.Equal(Table.TryCreateSingle(4, 3), actual);
}
[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">DeserializeNonTable</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">json</span> = <span style="color:#a31515;">"""{"foo":42}"""</span>;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = TableJson.Deserialize(json);
Assert.Null(actual);
}</pre>
</p>
<p>
Apart from using directives and namespace declaration this hand-written JSON capability requires 87 lines of code, although, to be fair, <code>TryInt</code> is a general-purpose method that ought to be part of the <code>System.Text.Json</code> API. Can we do better with static types and Reflection?
</p>
<h3 id="fc6832fe72874427b32a0ee062d4fbf6">
JSON serialisation based on types <a href="#fc6832fe72874427b32a0ee062d4fbf6">#</a>
</h3>
<p>
The static <a href="https://learn.microsoft.com/dotnet/api/system.text.json.jsonserializer">JsonSerializer</a> class comes with <code>Serialize<T></code> and <code>Deserialize<T></code> methods that use Reflection to convert a statically typed object to and from JSON. You can define a type (a <a href="https://en.wikipedia.org/wiki/Data_transfer_object">Data Transfer Object</a> (DTO) if you will) and let Reflection do the hard work.
</p>
<p>
In <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> I explain how you're usually better off separating the role of serialization from the role of Domain Model. One way to do that is exactly by defining a DTO for serialisation, and let the Domain Model remain exclusively to model the rules of the application. The above <code>Table</code> type plays the latter role, so we need new DTO types:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">TableDto</span>
{
<span style="color:blue;">public</span> CommunalTableDto? CommunalTable { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> SingleTableDto? SingleTable { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
}
<span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">CommunalTableDto</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Capacity { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
}
<span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">SingleTableDto</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Capacity { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">int</span> MinimalReservation { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
}</pre>
</p>
<p>
One way to model a <a href="https://en.wikipedia.org/wiki/Tagged_union">sum type</a> with a DTO is to declare both cases as nullable fields. While it does allow illegal states to be representable (i.e. both kinds of tables defined at the same time, or none of them present) this is only par for the course at the application boundary.
</p>
<p>
While you can serialize values of that type, by default the generated JSON doesn't have the right format. Instead, a serialized communal table looks like this:
</p>
<p>
<pre>{
<span style="color:#2e75b6;">"CommunalTable"</span>: { <span style="color:#2e75b6;">"Capacity"</span>: 42 },
<span style="color:#2e75b6;">"SingleTable"</span>: <span style="color:blue;">null</span>
}</pre>
</p>
<p>
There are two problems with the generated JSON document:
</p>
<ul>
<li>The casing is wrong</li>
<li>The null value shouldn't be there</li>
</ul>
<p>
None of those are too hard to address, but it does make the API a bit more awkward to use, as this test demonstrates:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">SerializeCommunalTableViaReflection</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">dto</span> = <span style="color:blue;">new</span> TableDto
{
CommunalTable = <span style="color:blue;">new</span> CommunalTableDto { Capacity = 42 }
};
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = JsonSerializer.Serialize(
dto,
<span style="color:blue;">new</span> JsonSerializerOptions
{
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
DefaultIgnoreCondition = JsonIgnoreCondition.WhenWritingNull
});
Assert.Equal(<span style="color:#a31515;">"""{"communalTable":{"capacity":42}}"""</span>, actual);
}</pre>
</p>
<p>
You can, of course, define this particular serialization behaviour as a reusable method, so it's not a problem that you can't address. I just wanted to include this, since it's part of the overall work that you have to do in order to make this work.
</p>
<h3 id="213e3959eb49407ab3cdf59a4d2aed06">
JSON deserialisation based on types <a href="#213e3959eb49407ab3cdf59a4d2aed06">#</a>
</h3>
<p>
To allow parsing of JSON into the above DTO the Reflection-based <code>Deserialize</code> method pretty much works out of the box, although again, it needs to be configured. Here's a passing test that demonstrates how that works:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">DeserializeSingleTableViaReflection</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">json</span> = <span style="color:#a31515;">"""{"singleTable":{"capacity":4,"minimalReservation":2}}"""</span>;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = JsonSerializer.Deserialize<TableDto>(
json,
<span style="color:blue;">new</span> JsonSerializerOptions { PropertyNamingPolicy = JsonNamingPolicy.CamelCase });
Assert.Null(actual?.CommunalTable);
Assert.Equal(4, actual?.SingleTable?.Capacity);
Assert.Equal(2, actual?.SingleTable?.MinimalReservation);
}</pre>
</p>
<p>
There's only difference in casing, so you'd expect the <code>Deserialize</code> method to be a <a href="https://martinfowler.com/bliki/TolerantReader.html">Tolerant Reader</a>, but no. It's very particular about that, so the <code>JsonNamingPolicy.CamelCase</code> configuration is necessary. Perhaps the API designers found that <a href="https://peps.python.org/pep-0020/">explicit is better than implicit</a>.
</p>
<p>
In any case, you could package that in a reusable <code>Deserialize</code> function that has all the options that are appropriate in a particular code context, so not a big deal. That takes care of actually writing and parsing JSON, but that's only half the battle. This only gives you a way to parse and serialize the DTO. What you ultimately want is to persist or dehydrate <code>Table</code> data.
</p>
<h3 id="ef08c05a3ae84141b2e8b20238af83df">
Converting DTO to Domain Model, and vice versa <a href="#ef08c05a3ae84141b2e8b20238af83df">#</a>
</h3>
<p>
As usual, converting a nice, encapsulated value to a more relaxed format is safe and trivial:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> TableDto <span style="font-weight:bold;color:#74531f;">ToDto</span>(<span style="color:blue;">this</span> Table <span style="font-weight:bold;color:#1f377f;">table</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> table.Accept(<span style="color:blue;">new</span> TableDtoVisitor());
}
<span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">TableDtoVisitor</span> : ITableVisitor<TableDto>
{
<span style="color:blue;">public</span> TableDto <span style="font-weight:bold;color:#74531f;">VisitCommunal</span>(NaturalNumber <span style="font-weight:bold;color:#1f377f;">capacity</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> TableDto
{
CommunalTable = <span style="color:blue;">new</span> CommunalTableDto
{
Capacity = (<span style="color:blue;">int</span>)capacity
}
};
}
<span style="color:blue;">public</span> TableDto <span style="font-weight:bold;color:#74531f;">VisitSingle</span>(
NaturalNumber <span style="font-weight:bold;color:#1f377f;">capacity</span>,
NaturalNumber <span style="font-weight:bold;color:#1f377f;">value</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> TableDto
{
SingleTable = <span style="color:blue;">new</span> SingleTableDto
{
Capacity = (<span style="color:blue;">int</span>)capacity,
MinimalReservation = (<span style="color:blue;">int</span>)value
}
};
}
}</pre>
</p>
<p>
Going the other way is <a href="https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/">fundamentally a parsing exercise</a>:
</p>
<p>
<pre><span style="color:blue;">public</span> Table? <span style="font-weight:bold;color:#74531f;">TryParse</span>()
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (CommunalTable <span style="color:blue;">is</span> { })
<span style="font-weight:bold;color:#8f08c4;">return</span> Table.TryCreateCommunal(CommunalTable.Capacity);
<span style="font-weight:bold;color:#8f08c4;">if</span> (SingleTable <span style="color:blue;">is</span> { })
<span style="font-weight:bold;color:#8f08c4;">return</span> Table.TryCreateSingle(
SingleTable.Capacity,
SingleTable.MinimalReservation);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
}</pre>
</p>
<p>
Here, like in <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>, I've made that conversion an instance method on <code>TableDto</code>.
</p>
<p>
Such an operation may fail, so the result is a nullable <code>Table</code> object.
</p>
<p>
Let's take stock of the type-based alternative. It requires 58 lines of code, distributed over three DTO types and the two conversions <code>ToDto</code> and <code>TryParse</code>, but here I haven't counted configuration of <code>Serialize</code> and <code>Deserialize</code>, since I left that to each test case that I wrote. Since all of this code generally stays within 80 characters in line width, that would realistically add another 10 lines of code, for a total around 68 lines.
</p>
<p>
This is smaller than the DOM-based code, but not by much.
</p>
<h3 id="bfac4d5d5ca940a2a61f964c4336adcf">
Conclusion <a href="#bfac4d5d5ca940a2a61f964c4336adcf">#</a>
</h3>
<p>
In this article I've explored two alternatives for converting a well-encapsulated Domain Model to and from JSON. One option is to directly manipulate the DOM. Another option is take a more declarative approach and define <em>types</em> that model the shape of the JSON data, and then leverage type-based automation (here, Reflection) to automatically parse and write the JSON.
</p>
<p>
I've deliberately chosen a Domain Model with some constraints, in order to demonstrate how persisting a non-trivial data model might work. With that setup, writing 'loosely coupled' code directly against the DOM requires 87 lines of code, while taking advantage of type-based automation requires 68 lines of code. Again, Reflection seems 'easier' if you count lines of code, but the difference is marginal.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="467ce74e29064c60bfa9559140710e51">
<div class="comment-author"><a href="https://blog.oakular.xyz">Callum Warrilow</a> <a href="#467ce74e29064c60bfa9559140710e51">#</a></div>
<div class="comment-content">
<p>
Great piece as ever Mark. Always enjoy reading about alternatives to methods that have become unquestioned convention.
</p>
<p>
I generally try to avoid reflection, especially within business code, and mainly use it for application bootstrapping, such as to discover services for dependency injection by convention. I also don't like attributes muddying model definitions, even on DTOs, so I would happily take an alternative to <code>System.Text.Json</code>. It is however increasingly integrated into other System libraries in ways that make it almost too useful to pass up. For example, the <code><a href="https://learn.microsoft.com/en-us/dotnet/api/system.net.http.httpcontent?view=net-8.0">System.Net.Http.HttpContent</a></code> class has the <code><a href="https://learn.microsoft.com/en-us/dotnet/api/system.net.http.json.httpcontentjsonextensions.readfromjsonasync?view=net-8.0">ReadFromJsonAsync</a></code> extension method, which makes it trivial to deserialize a response body. Analogous methods exist for <code><a href="https://learn.microsoft.com/en-us/dotnet/api/system.binarydata?view=dotnet-plat-ext-8.0">BinaryData</a></code>. I'm not normally a sucker for convenience, but it is difficult to turn down strong integration like this.
</p>
</div>
<div class="comment-date">2024-01-05 21:13 UTC</div>
</div>
<div class="comment" id="b9a5340eec9c45f49a438a37c7499520">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#b9a5340eec9c45f49a438a37c7499520">#</a></div>
<div class="comment-content">
<p>
Callum, thank you for writing. You are correct that the people who design and develop .NET put a lot of effort into making things convenient. Some of that convenience, however, comes with a price. You have to buy into a certain way of doing things, and that certain way can sometimes be at odds with other good software practices, such as the <a href="https://en.wikipedia.org/wiki/Dependency_inversion_principle">Dependency Inversion Principle</a> or test-driven development.
</p>
<p>
My goal with this (and other) article(s) isn't, however, to say that you mustn't take advantage of convenient integrations, but rather to highlight that alternatives exist.
</p>
<p>
The many 'convenient' ways that a framework gives you to solve various problems comes with the risk that you may paint yourself into a corner, if you aren't careful. You've invested heavily in the framework's way of doing things, but there's just this small edge case that you can't get right. So you write a bit of custom code, after having figured out the correct extensibility point to hook into. Until the framework changes 'how things are done' in the next iteration.
</p>
<p>
This is what I call <a href="/2023/10/02/dependency-whac-a-mole">Framework Whac-A-Mole</a> - a syndrome that I'm becoming increasingly wary of the more experience I gain. Of the examples linked to in that article, <a href="/2022/08/15/aspnet-validation-revisited">ASP.NET validation revisited</a> may be the most relevant to this discussion.
</p>
<p>
As a final note, I'd be remiss if I entered into a discussion about programmer convenience without drawing on <a href="https://en.wikipedia.org/wiki/Rich_Hickey">Rich Hickey</a>'s excellent presentation <a href="https://www.infoq.com/presentations/Simple-Made-Easy/">Simple Made Easy</a>, where he goes to great length distinguishing between what is <em>easy</em> (i.e. close at hand) and what is <em>simple</em> (i.e. not complex). The sweet spot, of course, is the intersection, where things are both simple and easy.
</p>
<p>
Most 'convenient' framework features do not, in my opinion, check that box.
</p>
</div>
<div class="comment-date">2024-01-10 13:37 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Serializing restaurant tables in F#https://blog.ploeh.dk/2023/12/18/serializing-restaurant-tables-in-f2023-12-18T13:59:00+00:00Mark Seemann
<div id="post">
<p>
<em>Using System.Text.Json, with and without Reflection.</em>
</p>
<p>
This article is part of a short series of articles about <a href="/2023/12/04/serialization-with-and-without-reflection">serialization with and without Reflection</a>. In this instalment I'll explore some options for serializing <a href="https://en.wikipedia.org/wiki/JSON">JSON</a> with <a href="https://fsharp.org/">F#</a> using the API built into .NET: <a href="https://learn.microsoft.com/dotnet/api/system.text.json">System.Text.Json</a>. I'm not going use <a href="https://www.newtonsoft.com/json">Json.NET</a> in this article, but I've <a href="/2022/01/03/to-id-or-not-to-id">done similar things with that library</a> in the past, so what's here is, at least, somewhat generalizable.
</p>
<h3 id="64e7a2f8c5634026ae4ffd1497dd58f9">
Natural numbers <a href="#64e7a2f8c5634026ae4ffd1497dd58f9">#</a>
</h3>
<p>
Before we start investigating how to serialize to and from JSON, we must have something to serialize. As described in the <a href="/2023/12/04/serialization-with-and-without-reflection">introductory article</a> we'd like to parse and write restaurant table configurations like this:
</p>
<p>
<pre>{
<span style="color:#2e75b6;">"singleTable"</span>: {
<span style="color:#2e75b6;">"capacity"</span>: 16,
<span style="color:#2e75b6;">"minimalReservation"</span>: 10
}
}</pre>
</p>
<p>
On the other hand, I'd like to represent the Domain Model in a way that <a href="/2022/10/24/encapsulation-in-functional-programming">encapsulates the rules</a> governing the model, <a href="https://blog.janestreet.com/effective-ml-video/">making illegal states unrepresentable</a>.
</p>
<p>
As the first step, we observe that the numbers involved are all <a href="https://en.wikipedia.org/wiki/Natural_number">natural numbers</a>. In F# it's both <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> and easy to define a <a href="https://www.hillelwayne.com/post/constructive/">predicative data type</a>:
</p>
<p>
<pre><span style="color:blue;">type</span> NaturalNumber = <span style="color:blue;">private</span> NaturalNumber <span style="color:blue;">of</span> int</pre>
</p>
<p>
Since it's defined with a <code>private</code> constructor we need to also supply a way to create valid values of the type:
</p>
<p>
<pre><span style="color:blue;">module</span> NaturalNumber =
<span style="color:blue;">let</span> tryCreate n = <span style="color:blue;">if</span> n < 1 <span style="color:blue;">then</span> None <span style="color:blue;">else</span> Some (NaturalNumber n)</pre>
</p>
<p>
In this, as well as the other articles in this series, I've chosen to model the potential for errors with <code>Option</code> values. I could also have chosen to use <code>Result</code> if I wanted to communicate information along the 'error channel', but sticking with <code>Option</code> makes the code a bit simpler. Not so much in F# or <a href="https://www.haskell.org/">Haskell</a>, but once we reach C#, <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">applicative validation</a> becomes complicated.
</p>
<p>
There's no loss of generality in this decision, since both <code>Option</code> and <code>Result</code> are <a href="/2018/10/01/applicative-functors">applicative functors</a>.
</p>
<p>
<pre>> NaturalNumber.tryCreate -1;;
val it: NaturalNumber option = None
> <span style="color:blue;">let</span> x = NaturalNumber.tryCreate 42;;
val x: NaturalNumber option = Some NaturalNumber 42</pre>
</p>
<p>
The <code>tryCreate</code> function enables client developers to create <code>NaturalNumber</code> values, and due to the F#'s default equality and comparison implementation, you can even compare them:
</p>
<p>
<pre>> <span style="color:blue;">let</span> y = NaturalNumber.tryCreate 2112;;
val y: NaturalNumber option = Some NaturalNumber 2112
> x < y;;
val it: bool = true</pre>
</p>
<p>
That's it for natural numbers. Three lines of code. Compare that to <a href="/2023/12/11/serializing-restaurant-tables-in-haskell">the Haskell implementation</a>, which required eight lines of code. This is mostly due to F#'s <code>private</code> keyword, which Haskell doesn't have.
</p>
<h3 id="8957ab6a606a4279a654e040d0788051">
Domain Model <a href="#8957ab6a606a4279a654e040d0788051">#</a>
</h3>
<p>
Modelling a restaurant table follows in the same vein. One invariant I would like to enforce is that for a 'single' table, the minimal reservation should be a <code>NaturalNumber</code> less than or equal to the table's capacity. It doesn't make sense to configure a table for four with a minimum reservation of six.
</p>
<p>
In the same spirit as above, then, define this type:
</p>
<p>
<pre><span style="color:blue;">type</span> Table =
<span style="color:blue;">private</span>
| SingleTable <span style="color:blue;">of</span> NaturalNumber * NaturalNumber
| CommunalTable <span style="color:blue;">of</span> NaturalNumber</pre>
</p>
<p>
Once more the <code>private</code> keyword makes it impossible for client code to create instances directly, so we need a pair of functions to create values:
</p>
<p>
<pre><span style="color:blue;">module</span> Table =
<span style="color:blue;">let</span> trySingle capacity minimalReservation = option {
<span style="color:blue;">let!</span> cap = NaturalNumber.tryCreate capacity
<span style="color:blue;">let!</span> min = NaturalNumber.tryCreate minimalReservation
<span style="color:blue;">if</span> cap < min <span style="color:blue;">then</span> <span style="color:blue;">return!</span> None
<span style="color:blue;">else</span> <span style="color:blue;">return</span> SingleTable (cap, min) }
<span style="color:blue;">let</span> tryCommunal = NaturalNumber.tryCreate >> Option.map CommunalTable</pre>
</p>
<p>
Notice that <code>trySingle</code> checks the invariant that the <code>capacity</code> must be greater than or equal to the <code>minimalReservation</code>.
</p>
<p>
Again, notice how much easier it is to define a predicative type in F#, compared to Haskell.
</p>
<p>
This isn't a competition between languages, and while F# certainly scores a couple of points here, Haskell has other advantages.
</p>
<p>
The point of this little exercise, so far, is that it <em>encapsulates</em> the contract implied by the Domain Model. It does this by using the static type system to its advantage.
</p>
<h3 id="560a74686ec64d36858a893c7c63cbb4">
JSON serialization by hand <a href="#560a74686ec64d36858a893c7c63cbb4">#</a>
</h3>
<p>
At the boundaries of applications, however, <a href="/2023/10/16/at-the-boundaries-static-types-are-illusory">there are no static types</a>. Is the static type system still useful in that situation?
</p>
<p>
For a long time, the most popular .NET library for JSON serialization was <a href="https://www.newtonsoft.com/json">Json.NET</a>, but these days I find the built-in API offered in the <a href="https://learn.microsoft.com/dotnet/api/system.text.json">System.Text.Json</a> namespace adequate. This is also the case here.
</p>
<p>
The original rationale for this article series was to demonstrate how serialization can be done without Reflection, so I'll start there and return to Reflection later.
</p>
<p>
In this article series, I consider the JSON format fixed. A single table should be rendered as shown above, and a communal table should be rendered like this:
</p>
<p>
<pre>{ <span style="color:#2e75b6;">"communalTable"</span>: { <span style="color:#2e75b6;">"capacity"</span>: 42 } }</pre>
</p>
<p>
Often in the real world you'll have to conform to a particular protocol format, or, even if that's not the case, being able to control the shape of the wire format is important to deal with backwards compatibility.
</p>
<p>
As I outlined in the <a href="/2023/12/04/serialization-with-and-without-reflection">introduction article</a> you can usually find a more weakly typed API to get the job done. For serializing <code>Table</code> to JSON it looks like this:
</p>
<p>
<pre><span style="color:blue;">let</span> serializeTable = <span style="color:blue;">function</span>
| SingleTable (NaturalNumber capacity, NaturalNumber minimalReservation) <span style="color:blue;">-></span>
<span style="color:blue;">let</span> j = JsonObject ()
j[<span style="color:#a31515;">"singleTable"</span>] <span style="color:blue;"><-</span> JsonObject ()
j[<span style="color:#a31515;">"singleTable"</span>][<span style="color:#a31515;">"capacity"</span>] <span style="color:blue;"><-</span> capacity
j[<span style="color:#a31515;">"singleTable"</span>][<span style="color:#a31515;">"minimalReservation"</span>] <span style="color:blue;"><-</span> minimalReservation
j.ToJsonString ()
| CommunalTable (NaturalNumber capacity) <span style="color:blue;">-></span>
<span style="color:blue;">let</span> j = JsonObject ()
j[<span style="color:#a31515;">"communalTable"</span>] <span style="color:blue;"><-</span> JsonObject ()
j[<span style="color:#a31515;">"communalTable"</span>][<span style="color:#a31515;">"capacity"</span>] <span style="color:blue;"><-</span> capacity
j.ToJsonString ()</pre>
</p>
<p>
In order to separate concerns, I've defined this functionality in a new module that references the module that defines the Domain Model. The <code>serializeTable</code> function pattern-matches on <code>SingleTable</code> and <code>CommunalTable</code> to write two different <a href="https://learn.microsoft.com/dotnet/api/system.text.json.nodes.jsonobject">JsonObject</a> objects, using the JSON API's underlying Document Object Model (DOM).
</p>
<h3 id="7fb5251937ac4f86b016f7c782db7680">
JSON deserialization by hand <a href="#7fb5251937ac4f86b016f7c782db7680">#</a>
</h3>
<p>
You can also go the other way, and when it looks more complicated, it's because it is. When serializing an encapsulated value, not a lot can go wrong because the value is already valid. When deserializing a JSON string, on the other hand, all sorts of things can go wrong: It might not even be a valid string, or the string may not be valid JSON, or the JSON may not be a valid <code>Table</code> representation, or the values may be illegal, etc.
</p>
<p>
Here I found it appropriate to first define a small API of parsing functions, mostly in order to make the object-oriented API more composable. First, I need some code that looks at the root JSON object to determine which kind of table it is (if it's a table at all). I found it appropriate to do that as a pair of <a href="https://learn.microsoft.com/dotnet/fsharp/language-reference/active-patterns">active patterns</a>:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:blue;">private</span> (|Single|_|) (node : JsonNode) =
<span style="color:blue;">match</span> node[<span style="color:#a31515;">"singleTable"</span>] <span style="color:blue;">with</span>
| <span style="color:blue;">null</span> <span style="color:blue;">-></span> None
| tn <span style="color:blue;">-></span> Some tn
<span style="color:blue;">let</span> <span style="color:blue;">private</span> (|Communal|_|) (node : JsonNode) =
<span style="color:blue;">match</span> node[<span style="color:#a31515;">"communalTable"</span>] <span style="color:blue;">with</span>
| <span style="color:blue;">null</span> <span style="color:blue;">-></span> None
| tn <span style="color:blue;">-></span> Some tn</pre>
</p>
<p>
It turned out that I also needed a function to even check if a string is a valid JSON document:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:blue;">private</span> tryParseJson (candidate : string) =
<span style="color:blue;">try</span> JsonNode.Parse candidate |> Some
<span style="color:blue;">with</span> | :? System.Text.Json.JsonException <span style="color:blue;">-></span> None</pre>
</p>
<p>
If there's a way to do that without a <code>try/with</code> expression, I couldn't find it. Likewise, trying to parse an integer turns out to be surprisingly complicated:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:blue;">private</span> tryParseInt (node : JsonNode) =
<span style="color:blue;">match</span> node <span style="color:blue;">with</span>
| <span style="color:blue;">null</span> <span style="color:blue;">-></span> None
| _ <span style="color:blue;">-></span>
<span style="color:blue;">if</span> node.GetValueKind () = JsonValueKind.Number
<span style="color:blue;">then</span>
<span style="color:blue;">try</span> node |> int |> Some
<span style="color:blue;">with</span> | :? FormatException <span style="color:blue;">-></span> None <span style="color:green;">// Thrown on decimal numbers</span>
<span style="color:blue;">else</span> None</pre>
</p>
<p>
Both <code>tryParseJson</code> and <code>tryParseInt</code> are, however, general-purpose functions, so if you have a lot of JSON you need to parse, you can put them in a reusable library.
</p>
<p>
With those building blocks you can now define a function to parse a <code>Table</code>:
</p>
<p>
<pre><span style="color:blue;">let</span> tryDeserializeTable (candidate : string) =
<span style="color:blue;">match</span> tryParseJson candidate <span style="color:blue;">with</span>
| Some (Single node) <span style="color:blue;">-></span> option {
<span style="color:blue;">let!</span> capacity = node[<span style="color:#a31515;">"capacity"</span>] |> tryParseInt
<span style="color:blue;">let!</span> minimalReservation = node[<span style="color:#a31515;">"minimalReservation"</span>] |> tryParseInt
<span style="color:blue;">return!</span> Table.trySingle capacity minimalReservation }
| Some (Communal node) <span style="color:blue;">-></span> option {
<span style="color:blue;">let!</span> capacity = node[<span style="color:#a31515;">"capacity"</span>] |> tryParseInt
<span style="color:blue;">return!</span> Table.tryCommunal capacity }
| _ <span style="color:blue;">-></span> None</pre>
</p>
<p>
Since both serialisation and deserialization is based on string values, you should write automated tests that verify that the code works, and in fact, I did. Here are a few examples:
</p>
<p>
<pre>[<Fact>]
<span style="color:blue;">let</span> ``Deserialize single table for 4`` () =
<span style="color:blue;">let</span> json = <span style="color:#a31515;">"""{"singleTable":{"capacity":4,"minimalReservation":3}}"""</span>
<span style="color:blue;">let</span> actual = tryDeserializeTable json
Table.trySingle 4 3 =! actual
[<Fact>]
<span style="color:blue;">let</span> ``Deserialize non-table`` () =
<span style="color:blue;">let</span> json = <span style="color:#a31515;">"""{"foo":42}"""</span>
<span style="color:blue;">let</span> actual = tryDeserializeTable json
None =! actual</pre>
</p>
<p>
Apart from module declaration and imports etc. this hand-written JSON capability requires 46 lines of code, although, to be fair, some of that code (<code>tryParseJson</code> and <code>tryParseInt</code>) are general-purpose functions that belong in a reusable library. Can we do better with static types and Reflection?
</p>
<h3 id="08491ec39df4485e83cd0b5cf80cdb7e">
JSON serialisation based on types <a href="#08491ec39df4485e83cd0b5cf80cdb7e">#</a>
</h3>
<p>
The static <a href="https://learn.microsoft.com/dotnet/api/system.text.json.jsonserializer">JsonSerializer</a> class comes with <code>Serialize<T></code> and <code>Deserialize<T></code> methods that use Reflection to convert a statically typed object to and from JSON. You can define a type (a <a href="https://en.wikipedia.org/wiki/Data_transfer_object">Data Transfer Object</a> (DTO) if you will) and let Reflection do the hard work.
</p>
<p>
In <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> I explain how you're usually better off separating the role of serialization from the role of Domain Model. One way to do that is exactly by defining a DTO for serialisation, and let the Domain Model remain exclusively to model the rules of the application. The above <code>Table</code> type plays the latter role, so we need new DTO types:
</p>
<p>
<pre><span style="color:blue;">type</span> CommunalTableDto = { Capacity : int }
<span style="color:blue;">type</span> SingleTableDto = { Capacity : int; MinimalReservation : int }
<span style="color:blue;">type</span> TableDto = {
CommunalTable : CommunalTableDto option
SingleTable : SingleTableDto option }</pre>
</p>
<p>
One way to model a <a href="https://en.wikipedia.org/wiki/Tagged_union">sum type</a> with a DTO is to declare both cases as <code>option</code> fields. While it does allow illegal states to be representable (i.e. both kinds of tables defined at the same time, or none of them present) this is only par for the course at the application boundary.
</p>
<p>
While you can serialize values of that type, by default the generated JSON doesn't have the right format:
</p>
<p>
<pre>> val dto: TableDto = { CommunalTable = Some { Capacity = 42 }
SingleTable = None }
> JsonSerializer.Serialize dto;;
val it: string = "{"CommunalTable":{"Capacity":42},"SingleTable":null}"</pre>
</p>
<p>
There are two problems with the generated JSON document:
</p>
<ul>
<li>The casing is wrong</li>
<li>The null value shouldn't be there</li>
</ul>
<p>
None of those are too hard to address, but it does make the API a bit more awkward to use, as this test demonstrates:
</p>
<p>
<pre>[<Fact>]
<span style="color:blue;">let</span> ``Serialize communal table via reflection`` () =
<span style="color:blue;">let</span> dto = { CommunalTable = Some { Capacity = 42 }; SingleTable = None }
<span style="color:blue;">let</span> actual =
JsonSerializer.Serialize (
dto,
JsonSerializerOptions (
PropertyNamingPolicy = JsonNamingPolicy.CamelCase,
IgnoreNullValues = <span style="color:blue;">true</span> ))
<span style="color:#a31515;">"""{"communalTable":{"capacity":42}}"""</span> =! actual</pre>
</p>
<p>
You can, of course, define this particular serialization behaviour as a reusable function, so it's not a problem that you can't address. I just wanted to include this, since it's part of the overall work that you have to do in order to make this work.
</p>
<h3 id="a215bb56b64446afbe2ca6861f724126">
JSON deserialisation based on types <a href="#a215bb56b64446afbe2ca6861f724126">#</a>
</h3>
<p>
To allow parsing of JSON into the above DTO the Reflection-based <code>Deserialize</code> method pretty much works out of the box, although again, it needs to be configured. Here's a passing test that demonstrates how that works:
</p>
<p>
<pre>[<Fact>]
<span style="color:blue;">let</span> ``Deserialize single table via reflection`` () =
<span style="color:blue;">let</span> json = <span style="color:#a31515;">"""{"singleTable":{"capacity":4,"minimalReservation":2}}"""</span>
<span style="color:blue;">let</span> actual =
JsonSerializer.Deserialize<TableDto> (
json,
JsonSerializerOptions ( PropertyNamingPolicy = JsonNamingPolicy.CamelCase ))
{
CommunalTable = None
SingleTable = Some { Capacity = 4; MinimalReservation = 2 }
} =! actual</pre>
</p>
<p>
There's only difference in casing, so you'd expect the <code>Deserialize</code> method to be a <a href="https://martinfowler.com/bliki/TolerantReader.html">Tolerant Reader</a>, but no. It's very particular about that, so the <code>JsonNamingPolicy.CamelCase</code> configuration is necessary. Perhaps the API designers found that <a href="https://peps.python.org/pep-0020/">explicit is better than implicit</a>.
</p>
<p>
In any case, you could package that in a reusable <code>Deserialize</code> function that has all the options that are appropriate in a particular code context, so not a big deal. That takes care of actually writing and parsing JSON, but that's only half the battle. This only gives you a way to parse and serialize the DTO. What you ultimately want is to persist or dehydrate <code>Table</code> data.
</p>
<h3 id="8faed9f5f0b149d68ec5e1a457046e59">
Converting DTO to Domain Model, and vice versa <a href="#8faed9f5f0b149d68ec5e1a457046e59">#</a>
</h3>
<p>
As usual, converting a nice, encapsulated value to a more relaxed format is safe and trivial:
</p>
<p>
<pre><span style="color:blue;">let</span> toTableDto = <span style="color:blue;">function</span>
| SingleTable (NaturalNumber capacity, NaturalNumber minimalReservation) <span style="color:blue;">-></span>
{
CommunalTable = None
SingleTable =
Some
{
Capacity = capacity
MinimalReservation = minimalReservation
}
}
| CommunalTable (NaturalNumber capacity) <span style="color:blue;">-></span>
{ CommunalTable = Some { Capacity = capacity }; SingleTable = None }</pre>
</p>
<p>
Going the other way is <a href="https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/">fundamentally a parsing exercise</a>:
</p>
<p>
<pre><span style="color:blue;">let</span> tryParseTableDto candidate =
<span style="color:blue;">match</span> candidate.CommunalTable, candidate.SingleTable <span style="color:blue;">with</span>
| Some { Capacity = capacity }, None <span style="color:blue;">-></span> Table.tryCommunal capacity
| None, Some { Capacity = capacity; MinimalReservation = minimalReservation } <span style="color:blue;">-></span>
Table.trySingle capacity minimalReservation
| _ <span style="color:blue;">-></span> None</pre>
</p>
<p>
Such an operation may fail, so the result is a <code>Table option</code>. It could also have been a <code>Result<Table, 'something></code>, if you wanted to return information about errors when things go wrong. It makes the code marginally more complex, but doesn't change the overall thrust of this exploration.
</p>
<p>
Ironically, while <code>tryParseTableDto</code> is actually more complex than <code>toTableDto</code> it looks smaller, or at least denser.
</p>
<p>
Let's take stock of the type-based alternative. It requires 26 lines of code, distributed over three DTO types and the two conversions <code>tryParseTableDto</code> and <code>toTableDto</code>, but here I haven't counted configuration of <code>Serialize</code> and <code>Deserialize</code>, since I left that to each test case that I wrote. Since all of this code generally stays within 80 characters in line width, that would realistically add another 10 lines of code, for a total around 36 lines.
</p>
<p>
This is smaller than the DOM-based code, although at the same magnitude.
</p>
<h3 id="2292c83546d441159ece864f3636cd41">
Conclusion <a href="#2292c83546d441159ece864f3636cd41">#</a>
</h3>
<p>
In this article I've explored two alternatives for converting a well-encapsulated Domain Model to and from JSON. One option is to directly manipulate the DOM. Another option is take a more declarative approach and define <em>types</em> that model the shape of the JSON data, and then leverage type-based automation (here, Reflection) to automatically parse and write the JSON.
</p>
<p>
I've deliberately chosen a Domain Model with some constraints, in order to demonstrate how persisting a non-trivial data model might work. With that setup, writing 'loosely coupled' code directly against the DOM requires 46 lines of code, while taking advantage of type-based automation requires 36 lines of code. Contrary to <a href="/2023/12/11/serializing-restaurant-tables-in-haskell">the Haskell example</a>, Reflection does seem to edge out a win this round.
</p>
<p>
<strong>Next:</strong> <a href="/2023/12/25/serializing-restaurant-tables-in-c">Serializing restaurant tables in C#</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Serializing restaurant tables in Haskellhttps://blog.ploeh.dk/2023/12/11/serializing-restaurant-tables-in-haskell2023-12-11T07:35:00+00:00Mark Seemann
<div id="post">
<p>
<em>Using Aeson, with and without generics.</em>
</p>
<p>
This article is part of a short series of articles about <a href="/2023/12/04/serialization-with-and-without-reflection">serialization with and without Reflection</a>. In this instalment I'll explore some options for serializing <a href="https://en.wikipedia.org/wiki/JSON">JSON</a> using <a href="https://hackage.haskell.org/package/aeson">Aeson</a>.
</p>
<p>
The source code is <a href="https://github.com/ploeh/HaskellJSONSerialization">available on GitHub</a>.
</p>
<h3 id="013a78e039024c81a204ca10f1a7af69">
Natural numbers <a href="#013a78e039024c81a204ca10f1a7af69">#</a>
</h3>
<p>
Before we start investigating how to serialize to and from JSON, we must have something to serialize. As described in the <a href="/2023/12/04/serialization-with-and-without-reflection">introductory article</a> we'd like to parse and write restaurant table configurations like this:
</p>
<p>
<pre>{
<span style="color:#2e75b6;">"singleTable"</span>: {
<span style="color:#2e75b6;">"capacity"</span>: 16,
<span style="color:#2e75b6;">"minimalReservation"</span>: 10
}
}</pre>
</p>
<p>
On the other hand, I'd like to represent the Domain Model in a way that <a href="/2022/10/24/encapsulation-in-functional-programming">encapsulates the rules</a> governing the model, <a href="https://blog.janestreet.com/effective-ml-video/">making illegal states unrepresentable</a>.
</p>
<p>
As the first step, we observe that the numbers involved are all <a href="https://en.wikipedia.org/wiki/Natural_number">natural numbers</a>. While <a href="/2022/01/24/type-level-di-container-prototype">I'm aware</a> that <a href="https://www.haskell.org/">Haskell</a> has built-in <a href="https://hackage.haskell.org/package/base/docs/GHC-TypeLits.html#t:Nat">Nat</a> type, I choose not to use it here, for a couple of reasons. One is that <code>Nat</code> is intended for type-level programming, and while this <em>might</em> be useful here, I don't want to pull in more exotic language features than are required. Another reason is that, in this domain, I want to model natural numbers as excluding zero (and I honestly don't remember if <code>Nat</code> allows zero, but I <em>think</em> that it does..?).
</p>
<p>
Another option is to use <a href="/2019/05/13/peano-catamorphism">Peano numbers</a>, but again, for didactic reasons, I'll stick with something a bit more idiomatic.
</p>
<p>
You can easily introduce a wrapper over, say, <code>Integer</code>, to model natural numbers:
</p>
<p>
<pre><span style="color:blue;">newtype</span> Natural = Natural Integer <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Eq</span>, <span style="color:#2b91af;">Ord</span>, <span style="color:#2b91af;">Show</span>)</pre>
</p>
<p>
This, however, doesn't prevent you from writing <code>Natural (-1)</code>, so we need to make this a <a href="https://www.hillelwayne.com/post/constructive/">predicative data type</a>. The first step is to only export the type, but <em>not</em> its data constructor:
</p>
<p>
<pre><span style="color:blue;">module</span> Restaurants (
<span style="color:blue;">Natural</span>,
<span style="color:green;">-- More exports here...</span>
) <span style="color:blue;">where</span></pre>
</p>
<p>
But this makes it impossible for client code to create values of the type, so we need to supply a <a href="https://wiki.haskell.org/Smart_constructors">smart constructor</a>:
</p>
<p>
<pre><span style="color:#2b91af;">tryNatural</span> <span style="color:blue;">::</span> <span style="color:#2b91af;">Integer</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Maybe</span> <span style="color:blue;">Natural</span>
tryNatural n
| n < 1 = Nothing
| <span style="color:blue;">otherwise</span> = Just (Natural n)</pre>
</p>
<p>
In this, as well as the other articles in this series, I've chosen to model the potential for errors with <code>Maybe</code> values. I could also have chosen to use <code>Either</code> if I wanted to communicate information along the 'error channel', but sticking with <code>Maybe</code> makes the code a bit simpler. Not so much in Haskell or <a href="https://fsharp.org/">F#</a>, but once we reach C#, <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">applicative validation</a> becomes complicated.
</p>
<p>
There's no loss of generality in this decision, since both <code>Maybe</code> and <code>Either</code> are <code>Applicative</code> instances.
</p>
<p>
With the <code>tryNatural</code> function you can now (attempt to) create <code>Natural</code> values:
</p>
<p>
<pre>ghci> tryNatural (-1)
Nothing
ghci> x = tryNatural 42
ghci> x
Just (Natural 42)</pre>
</p>
<p>
This enables client developers to create <code>Natural</code> values, and due to the type's <code>Ord</code> instance, you can even compare them:
</p>
<p>
<pre>ghci> y = tryNatural 2112
ghci> x < y
True</pre>
</p>
<p>
Even so, there will be cases when you need to extract the underlying <code>Integer</code> from a <code>Natural</code> value. You could supply a normal function for that purpose, but in order to make some of the following code a little more elegant, I chose to do it with pattern synonyms:
</p>
<p>
<pre>{-# COMPLETE N #-}
pattern N :: Integer -> Natural
pattern N i <- Natural i</pre>
</p>
<p>
That needs to be exported as well.
</p>
<p>
So, eight lines of code to declare a predicative type that models a natural number. Incidentally, this'll be 2-3 lines of code in F#.
</p>
<h3 id="1e5a116c8a6f4a928cbea3f88eed42ad">
Domain Model <a href="#1e5a116c8a6f4a928cbea3f88eed42ad">#</a>
</h3>
<p>
Modelling a restaurant table follows in the same vein. One invariant I would like to enforce is that for a 'single' table, the minimal reservation should be a <code>Natural</code> number less than or equal to the table's capacity. It doesn't make sense to configure a table for four with a minimum reservation of six.
</p>
<p>
In the same spirit as above, then, define this type:
</p>
<p>
<pre><span style="color:blue;">data</span> SingleTable = SingleTable
{ singleCapacity :: Natural
, minimalReservation :: Natural
} <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Eq</span>, <span style="color:#2b91af;">Ord</span>, <span style="color:#2b91af;">Show</span>)</pre>
</p>
<p>
Again, only export the type, but not its data constructor. In order to extract values, then, supply another pattern synonym:
</p>
<p>
<pre>{-# COMPLETE SingleT #-}
pattern SingleT :: Natural -> Natural -> SingleTable
pattern SingleT c m <- SingleTable c m</pre>
</p>
<p>
Finally, define a <code>Table</code> type and two smart constructors:
</p>
<p>
<pre><span style="color:blue;">data</span> Table = Single SingleTable | Communal Natural <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Eq</span>, <span style="color:#2b91af;">Show</span>)
<span style="color:#2b91af;">trySingleTable</span> <span style="color:blue;">::</span> <span style="color:#2b91af;">Integer</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Integer</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Maybe</span> <span style="color:blue;">Table</span>
trySingleTable capacity minimal = <span style="color:blue;">do</span>
c <- tryNatural capacity
m <- tryNatural minimal
<span style="color:blue;">if</span> c < m <span style="color:blue;">then</span> Nothing <span style="color:blue;">else</span> Just (Single (SingleTable c m))
<span style="color:#2b91af;">tryCommunalTable</span> <span style="color:blue;">::</span> <span style="color:#2b91af;">Integer</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Maybe</span> <span style="color:blue;">Table</span>
tryCommunalTable = <span style="color:blue;">fmap</span> Communal . tryNatural</pre>
</p>
<p>
Notice that <code>trySingleTable</code> checks the invariant that the <code>capacity</code> must be greater than or equal to the minimal reservation.
</p>
<p>
The point of this little exercise, so far, is that it <em>encapsulates</em> the contract implied by the Domain Model. It does this by using the static type system to its advantage.
</p>
<h3 id="a0f3606b82374c349dc0bf116db14493">
JSON serialization by hand <a href="#a0f3606b82374c349dc0bf116db14493">#</a>
</h3>
<p>
At the boundaries of applications, however, <a href="/2023/10/16/at-the-boundaries-static-types-are-illusory">there are no static types</a>. Is the static type system still useful in that situation?
</p>
<p>
For Haskell, the most common JSON library is Aeson, and I admit that I'm no expert. Thus, it's possible that there's an easier way to serialize to and deserialize from JSON. If so, please leave a comment explaining the alternative.
</p>
<p>
The original rationale for this article series was to demonstrate how serialization can be done without Reflection, or, in the case of Haskell, <a href="https://hackage.haskell.org/package/base/docs/GHC-Generics.html">Generics</a> (not to be confused with .NET generics, which in Haskell usually is called <em>parametric polymorphism</em>). We'll return to Generics later in this article.
</p>
<p>
In this article series, I consider the JSON format fixed. A single table should be rendered as shown above, and a communal table should be rendered like this:
</p>
<p>
<pre>{ <span style="color:#2e75b6;">"communalTable"</span>: { <span style="color:#2e75b6;">"capacity"</span>: 42 } }</pre>
</p>
<p>
Often in the real world you'll have to conform to a particular protocol format, or, even if that's not the case, being able to control the shape of the wire format is important to deal with backwards compatibility.
</p>
<p>
As I outlined in the <a href="/2023/12/04/serialization-with-and-without-reflection">introduction article</a> you can usually find a more weakly typed API to get the job done. For serializing <code>Table</code> to JSON it looks like this:
</p>
<p>
<pre><span style="color:blue;">newtype</span> JSONTable = JSONTable Table <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Eq</span>, <span style="color:#2b91af;">Show</span>)
<span style="color:blue;">instance</span> <span style="color:blue;">ToJSON</span> <span style="color:blue;">JSONTable</span> <span style="color:blue;">where</span>
toJSON (JSONTable (Single (SingleT (N c) (N m)))) =
object [<span style="color:#a31515;">"singleTable"</span> .= object [
<span style="color:#a31515;">"capacity"</span> .= c,
<span style="color:#a31515;">"minimalReservation"</span> .= m]]
toJSON (JSONTable (Communal (N c))) =
object [<span style="color:#a31515;">"communalTable"</span> .= object [<span style="color:#a31515;">"capacity"</span> .= c]]</pre>
</p>
<p>
In order to separate concerns, I've defined this functionality in a new module that references the module that defines the Domain Model. Thus, to avoid orphan instances, I've defined a <code>JSONTable</code> <code>newtype</code> wrapper that I then make a <code>ToJSON</code> instance.
</p>
<p>
The <code>toJSON</code> function pattern-matches on <code>Single</code> and <code>Communal</code> to write two different <a href="https://hackage.haskell.org/package/aeson/docs/Data-Aeson.html#t:Value">Values</a>, using Aeson's underlying Document Object Model (DOM).
</p>
<h3 id="9c3c05517701474d8ac88c59e6fd11e7">
JSON deserialization by hand <a href="#9c3c05517701474d8ac88c59e6fd11e7">#</a>
</h3>
<p>
You can also go the other way, and when it looks more complicated, it's because it is. When serializing an encapsulated value, not a lot can go wrong because the value is already valid. When deserializing a JSON string, on the other hand, all sorts of things can go wrong: It might not even be a valid string, or the string may not be valid JSON, or the JSON may not be a valid <code>Table</code> representation, or the values may be illegal, etc.
</p>
<p>
It's no surprise, then, that the <code>FromJSON</code> instance is bigger:
</p>
<p>
<pre><span style="color:blue;">instance</span> <span style="color:blue;">FromJSON</span> <span style="color:blue;">JSONTable</span> <span style="color:blue;">where</span>
parseJSON (Object v) = <span style="color:blue;">do</span>
single <- v .:? <span style="color:#a31515;">"singleTable"</span>
communal <- v .:? <span style="color:#a31515;">"communalTable"</span>
<span style="color:blue;">case</span> (single, communal) <span style="color:blue;">of</span>
(Just s, Nothing) -> <span style="color:blue;">do</span>
capacity <- s .: <span style="color:#a31515;">"capacity"</span>
minimal <- s .: <span style="color:#a31515;">"minimalReservation"</span>
<span style="color:blue;">case</span> trySingleTable capacity minimal <span style="color:blue;">of</span>
Nothing -> <span style="color:blue;">fail</span> <span style="color:#a31515;">"Expected natural numbers."</span>
Just t -> <span style="color:blue;">return</span> $ JSONTable t
(Nothing, Just c) -> <span style="color:blue;">do</span>
capacity <- c .: <span style="color:#a31515;">"capacity"</span>
<span style="color:blue;">case</span> tryCommunalTable capacity <span style="color:blue;">of</span>
Nothing -> <span style="color:blue;">fail</span> <span style="color:#a31515;">"Expected a natural number."</span>
Just t -> <span style="color:blue;">return</span> $ JSONTable t
_ -> <span style="color:blue;">fail</span> <span style="color:#a31515;">"Expected exactly one of singleTable or communalTable."</span>
parseJSON _ = <span style="color:blue;">fail</span> <span style="color:#a31515;">"Expected an object."</span></pre>
</p>
<p>
I could probably have done this more succinctly if I'd spent even more time on it than I already did, but it gets the job done and demonstrates the point. Instead of relying on run-time Reflection, the <code>FromJSON</code> instance is, unsurprisingly, a parser, composed from Aeson's specialised parser combinator API.
</p>
<p>
Since both serialisation and deserialization is based on string values, you should write automated tests that verify that the code works.
</p>
<p>
Apart from module declaration and imports etc. this hand-written JSON capability requires 27 lines of code. Can we do better with static types and Generics?
</p>
<h3 id="ddac03fba5134f0da9e613c29888ce83">
JSON serialisation based on types <a href="#ddac03fba5134f0da9e613c29888ce83">#</a>
</h3>
<p>
The intent with the Aeson library is that you define a type (a <a href="https://en.wikipedia.org/wiki/Data_transfer_object">Data Transfer Object</a> (DTO) if you will), and then let 'compiler magic' do the rest. In Haskell, it's not run-time Reflection, but a compilation technology called Generics. As I understand it, it automatically 'writes' the serialization and parsing code and turns it into machine code as part of normal compilation.
</p>
<p>
You're supposed to first turn on the
</p>
<p>
<pre>{-# <span style="color:gray;">LANGUAGE</span> DeriveGeneric #-}</pre>
</p>
<p>
language pragma and then tell the compiler to automatically derive <code>Generic</code> for the DTO in question. You'll see an example of that shortly.
</p>
<p>
It's a fairly flexible system that you can tweak in various ways, but if it's possible to do it directly with the above <code>Table</code> type, please leave a comment explaining how. I tried, but couldn't make it work. To be clear, I <em>could</em> make it serializable, but not to the above JSON format. <ins datetime="2023-12-11T11:06Z">After enough <a href="/2023/10/02/dependency-whac-a-mole">Aeson Whac-A-Mole</a> I decided to change tactics.</ins>
</p>
<p>
In <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> I explain how you're usually better off separating the role of serialization from the role of Domain Model. The way to do that is exactly by defining a DTO for serialisation, and let the Domain Model remain exclusively to model the rules of the application. The above <code>Table</code> type plays the latter role, so we need new DTO types.
</p>
<p>
We may start with the building blocks:
</p>
<p>
<pre><span style="color:blue;">newtype</span> CommunalDTO = CommunalDTO
{ communalCapacity :: Integer
} <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Eq</span>, <span style="color:#2b91af;">Show</span>, <span style="color:#2b91af;">Generic</span>)</pre>
</p>
<p>
Notice how it declaratively derives <code>Generic</code>, which works because of the <code>DeriveGeneric</code> language pragma.
</p>
<p>
From here, in principle, all that you need is just a single declaration to make it serializable:
</p>
<p>
<pre><span style="color:blue;">instance</span> <span style="color:blue;">ToJSON</span> <span style="color:blue;">CommunalDTO</span></pre>
</p>
<p>
While it does serialize to JSON, it doesn't have the right format:
</p>
<p>
<pre>{ <span style="color:#2e75b6;">"communalCapacity"</span>: 42 }</pre>
</p>
<p>
The property name should be <code>capacity</code>, not <code>communalCapacity</code>. Why did I call the record field <code>communalCapacity</code> instead of <code>capacity</code>? Can't I just fix my <code>CommunalDTO</code> record?
</p>
<p>
Unfortunately, I can't just do that, because I also need a <code>capacity</code> JSON property for the single-table case, and Haskell isn't happy about duplicated field names in the same module. (This language feature truly is one of the weak points of Haskell.)
</p>
<p>
Instead, I can tweak the Aeson rules by supplying an <code>Options</code> value to the instance definition:
</p>
<p>
<pre><span style="color:#2b91af;">communalJSONOptions</span> <span style="color:blue;">::</span> <span style="color:blue;">Options</span>
communalJSONOptions =
defaultOptions {
fieldLabelModifier = \s -> <span style="color:blue;">case</span> s <span style="color:blue;">of</span>
<span style="color:#a31515;">"communalCapacity"</span> -> <span style="color:#a31515;">"capacity"</span>
_ -> s }
<span style="color:blue;">instance</span> <span style="color:blue;">ToJSON</span> <span style="color:blue;">CommunalDTO</span> <span style="color:blue;">where</span>
toJSON = genericToJSON communalJSONOptions
toEncoding = genericToEncoding communalJSONOptions</pre>
</p>
<p>
This instructs the compiler to modify how it generates the serialization code, and the generated JSON fragment is now correct.
</p>
<p>
We can do the same with the single-table case:
</p>
<p>
<pre><span style="color:blue;">data</span> SingleDTO = SingleDTO
{ singleCapacity :: Integer
, minimalReservation :: Integer
} <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Eq</span>, <span style="color:#2b91af;">Show</span>, <span style="color:#2b91af;">Generic</span>)
<span style="color:#2b91af;">singleJSONOptions</span> <span style="color:blue;">::</span> <span style="color:blue;">Options</span>
singleJSONOptions =
defaultOptions {
fieldLabelModifier = \s -> <span style="color:blue;">case</span> s <span style="color:blue;">of</span>
<span style="color:#a31515;">"singleCapacity"</span> -> <span style="color:#a31515;">"capacity"</span>
<span style="color:#a31515;">"minimalReservation"</span> -> <span style="color:#a31515;">"minimalReservation"</span>
_ -> s }
<span style="color:blue;">instance</span> <span style="color:blue;">ToJSON</span> <span style="color:blue;">SingleDTO</span> <span style="color:blue;">where</span>
toJSON = genericToJSON singleJSONOptions
toEncoding = genericToEncoding singleJSONOptions</pre>
</p>
<p>
This takes care of that case, but we still need a container type that will hold either one or the other:
</p>
<p>
<pre><span style="color:blue;">data</span> TableDTO = TableDTO
{ singleTable :: Maybe SingleDTO
, communalTable :: Maybe CommunalDTO
} <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Eq</span>, <span style="color:#2b91af;">Show</span>, <span style="color:#2b91af;">Generic</span>)
<span style="color:#2b91af;">tableJSONOptions</span> <span style="color:blue;">::</span> <span style="color:blue;">Options</span>
tableJSONOptions =
defaultOptions { omitNothingFields = True }
<span style="color:blue;">instance</span> <span style="color:blue;">ToJSON</span> <span style="color:blue;">TableDTO</span> <span style="color:blue;">where</span>
toJSON = genericToJSON tableJSONOptions
toEncoding = genericToEncoding tableJSONOptions</pre>
</p>
<p>
One way to model a <a href="https://en.wikipedia.org/wiki/Tagged_union">sum type</a> with a DTO is to declare both cases as <code>Maybe</code> fields. While it does allow illegal states to be representable (i.e. both kinds of tables defined at the same time, or none of them present) this is only par for the course at the application boundary.
</p>
<p>
That's quite a bit of infrastructure to stand up, but at least most of it can be reused for parsing.
</p>
<h3 id="0a507081076c4afe9e2197f013f6f107">
JSON deserialisation based on types <a href="#0a507081076c4afe9e2197f013f6f107">#</a>
</h3>
<p>
To allow parsing of JSON into the above DTO we can make them all <code>FromJSON</code> instances, e.g.:
</p>
<p>
<pre><span style="color:blue;">instance</span> <span style="color:blue;">FromJSON</span> <span style="color:blue;">CommunalDTO</span> <span style="color:blue;">where</span>
parseJSON = genericParseJSON communalJSONOptions</pre>
</p>
<p>
Notice that you can reuse the same <code>communalJSONOptions</code> used for the <code>ToJSON</code> instance. Repeat that exercise for the two other record types.
</p>
<p>
That's only half the battle, though, since this only gives you a way to parse and serialize the DTO. What you ultimately want is to persist or dehydrate <code>Table</code> data.
</p>
<h3 id="7e234848ee2d48b8a1da42c0e05bb088">
Converting DTO to Domain Model, and vice versa <a href="#7e234848ee2d48b8a1da42c0e05bb088">#</a>
</h3>
<p>
As usual, converting a nice, encapsulated value to a more relaxed format is safe and trivial:
</p>
<p>
<pre><span style="color:#2b91af;">toTableDTO</span> <span style="color:blue;">::</span> <span style="color:blue;">Table</span> <span style="color:blue;">-></span> <span style="color:blue;">TableDTO</span>
toTableDTO (Single (SingleT (N c) (N m))) = TableDTO (Just (SingleDTO c m)) Nothing
toTableDTO (Communal (N c)) = TableDTO Nothing (Just (CommunalDTO c))</pre>
</p>
<p>
Going the other way is fundamentally a parsing exercise:
</p>
<p>
<pre><span style="color:#2b91af;">tryParseTable</span> <span style="color:blue;">::</span> <span style="color:blue;">TableDTO</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Maybe</span> <span style="color:blue;">Table</span>
tryParseTable (TableDTO (Just (SingleDTO c m)) Nothing) = trySingleTable c m
tryParseTable (TableDTO Nothing (Just (CommunalDTO c))) = tryCommunalTable c
tryParseTable _ = Nothing</pre>
</p>
<p>
Such an operation may fail, so the result is a <code>Maybe Table</code>. It could also have been an <code>Either something Table</code>, if you wanted to return information about errors when things go wrong. It makes the code marginally more complex, but doesn't change the overall thrust of this exploration.
</p>
<p>
Let's take stock of the type-based alternative. It requires 62 lines of code, distributed over three DTO types, their <code>Options</code>, their <code>ToJSON</code> and <code>FromJSON</code> instances, and finally the two conversions <code>tryParseTable</code> and <code>toTableDTO</code>.
</p>
<h3 id="473295252dbf46bd93fc31b1d2f505e5">
Conclusion <a href="#473295252dbf46bd93fc31b1d2f505e5">#</a>
</h3>
<p>
In this article I've explored two alternatives for converting a well-encapsulated Domain Model to and from JSON. One option is to directly manipulate the DOM. Another option is take a more declarative approach and define <em>types</em> that model the shape of the JSON data, and then leverage type-based automation (here, Generics) to automatically produce the code that parses and writes the JSON.
</p>
<p>
I've deliberately chosen a Domain Model with some constraints, in order to demonstrate how persisting a non-trivial data model might work. With that setup, writing 'loosely coupled' code directly against the DOM requires 27 lines of code, while 'taking advantage' of type-based automation requires 62 lines of code.
</p>
<p>
To be fair, the dice don't always land that way. You can't infer a general rule from a single example, and it's possible that I could have done something clever with Aeson to reduce the code. Even so, I think that there's a conclusion to be drawn, and it's this:
</p>
<p>
Type-based automation (Generics, or run-time Reflection) may seem simple at first glance. Just declare a type and let some automation library do the rest. It may happen, however, that you need to tweak the defaults so much that it would be easier skipping the type-based approach and instead directly manipulating the DOM.
</p>
<p>
I love static type systems, but I'm also watchful of their limitations. There's likely to be an inflection point where, on the one side, a type-based declarative API is best, while on the other side of that point, a more 'lightweight' approach is better.
</p>
<p>
The position of such an inflection point will vary from context to context. Just be aware of the possibility, and explore alternatives if things begin to feel awkward.
</p>
<p>
<strong>Next:</strong> <a href="/2023/12/18/serializing-restaurant-tables-in-f">Serializing restaurant tables in F#</a>.
</p>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Serialization with and without Reflectionhttps://blog.ploeh.dk/2023/12/04/serialization-with-and-without-reflection2023-12-04T20:53:00+00:00Mark Seemann
<div id="post">
<p>
<em>An investigation of alternatives.</em>
</p>
<p>
I recently wrote a tweet that caused more responses than usual:
</p>
<blockquote>
<p>
"A decade ago, I used .NET Reflection so often that I know most the the API by heart.
</p>
<p>
"Since then, I've learned better ways to solve my problems. I can't remember when was the last time I used .NET Reflection. I never need it.
</p>
<p>
"Do you?"
</p>
<footer><cite><a href="https://twitter.com/ploeh/status/1727280699051495857">me</a></cite></footer>
</blockquote>
<p>
Most people who read my tweets are programmers, and some are, perhaps, not entirely neurotypical, but I intended the last paragraph to be a <a href="https://en.wikipedia.org/wiki/Rhetorical_question">rhetorical question</a>. My point, really, was to point out that if I tell you it's possible to do without Reflection, one or two readers might keep that in mind and at least explore options the next time the urge to use Reflection arises.
</p>
<p>
A common response was that Reflection is useful for (de)serialization of data. These days, the most common case is going to and from <a href="https://en.wikipedia.org/wiki/JSON">JSON</a>, but the problem is similar if the format is <a href="https://en.wikipedia.org/wiki/XML">XML</a>, <a href="https://en.wikipedia.org/wiki/Comma-separated_values">CSV</a>, or another format. In a sense, even <a href="/2023/10/16/at-the-boundaries-static-types-are-illusory">reading to and from a database is a kind of serialization</a>.
</p>
<p>
In this little series of articles, I'm going to explore some alternatives to Reflection. I'll use the same example throughout, and I'll stick to JSON, but you can easily extrapolate to other serialization formats.
</p>
<h3 id="4b1340f8fad4452f950c3cfdde275365">
Table layouts <a href="#4b1340f8fad4452f950c3cfdde275365">#</a>
</h3>
<p>
As always, I find the example domain of online restaurant reservation systems to be so rich as to furnish a useful example. Imagine a multi-tenant service that enables restaurants to take and manage reservations.
</p>
<p>
When a new reservation request arrives, the system has to make a decision on whether to accept or reject the request. The layout, or configuration, of <a href="/2020/01/27/the-maitre-d-kata">tables plays a role in that decision</a>.
</p>
<p>
Such a multi-tenant system may have an API for configuring the restaurant; essentially, entering data into the system about the size and policies regarding tables in a particular restaurant.
</p>
<p>
Most restaurants have 'normal' tables where, if you reserve a table for three, you'll have the entire table for a duration. Some restaurants also have one or more communal tables, typically bar seating where you may get a view of the kitchen. Quite a few high-end restaurants have tables like these, because it enables them to cater to single diners without reserving an entire table that could instead have served two paying customers.
</p>
<p>
<img src="/content/binary/ernst.jpg" alt="Bar seating at Ernst, Berlin.">
</p>
<p>
In Copenhagen, on the other hand, it's also not uncommon to have a special room for larger parties. I think this has something to do with the general age of the buildings in the city. Most establishments are situated in older buildings, with all the trappings, including load-bearing walls, cellars, etc. As part of a restaurant's location, there may be a big cellar room, second-story room, or other room that's not practical for the daily operation of the place, but which works for parties of, say, 15-30 people. Such 'private dining' rooms can be used for private occasions or company outings.
</p>
<p>
A <a href="https://en.wikipedia.org/wiki/Ma%C3%AEtre_d%27h%C3%B4tel">maître d'hôtel</a> may wish to configure the system with a variety of tables, including communal tables, and private dining tables as described above.
</p>
<p>
One way to model such requirements is to distinguish between two kinds of tables: Communal tables, and 'single' tables, and where single tables come with an additional property that models the minimal reservation required to reserve that table. A JSON representation might look like this:
</p>
<p>
<pre>{
<span style="color:#2e75b6;">"singleTable"</span>: {
<span style="color:#2e75b6;">"capacity"</span>: 16,
<span style="color:#2e75b6;">"minimalReservation"</span>: 10
}
}</pre>
</p>
<p>
This may represent a private dining table that seats up to sixteen people, and where the maître d'hôtel has decided to only accept reservations for at least ten guests.
</p>
<p>
A <code>singleTable</code> can also be used to model 'normal' tables without special limits. If the restaurant has a table for four, but is ready to accept a reservation for one person, you can configure a table for four, with a minimum reservation of one.
</p>
<p>
Communal tables are different, though:
</p>
<p>
<pre>{ <span style="color:#2e75b6;">"communalTable"</span>: { <span style="color:#2e75b6;">"capacity"</span>: 10 } }</pre>
</p>
<p>
Why not just model that as ten single tables that each seat one?
</p>
<p>
You don't want to do that because you want to make sure that parties can eat together. Some restaurants have more than one communal table. Imagine that you only have two communal tables of ten seats each. What happens if you model this as twenty single-person tables?
</p>
<p>
If you do that, you may accept reservations for parties of six, six, and six, because <em>6 + 6 + 6 = 18 < 20</em>. When those three groups arrive, however, you discover that you have to split one of the parties! The party getting separated may not like that at all, and you are, after all, in the hospitality business.
</p>
<h3 id="368ebde4f05e434b938327469aba1640">
Exploration <a href="#368ebde4f05e434b938327469aba1640">#</a>
</h3>
<p>
In each article in this short series, I'll explore serialization with and without Reflection in a few languages. I'll start with <a href="https://www.haskell.org/">Haskell</a>, since that language doesn't have run-time Reflection. It does have a related facility called <em>generics</em>, not to be confused with .NET or Java generics, which in Haskell are called <em>parametric polymorphism</em>. It's confusing, I know.
</p>
<p>
Haskell generics look a bit like .NET Reflection, and there's some overlap, but it's not quite the same. The main difference is that Haskell generic programming all 'resolves' at compile time, so there's no run-time Reflection in Haskell.
</p>
<p>
If you don't care about Haskell, you can skip that article.
</p>
<ul>
<li><a href="/2023/12/11/serializing-restaurant-tables-in-haskell">Serializing restaurant tables in Haskell</a></li>
<li><a href="/2023/12/18/serializing-restaurant-tables-in-f">Serializing restaurant tables in F#</a></li>
<li><a href="/2023/12/25/serializing-restaurant-tables-in-c">Serializing restaurant tables in C#</a></li>
</ul>
<p>
As you can see, the next article repeats the exercise in <a href="https://fsharp.org/">F#</a>, and if you also don't care about that language, you can skip that article as well.
</p>
<p>
The C# article, on the other hand, should be readable to not only C# programmers, but also developers who work in sufficiently equivalent languages.
</p>
<h3 id="f367d5adccbe4d3792acc8f3828add85">
Descriptive, not prescriptive <a href="#f367d5adccbe4d3792acc8f3828add85">#</a>
</h3>
<p>
The purpose of this article series is only to showcase alternatives. Based on the reactions my tweet elicited I take it that some people can't imagine how serialisation might look without Reflection.
</p>
<p>
It is <em>not</em> my intent that you should eschew the Reflection-based APIs available in your languages. In .NET, for example, a framework like ASP.NET MVC expects you to model JSON or XML as <a href="https://en.wikipedia.org/wiki/Data_transfer_object">Data Transfer Objects</a>. This gives you an <a href="/2023/10/16/at-the-boundaries-static-types-are-illusory">illusion of static types at the boundary</a>.
</p>
<p>
Even a Haskell web library like <a href="https://www.servant.dev/">Servant</a> expects you to model web APIs with static types.
</p>
<p>
When working with such a framework, it doesn't always pay to fight against its paradigm. When I work with ASP.NET, I define DTOs just like everyone else. On the other hand, if communicating with a backend system, I <a href="/2022/01/03/to-id-or-not-to-id">sometimes choose to skip static types and instead working directly with a JSON Document Object Model</a> (DOM).
</p>
<p>
I occasionally find that it better fits my use case, but it's not the majority of times.
</p>
<h3 id="9af8390d4b1b4c50ae9148ea2eb841ad">
Conclusion <a href="#9af8390d4b1b4c50ae9148ea2eb841ad">#</a>
</h3>
<p>
While some sort of Reflection or metadata-driven mechanism is often used to implement serialisation, it often turns out that such convenient language capabilities are programmed on top of an ordinary object model. Even isolated to .NET, I think I'm on my third JSON library, and most (all?) turned out to have an underlying DOM that you can manipulate.
</p>
<p>
In this article I've set the stage for exploring how serialisation can work, with or (mostly) without Reflection.
</p>
<p>
If you're interested in the philosophy of science and <a href="https://en.wikipedia.org/wiki/Epistemology">epistemology</a>, you may have noticed a recurring discussion in academia: A wider society benefits not only from learning what works, but also from learning what doesn't work. It would be useful if researchers published their failures along with their successes, yet few do (for fairly obvious reasons).
</p>
<p>
Well, I depend neither on research grants nor salary, so I'm free to publish negative results, such as they are.
</p>
<p>
Not that I want to go so far as to categorize what I present in the present articles as useless, but they're probably best applied in special circumstances. On the other hand, I don't know <em>your context</em>, and perhaps you're doing something I can't even imagine, and what I present here is just what you need.
</p>
<p>
<strong>Next:</strong> <a href="/2023/12/11/serializing-restaurant-tables-in-haskell">Serializing restaurant tables in Haskell</a>.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="db4a9a94452a1237bf71989561dfd947">
<div class="comment-author">gdifolco <a href="#db4a9a94452a1237bf71989561dfd947">#</a></div>
<div class="comment-content">
<q>
<i>
I'll start with <a href="https://www.haskell.org/">Haskell</a>, since that language doesn't have run-time Reflection.
</i>
</q>
<p>
Haskell (the language) does not provide primitives to access to data representation <i>per-se</i>, during the compilation GHC (the compiler) erase a lot of information (more or less depending on the profiling flags) in order to provide to the run-time system (RTS) a minimal "bytecode".
</p>
<p>
That being said, provide three ways to deal structurally with values:
<ul>
<li><a href="https://wiki.haskell.org/Template_Haskell">TemplateHaskell</a>: give the ability to rewrite the AST at compile-time</li>
<li><a href="https://hackage.haskell.org/package/base-4.19.0.0/docs/GHC-Generics.html#t:Generic">Generics</a>: give the ability to have a type-level representation of a type structure</li>
<li><a href="https://hackage.haskell.org/package/base-4.19.0.0/docs/Type-Reflection.html#t:Typeable">Typeable</a>: give the ability to have a value-level representation of a type structure</li>
</ul>
</p>
<p>
<i>Template Haskell</i> is low-level as it implies to deal with the AST, it is also harder to debug as, in order to be evaluated, the main compilation phase is stopped, then the <i>Template Haskell</i> code is ran, and finally the main compilation phase continue. It also causes compilation cache issues.
</p>
<p>
<i>Generics</i> take type's structure and generate a representation at type-level, the main idea is to be able to go back and forth between the type and its representation, so you can define so behavior over a structure, the good thing being that, since the representation is known at compile-time, many optimizations can be done. On complex types it tends to slow-down compilation and produce larger-than-usual binaries, it is generraly the way libraries are implemented.
</p>
<p>
<i>Typeable</i> is a purely value-level type representation, you get only on type whatever the type structure is, it is generally used when you have "dynamic" types, it provides safe ways to do coercion.
</p>
<p>
Haskell tends to push as much things as possible in compile-time, this may explain this tendency.
</p>
</div>
<div class="comment-date">2023-12-05 21:47 UTC</div>
</div>
<div class="comment" id="9cf7ca25c1364eb39c489d9883d90cea">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#9cf7ca25c1364eb39c489d9883d90cea">#</a></div>
<div class="comment-content">
<p>
Thank you for writing. I was already aware of Template Haskell and Generics, but Typeable is new to me. I don't consider the first two equivalent to Reflection, since they resolve at compile time. They're more akin to automated code generation, I think. Reflection, as I'm used to it from .NET, is a run-time feature where you can inspect and interact with types and values as the code is executing.
</p>
<p>
I admit that I haven't had the time to more than browse the documentation of Typeable, and it's so abstract that it's not clear to me what it does, how it works, or whether it's actually comparable to Reflection. The first question that comes to my mind regards the type class <code>Typeable</code> itself. It has no instances. Is it one of those special type classes (like <a href="https://hackage.haskell.org/package/base/docs/Data-Coerce.html#t:Coercible">Coercible</a>) that one doesn't have to explicitly implement?
</p>
</div>
<div class="comment-date">2023-12-08 17:11 UTC</div>
</div>
<div class="comment" id="d3ad8ff86d11b285e69adc3ebbc74165">
<div class="comment-author">gdifolco <a href="#d3ad8ff86d11b285e69adc3ebbc74165">#</a></div>
<div class="comment-content">
<blockquote>
I don't consider the first two equivalent to Reflection, since they resolve at compile time. They're more akin to automated code generation, I think. Reflection, as I'm used to it from .NET, is a run-time feature where you can inspect and interact with types and values as the code is executing.
</blockquote>
<p>
I don't know the .NET ecosystem well, but I guess you can borrow information at run-time we have at compile-time with TemplateHaskell and Generics, I think you are right then.
</p>
<blockquote>
I admit that I haven't had the time to more than browse the documentation of Typeable, and it's so abstract that it's not clear to me what it does, how it works, or whether it's actually comparable to Reflection. The first question that comes to my mind regards the type class <code>Typeable</code> itself. It has no instances. Is it one of those special type classes (like <a href="https://hackage.haskell.org/package/base/docs/Data-Coerce.html#t:Coercible">Coercible</a>) that one doesn't have to explicitly implement?
</blockquote>
<p>
You can derive <code>Typeable</code> as any othe type classes:
</p>
<pre>
data MyType = MyType
{ myString :: String,
myInt :: Int
}
deriving stock (Eq, Show, Typeable)
</pre>
<p>
It's pretty "low-level", <code>typeRep</code> gives you a <code>TypeRep a</code> (<code>a</code> being the type represented) which is <a href="https://hackage.haskell.org/package/base-4.19.0.0/docs/Type-Reflection.html#v:App">a representation of the type with primitive elements</a> (<a href="https://www.youtube.com/watch?v=uR_VzYxvbxg">More details here</a>).
</p>
<p>
Then, you'll be able to either pattern match on it, or <a href="https://hackage.haskell.org/package/base-4.19.0.0/docs/Data-Typeable.html#v:cast">cast it</a> (which it not like casting in Java for example, because you are just proving to the compiler that two types are equivalent).
</p>
</div>
<div class="comment-date">2023-12-11 17:11 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Synchronizing concurrent teamshttps://blog.ploeh.dk/2023/11/27/synchronizing-concurrent-teams2023-11-27T08:43:00+00:00Mark Seemann
<div id="post">
<p>
<em>Or, rather: Try not to.</em>
</p>
<p>
A few months ago I visited a customer and as the day was winding down we got to talk more informally. One of the architects mentioned, in an almost off-hand manner, "we've embarked on a <a href="https://en.wikipedia.org/wiki/Scaled_agile_framework">SAFe</a> journey..."
</p>
<p>
"Yes..?" I responded, hoping that my inflection would sound enough like a question that he'd elaborate.
</p>
<p>
Unfortunately, I'm apparently sometimes too subtle when dealing with people face-to-face, so I never got to hear just how that 'SAFe journey' was going. Instead, the conversation shifted to the adjacent topic of how to coordinate independent teams.
</p>
<p>
I told them that, in my opinion, the best way to coordinate independent teams is to <em>not</em> coordinate them. I don't remember exactly how I proceeded from there, but I probably said something along the lines that I consider coordination meetings between teams to be an 'architecture smell'. That the need to talk to other teams was a symptom that teams were too tightly coupled.
</p>
<p>
I don't remember if I said exactly that, but it would have been in character.
</p>
<p>
The architect responded: "I don't like silos."
</p>
<p>
How do you respond to that?
</p>
<h3 id="c5fdd5e722994d8fb75bb3f32f1cf86b">
Autonomous teams <a href="#c5fdd5e722994d8fb75bb3f32f1cf86b">#</a>
</h3>
<p>
I couldn't very well respond that <em>silos are great</em>. First, it doesn't sound very convincing. Second, it'd be an argument suitable only in a kindergarten. <em>Are not! -Are too! -Not! -Too!</em> etc.
</p>
<p>
After feeling momentarily <a href="https://en.wikipedia.org/wiki/Check_(chess)">checked</a>, for once I managed to think on my feet, so I replied, "I don't suggest that your teams should be isolated from each other. I do encourage people to talk to each other, but I don't think that teams should <em>coordinate</em> much. Rather, think of each team as an organism on the savannah. They interact, and what they do impact others, but in the end they're autonomous life forms. I believe an architect's job is like a ranger's. You can't control the plants or animals, but you can nurture the ecosystem, herding it in a beneficial direction."
</p>
<p>
<img src="/content/binary/samburu.jpg" alt="Gazelles and warthogs in Samburu National Reserve, Kenya.">
</p>
<p>
That ranger metaphor is an old pet peeve of mine, originating from what I consider one of my most under-appreciated articles: <a href="/2012/12/18/ZookeepersmustbecomeRangers">Zookeepers must become Rangers</a>. It's closely related to the more popular metaphor of software architecture as gardening, but I like the wildlife variation because it emphasizes an even more hands-off approach. It removes the illusion that you can control a fundamentally unpredictable process, but replaces it with the hopeful emphasis on stewardship.
</p>
<p>
How do ecosystems thrive? A software architect (or ranger) should nurture resilience in each subsystem, just like evolution has promoted plants' and animals' ability to survive a variety of unforeseen circumstances: Flood, draught, fire, predators, lack of prey, disease, etc.
</p>
<p>
You want teams to work independently. This doesn't mean that they work in isolation, but rather they they are free to act according to their abilities and understanding of the situation. An architect can help them understand the wider ecosystem and predict tomorrow's weather, so to speak, but the team should remain autonomous.
</p>
<h3 id="d0637b2597964cfb9bf6ffd3f0559fd0">
Concurrent work <a href="#d0637b2597964cfb9bf6ffd3f0559fd0">#</a>
</h3>
<p>
I'm assuming that an organisation has multiple teams because they're supposed to work concurrently. While team A is off doing one thing, team B is doing something else. You can attempt to herd them in the same general direction, but beware of tight coordination.
</p>
<p>
What's the problem with coordination? Isn't it a kind of collaboration? Don't we consider that beneficial?
</p>
<p>
I'm not arguing that teams should be antagonistic. Like all metaphors, we should be careful not to take the savannah metaphor too far. I'm not imagining that one team consists of lions, apex predators, killing and devouring other teams.
</p>
<p>
Rather, the reason I'm wary of coordination is because it seems synonymous with <em>synchronisation</em>.
</p>
<p>
In <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> I've already discussed how good practices for Continuous Integration are similar to earlier lessons about <a href="https://en.wikipedia.org/wiki/Optimistic_concurrency_control">optimistic concurrency</a>. It recently struck me that we can draw a similar parallel between concurrent team work and parallel computing.
</p>
<p>
For decades we've known that the less synchronization, the faster parallel code is. Synchronization is costly.
</p>
<p>
In team work, coordination is like thread synchronization. Instead of doing work, you stop in order to coordinate. This implies that one thread or team has to wait for the other to catch up.
</p>
<p>
<img src="/content/binary/sync-wait.png" alt="Two horizontal bars presenting two processes, A and B. A is shorter than B, indicating that it finishes first.">
</p>
<p>
Unless work is perfectly evenly divided, team A may finish before team B. In order to coordinate, team A must sit idle for a while, waiting for B to catch up. (In development organizations, <em>idleness</em> is rarely allowed, so in practice, team A embarks on some other work, with <a href="/2012/12/18/ZookeepersmustbecomeRangers">consequences that I've already outlined</a>.)
</p>
<p>
If you have more than two teams, this phenomenon only becomes worse. You'll have more idle time. This reminds me of <a href="https://en.wikipedia.org/wiki/Amdahl%27s_law">Amdahl's law</a>, which briefly put expresses that there's a limit to how much of a speed improvement you can get from concurrent work. The limit is related to the percentage of the work that can <em>not</em> be parallelized. The greater the need to synchronize work, the lower the ceiling. Conversely, the more you can let concurrent processes run without coordination, the more you gain from parallelization.
</p>
<p>
It seems to me that there's a direct counterpart in team organization. The more teams need to coordinate, the less is gained from having multiple teams.
</p>
<p>
But really, <a href="/ref/mythical-man-month">Fred Brooks could you have told you so in 1975</a>.
</p>
<h3 id="11f48be968f94f67b42d0934a4d08501">
Versioning <a href="#11f48be968f94f67b42d0934a4d08501">#</a>
</h3>
<p>
A small development team may organize work informally. Work may be divided along 'natural' lines, each developer taking on tasks best suited to his or her abilities. If working in a code base with shared ownership, one developer doesn't <em>have</em> to wait on the work done by another developer. Instead, a programmer may complete the required work individually, or working together with a colleague. Coordination happens, but is both informal and frequent.
</p>
<p>
As development organizations grow, teams are formed. Separate teams are supposed to work independently, but may in practice often depend on each other. Team A may need team B to make a change before they can proceed with their own work. The (felt) need to coordinate team activities arise.
</p>
<p>
In my experience, this happens for a number of reasons. One is that teams may be divided along wrong lines; this is a socio-technical problem. Another, more technical, reason is that <a href="/2012/12/18/RangersandZookeepers">zookeepers</a> rarely think explicitly about versioning or avoiding breaking changes. Imagine that team A needs team B to develop a new capability. This new capability <em>implies</em> a breaking change, so the teams will now need to coordinate.
</p>
<p>
Instead, team B should develop the new feature in such a way that it doesn't break existing clients. If all else fails, the new feature must exist side-by-side with the old way of doing things. With <a href="https://en.wikipedia.org/wiki/Continuous_deployment">Continuous Deployment</a> the new feature becomes available when it's ready. Team A still has to <em>wait</em> for the feature to become available, but no <em>synchronization</em> is required.
</p>
<h3 id="def545737be9487da7c4b01dcc3eb106">
Conclusion <a href="#def545737be9487da7c4b01dcc3eb106">#</a>
</h3>
<p>
Yet another lesson about thread-safety and concurrent transactions seems to apply to people and processes. Parallel processes should be autonomous, with as little synchronization as possible. The more you coordinate development teams, the more you limit the speed of overall work. This seems to suggest that something akin to Amdahl's law also applies to development organizations.
</p>
<p>
Instead of coordinating teams, encourage them to exist as autonomous entities, but set things up so that <em>not breaking compatibility</em> is a major goal for each team.
</p>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Trimming a Fake Objecthttps://blog.ploeh.dk/2023/11/20/trimming-a-fake-object2023-11-20T06:44:00+00:00Mark Seemann
<div id="post">
<p>
<em>A refactoring example.</em>
</p>
<p>
When I introduce the <a href="http://xunitpatterns.com/Fake%20Object.html">Fake Object</a> testing pattern to people, a common concern is the maintenance burden of it. The point of the pattern is that you write some 'working' code only for test purposes. At a glance, it seems as though it'd be more work than using a dynamic mock library like <a href="https://www.devlooped.com/moq/">Moq</a> or <a href="https://site.mockito.org/">Mockito</a>.
</p>
<p>
This article isn't really about that, but the benefit of a Fake Object is that it has a <em>lower</em> maintenance footprint because it gives you a single class to maintain when you change interfaces or base classes. Dynamic mock objects, on the contrary, leads to <a href="https://en.wikipedia.org/wiki/Shotgun_surgery">Shotgun surgery</a> because every time you change an interface or base class, you have to revisit multiple tests.
</p>
<p>
In a <a href="/2023/11/13/fakes-are-test-doubles-with-contracts">recent article</a> I presented a Fake Object that may have looked bigger than most people would find comfortable for test code. In this article I discuss how to trim it via a set of refactorings.
</p>
<h3 id="22e6648934bb4ae59aa1a181940172ae">
Original Fake read registry <a href="#22e6648934bb4ae59aa1a181940172ae">#</a>
</h3>
<p>
The article presented this <code>FakeReadRegistry</code>, repeated here for your convenience:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FakeReadRegistry</span> : IReadRegistry
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IReadOnlyCollection<Room> rooms;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IDictionary<DateOnly, IReadOnlyCollection<Room>> views;
<span style="color:blue;">public</span> <span style="color:#2b91af;">FakeReadRegistry</span>(<span style="color:blue;">params</span> Room[] <span style="font-weight:bold;color:#1f377f;">rooms</span>)
{
<span style="color:blue;">this</span>.rooms = rooms;
views = <span style="color:blue;">new</span> Dictionary<DateOnly, IReadOnlyCollection<Room>>();
}
<span style="color:blue;">public</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetFreeRooms</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> EnumerateDates(arrival, departure)
.Select(GetView)
.Aggregate(rooms.AsEnumerable(), Enumerable.Intersect)
.ToList();
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RoomBooked</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (var <span style="font-weight:bold;color:#1f377f;">d</span> <span style="font-weight:bold;color:#8f08c4;">in</span> EnumerateDates(booking.Arrival, booking.Departure))
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">view</span> = GetView(d);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">newView</span> = QueryService.Reserve(booking, view);
views[d] = newView;
}
}
<span style="color:blue;">private</span> <span style="color:blue;">static</span> IEnumerable<DateOnly> <span style="color:#74531f;">EnumerateDates</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">d</span> = arrival;
<span style="font-weight:bold;color:#8f08c4;">while</span> (d < departure)
{
<span style="font-weight:bold;color:#8f08c4;">yield</span> <span style="font-weight:bold;color:#8f08c4;">return</span> d;
d = d.AddDays(1);
}
}
<span style="color:blue;">private</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (views.TryGetValue(date, <span style="color:blue;">out</span> var <span style="font-weight:bold;color:#1f377f;">view</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> view;
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> rooms;
}
}</pre>
</p>
<p>
This is 47 lines of code, spread over five members (including the constructor). Three of the methods have a <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a> (CC) of <em>2</em>, which is the maximum for this class. The remaining two have a CC of <em>1</em>.
</p>
<p>
While you <em>can</em> play some <a href="/2023/11/14/cc-golf">CC golf</a> with those CC-2 methods, that tends to pull the code in a direction of being less <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a>. For that reason, I chose to present the code as above. Perhaps more importantly, it doesn't save that many lines of code.
</p>
<p>
Had this been a piece of production code, no-one would bat an eye at size or complexity, but this is test code. To add spite to injury, those 47 lines of code implement this two-method interface:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IReadRegistry</span>
{
IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetFreeRooms</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>);
<span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RoomBooked</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>);
}</pre>
</p>
<p>
Can we improve the situation?
</p>
<h3 id="49f78ebabf3d4875b469e935151c064a">
Root cause analysis <a href="#49f78ebabf3d4875b469e935151c064a">#</a>
</h3>
<p>
Before you rush to 'improve' code, it pays to understand why it looks the way it looks.
</p>
<p>
Code is a wonderfully malleable medium, so you should regard nothing as set in stone. On the other hand, there's often a reason it looks like it does. It <em>may</em> be that the previous programmers were incompetent ogres for hire, but often there's a better explanation.
</p>
<p>
I've outlined my thinking process in <a href="/2023/11/13/fakes-are-test-doubles-with-contracts">the previous article</a>, and I'm not going to repeat it all here. To summarise, though, I've applied the <a href="https://en.wikipedia.org/wiki/Dependency_inversion_principle">Dependency Inversion Principle</a>.
</p>
<blockquote>
<p>
"clients [...] own the abstract interfaces"
</p>
<footer><cite>Robert C. Martin, <a href="/ref/appp">APPP</a>, chapter 11</cite></footer>
</blockquote>
<p>
In other words, I let the needs of the clients guide the design of the <code>IReadRegistry</code> interface, and then the implementation (<code>FakeReadRegistry</code>) had to conform.
</p>
<p>
But that's not the whole truth.
</p>
<p>
I was doing a programming exercise - the <a href="https://codingdojo.org/kata/CQRS_Booking/">CQRS booking</a> kata - and I was following the instructions given in the description. They quite explicitly outline the two dependencies and their methods.
</p>
<p>
When trying a new exercise, it's a good idea to follow instructions closely, so that's what I did. Once you get a sense of a kata, though, there's no law saying that you have to stick to the original rules. After all, the purpose of an exercise is to train, and in programming, <a href="/2020/01/13/on-doing-katas">trying new things is training</a>.
</p>
<h3 id="740cb249aff74666af7af4784cc166b8">
Test code that wants to be production code <a href="#740cb249aff74666af7af4784cc166b8">#</a>
</h3>
<p>
A major benefit of test-driven development (TDD) is that it provides feedback. It pays to be tuned in to that channel. The above <code>FakeReadRegistry</code> seems to be trying to tell us something.
</p>
<p>
Consider the <code>GetFreeRooms</code> method. I'll repeat the single-expression body here for your convenience:
</p>
<p>
<pre><span style="font-weight:bold;color:#8f08c4;">return</span> EnumerateDates(arrival, departure)
.Select(GetView)
.Aggregate(rooms.AsEnumerable(), Enumerable.Intersect)
.ToList();</pre>
</p>
<p>
Why is that the implementation? Why does it need to first enumerate the dates in the requested interval? Why does it need to call <code>GetView</code> for each date?
</p>
<p>
Why don't I just do the following and be done with it?
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FakeStorage</span> : Collection<Booking>, IWriteRegistry, IReadRegistry
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IReadOnlyCollection<Room> rooms;
<span style="color:blue;">public</span> <span style="color:#2b91af;">FakeStorage</span>(<span style="color:blue;">params</span> Room[] <span style="font-weight:bold;color:#1f377f;">rooms</span>)
{
<span style="color:blue;">this</span>.rooms = rooms;
}
<span style="color:blue;">public</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetFreeRooms</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">booked</span> = <span style="color:blue;">this</span>.Where(<span style="font-weight:bold;color:#1f377f;">b</span> => b.Overlaps(arrival, departure)).ToList();
<span style="font-weight:bold;color:#8f08c4;">return</span> rooms
.Where(<span style="font-weight:bold;color:#1f377f;">r</span> => !booked.Any(<span style="font-weight:bold;color:#1f377f;">b</span> => b.RoomName == r.Name))
.ToList();
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Save</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
Add(booking);
}
}</pre>
</p>
<p>
To be honest, that's what I did <em>first</em>.
</p>
<p>
While there are two interfaces, there's only one Fake Object implementing both. That's often an easy way to address the <a href="https://en.wikipedia.org/wiki/Interface_segregation_principle">Interface Segregation Principle</a> and still keeping the Fake Object simple.
</p>
<p>
This is much simpler than <code>FakeReadRegistry</code>, so why didn't I just keep that?
</p>
<p>
I didn't feel it was an honest attempt at CQRS. In CQRS you typically write the data changes to one system, and then you have another logical process that propagates the information about the data modification to the <em>read</em> subsystem. There's none of that here. Instead of being based on one or more 'materialised views', the query is just that: A query.
</p>
<p>
That was what I attempted to address with <code>FakeReadRegistry</code>, and I think it's a much more faithful CQRS implementation. It's also more complex, as CQRS tends to be.
</p>
<p>
In both cases, however, it seems that there's some production logic trapped in the test code. Shouldn't <code>EnumerateDates</code> be production code? And how about the general 'algorithm' of <code>RoomBooked</code>:
</p>
<ul>
<li>Enumerate the relevant dates</li>
<li>Get the 'materialised' view for each date</li>
<li>Calculate the new view for that date</li>
<li>Update the collection of views for that date</li>
</ul>
<p>
That seems like just enough code to warrant moving it to the production code.
</p>
<p>
A word of caution before we proceed. When deciding to pull some of that test code into the production code, I'm making a decision about architecture.
</p>
<p>
Until now, I'd been following the Dependency Inversion Principle closely. The interfaces exist because the client code needs them. Those interfaces could be implemented in various ways: You could use a relational database, a document database, files, blobs, etc.
</p>
<p>
Once I decide to pull the above algorithm into the production code, I'm choosing a particular persistent data structure. This now locks the data storage system into a design where there's a persistent view per date, and another database of bookings.
</p>
<p>
Now that I'd learned some more about the exercise, I felt confident making that decision.
</p>
<h3 id="d1515fe093394daf8cf9e4a9ec687770">
Template Method <a href="#d1515fe093394daf8cf9e4a9ec687770">#</a>
</h3>
<p>
The first move I made was to create a superclass so that I could employ the <a href="https://en.wikipedia.org/wiki/Template_method_pattern">Template Method</a> pattern:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">abstract</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ReadRegistry</span> : IReadRegistry
{
<span style="color:blue;">public</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetFreeRooms</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> EnumerateDates(arrival, departure)
.Select(GetView)
.Aggregate(Rooms.AsEnumerable(), Enumerable.Intersect)
.ToList();
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RoomBooked</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (var <span style="font-weight:bold;color:#1f377f;">d</span> <span style="font-weight:bold;color:#8f08c4;">in</span> EnumerateDates(booking.Arrival, booking.Departure))
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">view</span> = GetView(d);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">newView</span> = QueryService.Reserve(booking, view);
UpdateView(d, newView);
}
}
<span style="color:blue;">protected</span> <span style="color:blue;">abstract</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">UpdateView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>, IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">view</span>);
<span style="color:blue;">protected</span> <span style="color:blue;">abstract</span> IReadOnlyCollection<Room> Rooms { <span style="color:blue;">get</span>; }
<span style="color:blue;">protected</span> <span style="color:blue;">abstract</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">TryGetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>, <span style="color:blue;">out</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">view</span>);
<span style="color:blue;">private</span> <span style="color:blue;">static</span> IEnumerable<DateOnly> <span style="color:#74531f;">EnumerateDates</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">d</span> = arrival;
<span style="font-weight:bold;color:#8f08c4;">while</span> (d < departure)
{
<span style="font-weight:bold;color:#8f08c4;">yield</span> <span style="font-weight:bold;color:#8f08c4;">return</span> d;
d = d.AddDays(1);
}
}
<span style="color:blue;">private</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (TryGetView(date, <span style="color:blue;">out</span> var <span style="font-weight:bold;color:#1f377f;">view</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> view;
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> Rooms;
}
}</pre>
</p>
<p>
This looks similar to <code>FakeReadRegistry</code>, so how is this an improvement?
</p>
<p>
The new <code>ReadRegistry</code> class is production code. It can, and should, be tested. (Due to the history of how we got here, <a href="/2023/11/13/fakes-are-test-doubles-with-contracts">it's already covered by tests</a>, so I'm not going to repeat that effort here.)
</p>
<p>
True to the <a href="https://en.wikipedia.org/wiki/Template_method_pattern">Template Method</a> pattern, three <code>abstract</code> members await a child class' implementation. These are the <code>UpdateView</code> and <code>TryGetView</code> methods, as well as the <code>Rooms</code> read-only property (glorified getter method).
</p>
<p>
Imagine that in the production code, these are implemented based on file/document/blob storage - one per date. <code>TryGetView</code> would attempt to read the document from storage, <code>UpdateView</code> would create or modify the document, while <code>Rooms</code> returns a default set of rooms.
</p>
<p>
A Test Double, however, can still use an in-memory dictionary:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FakeReadRegistry</span> : ReadRegistry
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IReadOnlyCollection<Room> rooms;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IDictionary<DateOnly, IReadOnlyCollection<Room>> views;
<span style="color:blue;">protected</span> <span style="color:blue;">override</span> IReadOnlyCollection<Room> Rooms => rooms;
<span style="color:blue;">public</span> <span style="color:#2b91af;">FakeReadRegistry</span>(<span style="color:blue;">params</span> Room[] <span style="font-weight:bold;color:#1f377f;">rooms</span>)
{
<span style="color:blue;">this</span>.rooms = rooms;
views = <span style="color:blue;">new</span> Dictionary<DateOnly, IReadOnlyCollection<Room>>();
}
<span style="color:blue;">protected</span> <span style="color:blue;">override</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">UpdateView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>, IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">view</span>)
{
views[date] = view;
}
<span style="color:blue;">protected</span> <span style="color:blue;">override</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">TryGetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>, <span style="color:blue;">out</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">view</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> views.TryGetValue(date, <span style="color:blue;">out</span> view);
}
}</pre>
</p>
<p>
Each <code>override</code> is a one-liner with cyclomatic complexity <em>1</em>.
</p>
<h3 id="91153011891b4b7791acfe0edc65f997">
First round of clean-up <a href="#91153011891b4b7791acfe0edc65f997">#</a>
</h3>
<p>
An abstract class is already a polymorphic object, so we no longer need the <code>IReadRegistry</code> interface. Delete that, and update all code accordingly. Particularly, the <code>QueryService</code> now depends on <code>ReadRegistry</code> rather than <code>IReadRegistry</code>:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">readonly</span> ReadRegistry readRegistry;
<span style="color:blue;">public</span> <span style="color:#2b91af;">QueryService</span>(ReadRegistry <span style="font-weight:bold;color:#1f377f;">readRegistry</span>)
{
<span style="color:blue;">this</span>.readRegistry = readRegistry;
}</pre>
</p>
<p>
Now move the <code>Reserve</code> function from <code>QueryService</code> to <code>ReadRegistry</code>. Once this is done, the <code>QueryService</code> looks like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">QueryService</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> ReadRegistry readRegistry;
<span style="color:blue;">public</span> <span style="color:#2b91af;">QueryService</span>(ReadRegistry <span style="font-weight:bold;color:#1f377f;">readRegistry</span>)
{
<span style="color:blue;">this</span>.readRegistry = readRegistry;
}
<span style="color:blue;">public</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetFreeRooms</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> readRegistry.GetFreeRooms(arrival, departure);
}
}</pre>
</p>
<p>
That class is only passing method calls along, so clearly no longer serving any purpose. Delete it.
</p>
<p>
This is a not uncommon in CQRS. One might even argue that if CQRS is done right, there's almost no code on the query side, since all the data view update happens as events propagate.
</p>
<h3 id="640e19ed6f904d71b925672028aeee45">
From abstract class to Dependency Injection <a href="#640e19ed6f904d71b925672028aeee45">#</a>
</h3>
<p>
While the current state of the code is based on an abstract base class, the overall architecture of the system doesn't hinge on inheritance. From <a href="/2018/02/19/abstract-class-isomorphism">Abstract class isomorphism</a> we know that it's possible to refactor an abstract class to Constructor Injection. Let's do that.
</p>
<p>
First add an <code>IViewStorage</code> interface that mirrors the three <code>abstract</code> methods defined by <code>ReadRegistry</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IViewStorage</span>
{
IReadOnlyCollection<Room> Rooms { <span style="color:blue;">get</span>; }
<span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">UpdateView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>, IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">view</span>);
<span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">TryGetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>, <span style="color:blue;">out</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">view</span>);
}</pre>
</p>
<p>
Then implement it with a Fake Object:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FakeViewStorage</span> : IViewStorage
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IDictionary<DateOnly, IReadOnlyCollection<Room>> views;
<span style="color:blue;">public</span> IReadOnlyCollection<Room> Rooms { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:#2b91af;">FakeViewStorage</span>(<span style="color:blue;">params</span> Room[] <span style="font-weight:bold;color:#1f377f;">rooms</span>)
{
Rooms = rooms;
views = <span style="color:blue;">new</span> Dictionary<DateOnly, IReadOnlyCollection<Room>>();
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">UpdateView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>, IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">view</span>)
{
views[date] = view;
}
<span style="color:blue;">public</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">TryGetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>, <span style="color:blue;">out</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">view</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> views.TryGetValue(date, <span style="color:blue;">out</span> view);
}
}</pre>
</p>
<p>
Notice the similarity to <code>FakeReadRegistry</code>, which we'll get rid of shortly.
</p>
<p>
Now inject <code>IViewStorage</code> into <code>ReadRegistry</code>, and make <code>ReadRegistry</code> a regular (<code>sealed</code>) class:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ReadRegistry</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IViewStorage viewStorage;
<span style="color:blue;">public</span> <span style="color:#2b91af;">ReadRegistry</span>(IViewStorage <span style="font-weight:bold;color:#1f377f;">viewStorage</span>)
{
<span style="color:blue;">this</span>.viewStorage = viewStorage;
}
<span style="color:blue;">public</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetFreeRooms</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> EnumerateDates(arrival, departure)
.Select(GetView)
.Aggregate(viewStorage.Rooms.AsEnumerable(), Enumerable.Intersect)
.ToList();
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RoomBooked</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (var <span style="font-weight:bold;color:#1f377f;">d</span> <span style="font-weight:bold;color:#8f08c4;">in</span> EnumerateDates(booking.Arrival, booking.Departure))
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">view</span> = GetView(d);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">newView</span> = Reserve(booking, view);
viewStorage.UpdateView(d, newView);
}
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> IReadOnlyCollection<Room> <span style="color:#74531f;">Reserve</span>(
Booking <span style="font-weight:bold;color:#1f377f;">booking</span>,
IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">existingView</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> existingView
.Where(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Name != booking.RoomName)
.ToList();
}
<span style="color:blue;">private</span> <span style="color:blue;">static</span> IEnumerable<DateOnly> <span style="color:#74531f;">EnumerateDates</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">d</span> = arrival;
<span style="font-weight:bold;color:#8f08c4;">while</span> (d < departure)
{
<span style="font-weight:bold;color:#8f08c4;">yield</span> <span style="font-weight:bold;color:#8f08c4;">return</span> d;
d = d.AddDays(1);
}
}
<span style="color:blue;">private</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (viewStorage.TryGetView(date, <span style="color:blue;">out</span> var <span style="font-weight:bold;color:#1f377f;">view</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> view;
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> viewStorage.Rooms;
}
}</pre>
</p>
<p>
You can now delete the <code>FakeReadRegistry</code> Test Double, since <code>FakeViewStorage</code> has now taken its place.
</p>
<p>
Finally, we may consider if we can make <code>FakeViewStorage</code> even slimmer. While I usually favour composition over inheritance, I've found that deriving Fake Objects from collection base classes is often an efficient way to get a lot of mileage out of a few lines of code. <code>FakeReadRegistry</code>, however, had to inherit from <code>ReadRegistry</code>, so it couldn't derive from any other class.
</p>
<p>
<code>FakeViewStorage</code> isn't constrained in that way, so it's free to inherit from <code>Dictionary<DateOnly, IReadOnlyCollection<Room>></code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FakeViewStorage</span> : Dictionary<DateOnly, IReadOnlyCollection<Room>>, IViewStorage
{
<span style="color:blue;">public</span> IReadOnlyCollection<Room> Rooms { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:#2b91af;">FakeViewStorage</span>(<span style="color:blue;">params</span> Room[] <span style="font-weight:bold;color:#1f377f;">rooms</span>)
{
Rooms = rooms;
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">UpdateView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>, IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">view</span>)
{
<span style="color:blue;">this</span>[date] = view;
}
<span style="color:blue;">public</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">TryGetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>, <span style="color:blue;">out</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">view</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> TryGetValue(date, <span style="color:blue;">out</span> view);
}
}</pre>
</p>
<p>
This last move isn't strictly necessary, but I found it worth at least mentioning.
</p>
<p>
I hope you'll agree that this is a Fake Object that looks maintainable.
</p>
<h3 id="cd86ac335816431aa39a6538fd9ce95c">
Conclusion <a href="#cd86ac335816431aa39a6538fd9ce95c">#</a>
</h3>
<p>
Test-driven development is a feedback mechanism. If something is difficult to test, it tells you something about your System Under Test (SUT). If your test code looks bloated, that tells you something too. Perhaps part of the test code really belongs in the production code.
</p>
<p>
In this article, we started with a Fake Object that looked like it contained too much production code. Via a series of refactorings I moved the relevant parts to the production code, leaving me with a more idiomatic and conforming implementation.
</p>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.CC golfhttps://blog.ploeh.dk/2023/11/14/cc-golf2023-11-14T14:44:00+00:00Mark Seemann
<div id="post">
<p>
<em>Noun. Game in which the goal is to minimise cyclomatic complexity.</em>
</p>
<p>
<a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">Cyclomatic complexity</a> (CC) is a rare code metric since it <a href="/2019/12/09/put-cyclomatic-complexity-to-good-use">can be actually useful</a>. In general, it's a good idea to minimise it as much as possible.
</p>
<p>
In short, CC measures looping and branching in code, and this is often where bugs lurk. While it's only a rough measure, I nonetheless find the metric useful as a general guideline. Lower is better.
</p>
<h3 id="0a1e55fd6ebf422aac1f6441e4e34f99">
Golf <a href="#0a1e55fd6ebf422aac1f6441e4e34f99">#</a>
</h3>
<p>
I'd like to propose the term "CC golf" for the activity of minimising cyclomatic complexity in an area of code. The name derives from <a href="https://en.wikipedia.org/wiki/Code_golf">code golf</a>, in which you have to implement some behaviour (typically an algorithm) in fewest possible characters.
</p>
<p>
Such games can be useful because they enable you to explore different ways to express yourself in code. It's always a good <a href="/2020/01/13/on-doing-katas">kata constraint</a>. The <a href="/2011/05/16/TennisKatawithimmutabletypesandacyclomaticcomplexityof1">first time I tried that was in 2011</a>, and when looking back on that code today, I'm not that impressed. Still, it taught me a valuable lesson about the <a href="https://en.wikipedia.org/wiki/Visitor_pattern">Visitor pattern</a> that I never forgot, and that later enabled me to <a href="/2018/06/25/visitor-as-a-sum-type">connect some important dots</a>.
</p>
<p>
But don't limit CC golf to katas and the like. Try it in your production code too. Most production code I've seen could benefit from some CC golf, and if you <a href="https://stackoverflow.blog/2022/12/19/use-git-tactically/">use Git tactically</a> you can always stash the changes if they're no good.
</p>
<h3 id="f7e54cb8b9954fbd9ed8022ad09f5d7f">
Idiomatic tension <a href="#f7e54cb8b9954fbd9ed8022ad09f5d7f">#</a>
</h3>
<p>
Alternative expressions with lower cyclomatic complexity may not always be idiomatic. Let's look at a few examples. In my <a href="/2023/11/13/fakes-are-test-doubles-with-contracts">previous article</a>, I listed some test code where some helper methods had a CC of <em>2</em>. Here's one of them:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> IEnumerable<DateOnly> <span style="color:#74531f;">EnumerateDates</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">d</span> = arrival;
<span style="font-weight:bold;color:#8f08c4;">while</span> (d < departure)
{
<span style="font-weight:bold;color:#8f08c4;">yield</span> <span style="font-weight:bold;color:#8f08c4;">return</span> d;
d = d.AddDays(1);
}
}</pre>
</p>
<p>
Can you express this functionality with a CC of <em>1?</em> In <a href="https://www.haskell.org/">Haskell</a> it's essentially built in as <code>(. pred) . enumFromTo</code>, and in <a href="https://fsharp.org/">F#</a> it's also idiomatic, although more verbose:
</p>
<p>
<pre><span style="color:blue;">let</span> enumerateDates (arrival : DateOnly) departure =
Seq.initInfinite id |> Seq.map arrival.AddDays |> Seq.takeWhile (<span style="color:blue;">fun</span> d <span style="color:blue;">-></span> d < departure)</pre>
</p>
<p>
Can we do the same in C#?
</p>
<p>
If there's a general API in .NET that corresponds to the F#-specific <code>Seq.initInfinite</code> I haven't found it, but we can do something like this:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> IEnumerable<DateOnly> <span style="color:#74531f;">EnumerateDates</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="color:blue;">const</span> <span style="color:blue;">int</span> infinity = <span style="color:blue;">int</span>.MaxValue; <span style="color:green;">// As close as int gets, at least</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> Enumerable.Range(0, infinity).Select(arrival.AddDays).TakeWhile(<span style="font-weight:bold;color:#1f377f;">d</span> => d < departure);
}</pre>
</p>
<p>
In C# infinite sequences are generally unusual, but <em>if</em> you were to create one, a combination of <code>while true</code> and <code>yield return</code> would be the most idiomatic. The problem with that, though, is that such a construct has a cyclomatic complexity of <em>2</em>.
</p>
<p>
The above suggestion gets around that problem by pretending that <code>int.MaxValue</code> is infinity. Practically, at least, a 32-bit signed integer can't get larger than that anyway. I haven't tried to let F#'s <a href="https://fsharp.github.io/fsharp-core-docs/reference/fsharp-collections-seqmodule.html#initInfinite">Seq.initInfinite</a> run out, but by its type it seems <code>int</code>-bound as well, so in practice it, too, probably isn't infinite. (Or, if it is, the index that it supplies will have to overflow and wrap around to a negative value.)
</p>
<p>
Is this alternative C# code better than the first? You be the judge of that. It has a lower cyclomatic complexity, but is less idiomatic. This isn't uncommon. In languages with a procedural background, there's often tension between lower cyclomatic complexity and how 'things are usually done'.
</p>
<h3 id="d24ea9b2a10f482693071d7dbe1c6604">
Checking for null <a href="#d24ea9b2a10f482693071d7dbe1c6604">#</a>
</h3>
<p>
Is there a way to reduce the cyclomatic complexity of the <code>GetView</code> helper method?
</p>
<p>
<pre><span style="color:blue;">private</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (views.TryGetValue(date, <span style="color:blue;">out</span> var <span style="font-weight:bold;color:#1f377f;">view</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> view;
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> rooms;
}</pre>
</p>
<p>
This is an example of the built-in API being in the way. In F#, you naturally write the same behaviour with a CC of <em>1:</em>
</p>
<p>
<pre><span style="color:blue;">let</span> getView (date : DateOnly) =
views |> Map.tryFind date |> Option.defaultValue rooms |> Set.ofSeq</pre>
</p>
<p>
That <code>TryGet</code> idiom is in the way for further CC reduction, it seems. It <em>is</em> possible to reach a CC of <em>1</em>, though, but it's neither pretty nor idiomatic:
</p>
<p>
<pre><span style="color:blue;">private</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>)
{
views.TryGetValue(date, <span style="color:blue;">out</span> var <span style="font-weight:bold;color:#1f377f;">view</span>);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span>[] { view, rooms }.Where(<span style="font-weight:bold;color:#1f377f;">x</span> => x <span style="color:blue;">is</span> { }).First()!;
}</pre>
</p>
<p>
Perhaps there's a better way, but if so, it escapes me. Here, I use my knowledge that <code>view</code> is going to remain <code>null</code> if <code>TryGetValue</code> doesn't find the dictionary entry. Thus, I can put it in front of an array where I put the fallback value <code>rooms</code> as the second element. Then I filter the array by only keeping the elements that are <em>not</em> <code>null</code> (that's what the <code>x is { }</code> pun means; I usually read it as <em>x is something</em>). Finally, I return the first of these elements.
</p>
<p>
I know that <code>rooms</code> is never <code>null</code>, but apparently the compiler can't tell. Thus, I have to suppress its anxiety with the <code>!</code> operator, telling it that this <em>will</em> result in a non-null value.
</p>
<p>
I would never use such a code construct in a professional C# code base.
</p>
<h3 id="b1c8693ac29a43a1812e4b9ba9f86e6e">
Side effects <a href="#b1c8693ac29a43a1812e4b9ba9f86e6e">#</a>
</h3>
<p>
The third helper method suggests another kind of problem that you may run into:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RoomBooked</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (var <span style="font-weight:bold;color:#1f377f;">d</span> <span style="font-weight:bold;color:#8f08c4;">in</span> EnumerateDates(booking.Arrival, booking.Departure))
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">view</span> = GetView(d);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">newView</span> = QueryService.Reserve(booking, view);
views[d] = newView;
}
}</pre>
</p>
<p>
Here the higher-than-one CC stems from the need to loop through dates in order to produce a side effect for each. Even in F# I do that:
</p>
<p>
<pre><span style="color:blue;">member</span> this.RoomBooked booking =
<span style="color:blue;">for</span> d <span style="color:blue;">in</span> enumerateDates booking.Arrival booking.Departure <span style="color:blue;">do</span>
<span style="color:blue;">let</span> newView = getView d |> QueryService.reserve booking |> Seq.toList
views <span style="color:blue;"><-</span> Map.add d newView views</pre>
</p>
<p>
This also has a cyclomatic complexity of <em>2</em>. You could do something like this:
</p>
<p>
<pre><span style="color:blue;">member</span> this.RoomBooked booking =
enumerateDates booking.Arrival booking.Departure
|> Seq.iter (<span style="color:blue;">fun</span> d <span style="color:blue;">-></span>
<span style="color:blue;">let</span> newView = getView d |> QueryService.reserve booking |> Seq.toList <span style="color:blue;">in</span>
views <span style="color:blue;"><-</span> Map.add d newView views)</pre>
</p>
<p>
but while that nominally has a CC of <em>1</em>, it has the same level of indentation as the previous attempt. This seems to indicate, at least, that it doesn't <em>really</em> address any complexity issue.
</p>
<p>
You could also try something like this:
</p>
<p>
<pre><span style="color:blue;">member</span> this.RoomBooked booking =
enumerateDates booking.Arrival booking.Departure
|> Seq.map (<span style="color:blue;">fun</span> d <span style="color:blue;">-></span> d, getView d |> QueryService.reserve booking |> Seq.toList)
|> Seq.iter (<span style="color:blue;">fun</span> (d, newView) <span style="color:blue;">-></span> views <span style="color:blue;"><-</span> Map.add d newView views)</pre>
</p>
<p>
which, again, may be nominally better, but forced me to wrap the <code>map</code> output in a tuple so that both <code>d</code> and <code>newView</code> is available to <code>Seq.iter</code>. I tend to regard that as a code smell.
</p>
<p>
This latter version is, however, fairly easily translated to C#:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RoomBooked</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
EnumerateDates(booking.Arrival, booking.Departure)
.Select(<span style="font-weight:bold;color:#1f377f;">d</span> => (d, view: QueryService.Reserve(booking, GetView(d))))
.ToList()
.ForEach(<span style="font-weight:bold;color:#1f377f;">x</span> => views[x.d] = x.view);
}</pre>
</p>
<p>
The standard .NET API doesn't have something equivalent to <code>Seq.iter</code> (although you could trivially write such an action), but <a href="https://stackoverflow.com/a/1509450/126014">you can convert any sequence to a <code>List<T></code> and use its <code>ForEach</code> method</a>.
</p>
<p>
In practice, though, I tend to <a href="https://ericlippert.com/2009/05/18/foreach-vs-foreach/">agree with Eric Lippert</a>. There's already an idiomatic way to iterate over each item in a collection, and <a href="https://peps.python.org/pep-0020/">being explicit</a> is generally helpful to the reader.
</p>
<h3 id="5f060ce8557043dfb0f374ef254cc922">
Church encoding <a href="#5f060ce8557043dfb0f374ef254cc922">#</a>
</h3>
<p>
There's a general solution to most of CC golf: Whenever you need to make a decision and branch between two or more pathways, you can model that with a <a href="https://en.wikipedia.org/wiki/Tagged_union">sum type</a>. In C# you can mechanically model that with <a href="/2018/05/22/church-encoding">Church encoding</a> or <a href="/2018/06/25/visitor-as-a-sum-type">the Visitor pattern</a>. If you haven't tried that, I recommend it for the exercise, but once you've done it enough times, you realise that it requires little creativity.
</p>
<p>
As an example, in 2021 I <a href="/2021/08/03/the-tennis-kata-revisited">revisited the Tennis kata</a> with the explicit purpose of translating <a href="/2016/02/10/types-properties-software">my usual F# approach to the exercise</a> to C# using Church encoding and the Visitor pattern.
</p>
<p>
Once you've got a sense for how Church encoding enables you to simulate pattern matching in C#, there are few surprises. You may also rightfully question what is gained from such an exercise:
</p>
<p>
<pre><span style="color:blue;">public</span> IScore <span style="font-weight:bold;color:#74531f;">VisitPoints</span>(IPoint <span style="font-weight:bold;color:#1f377f;">playerOnePoint</span>, IPoint <span style="font-weight:bold;color:#1f377f;">playerTwoPoint</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> playerWhoWinsBall.Match(
playerOne: playerOnePoint.Match<IScore>(
love: <span style="color:blue;">new</span> Points(<span style="color:blue;">new</span> Fifteen(), playerTwoPoint),
fifteen: <span style="color:blue;">new</span> Points(<span style="color:blue;">new</span> Thirty(), playerTwoPoint),
thirty: <span style="color:blue;">new</span> Forty(playerWhoWinsBall, playerTwoPoint)),
playerTwo: playerTwoPoint.Match<IScore>(
love: <span style="color:blue;">new</span> Points(playerOnePoint, <span style="color:blue;">new</span> Fifteen()),
fifteen: <span style="color:blue;">new</span> Points(playerOnePoint, <span style="color:blue;">new</span> Thirty()),
thirty: <span style="color:blue;">new</span> Forty(playerWhoWinsBall, playerOnePoint)));
}</pre>
</p>
<p>
Believe it or not, but that method has a CC of <em>1</em> despite the double indentation strongly suggesting that there's some branching going on. To a degree, this also highlights the limitations of the cyclomatic complexity metric. Conversely, <a href="/2021/03/29/table-driven-tennis-scoring">stupidly simple code may have a high CC rating</a>.
</p>
<p>
Most of the examples in this article border on the pathological, and I don't recommend that you write code <em>like</em> that. I recommend that you do the exercise. In less pathological scenarios, there are real benefits to be reaped.
</p>
<h3 id="931c0946572041449fcce50da5f5219b">
Idioms <a href="#931c0946572041449fcce50da5f5219b">#</a>
</h3>
<p>
In 2015 I published an article titled <a href="/2015/08/03/idiomatic-or-idiosyncratic">Idiomatic or idiosyncratic?</a> In it, I tried to explore the idea that the notion of idiomatic code can sometimes hold you back. I revisited that idea in 2021 in an article called <a href="/2021/05/17/against-consistency">Against consistency</a>. The point in both cases is that just because something looks unfamiliar, it doesn't mean that it's bad.
</p>
<p>
Coding idioms somehow arose. If you believe that there's a portion of natural selection involved in the development of coding idioms, you may assume by default that idioms represent good ways of doing things.
</p>
<p>
To a degree I believe this to be true. Many idioms represent the best way of doing things at the time they settled into the shape that we now know them. Languages and contexts change, however. Just look at <a href="/2019/07/15/tester-doer-isomorphisms">the many approaches to data lookups</a> there have been over the years. For many years now, C# has settled into the so-called <em>TryParse</em> idiom to solve that problem. In my opinion this represents a local maximum.
</p>
<p>
Languages that provide <a href="/2018/06/04/church-encoded-maybe">Maybe</a> (AKA <code>option</code>) and <a href="/2018/06/11/church-encoded-either">Either</a> (AKA <code>Result</code>) types offer a superior alternative. These types naturally compose into <em>CC 1</em> pipelines, whereas <em>TryParse</em> requires you to stop what you're doing in order to check a return value. How very <a href="https://en.wikipedia.org/wiki/C_(programming_language)">C</a>-like.
</p>
<p>
All that said, I still think you should write idiomatic code by default, but don't be a slave by what's considered idiomatic, just as you shouldn't be a slave to consistency. If there's a better way of doing things, choose the better way.
</p>
<h3 id="8bf3cb23fa5a4f3aa532a40b01dbefb1">
Conclusion <a href="#8bf3cb23fa5a4f3aa532a40b01dbefb1">#</a>
</h3>
<p>
While cyclomatic complexity is a rough measure, it's one of the few useful programming metrics I know of. It should be as low as possible.
</p>
<p>
Most professional code I encounter implements decisions almost exclusively with language primitives: <code>if</code>, <code>for</code>, <code>switch</code>, <code>while</code>, etc. Once, an organisation hired me to give a one-day <em>anti-if</em> workshop. There are other ways to make decisions in code. Most of those alternatives reduce cyclomatic complexity.
</p>
<p>
That's not really a goal by itself, but reducing cyclomatic complexity tends to produce the beneficial side effect of structuring the code in a more sustainable way. It becomes easier to understand and change.
</p>
<p>
As the cliché goes: <em>Choose the right tool for the job.</em> You can't, however, do that if you have nothing to choose from. If you only know of one way to do a thing, you have no choice.
</p>
<p>
Play a little CC golf with your code from time to time. It may improve the code, or it may not. If it didn't, just <a href="https://git-scm.com/docs/git-stash">stash</a> those changes. Either way, you've probably <em>learned</em> something.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Fakes are Test Doubles with contractshttps://blog.ploeh.dk/2023/11/13/fakes-are-test-doubles-with-contracts2023-11-13T17:11:00+00:00Mark Seemann
<div id="post">
<p>
<em>Contracts of Fake Objects can be described by properties.</em>
</p>
<p>
The first time I tried my hand with the <a href="https://codingdojo.org/kata/CQRS_Booking/">CQRS Booking kata</a>, I abandoned it after 45 minutes because I found that I had little to learn from it. After all, I've already done umpteen variations of (restaurant) booking code examples, in several programming languages. The code example that accompanies my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> is only the largest and most complete of those.
</p>
<p>
I also wrote <a href="https://learn.microsoft.com/en-us/archive/msdn-magazine/2011/april/azure-development-cqrs-on-microsoft-azure">an MSDN Magazine article</a> in 2011 about <a href="https://en.wikipedia.org/wiki/Command_Query_Responsibility_Segregation">CQRS</a>, so I think I have that angle covered as well.
</p>
<p>
Still, while at first glance the kata seemed to have little to offer me, I've found myself coming back to it a few times. It does enable me to focus on something else than the 'production code'. In fact, it turns out that even if (or perhaps particularly <em>when</em>) you use test-driven development (TDD), there's precious little production code. Let's get that out of the way first.
</p>
<h3 id="b1192f76c2ef4f31b6bddcbc944664c7">
Production code <a href="#b1192f76c2ef4f31b6bddcbc944664c7">#</a>
</h3>
<p>
The few times I've now done the kata, there's almost no 'production code'. The implied <code>CommandService</code> has two lines of effective code:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">CommandService</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IWriteRegistry writeRegistry;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IReadRegistry readRegistry;
<span style="color:blue;">public</span> <span style="color:#2b91af;">CommandService</span>(IWriteRegistry <span style="font-weight:bold;color:#1f377f;">writeRegistry</span>, IReadRegistry <span style="font-weight:bold;color:#1f377f;">readRegistry</span>)
{
<span style="color:blue;">this</span>.writeRegistry = writeRegistry;
<span style="color:blue;">this</span>.readRegistry = readRegistry;
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">BookARoom</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
writeRegistry.Save(booking);
readRegistry.RoomBooked(booking);
}
}</pre>
</p>
<p>
The <code>QueryService</code> class isn't much more exciting:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">QueryService</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IReadRegistry readRegistry;
<span style="color:blue;">public</span> <span style="color:#2b91af;">QueryService</span>(IReadRegistry <span style="font-weight:bold;color:#1f377f;">readRegistry</span>)
{
<span style="color:blue;">this</span>.readRegistry = readRegistry;
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> IReadOnlyCollection<Room> <span style="color:#74531f;">Reserve</span>(
Booking <span style="font-weight:bold;color:#1f377f;">booking</span>,
IReadOnlyCollection<Room> <span style="font-weight:bold;color:#1f377f;">existingView</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> existingView.Where(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Name != booking.RoomName).ToList();
}
<span style="color:blue;">public</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetFreeRooms</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> readRegistry.GetFreeRooms(arrival, departure);
}
}</pre>
</p>
<p>
The kata only suggests the <code>GetFreeRooms</code> method, which is only a single line. The only reason the <code>Reserve</code> function also exists is to pull a bit of testable logic back from the below <a href="http://xunitpatterns.com/Fake%20Object.html">Fake object</a>. I'll return to that shortly.
</p>
<p>
I've also done the exercise in <a href="https://fsharp.org/">F#</a>, essentially porting the C# implementation, which only highlights how simple it all is:
</p>
<p>
<pre><span style="color:blue;">module</span> CommandService =
<span style="color:blue;">let</span> bookARoom (writeRegistry : IWriteRegistry) (readRegistry : IReadRegistry) booking =
writeRegistry.Save booking
readRegistry.RoomBooked booking
<span style="color:blue;">module</span> QueryService =
<span style="color:blue;">let</span> reserve booking existingView =
existingView |> Seq.filter (<span style="color:blue;">fun</span> r <span style="color:blue;">-></span> r.Name <> booking.RoomName)
<span style="color:blue;">let</span> getFreeRooms (readRegistry : IReadRegistry) arrival departure =
readRegistry.GetFreeRooms arrival departure</pre>
</p>
<p>
That's <em>both</em> the Command side and the Query side!
</p>
<p>
This represents my honest interpretation of the kata. Really, there's nothing to it.
</p>
<p>
The reason I still find the exercise interesting is that it explores other aspects of TDD than most katas. The most common katas require you to write a little algorithm: <a href="https://codingdojo.org/kata/Bowling/">Bowling</a>, <a href="https://codingdojo.org/kata/WordWrap/">Word wrap</a>, <a href="https://codingdojo.org/kata/RomanNumerals/">Roman Numerals</a>, <a href="https://codingdojo.org/kata/Diamond/">Diamond</a>, <a href="https://codingdojo.org/kata/Tennis/">Tennis</a>, etc.
</p>
<p>
The CQRS Booking kata suggests no interesting algorithm, but rather teaches some important lessons about software architecture, separation of concerns, and, if you approach it with TDD, real-world test automation. In contrast to all those algorithmic exercises, this one strongly suggests the use of <a href="http://xunitpatterns.com/Test%20Double.html">Test Doubles</a>.
</p>
<h3 id="6d8d7717cfef428e91418b319c4fe971">
Fakes <a href="#6d8d7717cfef428e91418b319c4fe971">#</a>
</h3>
<p>
You could attempt the kata with a dynamic 'mocking' library such as <a href="https://devlooped.com/moq">Moq</a> or <a href="https://site.mockito.org/">Mockito</a>, but I haven't tried. Since <a href="/2022/10/17/stubs-and-mocks-break-encapsulation">Stubs and Mocks break encapsulation</a> I favour Fake Objects instead.
</p>
<p>
Creating a Fake write registry is trivial:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FakeWriteRegistry</span> : Collection<Booking>, IWriteRegistry
{
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Save</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
Add(booking);
}
}</pre>
</p>
<p>
Its counterpart, the Fake read registry, turns out to be much more involved:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FakeReadRegistry</span> : IReadRegistry
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IReadOnlyCollection<Room> rooms;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IDictionary<DateOnly, IReadOnlyCollection<Room>> views;
<span style="color:blue;">public</span> <span style="color:#2b91af;">FakeReadRegistry</span>(<span style="color:blue;">params</span> Room[] <span style="font-weight:bold;color:#1f377f;">rooms</span>)
{
<span style="color:blue;">this</span>.rooms = rooms;
views = <span style="color:blue;">new</span> Dictionary<DateOnly, IReadOnlyCollection<Room>>();
}
<span style="color:blue;">public</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetFreeRooms</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> EnumerateDates(arrival, departure)
.Select(GetView)
.Aggregate(rooms.AsEnumerable(), Enumerable.Intersect)
.ToList();
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RoomBooked</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (var <span style="font-weight:bold;color:#1f377f;">d</span> <span style="font-weight:bold;color:#8f08c4;">in</span> EnumerateDates(booking.Arrival, booking.Departure))
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">view</span> = GetView(d);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">newView</span> = QueryService.Reserve(booking, view);
views[d] = newView;
}
}
<span style="color:blue;">private</span> <span style="color:blue;">static</span> IEnumerable<DateOnly> <span style="color:#74531f;">EnumerateDates</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">d</span> = arrival;
<span style="font-weight:bold;color:#8f08c4;">while</span> (d < departure)
{
<span style="font-weight:bold;color:#8f08c4;">yield</span> <span style="font-weight:bold;color:#8f08c4;">return</span> d;
d = d.AddDays(1);
}
}
<span style="color:blue;">private</span> IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetView</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">date</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (views.TryGetValue(date, <span style="color:blue;">out</span> var <span style="font-weight:bold;color:#1f377f;">view</span>))
<span style="font-weight:bold;color:#8f08c4;">return</span> view;
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> rooms;
}
}</pre>
</p>
<p>
I think I can predict the most common reaction: <em>That's much more code than the System Under Test!</em> Indeed. For this particular exercise, this may indicate that a 'dynamic mock' library may have been a better choice. I do, however, also think that it's an artefact of the kata description's lack of requirements.
</p>
<p>
As is evident from the restaurant sample code that accompanies <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>, once you add <a href="/2020/01/27/the-maitre-d-kata">realistic business rules</a> the production code grows, and the ratio of test code to production code becomes better balanced.
</p>
<p>
The size of the <code>FakeReadRegistry</code> class also stems from the way the .NET base class library API is designed. The <code>GetView</code> helper method demonstrates that it requires four lines of code to look up an entry in a dictionary but return a default value if the entry isn't found. That's a one-liner in F#:
</p>
<p>
<pre><span style="color:blue;">let</span> getView (date : DateOnly) = views |> Map.tryFind date |> Option.defaultValue rooms |> Set.ofSeq</pre>
</p>
<p>
I'll show the entire F# Fake later, but you could also play some <a href="/2023/11/14/cc-golf">CC golf</a> with the C# code. That's a bit besides the point, though.
</p>
<h3 id="1f5b34534edd4b72947c8d6b4c8921bf">
Command service design <a href="#1f5b34534edd4b72947c8d6b4c8921bf">#</a>
</h3>
<p>
Why does <code>FakeReadRegistry</code> look like it does? It's a combination of the kata description and my prior experience with CQRS. When adopting an asynchronous message-based architecture, I would usually not implement the write side exactly like that. Notice how the <code>CommandService</code> class' <code>BookARoom</code> method seems to repeat itself:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">BookARoom</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
writeRegistry.Save(booking);
readRegistry.RoomBooked(booking);
}</pre>
</p>
<p>
While semantically it seems to be making two different statements, structurally they're identical. If you rename the methods, you could wrap both method calls in a single <a href="https://en.wikipedia.org/wiki/Composite_pattern">Composite</a>. In a more typical CQRS architecture, you'd post a Command on bus:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">BookARoom</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>)
{
bus.BookRoom(booking);
}</pre>
</p>
<p>
This makes that particular <code>BookARoom</code> method, and perhaps the entire <code>CommandService</code> class, look redundant. Why do we need it?
</p>
<p>
As presented here, we don't, but in a real application, the Command service would likely perform some pre- and post-processing. For example, if this was a web application, the Command service might instead be a Controller concerned with validating and translating HTTP- or Web-based input to a Domain Object before posting to the bus.
</p>
<p>
A realistic code base would also be asynchronous, which, on .NET, would imply the use of the <code>async</code> and <code>await</code> keywords, etc.
</p>
<h3 id="4dd49e22fe4c479381285eb1b886457e">
Read registry design <a href="#4dd49e22fe4c479381285eb1b886457e">#</a>
</h3>
<p>
A central point of CQRS is that you can optimise the read side for the specific tasks that it needs to perform. Instead of performing a dynamic query every time a client requests a view, you can update and persist a view. Imagine having a JSON or HTML file that the system can serve upon request.
</p>
<p>
Part of handling a Command or Event is that the system background processes update persistent views once per event.
</p>
<p>
For the particular hotel booking system, I imagine that the read registry has a set of files, blobs, documents, or denormalised database rows. When it receives notification of a booking, it'll need to remove that room from the dates of the booking.
</p>
<p>
While a booking may stretch over several days, I found it simplest to think of the storage system as subdivided into single dates, instead of ranges. Indeed, the <code>GetFreeRooms</code> method is a ranged query, so if you really wanted to denormalise the views, you could create a persistent view per range. This would, however, require that you precalculate and persist a view for October 2 to October 4, and another one for October 2 to October 5, and so on. The combinatorial explosion suggests that this isn't a good idea, so instead I imagine keeping a persistent view per date, and then perform a bit of on-the-fly calculation per query.
</p>
<p>
That's what <code>FakeReadRegistry</code> does. It also falls back to a default collection of <code>rooms</code> for all the dates that are yet untouched by a booking. This is, again, because I imagine that I might implement a real system like that.
</p>
<p>
You may still protest that the <code>FakeReadRegistry</code> duplicates production code. True, perhaps, but if this really is a concern, you could <a href="/2023/11/20/trimming-a-fake-object">refactor it to the Template Method pattern</a>.
</p>
<p>
Still, it's not really that complicated; it only looks that way because C# and the Dictionary API is too heavy on <a href="/2019/12/16/zone-of-ceremony">ceremony</a>. The Fake looks much simpler in F#:
</p>
<p>
<pre><span style="color:blue;">type</span> FakeReadRegistry (rooms : IReadOnlyCollection<Room>) =
<span style="color:blue;">let</span> <span style="color:blue;">mutable</span> views = Map.empty
<span style="color:blue;">let</span> enumerateDates (arrival : DateOnly) departure =
Seq.initInfinite id
|> Seq.map arrival.AddDays
|> Seq.takeWhile (<span style="color:blue;">fun</span> d <span style="color:blue;">-></span> d < departure)
<span style="color:blue;">let</span> getView (date : DateOnly) =
views |> Map.tryFind date |> Option.defaultValue rooms |> Set.ofSeq
<span style="color:blue;">interface</span> IReadRegistry <span style="color:blue;">with</span>
<span style="color:blue;">member</span> this.GetFreeRooms arrival departure =
enumerateDates arrival departure
|> Seq.map getView
|> Seq.fold Set.intersect (Set.ofSeq rooms)
|> Set.toList :> _
<span style="color:blue;">member</span> this.RoomBooked booking =
<span style="color:blue;">for</span> d <span style="color:blue;">in</span> enumerateDates booking.Arrival booking.Departure <span style="color:blue;">do</span>
<span style="color:blue;">let</span> newView = getView d |> QueryService.reserve booking |> Seq.toList
views <span style="color:blue;"><-</span> Map.add d newView views
</pre>
</p>
<p>
This isn't just more dense than the corresponding C# code, as F# tends to be, it also has a lower <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a>. Both the <code>EnumerateDates</code> and <code>GetView</code> C# methods have a cyclomatic complexity of <em>2</em>, while their F# counterparts rate only <em>1</em>.
</p>
<p>
For production code, cyclomatic complexity of <em>2</em> is fine if the code is covered by automatic tests. In test code, however, we should be wary of any branching or looping, since there are (typically) no tests of the test code.
</p>
<p>
While I <em>am</em> going to show some tests of that code in what follows, I do that for a different reason.
</p>
<h3 id="da4d51ff041c4cf6b4007d53a67f2d76">
Contract <a href="#da4d51ff041c4cf6b4007d53a67f2d76">#</a>
</h3>
<p>
When explaining Fake Objects to people, I've begun to use a particular phrase:
</p>
<blockquote>
<p>
A Fake Object is a polymorphic implementation of a dependency that fulfils the contract, but lacks some of the <em>ilities</em>.
</p>
</blockquote>
<p>
It's funny how you can arrive at something that strikes you as profound, only to discover that it was part of the definition all along:
</p>
<blockquote>
<p>
"We acquire or build a very lightweight implementation of the same functionality as provided by a component on which the SUT [System Under Test] depends and instruct the SUT to use it instead of the real DOC [Depended-On Component]. This implementation need not have any of the "-ilities" that the real DOC needs to have"
</p>
<footer><cite>Gerard Meszaros, <a href="/ref/xunit-patterns">xUnit Test Patterns</a></cite></footer>
</blockquote>
<p>
A common example is a Fake Repository object that pretends to be a database, often by leveraging a built-in collection API. The above <code>FakeWriteRegistry</code> is as simple an example as you could have. A slightly more compelling example is <a href="/2023/08/14/replacing-mock-and-stub-with-a-fake">the FakeUserRepository shown in another article</a>. Such an 'in-memory database' fulfils the implied contract, because if you 'save' something in the 'database' you can later retrieve it again with a query. As long as the object remains in memory.
</p>
<p>
The <em>ilities</em> that such a Fake database lacks are
</p>
<ul>
<li>data persistence</li>
<li>thread safety</li>
<li>transaction support</li>
</ul>
<p>
and perhaps others. Such qualities are clearly required in a real production environment, but are in the way in an automated testing context. The implied contract, however, is satisfied: What you save you can later retrieve.
</p>
<p>
Now consider the <code>IReadRegistry</code> interface:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IReadRegistry</span>
{
IReadOnlyCollection<Room> <span style="font-weight:bold;color:#74531f;">GetFreeRooms</span>(DateOnly <span style="font-weight:bold;color:#1f377f;">arrival</span>, DateOnly <span style="font-weight:bold;color:#1f377f;">departure</span>);
<span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RoomBooked</span>(Booking <span style="font-weight:bold;color:#1f377f;">booking</span>);
}</pre>
</p>
<p>
Which contract does it imply, given what you know about the <em>CQRS Booking</em> kata?
</p>
<p>
I would suggest the following:
</p>
<ul>
<li><em>Precondition:</em> <code>arrival</code> should be less than (or equal?) to <code>departure</code>.</li>
<li><em>Postcondition:</em> <code>GetFreeRooms</code> should always return a result. Null isn't a valid return value.</li>
<li><em>Invariant:</em> After calling <code>RoomBooked</code>, <code>GetFreeRooms</code> should exclude that room when queried on overlapping dates.</li>
</ul>
<p>
There may be other parts of the contract than this, but I find the third one most interesting. This is exactly what you would expect from a real system: If you reserve a room, you'd be surprised to see <code>GetFreeRooms</code> indicating that this room is free if queried about dates that overlap the reservation.
</p>
<p>
This is the sort of implied interaction that <a href="/2022/10/17/stubs-and-mocks-break-encapsulation">Stubs and Mocks break</a>, but that <code>FakeReadRegistry</code> guarantees.
</p>
<h3 id="6ab8206598ab4bb990d93e5472d36054">
Properties <a href="#6ab8206598ab4bb990d93e5472d36054">#</a>
</h3>
<p>
There's a close relationship between contracts and properties. Once you can list preconditions, invariants, and postconditions for an object, there's a good chance that you can write code that exercises those qualities. Indeed, why not use property-based testing to do so?
</p>
<p>
I don't wish to imply that you should (normally) write tests of your test code. The following rather serves as a concretisation of the notion that a Fake Object is a Test Double that implements the 'proper' behaviour. In the following, I'll subject the <code>FakeReadRegistry</code> class to that exercise. To do that, I'll use <a href="https://github.com/AnthonyLloyd/CsCheck">CsCheck</a> 2.14.1 with <a href="https://xunit.net/">xUnit.net</a> 2.5.3.
</p>
<p>
Before tackling the above invariant, there's a simpler invariant specific to the <code>FakeReadRegistry</code> class. A <code>FakeReadRegistry</code> object takes a collection of <code>rooms</code> via its constructor, so for this particular implementation, we may wish to establish the reasonable invariant that <code>GetFreeRooms</code> doesn't 'invent' rooms on its own:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> Gen<Room> GenRoom =>
<span style="color:blue;">from</span> name <span style="color:blue;">in</span> Gen.String
<span style="color:blue;">select</span> <span style="color:blue;">new</span> Room(name);
[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">GetFreeRooms</span>()
{
(<span style="color:blue;">from</span> rooms <span style="color:blue;">in</span> GenRoom.ArrayUnique
<span style="color:blue;">from</span> arrival <span style="color:blue;">in</span> Gen.Date.Select(DateOnly.FromDateTime)
<span style="color:blue;">from</span> i <span style="color:blue;">in</span> Gen.Int[1, 1_000]
<span style="color:blue;">let</span> departure = arrival.AddDays(i)
<span style="color:blue;">select</span> (rooms, arrival, departure))
.Sample((<span style="font-weight:bold;color:#1f377f;">rooms</span>, <span style="font-weight:bold;color:#1f377f;">arrival</span>, <span style="font-weight:bold;color:#1f377f;">departure</span>) =>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> FakeReadRegistry(rooms);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = sut.GetFreeRooms(arrival, departure);
Assert.Subset(<span style="color:blue;">new</span> HashSet<Room>(rooms), <span style="color:blue;">new</span> HashSet<Room>(actual));
});
}</pre>
</p>
<p>
This property asserts that the <code>actual</code> value returned from <code>GetFreeRooms</code> is a subset of the <code>rooms</code> used to initialise the <code>sut</code>. Recall that the subset relation is <a href="https://en.wikipedia.org/wiki/Reflexive_relation">reflexive</a>; i.e. a set is a subset of itself.
</p>
<p>
The same property written in F# with <a href="https://hedgehog.qa/">Hedgehog</a> 0.13.0 and <a href="https://github.com/SwensenSoftware/unquote">Unquote</a> 6.1.0 may look like this:
</p>
<p>
<pre><span style="color:blue;">module</span> Gen =
<span style="color:blue;">let</span> room =
Gen.alphaNum
|> Gen.array (Range.linear 1 10)
|> Gen.map (<span style="color:blue;">fun</span> chars <span style="color:blue;">-></span> { Name = String chars })
<span style="color:blue;">let</span> dateOnly =
<span style="color:blue;">let</span> min = DateOnly(2000, 1, 1).DayNumber
<span style="color:blue;">let</span> max = DateOnly(2100, 1, 1).DayNumber
Range.linear min max |> Gen.int32 |> Gen.map DateOnly.FromDayNumber
[<Fact>]
<span style="color:blue;">let</span> GetFreeRooms () = Property.check <| property {
<span style="color:blue;">let!</span> rooms = Gen.room |> Gen.list (Range.linear 0 100)
<span style="color:blue;">let!</span> arrival = Gen.dateOnly
<span style="color:blue;">let!</span> i = Gen.int32 (Range.linear 1 1_000)
<span style="color:blue;">let</span> departure = arrival.AddDays i
<span style="color:blue;">let</span> sut = FakeReadRegistry rooms :> IReadRegistry
<span style="color:blue;">let</span> actual = sut.GetFreeRooms arrival departure
test <@ Set.isSubset (Set.ofSeq rooms) (Set.ofSeq actual) @> }</pre>
</p>
<p>
Simpler syntax, same idea.
</p>
<p>
Likewise, we can express the contract that describes the relationship between <code>RoomBooked</code> and <code>GetFreeRooms</code> like this:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RoomBooked</span>()
{
(<span style="color:blue;">from</span> rooms <span style="color:blue;">in</span> GenRoom.ArrayUnique.Nonempty
<span style="color:blue;">from</span> arrival <span style="color:blue;">in</span> Gen.Date.Select(DateOnly.FromDateTime)
<span style="color:blue;">from</span> i <span style="color:blue;">in</span> Gen.Int[1, 1_000]
<span style="color:blue;">let</span> departure = arrival.AddDays(i)
<span style="color:blue;">from</span> room <span style="color:blue;">in</span> Gen.OneOfConst(rooms)
<span style="color:blue;">from</span> id <span style="color:blue;">in</span> Gen.Guid
<span style="color:blue;">let</span> booking = <span style="color:blue;">new</span> Booking(id, room.Name, arrival, departure)
<span style="color:blue;">select</span> (rooms, booking))
.Sample((<span style="font-weight:bold;color:#1f377f;">rooms</span>, <span style="font-weight:bold;color:#1f377f;">booking</span>) =>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> FakeReadRegistry(rooms);
sut.RoomBooked(booking);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = sut.GetFreeRooms(booking.Arrival, booking.Departure);
Assert.DoesNotContain(booking.RoomName, actual.Select(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Name));
});
}</pre>
</p>
<p>
or, in F#:
</p>
<p>
<pre>[<Fact>]
<span style="color:blue;">let</span> RoomBooked () = Property.check <| property {
<span style="color:blue;">let!</span> rooms = Gen.room |> Gen.list (Range.linear 1 100)
<span style="color:blue;">let!</span> arrival = Gen.dateOnly
<span style="color:blue;">let!</span> i = Gen.int32 (Range.linear 1 1_000)
<span style="color:blue;">let</span> departure = arrival.AddDays i
<span style="color:blue;">let!</span> room = Gen.item rooms
<span style="color:blue;">let!</span> id = Gen.guid
<span style="color:blue;">let</span> booking = {
ClientId = id
RoomName = room.Name
Arrival = arrival
Departure = departure }
<span style="color:blue;">let</span> sut = FakeReadRegistry rooms :> IReadRegistry
sut.RoomBooked booking
<span style="color:blue;">let</span> actual = sut.GetFreeRooms arrival departure
test <@ not (Seq.contains room actual) @> }</pre>
</p>
<p>
In both cases, the property books a room and then proceeds to query <code>GetFreeRooms</code> to see which rooms are free. Since the query is exactly in the range from <code>booking.Arrival</code> to <code>booking.Departure</code>, we expect <em>not</em> to see the name of the booked room among the free rooms.
</p>
<p>
(As I'm writing this, I think that there may be a subtle bug in the F# property. Can you spot it?)
</p>
<h3 id="fa249347697b49699b7ea62336746651">
Conclusion <a href="#fa249347697b49699b7ea62336746651">#</a>
</h3>
<p>
A Fake Object isn't like other Test Doubles. While <a href="/2022/10/17/stubs-and-mocks-break-encapsulation">Stubs and Mocks break encapsulation</a>, a Fake Object not only stays encapsulated, but it also fulfils the contract implied by a polymorphic API (interface or base class).
</p>
<p>
Or, put another way: When is a Fake Object the right Test Double? When you can describe the contract of the dependency.
</p>
<p>
But if you <em>can't</em> describe the contract of a dependency, you should seriously consider if the design is right.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.A C# port of validation with partial round triphttps://blog.ploeh.dk/2023/10/30/a-c-port-of-validation-with-partial-round-trip2023-10-30T11:52:00+00:00Mark Seemann
<div id="post">
<p>
<em>A raw port of the previous F# demo code.</em>
</p>
<p>
This article is part of <a href="/2020/12/14/validation-a-solved-problem">a short article series</a> on <a href="/2018/11/05/applicative-validation">applicative validation</a> with a twist. The twist is that validation, when it fails, should return not only a list of error messages; it should also retain that part of the input that <em>was</em> valid.
</p>
<p>
In the <a href="/2020/12/28/an-f-demo-of-validation-with-partial-data-round-trip">previous article</a> I showed <a href="https://fsharp.org/">F#</a> demo code, and since <a href="https://forums.fsharp.org/t/thoughts-on-input-validation-pattern-from-a-noob/1541">the original forum question</a> that prompted the article series was about F# code, for a long time, I left it there.
</p>
<p>
Recently, however, I've found myself writing about validation in a broader context:
</p>
<ul>
<li><a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">An applicative reservation validation example in C#</a></li>
<li><a href="/2022/08/15/aspnet-validation-revisited">ASP.NET validation revisited</a></li>
<li><a href="/2022/08/22/can-types-replace-validation">Can types replace validation?</a></li>
<li><a href="/2023/06/26/validation-and-business-rules">Validation and business rules</a></li>
<li><a href="/2023/07/03/validating-or-verifying-emails">Validating or verifying emails</a></li>
</ul>
<p>
Perhaps I should consider adding a <em>validation</em> tag to the blog...
</p>
<p>
In that light I thought that it might be illustrative to continue <a href="/2020/12/14/validation-a-solved-problem">this article series</a> with a port to C#.
</p>
<p>
Here, I use techniques already described on this site to perform the translation. Follow the links for details.
</p>
<p>
The translation given here is direct so produces some fairly non-idiomatic C# code.
</p>
<h3 id="5cee653b6148484fb782d92fea2ca415">
Building blocks <a href="#5cee653b6148484fb782d92fea2ca415">#</a>
</h3>
<p>
The original problem is succinctly stated, and I follow it as closely as possible. This includes potential errors that may be present in the original post.
</p>
<p>
The task is to translate some input to a Domain Model with <a href="/2022/10/24/encapsulation-in-functional-programming">good encapsulation</a>. The input type looks like this, translated to a <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/builtin-types/record">C# record</a>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">record</span> <span style="color:#2b91af;">Input</span>(<span style="color:blue;">string</span>? <span style="font-weight:bold;color:#1f377f;">Name</span>, DateTime? <span style="font-weight:bold;color:#1f377f;">DoB</span>, <span style="color:blue;">string</span>? <span style="font-weight:bold;color:#1f377f;">Address</span>)</pre>
</p>
<p>
Notice that every input may be null. This indicates poor encapsulation, but is symptomatic of most input. <a href="/2023/10/16/at-the-boundaries-static-types-are-illusory">At the boundaries, static types are illusory</a>. Perhaps it would have been more idiomatic to model such input as a <a href="https://en.wikipedia.org/wiki/Data_transfer_object">Data Transfer Object</a>, but it makes little difference to what comes next.
</p>
<p>
I consider <a href="/2020/12/14/validation-a-solved-problem">validation a solved problem</a>, because it's possible to model the process as an <a href="/2018/10/01/applicative-functors">applicative functor</a>. Really, <a href="https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/">validation is a parsing problem</a>.
</p>
<p>
Since my main intent with this article is to demonstrate a technique, I will allow myself a few shortcuts. Like I did <a href="/2023/08/28/a-first-crack-at-the-args-kata">when I first encountered the Args kata</a>, I start by copying the <code>Validated</code> code from <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">An applicative reservation validation example in C#</a>; you can go there if you're interested in it. I'm not going to repeat it here.
</p>
<p>
The target type looks similar to the above <code>Input</code> record, but doesn't allow null values:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">record</span> <span style="color:#2b91af;">ValidInput</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">Name</span>, DateTime <span style="font-weight:bold;color:#1f377f;">DoB</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">Address</span>);</pre>
</p>
<p>
This could also have been a 'proper' class. The following code doesn't depend on that.
</p>
<h3 id="7af5ab9c8fca4dc193fc5854c2806ff4">
Validating names <a href="#7af5ab9c8fca4dc193fc5854c2806ff4">#</a>
</h3>
<p>
Since I'm now working in an ostensibly object-oriented language, I can make the various validation functions methods on the <code>Input</code> record. Since I'm treating validation as a parsing problem, I'm going to name those methods with the <code>TryParse</code> prefix:
</p>
<p>
<pre><span style="color:blue;">private</span> Validated<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), <span style="color:blue;">string</span>>
<span style="font-weight:bold;color:#74531f;">TryParseName</span>()
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (Name <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> Validated.Fail<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), <span style="color:blue;">string</span>>(
(<span style="font-weight:bold;color:#1f377f;">x</span> => x, <span style="color:blue;">new</span>[] { <span style="color:#a31515;">"name is required"</span> }));
<span style="font-weight:bold;color:#8f08c4;">if</span> (Name.Length <= 3)
<span style="font-weight:bold;color:#8f08c4;">return</span> Validated.Fail<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), <span style="color:blue;">string</span>>(
(<span style="font-weight:bold;color:#1f377f;">i</span> => i <span style="color:blue;">with</span> { Name = <span style="color:blue;">null</span> }, <span style="color:blue;">new</span>[] { <span style="color:#a31515;">"no bob and toms allowed"</span> }));
<span style="font-weight:bold;color:#8f08c4;">return</span> Validated.Succeed<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), <span style="color:blue;">string</span>>(Name);
}</pre>
</p>
<p>
As the two previous articles have explained, the result of trying to parse input is a type isomorphic to <a href="/2019/01/14/an-either-functor">Either</a>, but here called <code><span style="color:#2b91af;">Validated</span><<span style="color:#2b91af;">F</span>, <span style="color:#2b91af;">S</span>></code>. (The reason for this distinction is that we <em>don't</em> want the <a href="/2022/05/09/an-either-monad">monadic behaviour of Either</a>, because monads short-circuit.)
</p>
<p>
When parsing succeeds, the <code>TryParseName</code> method returns the <code>Name</code> wrapped in a <code>Success</code> case.
</p>
<p>
Parsing the name may fail in two different ways. If the name is missing, the method returns the input and the error message <em>"name is required"</em>. If the name is present, but too short, <code>TryParseName</code> returns another error message, and also resets <code>Name</code> to <code>null</code>.
</p>
<p>
Compare the C# code with <a href="/2020/12/28/an-f-demo-of-validation-with-partial-data-round-trip">the corresponding F#</a> or <a href="/2020/12/21/a-haskell-proof-of-concept-of-validation-with-partial-data-round-trip">Haskell code</a> and notice how much more verbose the C# has to be.
</p>
<p>
While it's possible to translate many functional programming concepts to a language like C#, syntax does matter, because it affects readability.
</p>
<h3 id="2113a955061341ab9e2dba711aaf8457">
Validating date of birth <a href="#2113a955061341ab9e2dba711aaf8457">#</a>
</h3>
<p>
From here, the port is direct, if awkward. Here's how to validate the date-of-birth field:
</p>
<p>
<pre><span style="color:blue;">private</span> Validated<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), DateTime>
<span style="font-weight:bold;color:#74531f;">TryParseDoB</span>(DateTime <span style="font-weight:bold;color:#1f377f;">now</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (!DoB.HasValue)
<span style="font-weight:bold;color:#8f08c4;">return</span> Validated.Fail<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), DateTime>(
(<span style="font-weight:bold;color:#1f377f;">x</span> => x, <span style="color:blue;">new</span>[] { <span style="color:#a31515;">"dob is required"</span> }));
<span style="font-weight:bold;color:#8f08c4;">if</span> (DoB.Value <= now.AddYears(-12))
<span style="font-weight:bold;color:#8f08c4;">return</span> Validated.Fail<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), DateTime>(
(<span style="font-weight:bold;color:#1f377f;">i</span> => i <span style="color:blue;">with</span> { DoB = <span style="color:blue;">null</span> }, <span style="color:blue;">new</span>[] { <span style="color:#a31515;">"get off my lawn"</span> }));
<span style="font-weight:bold;color:#8f08c4;">return</span> Validated.Succeed<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), DateTime>(
DoB.Value);
}</pre>
</p>
<p>
I suspect that the age check should really have been a greater-than relation, but I'm only reproducing the original code.
</p>
<h3 id="e1fc6b98e4fb4dad81ee5e354032acb8">
Validating addresses <a href="#e1fc6b98e4fb4dad81ee5e354032acb8">#</a>
</h3>
<p>
The final building block is to parse the input address:
</p>
<p>
<pre><span style="color:blue;">private</span> Validated<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), <span style="color:blue;">string</span>>
<span style="font-weight:bold;color:#74531f;">TryParseAddress</span>()
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (Address <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> Validated.Fail<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), <span style="color:blue;">string</span>>(
(<span style="font-weight:bold;color:#1f377f;">x</span> => x, <span style="color:blue;">new</span>[] { <span style="color:#a31515;">"add1 is required"</span> }));
<span style="font-weight:bold;color:#8f08c4;">return</span> Validated.Succeed<(Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>), <span style="color:blue;">string</span>>(
Address);
}</pre>
</p>
<p>
The <code>TryParseAddress</code> only checks whether or not the <code>Address</code> field is present.
</p>
<h3 id="b11153a62fa945568e880cf771a7cb19">
Composition <a href="#b11153a62fa945568e880cf771a7cb19">#</a>
</h3>
<p>
The above methods are <code>private</code> because the entire problem is simple enough that I can test the composition as a whole. Had I wanted to, however, I could easily have made them <code>public</code> and tested them individually.
</p>
<p>
You can now use applicative composition to produce a single validation method:
</p>
<p>
<pre><span style="color:blue;">public</span> Validated<(Input, IReadOnlyCollection<<span style="color:blue;">string</span>>), ValidInput>
<span style="font-weight:bold;color:#74531f;">TryParse</span>(DateTime <span style="font-weight:bold;color:#1f377f;">now</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">name</span> = TryParseName();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">dob</span> = TryParseDoB(now);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">address</span> = TryParseAddress();
Func<<span style="color:blue;">string</span>, DateTime, <span style="color:blue;">string</span>, ValidInput> <span style="font-weight:bold;color:#1f377f;">createValid</span> =
(<span style="font-weight:bold;color:#1f377f;">n</span>, <span style="font-weight:bold;color:#1f377f;">d</span>, <span style="font-weight:bold;color:#1f377f;">a</span>) => <span style="color:blue;">new</span> ValidInput(n, d, a);
<span style="color:blue;">static</span> (Func<Input, Input>, IReadOnlyCollection<<span style="color:blue;">string</span>>) <span style="color:#74531f;">combineErrors</span>(
(Func<Input, Input> f, IReadOnlyCollection<<span style="color:blue;">string</span>> es) <span style="font-weight:bold;color:#1f377f;">x</span>,
(Func<Input, Input> g, IReadOnlyCollection<<span style="color:blue;">string</span>> es) <span style="font-weight:bold;color:#1f377f;">y</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> (<span style="font-weight:bold;color:#1f377f;">z</span> => y.g(x.f(z)), y.es.Concat(x.es).ToArray());
}
<span style="font-weight:bold;color:#8f08c4;">return</span> createValid
.Apply(name, combineErrors)
.Apply(dob, combineErrors)
.Apply(address, combineErrors)
.SelectFailure(<span style="font-weight:bold;color:#1f377f;">x</span> => (x.Item1(<span style="color:blue;">this</span>), x.Item2));
}</pre>
</p>
<p>
This is where the <code>Validated</code> API is still awkward. You need to explicitly define a function to compose error cases. In this case, <code>combineErrors</code> composes the <a href="/2017/11/13/endomorphism-monoid">endomorphisms</a> and concatenates the collections.
</p>
<p>
The final step 'runs' the endomorphism. <code>x.Item1</code> is the endomorphism, and <code>this</code> is the <code>Input</code> value being validated. Again, this isn't readable in C#, but it's where the endomorphism removes the invalid values from the input.
</p>
<h3 id="8aa59e20c1924002ae0d4e951df71619">
Tests <a href="#8aa59e20c1924002ae0d4e951df71619">#</a>
</h3>
<p>
Since <a href="/2018/11/05/applicative-validation">applicative validation</a> is a functional technique, it's <a href="/2015/05/07/functional-design-is-intrinsically-testable">intrinsically testable</a>.
</p>
<p>
Testing a successful validation is as easy as this:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">ValidationSucceeds</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">now</span> = DateTime.Now;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">eightYearsAgo</span> = now.AddYears(-8);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">input</span> = <span style="color:blue;">new</span> Input(<span style="color:#a31515;">"Alice"</span>, eightYearsAgo, <span style="color:#a31515;">"x"</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = input.TryParse(now);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = Validated.Succeed<(Input, IReadOnlyCollection<<span style="color:blue;">string</span>>), ValidInput>(
<span style="color:blue;">new</span> ValidInput(<span style="color:#a31515;">"Alice"</span>, eightYearsAgo, <span style="color:#a31515;">"x"</span>));
Assert.Equal(expected, actual);
}</pre>
</p>
<p>
As is often the case, the error conditions are more numerous, or more interesting, if you will, than the success case, so this requires a parametrised test:
</p>
<p>
<pre>[Theory, ClassData(<span style="color:blue;">typeof</span>(ValidationFailureTestCases))]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">ValidationFails</span>(
Input <span style="font-weight:bold;color:#1f377f;">input</span>,
Input <span style="font-weight:bold;color:#1f377f;">expected</span>,
IReadOnlyCollection<<span style="color:blue;">string</span>> <span style="font-weight:bold;color:#1f377f;">expectedMessages</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">now</span> = DateTime.Now;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = input.TryParse(now);
var (<span style="font-weight:bold;color:#1f377f;">inp</span>, <span style="font-weight:bold;color:#1f377f;">msgs</span>) = Assert.Single(actual.Match(
onFailure: <span style="font-weight:bold;color:#1f377f;">x</span> => <span style="color:blue;">new</span>[] { x },
onSuccess: <span style="font-weight:bold;color:#1f377f;">_</span> => Array.Empty<(Input, IReadOnlyCollection<<span style="color:blue;">string</span>>)>()));
Assert.Equal(expected, inp);
Assert.Equal(expectedMessages, msgs);
}</pre>
</p>
<p>
I also had to take <code>actual</code> apart in order to inspects its individual elements. When working with a pure and immutable data structure, I consider that a test smell. Rather, one should be able to use <a href="/2021/05/03/structural-equality-for-better-tests">structural equality for better tests</a>. Unfortunately, .NET collections don't have structural equality, so the test has to pull the message collection out of <code>actual</code> in order to verify it.
</p>
<p>
Again, in F# or <a href="https://www.haskell.org/">Haskell</a> you don't have that problem, and the tests are much more succinct and robust.
</p>
<p>
The test cases are implemented by this nested <code>ValidationFailureTestCases</code> class:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ValidationFailureTestCases</span> :
TheoryData<Input, Input, IReadOnlyCollection<<span style="color:blue;">string</span>>>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">ValidationFailureTestCases</span>()
{
Add(<span style="color:blue;">new</span> Input(<span style="color:blue;">null</span>, <span style="color:blue;">null</span>, <span style="color:blue;">null</span>),
<span style="color:blue;">new</span> Input(<span style="color:blue;">null</span>, <span style="color:blue;">null</span>, <span style="color:blue;">null</span>),
<span style="color:blue;">new</span>[] { <span style="color:#a31515;">"add1 is required"</span>, <span style="color:#a31515;">"dob is required"</span>, <span style="color:#a31515;">"name is required"</span> });
Add(<span style="color:blue;">new</span> Input(<span style="color:#a31515;">"Bob"</span>, <span style="color:blue;">null</span>, <span style="color:blue;">null</span>),
<span style="color:blue;">new</span> Input(<span style="color:blue;">null</span>, <span style="color:blue;">null</span>, <span style="color:blue;">null</span>),
<span style="color:blue;">new</span>[] { <span style="color:#a31515;">"add1 is required"</span>, <span style="color:#a31515;">"dob is required"</span>, <span style="color:#a31515;">"no bob and toms allowed"</span> });
Add(<span style="color:blue;">new</span> Input(<span style="color:#a31515;">"Alice"</span>, <span style="color:blue;">null</span>, <span style="color:blue;">null</span>),
<span style="color:blue;">new</span> Input(<span style="color:#a31515;">"Alice"</span>, <span style="color:blue;">null</span>, <span style="color:blue;">null</span>),
<span style="color:blue;">new</span>[] { <span style="color:#a31515;">"add1 is required"</span>, <span style="color:#a31515;">"dob is required"</span> });
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">eightYearsAgo</span> = DateTime.Now.AddYears(-8);
Add(<span style="color:blue;">new</span> Input(<span style="color:#a31515;">"Alice"</span>, eightYearsAgo, <span style="color:blue;">null</span>),
<span style="color:blue;">new</span> Input(<span style="color:#a31515;">"Alice"</span>, eightYearsAgo, <span style="color:blue;">null</span>),
<span style="color:blue;">new</span>[] { <span style="color:#a31515;">"add1 is required"</span> });
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">fortyYearsAgo</span> = DateTime.Now.AddYears(-40);
Add(<span style="color:blue;">new</span> Input(<span style="color:#a31515;">"Alice"</span>, fortyYearsAgo, <span style="color:#a31515;">"x"</span>),
<span style="color:blue;">new</span> Input(<span style="color:#a31515;">"Alice"</span>, <span style="color:blue;">null</span>, <span style="color:#a31515;">"x"</span>),
<span style="color:blue;">new</span>[] { <span style="color:#a31515;">"get off my lawn"</span> });
Add(<span style="color:blue;">new</span> Input(<span style="color:#a31515;">"Tom"</span>, fortyYearsAgo, <span style="color:#a31515;">"x"</span>),
<span style="color:blue;">new</span> Input(<span style="color:blue;">null</span>, <span style="color:blue;">null</span>, <span style="color:#a31515;">"x"</span>),
<span style="color:blue;">new</span>[] { <span style="color:#a31515;">"get off my lawn"</span>, <span style="color:#a31515;">"no bob and toms allowed"</span> });
Add(<span style="color:blue;">new</span> Input(<span style="color:#a31515;">"Tom"</span>, eightYearsAgo, <span style="color:#a31515;">"x"</span>),
<span style="color:blue;">new</span> Input(<span style="color:blue;">null</span>, eightYearsAgo, <span style="color:#a31515;">"x"</span>),
<span style="color:blue;">new</span>[] { <span style="color:#a31515;">"no bob and toms allowed"</span> });
}
}</pre>
</p>
<p>
All eight tests pass.
</p>
<h3 id="96596b52720f4a2688c216701f48d559">
Conclusion <a href="#96596b52720f4a2688c216701f48d559">#</a>
</h3>
<p>
Once you know <a href="/2018/05/22/church-encoding">how to model sum types (discriminated unions) in C#</a>, translating something like applicative validation isn't difficult per se. It's a fairly automatic process.
</p>
<p>
The code is hardly <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> C#, and the type annotations are particularly annoying. Things work as expected though, and it isn't difficult to imagine how one could refactor some of this code to a more idiomatic form.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Domain Model firsthttps://blog.ploeh.dk/2023/10/23/domain-model-first2023-10-23T06:09:00+00:00Mark Seemann
<div id="post">
<p>
<em>Persistence concerns second.</em>
</p>
<p>
A few weeks ago, I published an article with the title <a href="/2023/09/18/do-orms-reduce-the-need-for-mapping">Do ORMs reduce the need for mapping?</a> Not surprisingly, this elicited more than one reaction. In this article, I'll respond to a particular kind of reaction.
</p>
<p>
First, however, I'd like to reiterate the message of the previous article, which is almost revealed by the title: <em>Do <a href="https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping">object-relational mappers</a> (ORMs) reduce the need for mapping?</em> To which the article answers a tentative <em>no</em>.
</p>
<p>
Do pay attention to the question. It doesn't ask whether ORMs are bad in general, or in all cases. It mainly analyses whether the use of ORMs reduces the need to write code that maps between different representations of data: From database to objects, from objects to <a href="https://en.wikipedia.org/wiki/Data_transfer_object">Data Transfer Objects</a> (DTOs), etc.
</p>
<p>
Granted, the article looks at a wider context, which I think is only a responsible thing to do. This could lead some readers to extrapolate from the article's specific focus to draw a wider conclusion.
</p>
<h3 id="951d538881fd4464a081ba3cd09162b0">
Encapsulation-first <a href="#951d538881fd4464a081ba3cd09162b0">#</a>
</h3>
<p>
Most of the systems I work with aren't <a href="https://en.wikipedia.org/wiki/Create,_read,_update_and_delete">CRUD</a> systems, but rather systems where correctness is important. As an example, one of my clients does security-heavy digital infrastructure. Earlier in my career, I helped write web shops when these kinds of systems were new. Let me tell you: System owners were quite concerned that prices were correct, and that orders were taken and handled without error.
</p>
<p>
In my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> I've tried to capture the essence of those kinds of system with the accompanying sample code, which pretends to be an online restaurant reservation system. While this may sound like a trivial CRUD system, <a href="/2020/01/27/the-maitre-d-kata">the business logic isn't entirely straightforward</a>.
</p>
<p>
The point I was making in <a href="/2023/09/18/do-orms-reduce-the-need-for-mapping">the previous article</a> is that I consider <a href="/encapsulation-and-solid">encapsulation</a> to be more important than 'easy' persistence. I don't mind writing a bit of mapping code, since <a href="/2018/09/17/typing-is-not-a-programming-bottleneck">typing isn't a programming bottleneck</a> anyway.
</p>
<p>
When prioritising encapsulation you should be able to make use of any design pattern, run-time assertion, as well as static type systems (if you're working in such a language) to guard correctness. You should be able to compose objects, define <a href="https://en.wikipedia.org/wiki/Value_object">Value Objects</a>, <a href="/2015/01/19/from-primitive-obsession-to-domain-modelling">wrap single values to avoid primitive obsession</a>, make constructors private, leverage polymorphism and effectively use any trick your language, <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiom</a>, and platform has on offer. If you want to use <a href="/2018/05/22/church-encoding">Church encoding</a> or the <a href="/2018/06/25/visitor-as-a-sum-type">Visitor pattern to represent a sum type</a>, you should be able to do that.
</p>
<p>
When writing these kinds of systems, I start with the Domain Model without any thought of how to persist or retrieve data.
</p>
<p>
In my experience, once the Domain Model starts to congeal, the persistence question tends to answer itself. There's usually one or two obvious ways to store and read data.
</p>
<p>
Usually, a relational database isn't the most obvious choice.
</p>
<h3 id="1b562dd9077e4b27b912d782bdca14fb">
Persistence ignorance <a href="#1b562dd9077e4b27b912d782bdca14fb">#</a>
</h3>
<p>
Write the best API you can to solve the problem, and then figure out how to store data. This is the allegedly elusive ideal of <em>persistence ignorance</em>, which turns out to be easier than rumour has it, once you cast a wider net than relational databases.
</p>
<p>
It seems to me, though, that more than one person who has commented on my previous article have a hard time considering alternatives. And granted, I've consulted with clients who knew how to operate a particular database system, but nothing else, and who didn't want to consider adopting another technology. I do understand that such constraints are real, too. Thus, if you need to compromise for reasons such as these, you aren't doing anything wrong. You may still, however, try to get the best out of the situation.
</p>
<p>
One client of mine, for example, didn't want to operate anything else than <a href="https://en.wikipedia.org/wiki/Microsoft_SQL_Server">SQL Server</a>, which they already know. For an asynchronous message-based system, then, we chose <a href="https://particular.net/nservicebus">NServiceBus</a> and configured it to use SQL Server as a persistent queue.
</p>
<p>
Several comments still seem to assume that persistence must look in a particular way.
</p>
<blockquote>
<p>
"So having a Order, OrderLine, Person, Address and City, all the rows needed to be loaded in advance, mapped to objects and references set to create the object graph to be able to, say, display shipping costs based on person's address."
</p>
<footer><cite><a href="/2023/09/18/do-orms-reduce-the-need-for-mapping#75ca5755d2a4445ba4836fc3f6922a5c">Vlad</a></cite></footer>
</blockquote>
<p>
I don't wish to single out Vlad, but this is both the first comment, and it captures the essence of other comments well. I imagine that what he has in mind is something like this:
</p>
<p>
<img src="/content/binary/orders-db-diagram.png" alt="Database diagram with five tables: Orders, OrderLines, Persons, Addresses, and Cities.">
</p>
<p>
I've probably simplified things a bit too much. In a more realistic model, each person may have a collection of addresses, instead of just one. If so, it only strengthens Vlad's point, because that would imply even more tables to read.
</p>
<p>
The unstated assumption, however, is that a fully <a href="https://en.wikipedia.org/wiki/Database_normalization">normalised</a> relational data model is the correct way to store such data.
</p>
<p>
It's not. As I already mentioned, I spent the first four years of my programming career developing web shops. Orders were an integral part of that work.
</p>
<p>
An order is a <em>document</em>. You don't want the customer's address to be updatable after the fact. With a normalised relational model, if you change the customer's address row in the future, it's going to look as though the order went to that address instead of the address it actually went to.
</p>
<p>
This also explains why the order lines should <em>not</em> point to the actually product entries in the product catalogue. Trust me, I almost shipped such a system once, when I was young and inexperienced.
</p>
<p>
You should, at the very least, denormalise the database model. To a degree, this has already happened here, since the implied order has order lines, that, I hope, are copies of the relevant product data, rather than linked to the product catalogue.
</p>
<p>
Such insights, however, suggest that other storage mechanisms may be more appropriate.
</p>
<p>
Putting that aside for a moment, though, how would a persistence-ignorant Domain Model look?
</p>
<p>
I'd probably start with something like this:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">order</span> = <span style="color:blue;">new</span> Order(
<span style="color:blue;">new</span> Person(<span style="color:#a31515;">"Olive"</span>, <span style="color:#a31515;">"Hoyle"</span>,
<span style="color:blue;">new</span> Address(<span style="color:#a31515;">"Green Street 15"</span>, <span style="color:blue;">new</span> City(<span style="color:#a31515;">"Oakville"</span>), <span style="color:#a31515;">"90125"</span>)),
<span style="color:blue;">new</span> OrderLine(123, 1),
<span style="color:blue;">new</span> OrderLine(456, 3),
<span style="color:blue;">new</span> OrderLine(789, 2));</pre>
</p>
<p>
(As <a href="/ref/90125">the ZIP code</a> implies, I'm more of a <a href="https://en.wikipedia.org/wiki/Yes_(band)">Yes</a> fan, but still can't help but relish writing <code>new Order</code> in code.)
</p>
<p>
With code like this, many a <a href="/ref/ddd">DDD</a>'er would start talking about Aggregate Roots, but that is, frankly, a concept that never made much sense to me. Rather, the above <code>order</code> is a <a href="https://en.wikipedia.org/wiki/Tree_(graph_theory)">tree</a> composed of immutable data structures.
</p>
<p>
It trivially serializes to e.g. JSON:
</p>
<p>
<pre>{
<span style="color:#2e75b6;">"customer"</span>: {
<span style="color:#2e75b6;">"firstName"</span>: <span style="color:#a31515;">"Olive"</span>,
<span style="color:#2e75b6;">"lastName"</span>: <span style="color:#a31515;">"Hoyle"</span>,
<span style="color:#2e75b6;">"address"</span>: {
<span style="color:#2e75b6;">"street"</span>: <span style="color:#a31515;">"Green Street 15"</span>,
<span style="color:#2e75b6;">"city"</span>: { <span style="color:#2e75b6;">"name"</span>: <span style="color:#a31515;">"Oakville"</span> },
<span style="color:#2e75b6;">"zipCode"</span>: <span style="color:#a31515;">"90125"</span>
}
},
<span style="color:#2e75b6;">"orderLines"</span>: [
{
<span style="color:#2e75b6;">"sku"</span>: 123,
<span style="color:#2e75b6;">"quantity"</span>: 1
},
{
<span style="color:#2e75b6;">"sku"</span>: 456,
<span style="color:#2e75b6;">"quantity"</span>: 3
},
{
<span style="color:#2e75b6;">"sku"</span>: 789,
<span style="color:#2e75b6;">"quantity"</span>: 2
}
]
}</pre>
</p>
<p>
All of this strongly suggests that this kind of data would be <em>much easier</em> to store and retrieve with a document database instead of a relational database.
</p>
<p>
While that's just one example, it strikes me as a common theme when discussing persistence. For most online transaction processing systems, relational database aren't necessarily the best fit.
</p>
<h3 id="3643959a545940f88001fb82297a286e">
The cart before the horse <a href="#3643959a545940f88001fb82297a286e">#</a>
</h3>
<p>
<a href="/2023/09/18/do-orms-reduce-the-need-for-mapping#359a7bb0d2c14b8eb2dcb2ac6de4897d">Another comment</a> also starts with the premise that a data model is fundamentally relational. This one purports to model the relationship between sheikhs, their wives, and supercars. While I understand that the example is supposed to be tongue-in-cheek, the comment launches straight into problems with how to read and persist such data without relying on an ORM.
</p>
<p>
Again, I don't intend to point fingers at anyone, but on the other hand, I can't suggest alternatives when a problem is presented like that.
</p>
<p>
The whole point of developing a Domain Model <em>first</em> is to find a good way to represent the business problem in a way that encourages correctness and ease of use.
</p>
<p>
If you present me with a relational model without describing the business goals you're trying to achieve, I don't have much to work with.
</p>
<p>
It may be that your business problem is truly relational, in which case an ORM probably is a good solution. I wrote as much in the previous article.
</p>
<p>
In many cases, however, it looks to me as though programmers start with a relational model, only to proceed to complain that it's difficult to work with in object-oriented (or functional) code.
</p>
<p>
If you, on the other hand, start with the business problem and figure out how to model it in code, the best way to store the data may suggest itself. Document databases are often a good fit, as are event stores. I've never had need for a graph database, but perhaps that would be a better fit for the <em>sheikh</em> domain suggested by <em>qfilip</em>.
</p>
<h3 id="8c32485e1ffd42f4ace9b83c98ae3184">
Reporting <a href="#8c32485e1ffd42f4ace9b83c98ae3184">#</a>
</h3>
<p>
While I no longer feel that relational databases are particularly well-suited for online transaction processing, they are really good at one thing: Ad-hoc querying. Because it's such a rich and mature type of technology, and because <a href="https://en.wikipedia.org/wiki/SQL">SQL</a> is a powerful language, you can slice and dice data in multiple ways.
</p>
<p>
This makes relational databases useful for reporting and other kinds of data extraction tasks.
</p>
<p>
You may have business stakeholders who insist on a relational database for that particular reason. It may even be a good reason.
</p>
<p>
If, however, the sole purpose of having a relational database is to support reporting, you may consider setting it up as a secondary system. Keep your online transactional data in another system, but regularly synchronize it to a relational database. If the only purpose of the relational database is to support reporting, you can treat it as a read-only system. This makes synchronization manageable. In general, you should avoid two-way synchronization if at all possible, but one-way synchronization is usually less of a problem.
</p>
<p>
Isn't that going to be more work, or more expensive?
</p>
<p>
That question, again, has no single answer. Of course setting up and maintaining two systems is more work at the outset. On the other hand, there's a perpetual cost to be paid if you come up with the wrong architecture. If development is slow, and you have many bugs in production, or similar problems, the cause could be that you've chosen the wrong architecture and you're now fighting a losing battle.
</p>
<p>
On the other hand, if you relegate relational databases exclusively to a reporting role, chances are that there's a lot of off-the-shelf software that can support your business users. Perhaps you can even hire a paratechnical power user to take care of that part of the system, freeing you to focus on the 'actual' system.
</p>
<p>
All of this is only meant as inspiration. If you don't want to, or can't, do it that way, then this article doesn't help you.
</p>
<h3 id="1b0ce932168349f8abb9887f9ed219c8">
Conclusion <a href="#1b0ce932168349f8abb9887f9ed219c8">#</a>
</h3>
<p>
When discussing databases, and particularly ORMs, some people approach the topic with the unspoken assumption that a relational database is the only option for storing data. Many programmers are so skilled in relational data design that they naturally use those skills when thinking new problems over.
</p>
<p>
Sometimes problems are just relational in nature, and that's fine. More often than not, however, that's not the case.
</p>
<p>
Try to model a business problem without concern for storage and see where that leads you. Test-driven development is often a great technique for such a task. Then, once you have a good API, consider how to store the data. The Domain Model that you develop in that way may naturally suggest a good way to store and retrieve the data.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="db4a9a94452a4cc7bf71989561dfd947">
<div class="comment-author"><a href="#db4a9a94452a4cc7bf71989561dfd947">qfilip</a></div>
<div class="comment-content">
<q>
<i>
Again, I don't intend to point fingers at anyone, but on the other hand, I can't suggest alternatives when a problem is presented like that.
</i>
</q>
<p>
Heh, that's fair criticism, not finger pointing. I wanted to give a better example here, but I gave up halfway through writing it. You raised some good points. I'll have to rethink my approach on domain modeling further, before asking any meaningful questions.
</p>
<p>
Years of working with EF-Core in a specific way got me... indoctrinated. Not all things are bad ofcourse, but I have missed the bigger picture in some areas, as far as I can tell.
</p>
<p>
Thanks for dedicating so many articles to the subject.
</p>
</div>
<div class="comment-date">2023-10-23 18:05 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.At the boundaries, static types are illusoryhttps://blog.ploeh.dk/2023/10/16/at-the-boundaries-static-types-are-illusory2023-10-16T08:07:00+00:00Mark Seemann
<div id="post">
<p>
<em>Static types are useful, but have limitations.</em>
</p>
<p>
Regular readers of this blog may have noticed that I like static type systems. Not the kind of static types offered by <a href="https://en.wikipedia.org/wiki/C_(programming_language)">C</a>, which strikes me as mostly being able to distinguish between way too many types of integers and pointers. <a href="/2020/01/20/algebraic-data-types-arent-numbers-on-steroids">A good type system is more than just numbers on steroids</a>. A type system like C#'s is <a href="/2019/12/16/zone-of-ceremony">workable, but verbose</a>. The kind of type system I find most useful is when it has <a href="https://en.wikipedia.org/wiki/Algebraic_data_type">algebraic data types</a> and good type inference. The examples that I know best are the type systems of <a href="https://fsharp.org/">F#</a> and <a href="https://www.haskell.org/">Haskell</a>.
</p>
<p>
As great as static type systems can be, they have limitations. <a href="https://www.hillelwayne.com/post/constructive/">Hillel Wayne has already outlined one kind of distinction</a>, but here I'd like to focus on another constraint.
</p>
<h3 id="ab0d595d35304a9ea9302197b4f796d3">
Application boundaries <a href="#ab0d595d35304a9ea9302197b4f796d3">#</a>
</h3>
<p>
Any piece of software interacts with the 'rest of the world'; effectively everything outside its own process. Sometimes (but increasingly rarely) such interaction is exclusively by way of some user interface, but more and more, an application interacts with other software in some way.
</p>
<p>
<img src="/content/binary/application-boundary.png" alt="A application depicted as an opaque disk with a circle emphasising its boundary. Also included are arrows in and out, with some common communication artefacts: Messages, HTTP traffic, and a database.">
</p>
<p>
Here I've drawn the application as an opaque disc in order to emphasise that what happens inside the process isn't pertinent to the following discussion. The diagram also includes some common kinds of traffic. Many applications rely on some kind of database or send messages (email, SMS, Commands, Events, etc.). We can think of such traffic as the interactions that the application initiates, but many systems also receive and react to incoming data: HTTP traffic or messages that arrive on a queue, and so on.
</p>
<p>
When I talk about application <em>boundaries</em>, I have in mind what goes on in that interface layer.
</p>
<p>
An application can talk to the outside world in multiple ways: It may read or write a file, access shared memory, call operating-system APIs, send or receive network packets, etc. Usually you get to program against higher-level abstractions, but ultimately the application is dealing with various binary protocols.
</p>
<h3 id="4991578e222e408bb08e261dce6454f1">
Protocols <a href="#4991578e222e408bb08e261dce6454f1">#</a>
</h3>
<p>
The bottom line is that at a sufficiently low level of abstraction, what goes in and out of your application has no static type stronger than an array of bytes.
</p>
<p>
You may counter-argue that higher-level APIs deal with that to present the input and output as static types. When you interact with a text file, you'll typically deal with a list of strings: One for each line in the file. Or you may manipulate <a href="https://en.wikipedia.org/wiki/JSON">JSON</a>, <a href="https://en.wikipedia.org/wiki/XML">XML</a>, <a href="https://en.wikipedia.org/wiki/Protocol_Buffers">Protocol Buffers</a>, or another wire format using a serializer/deserializer API. Sometime, as is often the case with <a href="https://en.wikipedia.org/wiki/Comma-separated_values">CSV</a>, you may need to write a very simple parser yourself. Or perhaps <a href="/2023/08/28/a-first-crack-at-the-args-kata">something slightly more involved</a>.
</p>
<p>
To demonstrate what I mean, there's no shortage of APIs like <a href="https://learn.microsoft.com/dotnet/api/system.text.json.jsonserializer.deserialize">JsonSerializer.Deserialize</a>, which enables you to write <a href="/2022/05/02/at-the-boundaries-applications-arent-functional">code like this</a>:
</p>
<p>
<pre><span style="color:blue;">let</span> n = JsonSerializer.Deserialize<Name> (json, opts)</pre>
</p>
<p>
and you may say: <em><code>n</code> is statically typed, and its type is <code>Name</code>! Hooray!</em> But you do realise that that's only half a truth, don't you?
</p>
<p>
An interaction at the application boundary is expected to follow some kind of <em>protocol</em>. This is even true if you're reading a text file. In these modern times, you may expect a text file to contain <a href="https://unicode.org/">Unicode</a>, but have you ever received a file from a legacy system and have to deal with its <a href="https://en.wikipedia.org/wiki/EBCDIC">EBCDIC</a> encoding? Or an <a href="https://en.wikipedia.org/wiki/ASCII">ASCII</a> file with a <a href="https://en.wikipedia.org/wiki/Code_page">code page</a> different from the one you expect? Or even just a file written on a Unix system, if you're on Windows, or vice versa?
</p>
<p>
In order to correctly interpret or transmit such data, you need to follow a <em>protocol</em>.
</p>
<p>
Such a protocol can be low-level, as the character-encoding examples I just listed, but it may also be much more high-level. You may, for example, consider an HTTP request like this:
</p>
<p>
<pre>POST /restaurants/90125/reservations?sig=aco7VV%2Bh5sA3RBtrN8zI8Y9kLKGC60Gm3SioZGosXVE%3D HTTP/1.1
Content-Type: application/json
{
<span style="color:#2e75b6;">"at"</span>: <span style="color:#a31515;">"2021-12-08 20:30"</span>,
<span style="color:#2e75b6;">"email"</span>: <span style="color:#a31515;">"snomob@example.com"</span>,
<span style="color:#2e75b6;">"name"</span>: <span style="color:#a31515;">"Snow Moe Beal"</span>,
<span style="color:#2e75b6;">"quantity"</span>: 1
}</pre>
</p>
<p>
Such an interaction implies a protocol. Part of such a protocol is that the HTTP request's body is a valid JSON document, that it has an <code>at</code> property, that that property encodes a valid date and time, that <code>quantity</code> is a natural number, that <code>email</code> <a href="/2023/07/03/validating-or-verifying-emails">is present</a>, and so on.
</p>
<p>
You can model the expected input as a <a href="https://en.wikipedia.org/wiki/Data_transfer_object">Data Transfer Object</a> (DTO):
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ReservationDto</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">string</span>? At { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">string</span>? Email { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">string</span>? Name { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Quantity { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
}</pre>
</p>
<p>
and even set up your 'protocol handlers' (here, an ASP.NET Core <a href="https://learn.microsoft.com/aspnet/core/mvc/controllers/actions">action method</a>) to use such a DTO:
</p>
<p>
<pre><span style="color:blue;">public</span> Task<ActionResult> <span style="font-weight:bold;color:#74531f;">Post</span>(ReservationDto <span style="font-weight:bold;color:#1f377f;">dto</span>)</pre>
</p>
<p>
While this may look statically typed, it assumes a particular protocol. What happens when the bytes on the wire don't follow the protocol?
</p>
<p>
Well, we've already been <a href="/2022/08/15/aspnet-validation-revisited">around that block</a> <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">more than once</a>.
</p>
<p>
The point is that there's always an implied protocol at the application boundary, and you can choose to model it more or less explicitly.
</p>
<h3 id="41f3b4ad7a4b4429bba3f619c2af55d1">
Types as short-hands for protocols <a href="#41f3b4ad7a4b4429bba3f619c2af55d1">#</a>
</h3>
<p>
In the above example, I've relied on <em>some</em> static typing to deal with the problem. After all, I did define a DTO to model the expected shape of input. I could have chosen other alternatives: Perhaps I could have used a JSON parser to explicitly <a href="https://learn.microsoft.com/dotnet/standard/serialization/system-text-json/use-dom">use the JSON DOM</a>, or even more low-level <a href="https://learn.microsoft.com/dotnet/standard/serialization/system-text-json/use-utf8jsonreader">used Utf8JsonReader</a>. Ultimately, I could have decided to write my own JSON parser.
</p>
<p>
I'd rarely (or never?) choose to implement a JSON parser from scratch, so that's not what I'm advocating. Rather, my point is that you can leverage existing APIs to deal with input and output, and some of those APIs offer a convincing illusion that what happens at the boundary is statically typed.
</p>
<p>
This illusion is partly API-specific, and partly language-specific. In .NET, for example, <code>JsonSerializer.Deserialize</code> <em>looks</em> like it'll always deserialize <em>any</em> JSON string into the desired model. Obviously, that's a lie, because the function will throw an exception if the operation is impossible (i.e. when the input is malformed). In .NET (and many other languages or platforms), you can't tell from an API's type what the failure modes might be. In contrast, aeson's <a href="https://hackage.haskell.org/package/aeson/docs/Data-Aeson.html#v:fromJSON">fromJSON</a> function returns a type that explicitly indicates that deserialization may fail. Even in Haskell, however, this is mostly an <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> convention, because Haskell also 'supports' exceptions.
</p>
<p>
At the boundary, a static type can be a useful shorthand for a protocol. You declare a static type (e.g. a DTO) and rely on built-in machinery to handle malformed input. You give up some fine-grained control in exchange for a more declarative model.
</p>
<p>
I often choose to do that because I find such a trade-off beneficial, but I'm under no illusion that my static types fully model what goes 'on the wire'.
</p>
<h3 id="792212e79feb46889e71a6c08dedb88e">
Reversed roles <a href="#792212e79feb46889e71a6c08dedb88e">#</a>
</h3>
<p>
So far, I've mostly discussed input validation. <a href="/2022/08/22/can-types-replace-validation">Can types replace validation?</a> No, but they can make most common validation scenarios easier. What happens when you return data?
</p>
<p>
You may decide to return a statically typed value. A serializer can faithfully convert such a value to a proper wire format (JSON, XML, or similar). The recipient may not care about that type. After all, you may return a Haskell value, but the system receiving the data is written in <a href="https://www.python.org/">Python</a>. Or you return a C# object, but the recipient is <a href="https://en.wikipedia.org/wiki/JavaScript">JavaScript</a>.
</p>
<p>
Should we conclude, then, that there's no reason to model return data with static types? Not at all, because by modelling output with static types, you are being <a href="https://en.wikipedia.org/wiki/Robustness_principle">conservative with what you send</a>. Since static types are typically more rigid than 'just code', there may be corner cases that a type can't easily express. While this may pose a problem when it comes to input, it's only a benefit when it comes to output. This means that you're <a href="/2021/11/29/postels-law-as-a-profunctor">narrowing the output funnel</a> and thus making your system easier to work with.
</p>
<p>
<img src="/content/binary/liberal-conservative-at-boundary.png" alt="Funnels labelled 'liberal' and 'conservative' to the left of an line indicating an application boundary.">
</p>
<p>
Now consider another role-reversal: When your application <em>initiates</em> an interaction, it starts by producing output and receives input as a result. This includes any database interaction. When you create, update, or delete a row in a database, you <em>send</em> data, and receive a response.
</p>
<p>
Should you not consider <a href="https://en.wikipedia.org/wiki/Robustness_principle">Postel's law</a> in that case?
</p>
<p>
<img src="/content/binary/conservative-liberal-at-boundary.png" alt="Funnels labelled 'conservative' and 'liberal' to the right of an line indicating an application boundary.">
</p>
<p>
Most people don't, particularly if they rely on <a href="https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping">object-relational mappers</a> (ORMs). After all, if you have a static type (class) that models a database row, what's the harm using that when updating the database?
</p>
<p>
Probably none. After all, based on what I've just written, using a static type is a good way to be conservative with what you send. Here's an example using <a href="https://en.wikipedia.org/wiki/Entity_Framework">Entity Framework</a>:
</p>
<p>
<pre><span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> RestaurantsContext(ConnectionString);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">dbReservation</span> = <span style="color:blue;">new</span> Reservation
{
PublicId = reservation.Id,
RestaurantId = restaurantId,
At = reservation.At,
Name = reservation.Name.ToString(),
Email = reservation.Email.ToString(),
Quantity = reservation.Quantity
};
<span style="color:blue;">await</span> db.Reservations.AddAsync(dbReservation);
<span style="color:blue;">await</span> db.SaveChangesAsync();</pre>
</p>
<p>
Here we send a statically typed <code>Reservation</code> 'Entity' to the database, and since we use a static type, we're being conservative with what we send. That's only good.
</p>
<p>
What happens when we query a database? Here's a typical example:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> Task<Restaurants.Reservation?> <span style="font-weight:bold;color:#74531f;">ReadReservation</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">restaurantId</span>, Guid <span style="font-weight:bold;color:#1f377f;">id</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> RestaurantsContext(ConnectionString);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">r</span> = <span style="color:blue;">await</span> db.Reservations.FirstOrDefaultAsync(<span style="font-weight:bold;color:#1f377f;">x</span> => x.PublicId == id);
<span style="font-weight:bold;color:#8f08c4;">if</span> (r <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> Restaurants.Reservation(
r.PublicId,
r.At,
<span style="color:blue;">new</span> Email(r.Email),
<span style="color:blue;">new</span> Name(r.Name),
r.Quantity);
}</pre>
</p>
<p>
Here I read a database row <code>r</code> and unquestioning translate it to my domain model. Should I do that? What if the database schema has diverged from my application code?
</p>
<p>
I suspect that much grief and trouble with relational databases, and particularly with ORMs, stem from the illusion that an ORM 'Entity' is a statically-typed view of the database schema. Typically, you can either use an ORM like Entity Framework in a code-first or a database-first fashion, but regardless of what you choose, you have two competing 'truths' about the database: The database schema and the Entity Classes.
</p>
<p>
You need to be disciplined to keep those two views in synch, and I'm not asserting that it's impossible. I'm only suggesting that it may pay to explicitly acknowledge that static types may not represent any truth about what's actually on the other side of the application boundary.
</p>
<h3 id="1ab46f8e48a74b94ad9aa92cce2d915f">
Types are an illusion <a href="#1ab46f8e48a74b94ad9aa92cce2d915f">#</a>
</h3>
<p>
Given that I usually find myself firmly in the static-types-are-great camp, it may seem odd that I now spend an entire article trashing them. Perhaps it looks as though I've had a revelation and made an about-face, but that's not the case. Rather, I'm fond of making the implicit explicit. This often helps improve understanding, because it helps delineate conceptual boundaries.
</p>
<p>
This, too, is the case here. <a href="https://en.wikipedia.org/wiki/All_models_are_wrong">All models are wrong, but some models are useful</a>. So are static types, I believe.
</p>
<p>
A static type system is a useful tool that enables you to model how your application should behave. The types don't really exist at run time. Even though .NET code (just to point out an example) compiles to <a href="https://en.wikipedia.org/wiki/Common_Intermediate_Language">a binary representation that includes type information</a>, once it runs, it <a href="https://en.wikipedia.org/wiki/Just-in-time_compilation">JITs</a> to machine code. In the end, it's just registers and memory addresses, or, if you want to be even more nihilistic, electrons moving around on a circuit board.
</p>
<p>
Even at a higher level of abstraction, you may say: <em>But at least, a static type system can help you encapsulate rules and assumptions.</em> In a language like C#, for example, consider a <a href="https://www.hillelwayne.com/post/constructive/">predicative type</a> like <a href="/2022/08/22/can-types-replace-validation">this NaturalNumber</a> class:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">struct</span> <span style="color:#2b91af;">NaturalNumber</span> : IEquatable<NaturalNumber>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">int</span> i;
<span style="color:blue;">public</span> <span style="color:#2b91af;">NaturalNumber</span>(<span style="color:blue;">int</span> candidate)
{
<span style="color:blue;">if</span> (candidate < 1)
<span style="color:blue;">throw</span> <span style="color:blue;">new</span> ArgumentOutOfRangeException(
nameof(candidate),
<span style="color:#a31515;">$"The value must be a positive (non-zero) number, but was: </span>{candidate}<span style="color:#a31515;">."</span>);
<span style="color:blue;">this</span>.i = candidate;
}
<span style="color:green;">// Various other members follow...</span></pre>
</p>
<p>
Such a class effectively protects the invariant that a <a href="https://en.wikipedia.org/wiki/Natural_number">natural number</a> is always a positive integer. Yes, that works well until someone does this:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">n</span> = (NaturalNumber)FormatterServices.GetUninitializedObject(<span style="color:blue;">typeof</span>(NaturalNumber));</pre>
</p>
<p>
This <code>n</code> value has the internal value <code>0</code>. Yes, <a href="https://learn.microsoft.com/dotnet/api/system.runtime.serialization.formatterservices.getuninitializedobject">FormatterServices.GetUninitializedObject</a> bypasses the constructor. This thing is evil, but it exists, and at least in the current discussion serves to illustrate the point that types are illusions.
</p>
<p>
This isn't just a flaw in C#. Other languages have similar backdoors. One of the most famously statically-typed languages, Haskell, comes with <a href="https://hackage.haskell.org/package/base/docs/System-IO-Unsafe.html#v:unsafePerformIO">unsafePerformIO</a>, which enables you to pretend that nothing untoward is going on even if you've written some impure code.
</p>
<p>
You may (and should) institute policies to not use such backdoors in your normal code bases. You don't need them.
</p>
<h3 id="de28c90a44e14b299f6eb30c09b08821">
Types are useful models <a href="#de28c90a44e14b299f6eb30c09b08821">#</a>
</h3>
<p>
All this may seem like an argument that types are useless. That would, however, be to draw the wrong conclusion. Types don't exist at run time to the same degree that Python objects or JavaScript functions don't exist at run time. Any language (except <a href="https://en.wikipedia.org/wiki/Assembly_language">assembler</a>) is an abstraction: A way to model computer instructions so that programming becomes easier (one would hope, <a href="/2023/09/11/a-first-stab-at-the-brainfuck-kata">but then...</a>). This is true even for C, as low-level and detail-oriented as it may seem.
</p>
<p>
If you grant that high-level programming languages (i.e. any language that is <em>not</em> machine code or assembler) are useful, you must also grant that you can't rule out the usefulness of types. Notice that this argument is one of logic, rather than of preference. The only claim I make here is that programming is based on useful illusions. That the abstractions are illusions don't prevent them from being useful.
</p>
<p>
In statically typed languages, we effectively need to pretend that the type system is good enough, strong enough, generally trustworthy enough that it's safe to ignore the underlying reality. We work with, if you will, a provisional truth that serves as a user interface to the computer.
</p>
<p>
Even though a computer program eventually executes on a processor where types don't exist, a good compiler can still check that our models look sensible. We say that it <em>type-checks</em>. I find that indispensable when modelling the internal behaviour of a program. Even in a large code base, a compiler can type-check whether all the various components look like they may compose correctly. That a program compiles is no guarantee that it works correctly, but if it doesn't type-check, it's strong evidence that the code's model is <em>internally</em> inconsistent.
</p>
<p>
In other words, that a statically-typed program type-checks is a necessary, but not a sufficient condition for it to work.
</p>
<p>
This holds as long as we're considering program internals. Some language platforms allow us to take this notion further, because we can link software components together and still type-check them. The .NET platform is a good example of this, since the IL code retains type information. This means that the C#, F#, or <a href="https://en.wikipedia.org/wiki/Visual_Basic_(.NET)">Visual Basic .NET</a> compiler can type-check your code against the APIs exposed by external libraries.
</p>
<p>
On the other hand, you can't extend that line of reasoning to the boundary of an application. What happens at the boundary is ultimately untyped.
</p>
<p>
Are types useless at the boundary, then? Not at all. <a href="https://lexi-lambda.github.io/blog/2020/01/19/no-dynamic-type-systems-are-not-inherently-more-open/">Alexis King has already dealt with this topic better than I could</a>, but the point is that types remain an effective way to capture the result of <a href="https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/">parsing input</a>. You can view receiving, handling, parsing, or validating input as implementing a protocol, as I've already discussed above. Such protocols are application-specific or domain-specific rather than general-purpose protocols, but they are still protocols.
</p>
<p>
When I decide to write <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">input validation for my restaurant sample code base as a set of composable parsers</a>, I'm implementing a protocol. My starting point isn't raw bits, but rather a loose static type: A DTO. In other cases, I may decide to use a different level of abstraction.
</p>
<p>
One of the (many) reasons I have for <a href="/2023/09/18/do-orms-reduce-the-need-for-mapping">finding ORMs unhelpful</a> is exactly because they insist on an illusion past its usefulness. Rather, I prefer implementing the protocol that talks to my database with a lower-level API, such as ADO.NET:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> Reservation <span style="color:#74531f;">ReadReservationRow</span>(SqlDataReader <span style="font-weight:bold;color:#1f377f;">rdr</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> Reservation(
(Guid)rdr[<span style="color:#a31515;">"PublicId"</span>],
(DateTime)rdr[<span style="color:#a31515;">"At"</span>],
<span style="color:blue;">new</span> Email((<span style="color:blue;">string</span>)rdr[<span style="color:#a31515;">"Email"</span>]),
<span style="color:blue;">new</span> Name((<span style="color:blue;">string</span>)rdr[<span style="color:#a31515;">"Name"</span>]),
<span style="color:blue;">new</span> NaturalNumber((<span style="color:blue;">int</span>)rdr[<span style="color:#a31515;">"Quantity"</span>]));
}</pre>
</p>
<p>
This actually isn't a particular good protocol implementation, because it fails to take Postel's law into account. Really, this code should be a <a href="https://martinfowler.com/bliki/TolerantReader.html">Tolerant Reader</a>. In practice, not that much input contravariance is possible, but perhaps, at least, this code ought to gracefully handle if the <code>Name</code> field was missing.
</p>
<p>
The point of this particular example isn't that it's perfect, because it's not, but rather that it's possible to drop down to a lower level of abstraction, and sometimes, this may be a more honest representation of reality.
</p>
<h3 id="ce2a1d57f63e4f39a28e801fd23164cf">
Conclusion <a href="#ce2a1d57f63e4f39a28e801fd23164cf">#</a>
</h3>
<p>
It may be helpful to acknowledge that static types don't really exist. Even so, internally in a code base, a static type system can be a powerful tool. A good type system enables a compiler to check whether various parts of your code looks internally consistent. Are you calling a procedure with the correct arguments? Have you implemented all methods defined by an interface? Have you handled all cases defined by a <a href="https://en.wikipedia.org/wiki/Tagged_union">sum type</a>? Have you correctly initialized an object?
</p>
<p>
As useful type systems are for this kind of work, you should also be aware of their limitations. A compiler can check whether a code base's internal model makes sense, but it can't verify what happens at run time.
</p>
<p>
As long as one part of your code base sends data to another part of your code base, your type system can still perform a helpful sanity check, but for data that enters (or leaves) your application at run time, bets are off. You may attempt to model what input <em>should</em> look like, and it may even be useful to do that, but it's important to acknowledge that reality may not look like your model.
</p>
<p>
You can write statically-typed, composable parsers. Some of them are quite elegant, but the good ones explicitly model that parsing of input is error-prone. When input is well-formed, the result may be a nicely <a href="/2022/10/24/encapsulation-in-functional-programming">encapsulated</a>, statically-typed value, but when it's malformed, the result is one or more error values.
</p>
<p>
Perhaps the most important message is that databases, other web services, file systems, etc. involve input and output, too. Even if <em>you</em> write code that initiates a database query, or a web service request, should you implicitly trust the data that comes back?
</p>
<p>
This question of trust doesn't have to imply security concerns. Rather, systems evolve and errors happen. Every time you interact with an external system, there's a risk that it has become misaligned with yours. Static types can't protect you against that.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.What's a sandwich?https://blog.ploeh.dk/2023/10/09/whats-a-sandwich2023-10-09T20:20:00+00:00Mark Seemann
<div id="post">
<p>
<em>Ultimately, it's more about programming than food.</em>
</p>
<p>
The <a href="https://en.wikipedia.org/wiki/Sandwich">Sandwich</a> was named after <a href="https://en.wikipedia.org/wiki/John_Montagu,_4th_Earl_of_Sandwich">John Montagu, 4th Earl of Sandwich</a> because of his fondness for this kind of food. As popular story has it, he found it practical because it enabled him to eat without greasing the cards he often played.
</p>
<p>
A few years ago, a corner of the internet erupted in good-natured discussion about exactly what constitutes a sandwich. For instance, is the Danish <a href="https://en.wikipedia.org/wiki/Sm%C3%B8rrebr%C3%B8d">smørrebrød</a> a sandwich? It comes in two incarnations: <em>Højtbelagt</em>, the luxury version which is only consumable with knife and fork and the more modest, everyday <em>håndmad</em> (literally <em>hand food</em>), which, while open-faced, can usually be consumed without cutlery.
</p>
<p>
<img src="/content/binary/bjoernekaelderen-hoejtbelagt.jpg" alt="A picture of elaborate Danish smørrebrød.">
</p>
<p>
If we consider the 4th Earl of Sandwich's motivation as a yardstick, then the depicted <em>højtbelagte smørrebrød</em> is hardly a sandwich, while I believe a case can be made that a <em>håndmad</em> is:
</p>
<p>
<img src="/content/binary/haandmadder.jpg" alt="Two håndmadder a half of a sliced apple.">
</p>
<p>
Obviously, you need a different grip on a <em>håndmad</em> than on a sandwich. The bread (<em>rugbrød</em>) is much denser than wheat bread, and structurally more rigid. You eat it with your thumb and index finger on each side, and remaining fingers supporting it from below. The bottom line is this: A single piece of bread with something on top can also solve the original problem.
</p>
<p>
What if we go in the other direction? How about a combo consisting of bread, meat, bread, meat, and bread? I believe that I've seen burgers like that. Can you eat that with one hand? I think that this depends more on how greasy and overfilled it is, than on the structure.
</p>
<p>
What if you had five layers of meat and six layers of bread? This is unlikely to work with traditional Western leavened bread which, being a foam, will lose structural integrity when cut too thin. Imagining other kinds of bread, though, and thin slices of meat (or other 'content'), I don't see why it couldn't work.
</p>
<h3 id="00d495b0703a45a98f36607e99799c62">
FP sandwiches <a href="#00d495b0703a45a98f36607e99799c62">#</a>
</h3>
<p>
As regular readers may have picked up over the years, I do like food, but this is, after all, a programming blog.
</p>
<p>
A few years ago I presented a functional-programming design pattern named <a href="/2020/03/02/impureim-sandwich">Impureim sandwich</a>. It argues that it's often beneficial to structure a code base according to the <a href="https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell">functional core, imperative shell</a> architecture.
</p>
<p>
The idea, in a nutshell, is that at every entry point (<code>Main</code> method, message handler, Controller action, etcetera) you first perform all impure actions necessary to collect input data for a <a href="https://en.wikipedia.org/wiki/Pure_function">pure function</a>, then you call that pure function (which may be composed by many smaller functions), and finally you perform one or more impure actions based on the function's return value. That's the <a href="/2020/03/02/impureim-sandwich">impure-pure-impure sandwich</a>.
</p>
<p>
My experience with this pattern is that it's surprisingly often possible to apply it. Not always, but more often than you think.
</p>
<p>
Sometimes, however, it demands a looser interpretation of the word <em>sandwich</em>.
</p>
<p>
Even the examples from <a href="/2020/03/02/impureim-sandwich">the article</a> aren't standard sandwiches, once you dissect them. Consider, first, the <a href="https://www.haskell.org/">Haskell</a> example, here recoloured:
</p>
<p>
<pre><span style="color:#600277;">tryAcceptComposition</span> :: <span style="color:blue;">Reservation</span> <span style="color:blue;">-></span> IO (Maybe Int)
tryAcceptComposition reservation <span style="color:#666666;">=</span> runMaybeT <span style="color:#666666;">$</span>
<span style="background-color: lightsalmon;"> liftIO (<span style="color:#dd0000;">DB</span><span style="color:#666666;">.</span>readReservations connectionString</span><span style="background-color: palegreen;"> <span style="color:#666666;">$</span> date reservation</span><span style="background-color: lightsalmon;">)</span>
<span style="background-color: palegreen;"> <span style="color:#666666;">>>=</span> <span style="color:#dd0000;">MaybeT</span> <span style="color:#666666;">.</span> return <span style="color:#666666;">.</span> flip (tryAccept <span style="color:#09885a;">10</span>) reservation</span>
<span style="background-color: lightsalmon;"> <span style="color:#666666;">>>=</span> liftIO <span style="color:#666666;">.</span> <span style="color:#dd0000;">DB</span><span style="color:#666666;">.</span>createReservation connectionString</span></pre>
</p>
<p>
The <code>date</code> function is a pure accessor that retrieves the date and time of the <code>reservation</code>. In C#, it's typically a read-only property:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> <span style="color:#2b91af;">Task</span><<span style="color:#2b91af;">IActionResult</span>> Post(<span style="color:#2b91af;">Reservation</span> reservation)
{
<span style="background-color: lightsalmon;"> <span style="color:blue;">return</span> <span style="color:blue;">await</span> Repository.ReadReservations(</span><span style="background-color: palegreen;">reservation.Date</span><span style="background-color: lightsalmon;">)</span>
<span style="background-color: palegreen;"> .Select(rs => maîtreD.TryAccept(rs, reservation))</span>
<span style="background-color: lightsalmon;"> .SelectMany(m => m.Traverse(Repository.Create))
.Match(InternalServerError(<span style="color:#a31515;">"Table unavailable"</span>), Ok);</span>
}</pre>
</p>
<p>
Perhaps you don't think of a C# property as a function. After all, it's just an idiomatic grouping of language keywords:
</p>
<p>
<pre><span style="color:blue;">public</span> DateTimeOffset Date { <span style="color:blue;">get</span>; }</pre>
</p>
<p>
Besides, a function takes input and returns output. What's the input in this case?
</p>
<p>
Keep in mind that a C# read-only property like this is only syntactic sugar for a getter method. In Java it would have been a method called <code>getDate()</code>. From <a href="/2018/01/22/function-isomorphisms">Function isomorphisms</a> we know that an instance method is isomorphic to a function that takes the object as input:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> DateTimeOffset GetDate(Reservation reservation)</pre>
</p>
<p>
In other words, the <code>Date</code> property is an operation that takes the object itself as input and returns <code>DateTimeOffset</code> as output. The operation has no side effects, and will always return the same output for the same input. In other words, it's a pure function, and that's the reason I've now coloured it green in the above code examples.
</p>
<p>
The layering indicated by the examples may, however, be deceiving. The green colour of <code>reservation.Date</code> is adjacent to the green colour of the <code>Select</code> expression below it. You might interpret this as though the pure middle part of the sandwich partially expands to the upper impure phase.
</p>
<p>
That's not the case. The <code>reservation.Date</code> expression executes <em>before</em> <code>Repository.ReadReservations</code>, and only then does the pure <code>Select</code> expression execute. Perhaps this, then, is a more honest depiction of the sandwich:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> Task<IActionResult> Post(Reservation reservation)
{
<span style="background-color: palegreen;"> <span style="color:blue;">var</span> date = reservation.Date;</span>
<span style="background-color: lightsalmon;"> <span style="color:blue;">return</span> <span style="color:blue;">await</span> Repository.ReadReservations(date)</span>
<span style="background-color: palegreen;"> .Select(rs => maîtreD.TryAccept(rs, reservation))</span>
<span style="background-color: lightsalmon;"> .SelectMany(m => m.Traverse(Repository.Create))
.Match(InternalServerError(<span style="color:#a31515;">"Table unavailable"</span>), Ok);</span>
}</pre>
</p>
<p>
The corresponding 'sandwich diagram' looks like this:
</p>
<p>
<img src="/content/binary/pure-impure-pure-impure-box.png" alt="A box with green, red, green, and red horizontal tiers.">
</p>
<p>
If you want to interpret the word <em>sandwich</em> narrowly, this is no longer a sandwich, since there's 'content' on top. That's the reason I started this article discussing Danish <em>smørrebrød</em>, also sometimes called <em>open-faced sandwiches</em>. Granted, I've never seen a <em>håndmad</em> with two slices of bread with meat both between and on top. On the other hand, I don't think that having a smidgen of 'content' on top is a showstopper.
</p>
<h3 id="c3a4d1243ee540af95571141c0dd500e">
Initial and eventual purity <a href="#c3a4d1243ee540af95571141c0dd500e">#</a>
</h3>
<p>
Why is this important? Whether or not <code>reservation.Date</code> is a little light of purity in the otherwise impure first slice of the sandwich actually doesn't concern me that much. After all, my concern is mostly cognitive load, and there's hardly much gained by extracting the <code>reservation.Date</code> expression to a separate line, as I did above.
</p>
<p>
The reason this interests me is that in many cases, the first step you may take is to validate input, and <a href="/2023/06/26/validation-and-business-rules">validation is a composed set of pure functions</a>. While pure, and <a href="/2020/12/14/validation-a-solved-problem">a solved problem</a>, validation may be a sufficiently significant step that it warrants explicit acknowledgement. It's not just a property getter, but complex enough that bugs could hide there.
</p>
<p>
Even if you follow the <em>functional core, imperative shell</em> architecture, you'll often find that the first step is pure validation.
</p>
<p>
Likewise, once you've performed impure actions in the second impure phase, you can easily have a final thin pure translation slice. In fact, the above C# example contains an example of just that:
</p>
<p>
<pre><span style="color:blue;">public</span> IActionResult Ok(<span style="color:blue;">int</span> value)
{
<span style="color:blue;">return</span> <span style="color:blue;">new</span> OkActionResult(value);
}
<span style="color:blue;">public</span> IActionResult InternalServerError(<span style="color:blue;">string</span> msg)
{
<span style="color:blue;">return</span> <span style="color:blue;">new</span> InternalServerErrorActionResult(msg);
}</pre>
</p>
<p>
These are two tiny pure functions used as the final translation in the sandwich:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> Task<IActionResult> Post(Reservation reservation)
{
<span style="background-color: palegreen;"> <span style="color:blue;">var</span> date = reservation.Date;</span>
<span style="background-color: lightsalmon;"> <span style="color:blue;">return</span> <span style="color:blue;">await</span> Repository.ReadReservations(date)</span>
<span style="background-color: palegreen;"> .Select(rs => maîtreD.TryAccept(rs, reservation))</span>
<span style="background-color: lightsalmon;"> .SelectMany(m => m.Traverse(Repository.Create))
.Match(</span><span style="background-color: palegreen;">InternalServerError(<span style="color:#a31515;">"Table unavailable"</span>), Ok</span><span style="background-color: lightsalmon;">);</span></span>
}</pre>
</p>
<p>
On the other hand, I didn't want to paint the <code>Match</code> operation green, since it's essentially a continuation of a <a href="https://learn.microsoft.com/dotnet/api/system.threading.tasks.task-1">Task</a>, and if we consider <a href="/2020/07/27/task-asynchronous-programming-as-an-io-surrogate">task asynchronous programming as an IO surrogate</a>, we should, at least, regard it with scepticism. While it might be pure, it probably isn't.
</p>
<p>
Still, we may be left with an inverted 'sandwich' that looks like this:
</p>
<p>
<img src="/content/binary/pure-impure-pure-impure-pure-box.png" alt="A box with green, red, green, red, and green horizontal tiers.">
</p>
<p>
Can we still claim that this is a sandwich?
</p>
<h3 id="4d14e6795066473e95d1e5cdbcef6c2d">
At the metaphor's limits <a href="#4d14e6795066473e95d1e5cdbcef6c2d">#</a>
</h3>
<p>
This latest development seems to strain the sandwich metaphor. Can we maintain it, or does it fall apart?
</p>
<p>
What seems clear to me, at least, is that this ought to be the limit of how much we can stretch the allegory. If we add more tiers we get a <a href="https://en.wikipedia.org/wiki/Dagwood_sandwich">Dagwood sandwich</a> which is clearly a gimmick of little practicality.
</p>
<p>
But again, I'm appealing to a dubious metaphor, so instead, let's analyse what's going on.
</p>
<p>
In practice, it seems that you can rarely avoid the initial (pure) validation step. Why not? Couldn't you move validation to the functional core and do the impure steps without validation?
</p>
<p>
The short answer is <em>no</em>, because <a href="https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/">validation done right is actually parsing</a>. At the entry point, you don't even know if the input makes sense.
</p>
<p>
A more realistic example is warranted, so I now turn to the example code base from my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>. One blog post shows <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">how to implement applicative validation for posting a reservation</a>.
</p>
<p>
A typical HTTP <code>POST</code> may include a JSON document like this:
</p>
<p>
<pre>{
<span style="color:#2e75b6;">"id"</span>: <span style="color:#a31515;">"bf4e84130dac451b9c94049da8ea8c17"</span>,
<span style="color:#2e75b6;">"at"</span>: <span style="color:#a31515;">"2024-11-07T20:30"</span>,
<span style="color:#2e75b6;">"email"</span>: <span style="color:#a31515;">"snomob@example.com"</span>,
<span style="color:#2e75b6;">"name"</span>: <span style="color:#a31515;">"Snow Moe Beal"</span>,
<span style="color:#2e75b6;">"quantity"</span>: 1
}</pre>
</p>
<p>
In order to handle even such a simple request, the system has to perform a set of impure actions. One of them is to query its data store for existing reservations. After all, the restaurant may not have any remaining tables for that day.
</p>
<p>
Which day, you ask? I'm glad you asked. The data access API comes with this method:
</p>
<p>
<pre>Task<IReadOnlyCollection<Reservation>> ReadReservations(
<span style="color:blue;">int</span> restaurantId, DateTime min, DateTime max);</pre>
</p>
<p>
You can supply <code>min</code> and <code>max</code> values to indicate the range of dates you need. How do you determine that range? You need the desired date of the reservation. In the above example it's 20:30 on November 7 2024. We're in luck, the data is there, and understandable.
</p>
<p>
Notice, however, that due to limitations of wire formats such as JSON, the date is a string. The value might be anything. If it's sufficiently malformed, you can't even perform the impure action of querying the database, because you don't know what to query it about.
</p>
<p>
If keeping the sandwich metaphor untarnished, you might decide to push the parsing responsibility to an impure action, but why make something impure that has a well-known pure solution?
</p>
<p>
A similar argument applies when performing a final, pure translation step in the other direction.
</p>
<p>
So it seems that we're stuck with implementations that don't quite fit the ideal of the sandwich metaphor. Is that enough to abandon the metaphor, or should we keep it?
</p>
<p>
The layers in layered application architecture aren't really layers, and neither are vertical slices really slices. <a href="https://en.wikipedia.org/wiki/All_models_are_wrong">All models are wrong, but some are useful</a>. This is the case here, I believe. You should still keep the <a href="/2020/03/02/impureim-sandwich">Impureim sandwich</a> in mind when structuring code: Keep impure actions at the application boundary - in the 'Controllers', if you will; have only two phases of impurity - the initial and the ultimate; and maximise use of pure functions for everything else. Keep most of the pure execution between the two impure phases, but realistically, you're going to need a pure validation phase in front, and a slim translation layer at the end.
</p>
<h3 id="5b191dfc434149bab1d9d6bea029a4d4">
Conclusion <a href="#5b191dfc434149bab1d9d6bea029a4d4">#</a>
</h3>
<p>
Despite the prevalence of food imagery, this article about functional programming architecture has eluded any mention of <a href="https://byorgey.wordpress.com/2009/01/12/abstraction-intuition-and-the-monad-tutorial-fallacy/">burritos</a>. Instead, it examines the tension between an ideal, the <a href="/2020/03/02/impureim-sandwich">Impureim sandwich</a>, with real-world implementation details. When you have to deal with concerns such as input validation or translation to egress data, it's practical to add one or two more thin slices of purity.
</p>
<p>
In <a href="/2018/11/19/functional-architecture-a-definition">functional architecture</a> you want to maximise the proportion of pure functions. Adding more pure code is hardly a problem.
</p>
<p>
The opposite is not the case. We shouldn't be cavalier about adding more impure slices to the sandwich. Thus, the adjusted definition of the Impureim sandwich seems to be that it may have at most two impure phases, but from one to three pure slices.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="aa4031dbe9a7467ba087c2731596f420">
<div class="comment-author">qfilip <a href="#aa4031dbe9a7467ba087c2731596f420">#</a></div>
<div class="comment-content">
<p>
Hello again...
</p>
<p>
In one of your excellent talks (<a href="https://youtu.be/F9bznonKc64?feature=shared&t=3392">here</a>), you ended up refactoring maitreD kata using the <pre>traverse</pre> function. Since this step is crucial for "sandwich" to work, any post detailing it's implementation would be nice.
</p>
<p>
Thanks
</p>
</div>
<div class="comment-date">2023-11-16 10:56 UTC</div>
</div>
<div class="comment" id="7ea7f0f5f3a24a939be3a1cb5b23e2f5">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#7ea7f0f5f3a24a939be3a1cb5b23e2f5">#</a></div>
<div class="comment-content">
<p>
qfilip, thank you for writing. That particular talk fortunately comes with a set of companion articles:
</p>
<ul>
<li><a href="/2019/02/04/how-to-get-the-value-out-of-the-monad">How to get the value out of the monad</a></li>
<li><a href="/2019/02/11/asynchronous-injection">Asynchronous Injection</a></li>
</ul>
<p>
The latter of the two comes with a link to <a href="https://github.com/ploeh/asynchronous-injection">a GitHub repository with all the sample code</a>, including the <code>Traverse</code> implementation.
</p>
<p>
That said, a more formal description of traversals has long been on my to-do list, as you can infer from <a href="/2022/07/11/functor-relationships">this (currently inactive) table of contents</a>.
</p>
</div>
<div class="comment-date">2023-11-16 11:18 UTC</div>
</div>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Dependency Whac-A-Molehttps://blog.ploeh.dk/2023/10/02/dependency-whac-a-mole2023-10-02T07:52:00+00:00Mark Seemann
<div id="post">
<p>
<em>AKA Framework Whac-A-Mole, Library Whac-A-Mole.</em>
</p>
<p>
I have now three times used the name <a href="https://en.wikipedia.org/wiki/Whac-A-Mole">Whac-A-Mole</a> about a particular kind of relationship that may evolve with some dependencies. According to the <a href="https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)">rule of three</a>, I can now extract the explanation to a separate article. This is that article.
</p>
<h3 id="f9a98473c3ed40eda1f6288eec631795">
Architecture smell <a href="#f9a98473c3ed40eda1f6288eec631795">#</a>
</h3>
<p>
<em>Dependency Whac-A-Mole</em> describes the situation when you're spending too much time investigating, learning, troubleshooting, and overall satisfying the needs of a dependency (i.e. library or framework) instead of delivering value to users.
</p>
<p>
Examples include Dependency Injection containers, <a href="https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping">object-relational mappers</a>, validation frameworks, dynamic mock libraries, and perhaps the Gherkin language.
</p>
<p>
From the above list it does <em>not</em> follow that those examples are universally bad. I can think of situations where some of them make sense. I might even use them myself.
</p>
<p>
Rather, the Dependency Whac-A-Mole architecture smell occurs when a given dependency causes more trouble than the benefit it was supposed to provide.
</p>
<h3 id="9ae83d04788d4d4c9582ba02aa11b19b">
Causes <a href="#9ae83d04788d4d4c9582ba02aa11b19b">#</a>
</h3>
<p>
We rarely set out to do the wrong thing, but we often make mistakes in good faith. You may decide to take a dependency on a library or framework because
</p>
<ul>
<li>it worked well for you in a previous context</li>
<li>it looks as though it'll address a major problem you had in a previous context</li>
<li>you've heard good things about it</li>
<li>you saw a convincing demo</li>
<li>you heard about it in a podcast, conference talk, YouTube video, etc.</li>
<li>a FAANG company uses it</li>
<li>it's the latest tech</li>
<li>you want it on your CV</li>
</ul>
<p>
There could be other motivations as well, and granted, some of those I listed aren't really <em>good</em> reasons. Even so, I don't think anyone chooses a dependency with ill intent.
</p>
<p>
And what might work in one context may turn out to not work in another. You can't always predict such consequences, so I imply no judgement on those who choose the 'wrong' dependency. I've done it, too.
</p>
<p>
It is, however, important to be aware that this risk is always there. You picked a library with the best of intentions, but it turns out to slow you down. If so, acknowledge the mistake and kill your darlings.
</p>
<h3 id="02aa21e2bdc645f1b769c5a8412323f9">
Background <a href="#02aa21e2bdc645f1b769c5a8412323f9">#</a>
</h3>
<p>
Whenever you use a library or framework, you need to learn how to use it effectively. You have to learn its concepts, abstractions, APIs, pitfalls, etc. Not only that, but you need to stay abreast of changes and improvements.
</p>
<p>
Microsoft, for example, is usually good at maintaining backwards compatibility, but even so, things don't stand still. They evolve libraries and frameworks the same way I would do it: Don't introduce breaking changes, but do introduce new, better APIs going forward. This is essentially the <a href=https://martinfowler.com/bliki/StranglerFigApplication.html>Strangler pattern</a> that I also write about in <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>.
</p>
<p>
While it's a good way to evolve a library or framework, the point remains: Even if you trust a supplier to prioritise backwards compatibility, it doesn't mean that you can stop learning. You have to stay up to date with all your dependencies. If you don't, sooner or later, the way that you use something like, say, <a href="https://en.wikipedia.org/wiki/Entity_Framework">Entity Framework</a> is 'the old way', and it's not really supported any longer.
</p>
<p>
In order to be able to move forward, you'll have to rewrite those parts of your code that depend on that old way of doing things.
</p>
<p>
Each dependency comes with benefits and costs. As long as the benefits outweigh the costs, it makes sense to keep it around. If, on the other hand, you spend more time dealing with it than it would take you to do the work yourself, consider getting rid of it.
</p>
<h3 id="439ea4466014446a9ddfc2e264c86fba">
Symptoms <a href="#439ea4466014446a9ddfc2e264c86fba">#</a>
</h3>
<p>
Perhaps the infamous <em>left-pad</em> incident is too easy an example, but it does highlight the essence of this tension. Do you really need a third-party package to pad a string, or could you have done it yourself?
</p>
<p>
You can spend much time figuring out how to fit a general-purpose library or framework to your particular needs. How do you make your object-relational mapper (ORM) fit a special database schema? How do you annotate a class so that it produces validation messages according to the requirements in your jurisdiction? How do you configure an automatic mapping library so that it correctly projects data? How do you tell a Dependency Injection (DI) Container how to compose a <a href="https://en.wikipedia.org/wiki/Chain-of-responsibility_pattern">Chain of Responsibility</a> where some objects also take strings or integers in their constructors?
</p>
<p>
Do such libraries or frameworks save time, or could you have written the corresponding code quicker? To be clear, I'm not talking about writing your own ORM, your own DI Container, your own auto-mapper. Rather, instead of using a DI Container, <a href="/2014/06/10/pure-di">Pure DI</a> is likely easier. As an alternative to an ORM, what's the cost of just writing <a href="https://en.wikipedia.org/wiki/SQL">SQL</a>? Instead of an <a href="https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule">ad-hoc, informally-specified, bug-ridden</a> validation framework, have you considered <a href="/2018/11/05/applicative-validation">applicative validation</a>?
</p>
<p>
Things become really insidious if your chosen library never really solves all problems. Every time you figure out how to use it for one exotic corner case, your 'solution' causes a new problem to arise.
</p>
<p>
A symptom of <em>Dependency Whac-A-Mole</em> is when you have to advertise after people skilled in a particular technology.
</p>
<p>
Again, it's not necessarily a problem. If you're getting tremendous value out of, say, Entity Framework, it makes sense to list expertise as a job requirement. If, on the other hand, you have to list a litany of libraries and frameworks as necessary skills, it might pay to stop and reconsider. You can call it your 'tech stack' all you will, but is it really an inadvertent case of <a href="https://en.wikipedia.org/wiki/Vendor_lock-in">vendor lock-in</a>?
</p>
<h3 id="381db0b94f094be2be2b95841e248669">
Anecdotal evidence <a href="#381db0b94f094be2be2b95841e248669">#</a>
</h3>
<p>
I've used the term <em>Whac-A-Mole</em> a couple of times to describe the kind of situation where you feel that you're fighting a technology more than it's helping you. It seems to resonate with other people than me.
</p>
<p>
Here are the original articles where I used the term:
</p>
<ul>
<li><a href="/2022/08/15/aspnet-validation-revisited">ASP.NET validation revisited</a></li>
<li><a href="/2022/08/22/can-types-replace-validation">Can types replace validation?</a></li>
<li><a href="/2023/09/18/do-orms-reduce-the-need-for-mapping">Do ORMs reduce the need for mapping?</a></li>
</ul>
<p>
These are only the articles where I explicitly use the term. I do, however, think that the phenomenon is more common. I'm particularly sensitive to it when it comes to Dependency Injection, where I generally believe that DI Containers make the technique harder that it has to be. Composing object graphs is easily done with code.
</p>
<h3 id="2ef657b607cd49408ced7110e28e2321">
Conclusion <a href="#2ef657b607cd49408ced7110e28e2321">#</a>
</h3>
<p>
Sometimes a framework or library makes it more difficult to get things done. You spend much time kowtowing to its needs, researching how to do things 'the xyz way', learning its intricate extensibility points, keeping up to date with its evolving API, and engaging with its community to lobby for new features.
</p>
<p>
Still, you feel that it makes you compromise. You might have liked to organise your code in a different way, but unfortunately you can't, because it doesn't fit the way the dependency works. As you solve issues with it, new ones appear.
</p>
<p>
These are symptoms of <em>Dependency Whac-A-Mole</em>, an architecture smell that indicates that you're using the wrong tool for the job. If so, get rid of the dependency in favour of something better. Often, the better alternative is just plain vanilla code.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="9235995516070545f7cc3ee83d37023d">
<div class="comment-author"><a href="https://github.com/thomaslevesque">Thomas Levesque</a> <a href="#9235995516070545f7cc3ee83d37023d">#</a></div>
<div class="comment-content">
<p>
The most obvious example of this for me is definitely AutoMapper. I used to think it was great and saved so much time, but more often than not,
the mapping configuration ended up being more complex (and fragile) than just mapping the properties manually.
</p>
</div>
<div class="comment-date">2023-10-02 13:27 UTC</div>
</div>
<div class="comment" id="93b32bb03ee14d298b0d9b7cf65ddcae">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#93b32bb03ee14d298b0d9b7cf65ddcae">#</a></div>
<div class="comment-content">
<p>
I could imagine. AutoMapper is not, however, a library I've used enough to evaluate.
</p>
</div>
<div class="comment-date">2023-10-02 13:58 UTC</div>
</div>
<div class="comment" id="3e81ff9e535743148d8898e84ff69595">
<div class="comment-author"><a href="https://blog.oakular.xyz">Callum Warrilow</a> <a href="#3e81ff9e535743148d8898e84ff69595">#</a></div>
<div class="comment-content">
<p>
The moment I lost any faith in AutoMapper was after trying to debug a mapping that was silently failing on a single property.
Three of us were looking at it for a good amount of time before one of us noticed a single character typo on the destination property.
As the names did not match, no mapping occurred. It is unfortunately a black box, and obfuscated a problem that a manual mapping would have handled gracefully.
<hr />
Mark, it is interesting that you mention Gherkin as potentially one of these moles. It is something I've been evaluating in the hopes of making our tests more business focused,
but considering it again now, you can achieve a lot of what Gherkin offers with well defined namespaces, classes and methods in your test assemblies, something like:
<ul>
<li>Namespace: GivenSomePrecondition</li>
<li>TestClass: WhenCarryingOutAnAction</li>
<li>TestMethod: ThenTheExpectedPostConditionResults</li>
</ul>
To get away from playing Whac-a-Mole, it would seem to require changing the question being asked, from <i>what product do I need to solve this problem?</i>, to <i>what tools and patterns can do I have around me to solve this problem?</i>.
</p>
</div>
<div class="comment-date">2023-10-11 15:54 UTC</div>
</div>
<div class="comment" id="eef76159a60b4ee482238b1cd990ab94">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#eef76159a60b4ee482238b1cd990ab94">#</a></div>
<div class="comment-content">
<p>
Callum, I was expecting someone to comment on including Gherkin on the list.
</p>
<p>
I don't consider all my examples as universally problematic. Rather, they often pop up in contexts where people seem to be struggling with a concept or a piece of technology with no apparent benefit.
</p>
<p>
I'm sure that when <a href="https://dannorth.net/">Dan North</a> came up with the idea of BDD and Gherkin, he actually <em>used</em> it. When used in the way it was originally intended, I can see it providing value.
</p>
<p>
Apart from Dan himself, however, I'm not aware that I've ever met anyone who has used BDD and Gherkin in that way. On the contrary, I've had more than one discussion that went like this:
</p>
<p>
<em>Interlocutor:</em> "We use BDD and Gherkin. It's great! You should try it."
</p>
<p>
<em>Me:</em> "Why?"
</p>
<p>
<em>Interlocutor:</em> "It enables us to <em>organise</em> our tests."
</p>
<p>
<em>Me:</em> "Can't you do that with the <a href="https://wiki.c2.com/?ArrangeActAssert">AAA</a> pattern?"
</p>
<p>
<em>Interlocutor:</em> "..."
</p>
<p>
<em>Me:</em> "Do any non-programmers ever look at your tests?"
</p>
<p>
<em>Interlocutor:</em> "No..."
</p>
<p>
If only programmers look at the test code, then why impose an artificial constraint? <em>Given-when-then</em> is just <em>arrange-act-assert</em> with different names, but free of Gherkin and the tooling that typically comes with it, you're free to write test code that follows normal good coding practices.
</p>
<p>
(As an aside, yes: Sometimes <a href="https://www.dotnetrocks.com/?show=1542">constraints liberate</a>, but what I've seen of Gherkin-based test code, this doesn't seem to be one of those cases.)
</p>
<p>
Finally, to be quite clear, although I may be repeating myself: If you're using Gherkin to interact with non-programmers on a regular basis, it may be beneficial. I've just never been in that situation, or met anyone other than Dan North who have.
</p>
</div>
<div class="comment-date">2023-10-15 14:35 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.The case of the mysterious comparisonhttps://blog.ploeh.dk/2023/09/25/the-case-of-the-mysterious-comparison2023-09-25T05:58:00+00:00Mark Seemann
<div id="post">
<p>
<em>A ploeh mystery.</em>
</p>
<p>
I was <a href="/2023/09/18/do-orms-reduce-the-need-for-mapping">recently playing around</a> with the example code from my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, refactoring the <code>Table</code> class to use <a href="/2022/08/22/can-types-replace-validation">a predicative NaturalNumber wrapper</a> to represent a table's seating capacity.
</p>
<p>
Originally, the <code>Table</code> constructor and corresponding read-only data looked like this:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">bool</span> isStandard;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> Reservation[] reservations;
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Capacity { <span style="color:blue;">get</span>; }
<span style="color:blue;">private</span> <span style="color:#2b91af;">Table</span>(<span style="color:blue;">bool</span> <span style="font-weight:bold;color:#1f377f;">isStandard</span>, <span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">capacity</span>, <span style="color:blue;">params</span> Reservation[] <span style="font-weight:bold;color:#1f377f;">reservations</span>)
{
<span style="color:blue;">this</span>.isStandard = isStandard;
Capacity = capacity;
<span style="color:blue;">this</span>.reservations = reservations;
}</pre>
</p>
<p>
Since I wanted to show an example of how wrapper types can help make preconditions explicit, I changed it to this:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">bool</span> isStandard;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> Reservation[] reservations;
<span style="color:blue;">public</span> NaturalNumber Capacity { <span style="color:blue;">get</span>; }
<span style="color:blue;">private</span> <span style="color:#2b91af;">Table</span>(<span style="color:blue;">bool</span> <span style="font-weight:bold;color:#1f377f;">isStandard</span>, NaturalNumber <span style="font-weight:bold;color:#1f377f;">capacity</span>, <span style="color:blue;">params</span> Reservation[] <span style="font-weight:bold;color:#1f377f;">reservations</span>)
{
<span style="color:blue;">this</span>.isStandard = isStandard;
Capacity = capacity;
<span style="color:blue;">this</span>.reservations = reservations;
}</pre>
</p>
<p>
The only thing I changed was the type of <code>Capacity</code> and <code>capacity</code>.
</p>
<p>
As I did that, two tests failed.
</p>
<h3 id="5942663d531c41c491e2b79116008c5e">
Evidence <a href="#5942663d531c41c491e2b79116008c5e">#</a>
</h3>
<p>
Both tests failed in the same way, so I only show one of the failures:
</p>
<p>
<pre>Ploeh.Samples.Restaurants.RestApi.Tests.MaitreDScheduleTests.Schedule
Source: MaitreDScheduleTests.cs line 16
Duration: 340 ms
Message:
FsCheck.Xunit.PropertyFailedException :
Falsifiable, after 2 tests (0 shrinks) (StdGen (48558275,297233133)):
Original:
<null>
(Ploeh.Samples.Restaurants.RestApi.MaitreD,
[|Ploeh.Samples.Restaurants.RestApi.Reservation|])
---- System.InvalidOperationException : Failed to compare two elements in the array.
-------- System.ArgumentException : At least one object must implement IComparable.
Stack Trace:
----- Inner Stack Trace -----
GenericArraySortHelper`1.Sort(T[] keys, Int32 index, Int32 length, IComparer`1 comparer)
Array.Sort[T](T[] array, Int32 index, Int32 length, IComparer`1 comparer)
EnumerableSorter`2.QuickSort(Int32[] keys, Int32 lo, Int32 hi)
EnumerableSorter`1.Sort(TElement[] elements, Int32 count)
OrderedEnumerable`1.ToList()
Enumerable.ToList[TSource](IEnumerable`1 source)
<span style="color: red;">MaitreD.Allocate(IEnumerable`1 reservations)</span> line 91
<span style="color: red;"><>c__DisplayClass21_0.<Schedule>b__4(<>f__AnonymousType7`2 <>h__TransparentIdentifier1)</span> line 114
<>c__DisplayClass2_0`3.<CombineSelectors>b__0(TSource x)
SelectIPartitionIterator`2.GetCount(Boolean onlyIfCheap)
Enumerable.Count[TSource](IEnumerable`1 source)
<span style="color: red;">MaitreDScheduleTests.ScheduleImp(MaitreD sut, Reservation[] reservations)</span> line 31
<span style="color: red;"><>c.<Schedule>b__0_2(ValueTuple`2 t)</span> line 22
ForAll@15.Invoke(Value arg00)
Testable.evaluate[a,b](FSharpFunc`2 body, a a)
----- Inner Stack Trace -----
Comparer.Compare(Object a, Object b)
ObjectComparer`1.Compare(T x, T y)
EnumerableSorter`2.CompareAnyKeys(Int32 index1, Int32 index2)
ComparisonComparer`1.Compare(T x, T y)
ArraySortHelper`1.SwapIfGreater(T[] keys, Comparison`1 comparer, Int32 a, Int32 b)
ArraySortHelper`1.IntroSort(T[] keys, Int32 lo, Int32 hi, Int32 depthLimit, Comparison`1 comparer)
GenericArraySortHelper`1.Sort(T[] keys, Int32 index, Int32 length, IComparer`1 comparer)</pre>
</p>
<p>
The code highlighted with red is user code (i.e. my code). The rest comes from .NET or <a href="https://fscheck.github.io/FsCheck/">FsCheck</a>.
</p>
<p>
While a stack trace like that can look intimidating, I usually navigate to the top stack frame of my own code. As I reproduce my investigation, see if you can spot the problem before I did.
</p>
<h3 id="fded951b0b3a4cac848941153e84eaa6">
Understand before resolving <a href="#fded951b0b3a4cac848941153e84eaa6">#</a>
</h3>
<p>
Before starting the investigation proper, we might as well acknowledge what seems evident. I had a fully passing test suite, then I edited two lines of code, which caused the above error. The two nested exception messages contain obvious clues: <em>Failed to compare two elements in the array,</em> and <em>At least one object must implement IComparable</em>.
</p>
<p>
The only edit I made was to change an <code>int</code> to a <code>NaturalNumber</code>, and <code>NaturalNumber</code> didn't implement <code>IComparable</code>. It seems straightforward to just make <code>NaturalNumber</code> implement that interface and move on, and as it turns out, that <em>is</em> the solution.
</p>
<p>
As I describe in <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>, when troubleshooting, first seek to understand the problem. I've seen too many people go immediately into 'action mode' when faced with a problem. It's often a suboptimal strategy.
</p>
<p>
First, if the immediate solution turns out not to work, you can waste much time trashing, trying various 'fixes' without understanding the problem.
</p>
<p>
Second, even if the resolution is easy, as is the case here, if you don't understand the underlying cause and effect, you can easily build a <a href="https://en.wikipedia.org/wiki/Cargo_cult">cargo cult</a>-like 'understanding' of programming. This could become one such experience: <em>All wrapper types must implement <code>IComparable</code></em>, or some nonsense like that.
</p>
<p>
Unless people are getting hurt or you are bleeding money because of the error, seek first to understand, and only then fix the problem.
</p>
<h3 id="f881a7f048144dd2a2521e336675d052">
First clue <a href="#f881a7f048144dd2a2521e336675d052">#</a>
</h3>
<p>
The top user stack frame is the <code>Allocate</code> method:
</p>
<p>
<pre><span style="color:blue;">private</span> IEnumerable<Table> <span style="font-weight:bold;color:#74531f;">Allocate</span>(
IEnumerable<Reservation> <span style="font-weight:bold;color:#1f377f;">reservations</span>)
{
List<Table> <span style="font-weight:bold;color:#1f377f;">allocation</span> = Tables.ToList();
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (var <span style="font-weight:bold;color:#1f377f;">r</span> <span style="font-weight:bold;color:#8f08c4;">in</span> reservations)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">table</span> = allocation.Find(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Fits(r.Quantity));
<span style="font-weight:bold;color:#8f08c4;">if</span> (table <span style="color:blue;">is</span> { })
{
allocation.Remove(table);
allocation.Add(table.Reserve(r));
}
}
<span style="font-weight:bold;color:#8f08c4;">return</span> allocation;
}</pre>
</p>
<p>
The stack trace points to line 91, which is the first line of code; where it calls <code>Tables.ToList()</code>. This is also consistent with the stack trace, which indicates that the exception is thrown from <a href="https://learn.microsoft.com/dotnet/api/system.linq.enumerable.tolist">ToList</a>.
</p>
<p>
I am, however, not used to <code>ToList</code> throwing exceptions, so I admit that I was nonplussed. Why would <code>ToList</code> try to sort the input? It usually doesn't do that.
</p>
<p>
Now, I <em>did</em> notice the <code>OrderedEnumerable`1</code> on the stack frame above <code>Enumerable.ToList</code>, but this early in the investigation, I failed to connect the dots.
</p>
<p>
What does the caller look like? It's that scary <code>DisplayClass21</code>...
</p>
<h3 id="5250f81716324ff1918bea2e57d08ef4">
Immediate caller <a href="#5250f81716324ff1918bea2e57d08ef4">#</a>
</h3>
<p>
The code that calls <code>Allocate</code> is the <code>Schedule</code> method, the System Under Test:
</p>
<p>
<pre><span style="color:blue;">public</span> IEnumerable<TimeSlot> <span style="font-weight:bold;color:#74531f;">Schedule</span>(
IEnumerable<Reservation> <span style="font-weight:bold;color:#1f377f;">reservations</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span>
<span style="color:blue;">from</span> r <span style="color:blue;">in</span> reservations
<span style="color:blue;">group</span> r <span style="color:blue;">by</span> r.At <span style="color:blue;">into</span> g
<span style="color:blue;">orderby</span> g.Key
<span style="color:blue;">let</span> seating = <span style="color:blue;">new</span> Seating(SeatingDuration, g.Key)
<span style="color:blue;">let</span> overlapping = reservations.Where(seating.Overlaps)
<span style="color:blue;">select</span> <span style="color:blue;">new</span> TimeSlot(g.Key, Allocate(overlapping).ToList());
}</pre>
</p>
<p>
While it does <code>orderby</code>, it doesn't seem to be sorting the input to <code>Allocate</code>. While <code>overlapping</code> is a filtered subset of <code>reservations</code>, the code doesn't sort <code>reservations</code>.
</p>
<p>
Okay, moving on, what does the caller of that method look like?
</p>
<h3 id="0db463ec75d64f93a1b188af9fe731f3">
Test implementation <a href="#0db463ec75d64f93a1b188af9fe731f3">#</a>
</h3>
<p>
The caller of the <code>Schedule</code> method is this test implementation:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> <span style="color:blue;">void</span> <span style="color:#74531f;">ScheduleImp</span>(
MaitreD <span style="font-weight:bold;color:#1f377f;">sut</span>,
Reservation[] <span style="font-weight:bold;color:#1f377f;">reservations</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = sut.Schedule(reservations);
Assert.Equal(
reservations.Select(<span style="font-weight:bold;color:#1f377f;">r</span> => r.At).Distinct().Count(),
actual.Count());
Assert.Equal(
actual.Select(<span style="font-weight:bold;color:#1f377f;">ts</span> => ts.At).OrderBy(<span style="font-weight:bold;color:#1f377f;">d</span> => d),
actual.Select(<span style="font-weight:bold;color:#1f377f;">ts</span> => ts.At));
Assert.All(actual, <span style="font-weight:bold;color:#1f377f;">ts</span> => AssertTables(sut.Tables, ts.Tables));
Assert.All(
actual,
<span style="font-weight:bold;color:#1f377f;">ts</span> => AssertRelevance(reservations, sut.SeatingDuration, ts));
}</pre>
</p>
<p>
Notice how the first line of code calls <code>Schedule</code>, while the rest is 'just' assertions.
</p>
<p>
Because I had noticed that <code>OrderedEnumerable`1</code> on the stack, I was on the lookout for an expression that would sort an <code>IEnumerable<T></code>. The <code>ScheduleImp</code> method surprised me, though, because the <code>reservations</code> parameter is an array. If there was any problem sorting it, it should have blown up much earlier.
</p>
<p>
I really should be paying more attention, but despite my best resolution to proceed methodically, I was chasing the wrong clue.
</p>
<p>
Which line of code throws the exception? The stack trace says line 31. That's not the <code>sut.Schedule(reservations)</code> call. It's the first assertion following it. I failed to notice that.
</p>
<h3 id="5116edf953f1438b9cb4b37c5b043bda">
Property <a href="#5116edf953f1438b9cb4b37c5b043bda">#</a>
</h3>
<p>
I was stumped, and not knowing what to do, I looked at the fourth and final piece of user code in that stack trace:
</p>
<p>
<pre>[Property]
<span style="color:blue;">public</span> Property <span style="font-weight:bold;color:#74531f;">Schedule</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> Prop.ForAll(
(<span style="color:blue;">from</span> rs <span style="color:blue;">in</span> Gens.Reservations
<span style="color:blue;">from</span> m <span style="color:blue;">in</span> Gens.MaitreD(rs)
<span style="color:blue;">select</span> (m, rs)).ToArbitrary(),
<span style="font-weight:bold;color:#1f377f;">t</span> => ScheduleImp(t.m, t.rs));
}</pre>
</p>
<p>
No sorting there. What's going on?
</p>
<p>
In retrospect, I'm struggling to understand what was going on in my mind. Perhaps you're about to lose patience with me. I was chasing the wrong 'clue', just as I said above that 'other' people do, but surely, it's understood, that I don't.
</p>
<h3 id="70ad2e90704d4d31bc0d045fff16a011">
WYSIATI <a href="#70ad2e90704d4d31bc0d045fff16a011">#</a>
</h3>
<p>
In <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a> I spend some time discussing how code relates to human cognition. I'm no neuroscientist, but I try to read books on other topics than programming. I was partially inspired by <a href="/ref/thinking-fast-and-slow">Thinking, Fast and Slow</a> in which <a href="https://en.wikipedia.org/wiki/Daniel_Kahneman">Daniel Kahneman</a> (among many other topics) presents how <em>System 1</em> (the inaccurate <em>fast</em> thinking process) mostly works with what's right in front of it: <em>What You See Is All There Is</em>, or WYSIATI.
</p>
<p>
That <code>OrderedEnumerable`1</code> in the stack trace had made me look for an <code>IEnumerable<T></code> as the culprit, and in the source code of the <code>Allocate</code> method, one parameter is clearly what I was looking for. I'll repeat that code here for your benefit:
</p>
<p>
<pre><span style="color:blue;">private</span> IEnumerable<Table> <span style="font-weight:bold;color:#74531f;">Allocate</span>(
IEnumerable<Reservation> <span style="font-weight:bold;color:#1f377f;">reservations</span>)
{
List<Table> <span style="font-weight:bold;color:#1f377f;">allocation</span> = Tables.ToList();
<span style="font-weight:bold;color:#8f08c4;">foreach</span> (var <span style="font-weight:bold;color:#1f377f;">r</span> <span style="font-weight:bold;color:#8f08c4;">in</span> reservations)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">table</span> = allocation.Find(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Fits(r.Quantity));
<span style="font-weight:bold;color:#8f08c4;">if</span> (table <span style="color:blue;">is</span> { })
{
allocation.Remove(table);
allocation.Add(table.Reserve(r));
}
}
<span style="font-weight:bold;color:#8f08c4;">return</span> allocation;
}</pre>
</p>
<p>
Where's the <code>IEnumerable<T></code> in that code?
</p>
<p>
<code>reservations</code>, right?
</p>
<h3 id="c99f8e1238284cc88868d5fe39f43f2a">
Revelation <a href="#c99f8e1238284cc88868d5fe39f43f2a">#</a>
</h3>
<p>
As WYSIATI 'predicts', the brain gloms on to what's prominent. I was looking for <code>IEnumerable<T></code>, and it's right there in the method declaration as the parameter <code>IEnumerable<Reservation> <span style="font-weight:bold;color:#1f377f;">reservations</span></code>.
</p>
<p>
As covered in multiple places (<a href="/code-that-fits-in-your-head">my book</a>, <a href="/ref/programmers-brain">The Programmer's Brain</a>), the human brain has limited short-term memory. Apparently, while chasing the <code>IEnumerable<T></code> clue, I'd already managed to forget another important datum.
</p>
<p>
Which line of code throws the exception? This one:
</p>
<p>
<pre>List<Table> <span style="font-weight:bold;color:#1f377f;">allocation</span> = Tables.ToList();</pre>
</p>
<p>
The <code>IEnumerable<T></code> isn't <code>reservations</code>, but <code>Tables</code>.
</p>
<p>
While the code doesn't explicitly say <code>IEnumerable<Table> Tables</code>, that's just what it is.
</p>
<p>
Yes, it took me way too long to notice that I'd been barking up the wrong tree all along. Perhaps you immediately noticed that, but have pity with me. I don't think this kind of human error is uncommon.
</p>
<h3 id="288f57cb1a4648a1926164e64aebfbe2">
The culprit <a href="#288f57cb1a4648a1926164e64aebfbe2">#</a>
</h3>
<p>
Where do <code>Tables</code> come from? It's a read-only property originally injected via the constructor:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">MaitreD</span>(
TimeOfDay <span style="font-weight:bold;color:#1f377f;">opensAt</span>,
TimeOfDay <span style="font-weight:bold;color:#1f377f;">lastSeating</span>,
TimeSpan <span style="font-weight:bold;color:#1f377f;">seatingDuration</span>,
IEnumerable<Table> <span style="font-weight:bold;color:#1f377f;">tables</span>)
{
OpensAt = opensAt;
LastSeating = lastSeating;
SeatingDuration = seatingDuration;
Tables = tables;
}</pre>
</p>
<p>
Okay, in the test then, where does it come from? That's the <code>m</code> in the above property, repeated here for your convenience:
</p>
<p>
<pre>[Property]
<span style="color:blue;">public</span> Property <span style="font-weight:bold;color:#74531f;">Schedule</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> Prop.ForAll(
(<span style="color:blue;">from</span> rs <span style="color:blue;">in</span> Gens.Reservations
<span style="color:blue;">from</span> m <span style="color:blue;">in</span> Gens.MaitreD(rs)
<span style="color:blue;">select</span> (m, rs)).ToArbitrary(),
<span style="font-weight:bold;color:#1f377f;">t</span> => ScheduleImp(t.m, t.rs));
}</pre>
</p>
<p>
The <code>m</code> variable is generated by <code>Gens.MaitreD</code>, so let's follow that clue:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">static</span> Gen<MaitreD> <span style="color:#74531f;">MaitreD</span>(
IEnumerable<Reservation> <span style="font-weight:bold;color:#1f377f;">reservations</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span>
<span style="color:blue;">from</span> seatingDuration <span style="color:blue;">in</span> Gen.Choose(1, 6)
<span style="color:blue;">from</span> tables <span style="color:blue;">in</span> Tables(reservations)
<span style="color:blue;">select</span> <span style="color:blue;">new</span> MaitreD(
TimeSpan.FromHours(18),
TimeSpan.FromHours(21),
TimeSpan.FromHours(seatingDuration),
tables);
}</pre>
</p>
<p>
We're not there yet, but close. The <code>tables</code> variable is generated by this <code>Tables</code> helper function:
</p>
<p>
<pre><span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"><</span><span style="color:gray;">summary</span><span style="color:gray;">></span>
<span style="color:gray;">///</span><span style="color:green;"> Generate a table configuration that can at minimum accomodate all</span>
<span style="color:gray;">///</span><span style="color:green;"> reservations.</span>
<span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"></</span><span style="color:gray;">summary</span><span style="color:gray;">></span>
<span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"><</span><span style="color:gray;">param</span> <span style="color:gray;">name</span><span style="color:gray;">=</span><span style="color:gray;">"</span>reservations<span style="color:gray;">"</span><span style="color:gray;">></span><span style="color:green;">The reservations to accommodate</span><span style="color:gray;"></</span><span style="color:gray;">param</span><span style="color:gray;">></span>
<span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"><</span><span style="color:gray;">returns</span><span style="color:gray;">></span><span style="color:green;">A generator of valid table configurations.</span><span style="color:gray;"></</span><span style="color:gray;">returns</span><span style="color:gray;">></span>
<span style="color:blue;">private</span> <span style="color:blue;">static</span> Gen<IEnumerable<Table>> <span style="color:#74531f;">Tables</span>(
IEnumerable<Reservation> <span style="font-weight:bold;color:#1f377f;">reservations</span>)
{
<span style="color:green;">// Create a table for each reservation, to ensure that all</span>
<span style="color:green;">// reservations can be allotted a table.</span>
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">tables</span> = reservations.Select(<span style="font-weight:bold;color:#1f377f;">r</span> => Table.Standard(r.Quantity));
<span style="font-weight:bold;color:#8f08c4;">return</span>
<span style="color:blue;">from</span> moreTables <span style="color:blue;">in</span>
Gen.Choose(1, 12).Select(
<span style="font-weight:bold;color:#1f377f;">i</span> => Table.Standard(<span style="color:blue;">new</span> NaturalNumber(i))).ArrayOf()
<span style="color:blue;">let</span> allTables =
tables.Concat(moreTables).OrderBy(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Capacity)
<span style="color:blue;">select</span> allTables.AsEnumerable();
}</pre>
</p>
<p>
And there you have it: <code>OrderBy(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Capacity)</code>!
</p>
<p>
The <code>Capacity</code> property was exactly the property I changed from <code>int</code> to <code>NaturalNumber</code> - the change that made the test fail.
</p>
<p>
As expected, the fix was to let <code>NaturalNumber</code> implement <code>IComparable<NaturalNumber></code>.
</p>
<h3 id="534a507621b840dfb566cdb359261840">
Conclusion <a href="#534a507621b840dfb566cdb359261840">#</a>
</h3>
<p>
I thought this little troubleshooting session was interesting enough to write down. I spent perhaps twenty minutes on it before I understood what was going on. Not disastrously long, but enough time that I was relieved when I figured it out.
</p>
<p>
Apart from the obvious (look for the problem where it is), there is one other useful lesson to be learned, I think.
</p>
<p>
<a href="https://learn.microsoft.com/dotnet/standard/linq/deferred-execution-lazy-evaluation">Deferred execution</a> can confuse even the most experienced programmer. It took me some time before it dawned on me that even though the the <code>MaitreD</code> constructor had run and the object was 'safely' initialised, it actually wasn't.
</p>
<p>
The implication is that there's a 'disconnect' between the constructor and the <code>Allocate</code> method. The error actually happens during initialisation (i.e. in the caller of the constructor), but it only manifests when you run the method.
</p>
<p>
Ever since <a href="/2013/07/20/linq-versus-the-lsp">I discovered the IReadOnlyCollection<T> interface in 2013</a> I've resolved to favour it over <code>IEnumerable<T></code>. This is one example of why that's a good idea.
</p>
<p>
Despite my best intentions, I, too, cut corners from time to time. I've done it here, by accepting <code>IEnumerable<Table></code> instead of <code>IReadOnlyCollection<Table></code> as a constructor parameter. I really should have known better, and now I've paid the price.
</p>
<p>
This is particularly ironic because I also love <a href="https://www.haskell.org/">Haskell</a> so much. Haskell is lazy by default, so you'd think that I run into such issues all the time. An expression like <code>OrderBy(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Capacity)</code>, however, wouldn't have compiled in Haskell unless the sort key implemented the <a href="https://hackage.haskell.org/package/base/docs/Data-Ord.html#t:Ord">Ord</a> type class. Even C#'s type system can express that a generic type must implement an interface, but <a href="https://learn.microsoft.com/dotnet/api/system.linq.enumerable.orderby">OrderBy</a> doesn't do that.
</p>
<p>
This problem could have been caught at compile-time, but unfortunately it wasn't.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="7207e4dc0287435facea31fc9ce49d36">
<div class="comment-author"><a href="https://github.com/JesHansen">Jes Hansen</a> <a href="#7207e4dc0287435facea31fc9ce49d36">#</a></div>
<div class="comment-content">
<p>
I made a <a href="https://github.com/dotnet/runtime/issues/92691">pull request</a> describing the issue.
</p>
<p>
As this is likely a breaking change I don't have high hopes for it to be fixed, though…
</p>
</div>
<div class="comment-date">2023-09-27 09:40 UTC</div>
</div>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Do ORMs reduce the need for mapping?https://blog.ploeh.dk/2023/09/18/do-orms-reduce-the-need-for-mapping2023-09-18T14:40:00+00:00Mark Seemann
<div id="post">
<p>
<em>With some Entity Framework examples in C#.</em>
</p>
<p>
In a recent comment, a reader <a href="/2023/07/17/works-on-most-machines#4012c2cddcb64a068c0b06b7989a676e">asked me to expand on my position</a> on <a href="https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping">object-relational mappers</a> (ORMs), which is that I'm not a fan:
</p>
<blockquote>
<p>
I consider ORMs a waste of time: they create more problems than they solve.
</p>
<footer><cite><a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, subsection 12.2.2, footnote</cite></footer>
</blockquote>
<p>
While I acknowledge that only a Sith deals in absolutes, I favour clear assertions over guarded language. I don't really mean it that categorically, but I do stand by the general sentiment. In this article I'll attempt to describe why I don't reach for ORMs when querying or writing to a relational database.
</p>
<p>
As always, any exploration of such a kind is made in a <em>context</em>, and this article is no exception. Before proceeding, allow me to delineate the scope. If your context differs from mine, what I write may not apply to your situation.
</p>
<h3 id="a29a6dfd90604a358c5e2f8e76941f80">
Scope <a href="#a29a6dfd90604a358c5e2f8e76941f80">#</a>
</h3>
<p>
It's been decades since I last worked on a system where the database 'came first'. The last time that happened, the database was hidden behind an XML-based <a href="https://en.wikipedia.org/wiki/Remote_procedure_call">RPC</a> API that tunnelled through HTTP. Not a <a href="https://en.wikipedia.org/wiki/REST">REST</a> API by a long shot.
</p>
<p>
Since then, I've worked on various systems. Some used relational databases, some document databases, some worked with CSV, or really old legacy APIs, etc. Common to these systems was that they were <em>not</em> designed around a database. Rather, they were developed with an eye to the <a href="https://en.wikipedia.org/wiki/Dependency_inversion_principle">Dependency Inversion Principle</a>, keeping storage details out of the Domain Model. Many were developed with test-driven development (TDD).
</p>
<p>
When I evaluate whether or not to use an ORM in situations like these, the core application logic is my main design driver. As I describe in <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, I usually develop (vertical) feature slices one at a time, utilising an <a href="/outside-in-tdd">outside-in TDD</a> process, during which I also figure out how to save or retrieve data from persistent storage.
</p>
<p>
Thus, in systems like these, storage implementation is an artefact of the software architecture. If a relational database is involved, the schema must adhere to the needs of the code; not the other way around.
</p>
<p>
To be clear, then, this article doesn't discuss typical <a href="https://en.wikipedia.org/wiki/Create,_read,_update_and_delete">CRUD</a>-heavy applications that are mostly forms over relational data, with little or no application logic. If you're working with such a code base, an ORM might be useful. I can't really tell, since I last worked with such systems at a time when ORMs didn't exist.
</p>
<h3 id="b6446ab3f8b8410da2679b4fb915a69e">
The usual suspects <a href="#b6446ab3f8b8410da2679b4fb915a69e">#</a>
</h3>
<p>
The most common criticism of ORMs (that I've come across) is typically related to the queries they generate. People who are skilled in writing <a href="https://en.wikipedia.org/wiki/SQL">SQL</a> by hand, or who are concerned about performance, may look at the SQL that an ORM generates and dislike it for that reason.
</p>
<p>
It's my impression that ORMs have come a long way over the decades, but frankly, the generated SQL is not really what concerns me. It never was.
</p>
<p>
In the abstract, Ted Neward already outlined the problems in the seminal article <a href="https://blogs.newardassociates.com/blog/2006/the-vietnam-of-computer-science.html">The Vietnam of Computer Science</a>. That problem description may, however, be too theoretical to connect with most programmers, so I'll try a more example-driven angle.
</p>
<h3 id="6908e1b735ee41068baeeb9482a15953">
Database operations without an ORM <a href="#6908e1b735ee41068baeeb9482a15953">#</a>
</h3>
<p>
Once more I turn to the trusty example code base that accompanies <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>. In it, I used <a href="https://en.wikipedia.org/wiki/Microsoft_SQL_Server">SQL Server</a> as the example database, and ADO.NET as the data access technology.
</p>
<p>
I considered this more than adequate for saving and reading restaurant reservations. Here, for example, is the code that creates a new reservation row in the database:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> Task <span style="font-weight:bold;color:#74531f;">Create</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">restaurantId</span>, Reservation <span style="font-weight:bold;color:#1f377f;">reservation</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (reservation <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> ArgumentNullException(nameof(reservation));
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">conn</span> = <span style="color:blue;">new</span> SqlConnection(ConnectionString);
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">cmd</span> = <span style="color:blue;">new</span> SqlCommand(createReservationSql, conn);
cmd.Parameters.AddWithValue(<span style="color:#a31515;">"@Id"</span>, reservation.Id);
cmd.Parameters.AddWithValue(<span style="color:#a31515;">"@RestaurantId"</span>, restaurantId);
cmd.Parameters.AddWithValue(<span style="color:#a31515;">"@At"</span>, reservation.At);
cmd.Parameters.AddWithValue(<span style="color:#a31515;">"@Name"</span>, reservation.Name.ToString());
cmd.Parameters.AddWithValue(<span style="color:#a31515;">"@Email"</span>, reservation.Email.ToString());
cmd.Parameters.AddWithValue(<span style="color:#a31515;">"@Quantity"</span>, reservation.Quantity);
<span style="color:blue;">await</span> conn.OpenAsync().ConfigureAwait(<span style="color:blue;">false</span>);
<span style="color:blue;">await</span> cmd.ExecuteNonQueryAsync().ConfigureAwait(<span style="color:blue;">false</span>);
}
<span style="color:blue;">private</span> <span style="color:blue;">const</span> <span style="color:blue;">string</span> createReservationSql = <span style="color:maroon;">@"
INSERT INTO [dbo].[Reservations] (
[PublicId], [RestaurantId], [At], [Name], [Email], [Quantity])
VALUES (@Id, @RestaurantId, @At, @Name, @Email, @Quantity)"</span>;</pre>
</p>
<p>
Yes, there's mapping, even if it's 'only' from a Domain Object to command parameter strings. As I'll argue later, if there's a way to escape such mapping, I'm not aware of it. ORMs don't seem to solve that problem.
</p>
<p>
This, however, seems to be the reader's main concern:
</p>
<blockquote>
<p>
"I can work with raw SQL ofcourse... but the mapping... oh the mapping..."
</p>
<footer><cite><a href="/2023/07/17/works-on-most-machines#4012c2cddcb64a068c0b06b7989a676e">qfilip</a></cite></footer>
</blockquote>
<p>
It's not a concern that I share, but again I'll remind you that if your context differs substantially from mine, what doesn't concern me could reasonably concern you.
</p>
<p>
You may argue that the above example isn't representative, since it only involves a single table. No foreign key relationships are involved, so perhaps the example is artificially easy.
</p>
<p>
In order to work with a slightly more complex schema, I decided to port the read-only in-memory restaurant database (the one that keeps track of the restaurants - the <em>tenants</em> - of the system) to SQL Server.
</p>
<h3 id="ef3d04206a20442dbd2c01336c48fd28">
Restaurants schema <a href="#ef3d04206a20442dbd2c01336c48fd28">#</a>
</h3>
<p>
In the book's sample code base, I'd only stored restaurant configurations as JSON config files, since I considered it out of scope to include an online tenant management system. Converting to a relational model wasn't hard, though. Here's the database schema:
</p>
<p>
<pre><span style="color:blue;">CREATE</span> <span style="color:blue;">TABLE</span> [dbo]<span style="color:gray;">.</span>[Restaurants]<span style="color:blue;"> </span><span style="color:gray;">(</span>
[Id] <span style="color:blue;">INT</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL,</span>
[Name] <span style="color:blue;">NVARCHAR </span><span style="color:gray;">(</span>50<span style="color:gray;">)</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL</span> <span style="color:blue;">UNIQUE</span><span style="color:gray;">,</span>
[OpensAt] <span style="color:blue;">TIME</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL,</span>
[LastSeating] <span style="color:blue;">TIME</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL,</span>
[SeatingDuration] <span style="color:blue;">TIME</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL</span>
<span style="color:blue;">PRIMARY</span> <span style="color:blue;">KEY</span> <span style="color:blue;">CLUSTERED </span><span style="color:gray;">(</span>[Id] <span style="color:blue;">ASC</span><span style="color:gray;">)</span>
<span style="color:gray;">)</span>
<span style="color:blue;">CREATE</span> <span style="color:blue;">TABLE</span> [dbo]<span style="color:gray;">.</span>[Tables]<span style="color:blue;"> </span><span style="color:gray;">(</span>
[Id] <span style="color:blue;">INT</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL</span> <span style="color:blue;">IDENTITY</span><span style="color:gray;">,</span>
[RestaurantId] <span style="color:blue;">INT</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL</span> <span style="color:blue;">REFERENCES</span> [dbo]<span style="color:gray;">.</span>[Restaurants]<span style="color:gray;">(</span>Id<span style="color:gray;">),</span>
[Capacity] <span style="color:blue;">INT</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL,</span>
[IsCommunal] <span style="color:blue;">BIT</span> <span style="color:gray;">NOT</span> <span style="color:gray;">NULL</span>
<span style="color:blue;">PRIMARY</span> <span style="color:blue;">KEY</span> <span style="color:blue;">CLUSTERED </span><span style="color:gray;">(</span>[Id] <span style="color:blue;">ASC</span><span style="color:gray;">)</span>
<span style="color:gray;">)</span></pre>
</p>
<p>
This little subsystem requires two database tables: One that keeps track of the overall restaurant configuration, such as name, opening and closing times, and another database table that lists all a restaurant's physical tables.
</p>
<p>
You may argue that this is still too simple to realistically capture the intricacies of existing database systems, but conversely I'll remind you that the scope of this article is the sort of system where you develop and design the application first; not a system where you're given a relational database upon which you must create an application.
</p>
<p>
Had I been given this assignment in a realistic setting, a relational database probably wouldn't have been my first choice. Some kind of document database, or even blob storage, strikes me as a better fit. Still, this article is about ORMs, so I'll pretend that there are external circumstances that dictate a relational database.
</p>
<p>
To test the system, I also created a script to populate these tables. Here's part of it:
</p>
<p>
<pre><span style="color:blue;">INSERT</span> <span style="color:blue;">INTO</span> [dbo]<span style="color:gray;">.</span>[Restaurants]<span style="color:blue;"> </span><span style="color:gray;">(</span>[Id]<span style="color:gray;">,</span> [Name]<span style="color:gray;">,</span> [OpensAt]<span style="color:gray;">,</span> [LastSeating]<span style="color:gray;">,</span> [SeatingDuration]<span style="color:gray;">)</span>
<span style="color:blue;">VALUES </span><span style="color:gray;">(</span>1<span style="color:gray;">,</span> <span style="color:red;">N'Hipgnosta'</span><span style="color:gray;">,</span> <span style="color:red;">'18:00'</span><span style="color:gray;">,</span> <span style="color:red;">'21:00'</span><span style="color:gray;">,</span> <span style="color:red;">'6:00'</span><span style="color:gray;">)</span>
<span style="color:blue;">INSERT</span> <span style="color:blue;">INTO</span> [dbo]<span style="color:gray;">.</span>[Tables]<span style="color:blue;"> </span><span style="color:gray;">(</span>[RestaurantId]<span style="color:gray;">,</span> [Capacity]<span style="color:gray;">,</span> [IsCommunal]<span style="color:gray;">)</span>
<span style="color:blue;">VALUES </span><span style="color:gray;">(</span>1<span style="color:gray;">,</span> 10<span style="color:gray;">,</span> 1<span style="color:gray;">)</span>
<span style="color:blue;">INSERT</span> <span style="color:blue;">INTO</span> [dbo]<span style="color:gray;">.</span>[Restaurants]<span style="color:blue;"> </span><span style="color:gray;">(</span>[Id]<span style="color:gray;">,</span> [Name]<span style="color:gray;">,</span> [OpensAt]<span style="color:gray;">,</span> [LastSeating]<span style="color:gray;">,</span> [SeatingDuration]<span style="color:gray;">)</span>
<span style="color:blue;">VALUES </span><span style="color:gray;">(</span>2112<span style="color:gray;">,</span> <span style="color:red;">N'Nono'</span><span style="color:gray;">,</span> <span style="color:red;">'18:00'</span><span style="color:gray;">,</span> <span style="color:red;">'21:00'</span><span style="color:gray;">,</span> <span style="color:red;">'6:00'</span><span style="color:gray;">)</span>
<span style="color:blue;">INSERT</span> <span style="color:blue;">INTO</span> [dbo]<span style="color:gray;">.</span>[Tables]<span style="color:blue;"> </span><span style="color:gray;">(</span>[RestaurantId]<span style="color:gray;">,</span> [Capacity]<span style="color:gray;">,</span> [IsCommunal]<span style="color:gray;">)</span>
<span style="color:blue;">VALUES </span><span style="color:gray;">(</span>2112<span style="color:gray;">,</span> 6<span style="color:gray;">,</span> 1<span style="color:gray;">)</span>
<span style="color:blue;">INSERT</span> <span style="color:blue;">INTO</span> [dbo]<span style="color:gray;">.</span>[Tables]<span style="color:blue;"> </span><span style="color:gray;">(</span>[RestaurantId]<span style="color:gray;">,</span> [Capacity]<span style="color:gray;">,</span> [IsCommunal]<span style="color:gray;">)</span>
<span style="color:blue;">VALUES </span><span style="color:gray;">(</span>2112<span style="color:gray;">,</span> 4<span style="color:gray;">,</span> 1<span style="color:gray;">)</span>
<span style="color:blue;">INSERT</span> <span style="color:blue;">INTO</span> [dbo]<span style="color:gray;">.</span>[Tables]<span style="color:blue;"> </span><span style="color:gray;">(</span>[RestaurantId]<span style="color:gray;">,</span> [Capacity]<span style="color:gray;">,</span> [IsCommunal]<span style="color:gray;">)</span>
<span style="color:blue;">VALUES </span><span style="color:gray;">(</span>2112<span style="color:gray;">,</span> 2<span style="color:gray;">,</span> 0<span style="color:gray;">)</span>
<span style="color:blue;">INSERT</span> <span style="color:blue;">INTO</span> [dbo]<span style="color:gray;">.</span>[Tables]<span style="color:blue;"> </span><span style="color:gray;">(</span>[RestaurantId]<span style="color:gray;">,</span> [Capacity]<span style="color:gray;">,</span> [IsCommunal]<span style="color:gray;">)</span>
<span style="color:blue;">VALUES </span><span style="color:gray;">(</span>2112<span style="color:gray;">,</span> 2<span style="color:gray;">,</span> 0<span style="color:gray;">)</span>
<span style="color:blue;">INSERT</span> <span style="color:blue;">INTO</span> [dbo]<span style="color:gray;">.</span>[Tables]<span style="color:blue;"> </span><span style="color:gray;">(</span>[RestaurantId]<span style="color:gray;">,</span> [Capacity]<span style="color:gray;">,</span> [IsCommunal]<span style="color:gray;">)</span>
<span style="color:blue;">VALUES </span><span style="color:gray;">(</span>2112<span style="color:gray;">,</span> 4<span style="color:gray;">,</span> 0<span style="color:gray;">)</span>
<span style="color:blue;">INSERT</span> <span style="color:blue;">INTO</span> [dbo]<span style="color:gray;">.</span>[Tables]<span style="color:blue;"> </span><span style="color:gray;">(</span>[RestaurantId]<span style="color:gray;">,</span> [Capacity]<span style="color:gray;">,</span> [IsCommunal]<span style="color:gray;">)</span>
<span style="color:blue;">VALUES </span><span style="color:gray;">(</span>2112<span style="color:gray;">,</span> 4<span style="color:gray;">,</span> 0<span style="color:gray;">)</span></pre>
</p>
<p>
There are more rows than this, but this should give you an idea of what data looks like.
</p>
<h3 id="ba5d810c332945398ab2a870711357f1">
Reading restaurant data without an ORM <a href="#ba5d810c332945398ab2a870711357f1">#</a>
</h3>
<p>
Due to the foreign key relationship, reading restaurant data from the database is a little more involved than reading from a single table.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> Task<Restaurant?> <span style="font-weight:bold;color:#74531f;">GetRestaurant</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">name</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">cmd</span> = <span style="color:blue;">new</span> SqlCommand(readByNameSql);
cmd.Parameters.AddWithValue(<span style="color:#a31515;">"@Name"</span>, name);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">restaurants</span> = <span style="color:blue;">await</span> ReadRestaurants(cmd);
<span style="font-weight:bold;color:#8f08c4;">return</span> restaurants.SingleOrDefault();
}
<span style="color:blue;">private</span> <span style="color:blue;">const</span> <span style="color:blue;">string</span> readByNameSql = <span style="color:maroon;">@"
SELECT [Id], [Name], [OpensAt], [LastSeating], [SeatingDuration]
FROM [dbo].[Restaurants]
WHERE [Name] = @Name
SELECT [RestaurantId], [Capacity], [IsCommunal]
FROM [dbo].[Tables]
JOIN [dbo].[Restaurants]
ON [dbo].[Tables].[RestaurantId] = [dbo].[Restaurants].[Id]
WHERE [Name] = @Name"</span>;</pre>
</p>
<p>
There are more than one option when deciding how to construct the query. You could make one query with a join, in which case you'd get rows with repeated data, and you'd then need to detect duplicates, or you could do as I've done here: Query each table to get multiple result sets.
</p>
<p>
I'm not claiming that this is better in any way. I only chose this option because I found the code that I had to write less offensive.
</p>
<p>
Since the <code>IRestaurantDatabase</code> interface defines three different kinds of queries (<code>GetAll()</code>, <code>GetRestaurant(int id)</code>, and <code>GetRestaurant(string name)</code>), I invoked the <a href="https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)">rule of three</a> and extracted a helper method:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">async</span> Task<IEnumerable<Restaurant>> <span style="font-weight:bold;color:#74531f;">ReadRestaurants</span>(SqlCommand <span style="font-weight:bold;color:#1f377f;">cmd</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">conn</span> = <span style="color:blue;">new</span> SqlConnection(ConnectionString);
cmd.Connection = conn;
<span style="color:blue;">await</span> conn.OpenAsync();
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">rdr</span> = <span style="color:blue;">await</span> cmd.ExecuteReaderAsync();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">restaurants</span> = Enumerable.Empty<Restaurant>();
<span style="font-weight:bold;color:#8f08c4;">while</span> (<span style="color:blue;">await</span> rdr.ReadAsync())
restaurants = restaurants.Append(ReadRestaurantRow(rdr));
<span style="font-weight:bold;color:#8f08c4;">if</span> (<span style="color:blue;">await</span> rdr.NextResultAsync())
<span style="font-weight:bold;color:#8f08c4;">while</span> (<span style="color:blue;">await</span> rdr.ReadAsync())
restaurants = ReadTableRow(rdr, restaurants);
<span style="font-weight:bold;color:#8f08c4;">return</span> restaurants;
}</pre>
</p>
<p>
The <code>ReadRestaurants</code> method does the overall work of opening the database connection, executing the query, and moving through rows and result sets. Again, we'll find mapping code hidden in helper methods:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> Restaurant <span style="color:#74531f;">ReadRestaurantRow</span>(SqlDataReader <span style="font-weight:bold;color:#1f377f;">rdr</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> Restaurant(
(<span style="color:blue;">int</span>)rdr[<span style="color:#a31515;">"Id"</span>],
(<span style="color:blue;">string</span>)rdr[<span style="color:#a31515;">"Name"</span>],
<span style="color:blue;">new</span> MaitreD(
<span style="color:blue;">new</span> TimeOfDay((TimeSpan)rdr[<span style="color:#a31515;">"OpensAt"</span>]),
<span style="color:blue;">new</span> TimeOfDay((TimeSpan)rdr[<span style="color:#a31515;">"LastSeating"</span>]),
(TimeSpan)rdr[<span style="color:#a31515;">"SeatingDuration"</span>]));
}</pre>
</p>
<p>
As the name suggests, <code>ReadRestaurantRow</code> reads a row from the <code>Restaurants</code> table and converts it into a <code>Restaurant</code> object. At this time, however, it creates each <code>MaitreD</code> object without any tables. This is possible because one of the <code>MaitreD</code> constructors takes a <code>params</code> array as the last parameter:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">MaitreD</span>(
TimeOfDay <span style="font-weight:bold;color:#1f377f;">opensAt</span>,
TimeOfDay <span style="font-weight:bold;color:#1f377f;">lastSeating</span>,
TimeSpan <span style="font-weight:bold;color:#1f377f;">seatingDuration</span>,
<span style="color:blue;">params</span> Table[] <span style="font-weight:bold;color:#1f377f;">tables</span>) :
<span style="color:blue;">this</span>(opensAt, lastSeating, seatingDuration, tables.AsEnumerable())
{
}</pre>
</p>
<p>
Only when the <code>ReadRestaurants</code> method moves on to the next result set can it add tables to each restaurant:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> IEnumerable<Restaurant> <span style="color:#74531f;">ReadTableRow</span>(
SqlDataReader <span style="font-weight:bold;color:#1f377f;">rdr</span>,
IEnumerable<Restaurant> <span style="font-weight:bold;color:#1f377f;">restaurants</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">restaurantId</span> = (<span style="color:blue;">int</span>)rdr[<span style="color:#a31515;">"RestaurantId"</span>];
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">capacity</span> = (<span style="color:blue;">int</span>)rdr[<span style="color:#a31515;">"Capacity"</span>];
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">isCommunal</span> = (<span style="color:blue;">bool</span>)rdr[<span style="color:#a31515;">"IsCommunal"</span>];
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">table</span> = isCommunal ? Table.Communal(capacity) : Table.Standard(capacity);
<span style="font-weight:bold;color:#8f08c4;">return</span> restaurants.Select(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Id == restaurantId ? AddTable(r, table) : r);
}</pre>
</p>
<p>
As was also the case in <code>ReadRestaurantRow</code>, this method uses string-based indexers on the <code>rdr</code> to extract the data. I'm no fan of stringly-typed code, but at least I have automated tests that exercise these methods.
</p>
<p>
Could an ORM help by creating strongly-typed classes that model database tables? To a degree; I'll discuss that later.
</p>
<p>
In any case, since the entire code base follows the <a href="https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell">Functional Core, Imperative Shell</a> architecture, the entire Domain Model is made of immutable data types with pure functions. Thus, <code>ReadTableRow</code> has to iterate over all <code>restaurants</code> and add the table when the <code>Id</code> matches. <code>AddTable</code> does that:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> Restaurant <span style="color:#74531f;">AddTable</span>(Restaurant <span style="font-weight:bold;color:#1f377f;">restaurant</span>, Table <span style="font-weight:bold;color:#1f377f;">table</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> restaurant.Select(<span style="font-weight:bold;color:#1f377f;">m</span> => m.WithTables(m.Tables.Append(table).ToArray()));
}</pre>
</p>
<p>
I can think of other ways to solve the overall mapping task when using ADO.NET, but this was what made most sense to me.
</p>
<h3 id="3eda5ebf7165478698c756d59af43a1e">
Reading restaurants with Entity Framework <a href="#3eda5ebf7165478698c756d59af43a1e">#</a>
</h3>
<p>
Does an ORM like <a href="https://en.wikipedia.org/wiki/Entity_Framework">Entity Framework</a> (EF) improve things? To a degree, but not enough to outweigh the disadvantages it also brings.
</p>
<p>
In order to investigate, I followed <a href="https://learn.microsoft.com/ef/core/managing-schemas/scaffolding">the EF documentation to scaffold</a> code from a database I'd set up for only that purpose. For the <code>Tables</code> table it created the following <code>Table</code> class and a similar <code>Restaurant</code> class.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">partial</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Table</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Id { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">int</span> RestaurantId { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Capacity { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">bool</span> IsCommunal { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">virtual</span> Restaurant Restaurant { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; } = <span style="color:blue;">null</span>!;
}</pre>
</p>
<p>
Hardly surprising. Also, hardly object-oriented, but more about that later, too.
</p>
<p>
Entity Framework didn't, by itself, add a <code>Tables</code> collection to the <code>Restaurant</code> class, so I had to do that by hand, as well as modify the <a href="https://learn.microsoft.com/dotnet/api/microsoft.entityframeworkcore.dbcontext">DbContext</a>-derived class to tell it about this relationship:
</p>
<p>
<pre>entity.OwnsMany(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Tables, <span style="font-weight:bold;color:#1f377f;">b</span> =>
{
b.Property<<span style="color:blue;">int</span>>(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Id).ValueGeneratedOnAdd();
b.HasKey(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Id);
});</pre>
</p>
<p>
I thought that such a simple foreign key relationship would be something an ORM would help with, but apparently not.
</p>
<p>
With that in place, I could now rewrite the above <code>GetRestaurant</code> method to use Entity Framework instead of ADO.NET:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> Task<Restaurants.Restaurant?> <span style="font-weight:bold;color:#74531f;">GetRestaurant</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">name</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> RestaurantsContext(ConnectionString);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">dbRestaurant</span> = <span style="color:blue;">await</span> db.Restaurants.FirstOrDefaultAsync(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Name == name);
<span style="font-weight:bold;color:#8f08c4;">if</span> (dbRestaurant == <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="font-weight:bold;color:#8f08c4;">return</span> ToDomainModel(dbRestaurant);
}</pre>
</p>
<p>
The method now queries the database, and EF automatically returns a populated object. This would be nice if it was the right kind of object, but alas, it isn't. <code>GetRestaurant</code> still has to call a helper method to convert to the correct Domain Object:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> Restaurants.Restaurant <span style="color:#74531f;">ToDomainModel</span>(Restaurant <span style="font-weight:bold;color:#1f377f;">restaurant</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> Restaurants.Restaurant(
restaurant.Id,
restaurant.Name,
<span style="color:blue;">new</span> MaitreD(
<span style="color:blue;">new</span> TimeOfDay(restaurant.OpensAt),
<span style="color:blue;">new</span> TimeOfDay(restaurant.LastSeating),
restaurant.SeatingDuration,
restaurant.Tables.Select(ToDomainModel).ToList()));
}</pre>
</p>
<p>
While this helper method converts an EF <code>Restaurant</code> object to a proper Domain Object (<code>Restaurants.Restaurant</code>), it also needs another helper to convert the <code>table</code> objects:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> Restaurants.Table <span style="color:#74531f;">ToDomainModel</span>(Table <span style="font-weight:bold;color:#1f377f;">table</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (table.IsCommunal)
<span style="font-weight:bold;color:#8f08c4;">return</span> Restaurants.Table.Communal(table.Capacity);
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> Restaurants.Table.Standard(table.Capacity);
}</pre>
</p>
<p>
As should be clear by now, using vanilla EF doesn't reduce the need for mapping.
</p>
<p>
Granted, the mapping code is a bit simpler, but you still need to remember to map <code>restaurant.Name</code> to the right constructor parameter, <code>restaurant.OpensAt</code> and <code>restaurant.LastSeating</code> to <em>their</em> correct places, <code>table.Capacity</code> to a constructor argument, and so on. If you make changes to the database schema or the Domain Model, you'll need to edit this code.
</p>
<h3 id="b85eb83e5acc4d66baaf9ec51c3be02d">
Encapsulation <a href="#b85eb83e5acc4d66baaf9ec51c3be02d">#</a>
</h3>
<p>
This is the point where more than one reader wonders: <em>Can't you just..?</em>
</p>
<p>
In short, no, I can't just.
</p>
<p>
The most common reaction is most likely that I'm doing this all wrong. I'm supposed to use the EF classes as my Domain Model.
</p>
<p>
But I can't, and I won't. I can't because I already have classes in place that serve that purpose. I also will not, because it would violate the Dependency Inversion Principle. As I recently described, <a href="/2023/09/04/decomposing-ctfiyhs-sample-code-base">the architecture is Ports and Adapters</a>, or, if you will, <a href="/ref/clean-architecture">Clean Architecture</a>. The database <a href="https://en.wikipedia.org/wiki/Adapter_pattern">Adapter</a> should depend on the Domain Model; the Domain Model shouldn't depend on the database implementation.
</p>
<p>
Okay, but couldn't I have generated the EF classes in the Domain Model? After all, a class like the above <code>Table</code> is just a <a href="https://en.wikipedia.org/wiki/Plain_old_CLR_object">POCO</a> Entity. It doesn't depend on the Entity Framework. I could have those classes in my Domain Model, put my <code>DbContext</code> in the data access layer, and have the best of both worlds. Right?
</p>
<p>
The code shown so far hints at a particular API afforded by the Domain Model. If you've read <a href="/code-that-fits-in-your-head">my book</a>, you already know what comes next. Here's the <code>Table</code> Domain Model's API:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Table</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Table <span style="color:#74531f;">Standard</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">seats</span>)
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Table <span style="color:#74531f;">Communal</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">seats</span>)
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Capacity { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">int</span> RemainingSeats { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> Table <span style="font-weight:bold;color:#74531f;">Reserve</span>(Reservation <span style="font-weight:bold;color:#1f377f;">reservation</span>)
<span style="color:blue;">public</span> T <span style="font-weight:bold;color:#74531f;">Accept</span><<span style="color:#2b91af;">T</span>>(ITableVisitor<T> <span style="font-weight:bold;color:#1f377f;">visitor</span>)
}</pre>
</p>
<p>
A couple of qualities of this design should be striking: There's <em>no</em> visible constructor - not even one that takes parameters. Instead, the type affords two static creation functions. One creates a standard table, the other a communal table. My book describes the difference between these types, and so does <a href="/2020/01/27/the-maitre-d-kata">the Maître d' kata</a>.
</p>
<p>
This isn't some frivolous design choice of mine, but rather quite deliberate. That <code>Table</code> class is a <a href="/2018/06/25/visitor-as-a-sum-type">Visitor-encoded sum type</a>. You can debate whether I should have modelled a table as a <a href="https://en.wikipedia.org/wiki/Tagged_union">sum type</a> or a polymorphic object, but now that I've chosen a sum type, it should be explicit in the API design.
</p>
<blockquote>
<p>
"Explicit is better than implicit."
</p>
<footer><cite><a href="https://peps.python.org/pep-0020/">The Zen of Python</a></cite></footer>
</blockquote>
<p>
When we program, we make many mistakes. It's important to discover the mistakes as soon as possible. With a compiled language, <a href="/2011/04/29/Feedbackmechanismsandtradeoffs">the first feedback you get is from the compiler</a>. I favour leveraging the compiler, and its type system, to prevent as many mistakes as possible. That's what <a href="https://buttondown.email/hillelwayne/archive/making-illegal-states-unrepresentable/">Hillel Wayne calls <em>constructive</em> data</a>. <a href="https://blog.janestreet.com/effective-ml-video/">Make illegal states unrepresentable</a>.
</p>
<p>
I could, had I thought of it at the time, have introduced <a href="/2022/08/22/can-types-replace-validation">a predicative natural-number wrapper of integers</a>, in which case I could have strengthened the contract of <code>Table</code> even further:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Table</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Table <span style="color:#74531f;">Standard</span>(NaturalNumber <span style="font-weight:bold;color:#1f377f;">capacity</span>)
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Table <span style="color:#74531f;">Communal</span>(NaturalNumber <span style="font-weight:bold;color:#1f377f;">capacity</span>)
<span style="color:blue;">public</span> NaturalNumber Capacity { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">int</span> RemainingSeats { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> Table <span style="font-weight:bold;color:#74531f;">Reserve</span>(Reservation <span style="font-weight:bold;color:#1f377f;">reservation</span>)
<span style="color:blue;">public</span> T <span style="font-weight:bold;color:#74531f;">Accept</span><<span style="color:#2b91af;">T</span>>(ITableVisitor<T> <span style="font-weight:bold;color:#1f377f;">visitor</span>)
}</pre>
</p>
<p>
The point is that I take <a href="/encapsulation-and-solid">encapsulation</a> seriously, and my interpretation of the concept is heavily inspired by <a href="https://en.wikipedia.org/wiki/Bertrand_Meyer">Bertrand Meyer</a>'s <a href="/ref/oosc">Object-Oriented Software Construction</a>. The view of encapsulation emphasises <em>contracts</em> (preconditions, invariants, postconditions) rather than information hiding.
</p>
<p>
As I described in a previous article, <a href="/2022/08/22/can-types-replace-validation">you can't model all preconditions and invariants with types</a>, but you can still let the type system do much heavy lifting.
</p>
<p>
This principle applies to all classes that are part of the Domain Model; not only <code>Table</code>, but also <code>Restaurant</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Restaurant</span>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">Restaurant</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">id</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">name</span>, MaitreD <span style="font-weight:bold;color:#1f377f;">maitreD</span>)
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Id { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">string</span> Name { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> MaitreD MaitreD { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> Restaurant <span style="font-weight:bold;color:#74531f;">WithId</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">newId</span>)
<span style="color:blue;">public</span> Restaurant <span style="font-weight:bold;color:#74531f;">WithName</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">newName</span>)
<span style="color:blue;">public</span> Restaurant <span style="font-weight:bold;color:#74531f;">WithMaitreD</span>(MaitreD <span style="font-weight:bold;color:#1f377f;">newMaitreD</span>)
<span style="color:blue;">public</span> Restaurant <span style="font-weight:bold;color:#74531f;">Select</span>(Func<MaitreD, MaitreD> <span style="font-weight:bold;color:#1f377f;">selector</span>)
}</pre>
</p>
<p>
While this class does have a public constructor, it makes use of another design choice that Entity Framework doesn't support: It nests one rich object (<code>MaitreD</code>) inside another. Why does it do that?
</p>
<p>
Again, this is far from a frivolous design choice I made just to be difficult. Rather, it's a result of a need-to-know principle (which strikes me as closely related to the <a href="https://en.wikipedia.org/wiki/Single-responsibility_principle">Single Responsibility Principle</a>): A class should only contain the information it needs in order to perform its job.
</p>
<p>
The <code>MaitreD</code> class does all the heavy lifting when it comes to deciding whether or not to accept reservations, how to allocate tables, etc. It doesn't, however, need to know the <code>id</code> or <code>name</code> of the restaurant in order to do that. Keeping that information out of <code>MaitreD</code>, and instead in the <code>Restaurant</code> wrapper, makes the code simpler and easier to use.
</p>
<p>
The bottom line of all this is that I value encapsulation over 'easy' database mapping.
</p>
<h3 id="3cfc364c9a87463aaac98bc358d062d5">
Limitations of Entity Framework <a href="#3cfc364c9a87463aaac98bc358d062d5">#</a>
</h3>
<p>
The promise of an object-relational mapper is that it automates mapping between objects and database. Is that promise realised?
</p>
<p>
In its current incarnation, <a href="https://stackoverflow.com/q/77039584/126014">it doesn't look as though Entity Framework supports mapping to and from the Domain Model</a>. With the above tweaks, it supports the database schema that I've described, but only via 'Entity classes'. I still have to map to and from the 'Entity objects' and the actual Domain Model. Not much is gained.
</p>
<p>
One should, of course, be careful not drawing too strong inferences from this example. First, proving anything impossible is generally difficult. Just because <em>I</em> can't find a way to do what I want, I can't conclude that it's impossible. That a few other people tell me, too, that it's impossible still doesn't constitute strong evidence.
</p>
<p>
Second, even if it's impossible today, it doesn't follow that it will be impossible forever. Perhaps Entity Framework will support my Domain Model in the future.
</p>
<p>
Third, we can't conclude that just because Entity Framework (currently) doesn't support my Domain Model it follows that no object-relational mapper (ORM) does. There might be another ORM out there that perfectly supports my design, but I'm just not aware of it.
</p>
<p>
Based on my experience and what I see, read, and hear, I don't think any of that likely. Things might change, though.
</p>
<h3 id="4002bb8f9ec64b14958233b16b93ec10">
Net benefit or drawback? <a href="#4002bb8f9ec64b14958233b16b93ec10">#</a>
</h3>
<p>
Perhaps, despite all of this, you still prefer ORMs. You may compare my ADO.NET code to my Entity Framework code and conclude that the EF code still looks simpler. After all, when using ADO.NET I have to jump through some hoops to load the correct tables associated with each restaurant, whereas EF automatically handles that for me. The EF version requires fewer lines of code.
</p>
<p>
In isolation, the fewer lines of code the better. This seems like an argument for using an ORM after all, even if the original promise remains elusive. Take what you can get.
</p>
<p>
On the other hand, when you take on a dependency, there's usually a cost that comes along. A library like Entity Framework isn't free. While you don't pay a licence fee for it, it comes with other costs. You have to learn how to use it, and so do your colleagues. You also have to keep up to date with changes.
</p>
<p>
Every time some exotic requirement comes up, you'll have to spend time investigating how to address it with that ORM's API. This may lead to a game of <a href="https://en.wikipedia.org/wiki/Whac-A-Mole">Whac-A-Mole</a> where every tweak to the ORM leads you further down the rabbit hole, and couples your code tighter with it.
</p>
<p>
You can only keep up with so many things. What's the best investment of your time, and the time of your team mates? Learning and knowing <a href="https://en.wikipedia.org/wiki/SQL">SQL</a>, or learning and keeping up to date with a particular ORM?
</p>
<p>
I learned SQL decades ago, and that knowledge is still useful. On the other hand, I don't even know how many library and framework APIs that I've both learned and forgotten about.
</p>
<p>
As things currently stand, it looks to me as though the net benefit of using a library like Entity Framework is negative. Yes, it might save me a few lines of code, but I'm not ready to pay the costs just outlined.
</p>
<p>
This balance could tip in the future, or my context may change.
</p>
<h3 id="e76ec6ffdf3f44949bab3d86786b26ae">
Conclusion <a href="#e76ec6ffdf3f44949bab3d86786b26ae">#</a>
</h3>
<p>
For the kind of applications that I tend to become involved with, I don't find object-relational mappers particularly useful. When you have a rich Domain Model where the first design priority is encapsulation, assisted by the type system, it looks as though mapping is unavoidable.
</p>
<p>
While you can ask automated tools to generate code that mirrors a database schema (or the other way around), only classes with poor encapsulation are supported. As soon as you do something out of the ordinary like static factory methods or nested objects, apparently Entity Framework gives up.
</p>
<p>
Can we extrapolate from Entity Framework to other ORMs? Can we infer that Entity Framework will never be able to support objects with proper encapsulation, just because it currently doesn't?
</p>
<p>
I can't say, but I'd be surprised if things change soon, if at all. If, on the other hand, it eventually turns out that I can have my cake and eat it too, then why shouldn't I?
</p>
<p>
Until then, however, I don't find that the benefits of ORMs trump the costs of using them.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="75ca5755d2a4445ba4836fc3f6922a5c">
<div class="comment-author">Vlad <a href="#75ca5755d2a4445ba4836fc3f6922a5c">#</a></div>
<div class="comment-content">
<p>
One project I worked on was (among other things) mapping data from database to rich domain objects in the way similar to what is described
in this article. These object knew how to do a lot of things but were dependant on related objects and so everything neded to be loaded in advance from the database in order to ensure correctness.
So having a Order, OrderLine, Person, Address and City, all the rows needed to be loaded in advance, mapped to objects and references set to create the object graph to be able to, say, display shipping
costs based on person's address.
</p>
<p>
The mapping step involved cumbersome manual coding and was error prone because it was easy to forget to load some list or set some reference. Reflecting on that experience, it seems to
me that sacrificing a bit of purity wrt domain modelling and leaning on an ORM to
lazily load the references would have been much more efficient <strong>and</strong> correct.
</p>
<p>
But I guess it all depends on the context..?
</p>
</div>
<div class="comment-date">2023-09-19 13:17 UTC</div>
</div>
<div class="comment" id="58265df2f91c434696a3e0d21fe1b3b1">
<div class="comment-author">qfilip <a href="#58265df2f91c434696a3e0d21fe1b3b1">#</a></div>
<div class="comment-content">
<!-- comment here -->
<p>
Thanks. I've been following recent posts, but I was too lazy to go through the whole PRing things to reply. Maybe that's a good thing, since it forces you to think how to reply, instead of throwing a bunch of words together quickly. Anyways, back to business.
</p>
<p>
I'm not trying to sell one way or the other, because I'm seriously conflicted with both. Since most people on the web tend to fall into ORM category (in .NET world at least), I was simply looking for other perspective, from someone more knowledgable than me.
</p>
<p>
The following is just my thinking out loud...
</p>
<p>
You've used DB-first approach and scaffolding classes from DB schema. With EF core, the usual thing to do, is the opposite. Write classes to scaffold DB schema. Now, this doesn't save us from writing those "relational properties", but it allows us to generate DB update scripts. So if you have a class like:
<pre>
class SomeTable
{
public int Id;
public string Name;
}
</pre>
and you add a field:
<pre>
class SomeTable
{
public int Id;
public string Name;
public DateTime Birthday;
}
</pre>
you can run
<pre>
add-migration MyMigration // generate migration file
update-database // execute it
</pre>
</p>
<p>
This gives you a nice way to track DB chages via Git, but it can also introduce conflicts. Two devs cannot edit the same class/table. You have to be really careful when making commits. Another painful thing to do this way is creating DB views and stored procedures. I've honestly never saw a good solution for it. Maybe trying to do these things is a futile effort in the first place.
</p>
<p>
The whole
<pre>
readByNameSql = @"SELECT [Id], [Name], [OpensAt], [LastSeating], [SeatingDuration]...
</pre>
is giving me heebie jeebies. It is easy to change some column name, and introduce a bug. It might be possible to do stuff with string interpolation, but at that point, I'm thinking about creating my own framework...
</p>
<p>
<blockquote>
The most common reaction is most likely that I'm doing this all wrong. I'm supposed to use the EF classes as my Domain Model. - Mark Seemann
</blockquote>
One of the first things that I was taught on my first job, was to never expose my domain model to the outside world. The domain model being EF Core classes... These days, I'm thinking quite the opposite. EF Core classes are DTOs for the database (with some boilerplate in order for framework to do it's magic). I also <b>want</b> to expose my domain model to the outside world. Why not? That's the contract after all. But the problem with this, is that it adds another layer of mapping. Since my domain model validation is done in class constructor, deserialization of requests becomes a problem. Ideally, it should sit in a static method. But in that case I have: jsonDto -> domainModel -> dbDto. The No-ORM approach also still requires me to map domainModel to command parameters manually. All of this is a tedious, and very error prone process. Especially if you have the case like <a href="#75ca5755d2a4445ba4836fc3f6922a5c">vlad</a> mentioned above.
</p>
<p>
Minor observation on your code. People rarely map things from DB data to domain models when using EF Core. This is a horrible thing to do. Anyone can run a script against a DB, and corrupt the data. It is something I intend to enforce in future projects, if possible. Thank you F# community.
</p>
<p>
I can't think of anything more to say at the moment. Thanks again for a fully-fledged-article reply :). I also recommend <a href="https://www.youtube.com/watch?v=ZYfdjszs8sU">this video</a>. I haven't had the time to try things he is talking about yet.
</p>
</div>
<div class="comment-date">2023-09-21 19:27 UTC</div>
</div>
<div class="comment" id="b3a8702fe15b416fa20f78d1351c8ca4">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#b3a8702fe15b416fa20f78d1351c8ca4">#</a></div>
<div class="comment-content">
<p>
Vlad, qfilip, thank you for writing.
</p>
<p>
I think your comments warrant another article. I'll post an update here later.
</p>
</div>
<div class="comment-date">2023-09-24 15:57 UTC</div>
</div>
<div class="comment" id="359a7bb0d2c14b8eb2dcb2ac6de4897d">
<div class="comment-author">qfilip <a href="#359a7bb0d2c14b8eb2dcb2ac6de4897d">#</a></div>
<div class="comment-content">
<p>
Quick update from me again. I've been thinking and experimenting with several approaches to solve issues I've written about above. How idealized world works and where do we make compromises. Forgive me for the sample types, I couldn't think of anything else. Let's assume we have this table:
<pre>
type Sheikh = {
// db entity data
Id: Guid
CreatedAt: DateTime
ModifiedAt: DateTime
Deleted: bool
// domain data
Name: string
Email: string // unique constraint here
// relational data
Wives: Wife list
Supercars: Supercar list
}
</pre>
</p>
<p>
I've named first 3 fields as "entity data". Why would my domain model contain an ID? It shouldn't care about persistence. I may save it to the DB, write it to a text file, or print it on a piece of paper. Don't care. We put IDs because data usually ends up in a DB. I could have used Email here to serve as an ID, because it should be unique, but we also like to standardize these stuff. All IDs shall be uuids.
</p>
<p>
There are also these "CreatedAt", "ModifiedAt" and "Deleted" columns. This is something I usually do, when I want soft-delete functionality. Denoramalize the data to gain performance. Otherwise, I would need to make... say... EntityStatus table to keep that data, forcing me to do a JOIN for every read operation and additional UPDATE EntityStatus for every write operation. So I kinda sidestep "the good practices" to avoid very real complications.
</p>
<p>
Domain data part is what it is, so I can safely skip that part.
</p>
<p>
Relational data part is the most interesting bit. I think this is what keeps me falling back to EntityFramework and why using "relational properties" are unavoidable. Either that, or I'm missing something.
</p>
<p>
Focusing attention on <code>Sheikh</code> table here, with just 2 relations, there are 4 potential scenarios. I don't want to load stuff from the DB, unless they are required, so the scenarios are:
<ul>
<li>Load <code>Sheikh</code> without relational data</li>
<li>Load <code>Sheikh</code> with <code>Wives</code></li>
<li>Load <code>Sheikh</code> with <code>Supercars</code></li>
<li>Load <code>Sheikh</code> with <code>Wives</code> and <code>Supercars</code></li>
</ul>
</p>
<p>
2<sup>NRelations</sup> I guess? I'm three beers in on this, with only six hours left until corporate clock starts ticking, so my math is probably off.
</p>
<p>
God forbid if any of these relations have their own "inner relations" you may or may not need to load. This is where the (magic mapping/to SQL translations) really start getting useful. There will be some code repetition, but you'll just need to add <code>ThenInclude(x => ...)</code> and you're done.
</p>
<p>
Now the flip side. Focusing attention on <code>Supercar</code> table:
<pre>
type Supercar = {
// db entity data
...
// domain data
Vendor: string
Model: string
HorsePower: int
// relational data
Owner: Sheikh
OwnerId: Guid
}
</pre>
</p>
<p>
Pretty much same as before. Sometimes I'll need <code>Sheikh</code> info, sometimes I won't. One of F# specific problems I'm having is that, records require all fields to be populated. What if I need just SheikhID to perform some domain logic?
<pre>
let tuneSheikhCars (sheikhId) (hpIncrement) (cars) =
cars
|> List.filter (fun x -> x.Owner.Id = sheikhId)
|> List.map (fun x -> x with { HorsePower = x.HorsePower + hpIncrement })
</pre>
</p>
<p>
Similar goes for inserting new <code>Supercar</code>. I want to query-check first if <code>Owner/Sheikh</code> exists, before attempting insertion. You can pass it as a separate parameter, but code gets messier and messier.
</p>
<p>
No matter how I twist and turn things around, in the real world, I'm not only concerned by current steps I need to take to complete a task, but also with possible future steps. Now, I could define a record that only contains relevant data per each request. But, as seen above, I'd be eventually forced to make ~ 2<sup>NRelations</sup> of such records, instead of one. A reusable one, that serves like a bucket for a preferred persistence mechanism, allowing me to load relations later on, because nothing lives in memory long term.
</p>
<p>
I strayed away slightly here from ORM vs no-ORM discussion that I've started earlier. Because, now I realize that this problem isn't just about mapping things from type <code>A</code> to type <code>B</code>.
</p>
</div>
<div class="comment-date">2023-10-08 23:24 UTC</div>
</div>
<div class="comment" id="f8dc3a0d9ca44cbc88de5b773f4679d0">
<div class="comment-author">opcoder <a href="#f8dc3a0d9ca44cbc88de5b773f4679d0">#</a></div>
<div class="comment-content">
I wonder if EF not having all the features we want isn't a false problem. I feel like we try to use the domain entities as DTOs and viceversa, breaking the SRP principle.
But if we start writing DTOs and use them with EF, we would need a layer to map between the DTOs and the entities (AutoMapper might help with this?).
I'm sure this has been discussed before.
</div>
<div class="comment-date">2023-10-09 6:56 UTC</div>
</div>
<div class="comment" id="9af5c8eda58f44fcaa24c22515286be8">
<div class="comment-author">qfilip <a href="#9af5c8eda58f44fcaa24c22515286be8">#</a></div>
<div class="comment-content">
<a href="#f8dc3a0d9ca44cbc88de5b773f4679d0">opcoder</a> not really, no... Automapper(s) should only be used for mapping between two "dumb objects" (DTOs). I wouldn't drag in a library even for that, however, as it's relatively simple to write this thing yourself (with tests) and have full control / zero configuration when you come to a point to create some special projections. As for storing domain models in objects, proper OOP objects, with both data and behaviour, I don't like that either. Single reason for that is: constructors. This is where you pass the data to be validated into a domain model, and this is where OOP has a fatal flaw for me. Constructors can only throw exceptions, giving me no room to maneuver. You can use static methods with <code>ValidationResult<T></code> as a return type, but now we're entering a territory where C#, as a language, is totally unprepared for.
</div>
<div class="comment-date"><time>2023-10-13 17:00 UTC</time></div>
</div>
<div class="comment" id="84a111412d674526b05226f903e81af3">
<div class="comment-author">Iker <a href="#84a111412d674526b05226f903e81af3">#</a></div>
<div class="comment-content">
<p>Just my two cents:</p>
<p>Yes, it is possible to map the <code>NaturalNumber</code> object to an E.F class property using <a href="https://learn.microsoft.com/en-us/ef/core/modeling/value-conversions">ValueConverters</a>. Here are a couple of articles talking about this:</p>
<ul>
<li><a href="https://andrewlock.net/strongly-typed-ids-in-ef-core-using-strongly-typed-entity-ids-to-avoid-primitive-obsession-part-4/">Andrew Lock: Using strongly-typed entity IDs to avoid primitive obsession.</a></li>
<li><a href="https://thomaslevesque.com/2020/12/23/csharp-9-records-as-strongly-typed-ids-part-4-entity-framework-core-integration/">Thomas Levesque: Using C# 9 records as strongly-typed ids.</a></li>
</ul>
<p>But even though you can use this, you may still encounter another use cases that you cannot tackle. E.F is just a tool with its limitations, and there will be things you can do with simple C# that you can not do with E.F.</p>
<p>I think you need to consider why you want to use E.F, understand its strengths and weaknesses, and then decide if it suits your project.</p>
<p>Do you want to use EF solely as a data access layer, or do you want it to be your domain layer?. Maybe for a big project you can use only E.F as a data access layer and use old plain C# files for domain layer. In a [small | medium | quick & dirty] project use as your domain layer.</p>
<p>There are bad thing we already know:</p>
<ul>
<li>Increased complexity.</li>
<li>There will be things you can not do. So you must be carefull you will not need something E.F can not give you.</li>
<li>You need to know how it works. For example, know that accessing <code>myRestaurant.MaitreD</code> implies a new database access (if you have not loaded it previously).</li>
</ul>
<p>But sometimes E.F shines, for example:</p>
<ul>
<li>You are programing against the E.F model, not against a specific database, so it is easier to migrate to another database.</li>
<li>Maybe migrate to another database is rare, but it is very convenient to run tests against an in-memory SQLite database. Tests against a real database can be run in the CD/CI environment, for example.</li>
<li>Having a centralized point to process changes (<code>SaveChanges</code>) allows you to easily do interesting things: save "CreatedBy," "CreatedDate," "ModifiedBy," and "ModifiedDate" fields for all tables or create historical tables (if you do not have access to the<a href="https://learn.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver16"> SQL Server temporal tables</a>).</li>
<li>Global query filters allow you to make your application multi-tenant with very little code: all tables implement <code>IByClient</code>, a global filter for theses tables... and voilà, your application becomes multi-client with just a few lines.</li>
</ul>
<p>I am not a E.F defender, in fact I have a love & hate reletaionship with it. But I believe it is a powerful tool for certain projects. As always, the important thing is to think whether it is the right tool for your specific project :)</p>
</div>
<div class="comment-date">2023-10-15 16:43 UTC</div>
</div>
<div class="comment" id="0c76c456b47e42ec872603996ba1cfc0">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#0c76c456b47e42ec872603996ba1cfc0">#</a></div>
<div class="comment-content">
<p>
Thank you, all, for writing. There's more content in your comments than I can address in one piece, but I've written a follow-up article that engages with some of your points: <a href="/2023/10/23/domain-model-first">Domain Model first</a>.
</p>
<p>
Specifically regarding the point of having to hand-write a lot of code to deal with multiple tables joined in various fashions, I grant that while <a href="/2018/09/17/typing-is-not-a-programming-bottleneck">typing isn't a bottleneck</a>, the more code you add, the greater the risk of bugs. I'm not trying to be dismissive of ORMs as a general tool. If you truly, inescapably, have a relational model, then an ORM seems like a good choice. If so, however, I don't see that you can get good encapsulation at the same time.
</p>
<p>
And indeed, an important responsibility of a software architect is to consider trade-offs to find a good solution for a particular problem. Sometimes such a solution involves an ORM, but sometimes, it doesn't. In my world, it usually doesn't.
</p>
<p>
Do I breathe rarefied air, dealing with esoteric problems that mere mortals can never hope to encounter? I don't think so. Rather, I offer the interpretation that I sometimes approach problems in a different way. All I really try to do with these articles is to present to the public the ways I think about problems. I hope, then, that it may inspire other people to consider problems from more than one angle.
</p>
<p>
Finally, from my various customer engagements I get the impression that people also like ORMs because 'entity classes' look strongly typed. As a counter-argument, I suggest that <a href="/2023/10/16/at-the-boundaries-static-types-are-illusory">this may be an illusion</a>.
</p>
</div>
<div class="comment-date">2023-10-23 06:45 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.A first stab at the Brainfuck katahttps://blog.ploeh.dk/2023/09/11/a-first-stab-at-the-brainfuck-kata2023-09-11T08:07:00+00:00Mark Seemann
<div id="post">
<p>
<em>I almost gave up, but persevered and managed to produce something that works.</em>
</p>
<p>
As I've <a href="/2023/08/28/a-first-crack-at-the-args-kata">previously mentioned</a>, a customer hired me to swing by to demonstrate test-driven development and <a href="https://stackoverflow.blog/2022/12/19/use-git-tactically/">tactical Git</a>. To make things interesting, we agreed that they'd give me a <a href="https://en.wikipedia.org/wiki/Kata_(programming)">kata</a> at the beginning of the session. I didn't know which problem they'd give me, so I thought it'd be a good idea to come prepared. I decided to seek out katas that I hadn't done before.
</p>
<p>
The demonstration session was supposed to be two hours in front of a participating audience. In order to make my preparation aligned to that situation, I decided to impose a two-hour time limit to see how far I could get. At the same time, I'd also keep an eye on didactics, so preferably proceeding in an order that would be explainable to an audience.
</p>
<p>
Some katas are more complicated than others, so I'm under no illusion that I can complete any, to me unknown, kata in under two hours. My success criterion for the time limit is that I'd like to reach a point that would satisfy an audience. Even if, after two hours, I don't reach a complete solution, I should leave a creative and intelligent audience with a good idea of how to proceed.
</p>
<p>
After a few other katas, I ran into the <a href="https://codingdojo.org/kata/Brainfuck/">Brainfuck</a> kata one Thursday. In this article, I'll describe some of the most interesting things that happened along the way. If you want all the details, the code is <a href="https://github.com/ploeh/BrainfuckCSharp">available on GitHub</a>.
</p>
<h3 id="24034bdadc7c4f9798ada181f99cd46a">
Understanding the problem <a href="#24034bdadc7c4f9798ada181f99cd46a">#</a>
</h3>
<p>
I had heard about <a href="https://en.wikipedia.org/wiki/Brainfuck">Brainfuck</a> before, but never tried to write an interpreter (or a program, for that matter).
</p>
<p>
The <a href="https://codingdojo.org/kata/Brainfuck/">kata description</a> lacks examples, so I decided to search for them elsewhere. The <a href="https://en.wikipedia.org/wiki/Brainfuck">wikipedia article</a> comes with some examples of small programs (including <a href="https://en.wikipedia.org/wiki/%22Hello,_World!%22_program">Hello, World</a>), so ultimately I used that for reference instead of the kata page.
</p>
<p>
I'm happy I wasn't making my first pass on this problem in front of an audience. I spent the first 45 minutes just trying to understand the examples.
</p>
<p>
You might find me slow, since the rules of the language aren't that complicated. I was, however, confused by the way the examples were presented.
</p>
<p>
As the wikipedia article explains, in order to add two numbers together, one can use this idiom:
</p>
<p>
<pre>[->+<]</pre>
</p>
<p>
The article then proceeds to list a small, complete program that adds two numbers. This program adds numbers this way:
</p>
<p>
<pre>[ Start your loops with your cell pointer on the loop counter (c1 in our case)
< + Add 1 to c0
> - Subtract 1 from c1
] End your loops with the cell pointer on the loop counter</pre>
</p>
<p>
I couldn't understand why this annotated 'walkthrough' explained the idiom in reverse. Several times, I was on the verge of giving up, feeling that I made absolutely no progress. Finally, it dawned on me that the second example is <em>not</em> an explanation of the first example, but rather a separate example that makes use of the same idea, but expresses it in a different way.
</p>
<p>
Most programming languages have more than one way to do things, and this is also the case here. <code>[->+<]</code> adds two numbers together, but so does <code>[<+>-]</code>.
</p>
<p>
Once you understand something, it can be difficult to recall why you ever found it confusing. Now that I get this, I'm having trouble explaining what I was originally thinking, and why it confused me.
</p>
<p>
This experience does, however, drive home a point for educators: When you introduce a concept and then provide examples, the first example should be a direct continuation of the introduction, and not some variation. Variations are fine, too, but should follow later and be clearly labelled.
</p>
<p>
After 45 minutes I had finally cracked the code and was ready to get programming.
</p>
<h3 id="4903839fcc6a4d75917b97026359cdc6">
Getting started <a href="#4903839fcc6a4d75917b97026359cdc6">#</a>
</h3>
<p>
The <a href="https://codingdojo.org/kata/Brainfuck/">kata description</a> suggests starting with the <code>+</code>, <code>-</code>, <code>></code>, and <code><</code> instructions to manage memory. I briefly considered that, but on the other hand, I wanted to have some test coverage. Usually, I take advantage of test-driven development, and I while I wasn't sure how to proceed, I wanted to have some tests.
</p>
<p>
If I were to exclusively start with memory management, I would need some way to inspect the memory in order to write assertions. This struck me as violating <a href="/encapsulation-and-solid">encapsulation</a>.
</p>
<p>
Instead, I thought that I'd write the simplest program that would produce some output, because if I had output, I would have something to verify.
</p>
<p>
That, on the other hand, meant that I had to consider how to model input and output. The Wikipedia article describes these as
</p>
<blockquote>
<p>
"two streams of bytes for input and output (most often connected to a keyboard and a monitor respectively, and using the ASCII character encoding)."
</p>
<footer><cite><a href="https://en.wikipedia.org/wiki/Brainfuck">Wikipedia</a></cite></footer>
</blockquote>
<p>
Knowing that <a href="https://learn.microsoft.com/archive/blogs/ploeh/console-unit-testing">you can model the console's input and output streams as polymorphic objects</a>, I decided to model the output as a <a href="https://learn.microsoft.com/dotnet/api/system.io.textwriter">TextWriter</a>. The lowest-valued printable <a href="https://en.wikipedia.org/wiki/ASCII">ASCII</a> character is space, which has the byte value <code>32</code>, so I wrote this test:
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:#a31515;">"++++++++++++++++++++++++++++++++."</span>, <span style="color:#a31515;">" "</span>)] <span style="color:green;">// 32 increments; ASCII 32 is space</span>
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Run</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">program</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">expected</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">output</span> = <span style="color:blue;">new</span> StringWriter();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> BrainfuckInterpreter(output);
sut.Run(program);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = output.ToString();
Assert.Equal(expected, actual);
}</pre>
</p>
<p>
As you can see, I wrote the test as a <code>[Theory]</code> (parametrised test) from the outset, since I predicted that I'd add more test cases soon. Strictly speaking, when following the <a href="/2019/10/21/a-red-green-refactor-checklist">red-green-refactor checklist</a>, you shouldn't write more code than absolutely necessary. According to <a href="https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it">YAGNI</a>, you should avoid <a href="https://wiki.c2.com/?SpeculativeGenerality">speculative generality</a>.
</p>
<p>
Sometimes, however, you've gone through a process so many times that you know, with near certainty, what happens next. I've done test-driven development for decades, so I occasionally allow my experience to trump the rules.
</p>
<p>
The Brainfuck program in the <code>[InlineData]</code> attribute increments the same data cell 32 times (you can count the plusses) and then outputs its value. The <code>expected</code> output is the space character, since it has the ASCII code <code>32</code>.
</p>
<p>
What's the simplest thing that could possibly work? Something like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">BrainfuckInterpreter</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> StringWriter output;
<span style="color:blue;">public</span> <span style="color:#2b91af;">BrainfuckInterpreter</span>(StringWriter <span style="font-weight:bold;color:#1f377f;">output</span>)
{
<span style="color:blue;">this</span>.output = output;
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Run</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">program</span>)
{
output.Write(<span style="color:#a31515;">' '</span>);
}
}</pre>
</p>
<p>
As is typical with test-driven development (TDD), the first few tests help you design the API, but not the implementation, which, here, is deliberately naive.
</p>
<p>
Since I felt pressed for time, having already spent 45 minutes of my two-hour time limit getting to grips with the problem, I suppose I lingered less on the <em>refactoring</em> phase than perhaps I should have. You'll notice, at least, that the <code>BrainfuckInterpreter</code> class depends on <a href="https://learn.microsoft.com/dotnet/api/system.io.stringwriter">StringWriter</a> rather than its abstract parent class <code>TextWriter</code>, which was the original plan.
</p>
<p>
It's not a disastrous mistake, so when I later discovered it, I easily rectified it.
</p>
<h3 id="301838c8419640358d52babb6d8e04b8">
Implementation outline <a href="#301838c8419640358d52babb6d8e04b8">#</a>
</h3>
<p>
To move on, I added another test case:
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:#a31515;">"++++++++++++++++++++++++++++++++."</span>, <span style="color:#a31515;">" "</span>)] <span style="color:green;">// 32 increments; ASCII 32 is space</span>
[InlineData(<span style="color:#a31515;">"+++++++++++++++++++++++++++++++++."</span>, <span style="color:#a31515;">"!"</span>)] <span style="color:green;">// 33 increments; ASCII 32 is !</span>
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Run</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">program</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">expected</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">output</span> = <span style="color:blue;">new</span> StringWriter();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> BrainfuckInterpreter(output);
sut.Run(program);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = output.ToString();
Assert.Equal(expected, actual);
}</pre>
</p>
<p>
The only change is the addition of the second <code>[InlineData]</code> attribute, which supplies a slightly different Brainfuck program. This one has 33 increments, which corresponds to the ASCII character code for an exclamation mark.
</p>
<p>
Notice that I clearly copied and pasted the comment, but forgot to change the last <code>32</code> to <code>33</code>.
</p>
<p>
In my eagerness to pass both tests, and because I felt the clock ticking, I made another classic TDD mistake: I took too big a step. At this point, it would have been enough to iterate over the program's characters, count the number of plusses, and convert that number to a character. What I did instead was this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">BrainfuckInterpreter</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> StringWriter output;
<span style="color:blue;">public</span> <span style="color:#2b91af;">BrainfuckInterpreter</span>(StringWriter <span style="font-weight:bold;color:#1f377f;">output</span>)
{
<span style="color:blue;">this</span>.output = output;
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Run</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">program</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">imp</span> = <span style="color:blue;">new</span> InterpreterImp(program, output);
imp.Run();
}
<span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">InterpreterImp</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">int</span> programPointer;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">byte</span>[] data;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">string</span> program;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> StringWriter output;
<span style="color:blue;">internal</span> <span style="color:#2b91af;">InterpreterImp</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">program</span>, StringWriter <span style="font-weight:bold;color:#1f377f;">output</span>)
{
data = <span style="color:blue;">new</span> <span style="color:blue;">byte</span>[30_000];
<span style="color:blue;">this</span>.program = program;
<span style="color:blue;">this</span>.output = output;
}
<span style="color:blue;">internal</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Run</span>()
{
<span style="font-weight:bold;color:#8f08c4;">while</span> (!IsDone)
InterpretInstruction();
}
<span style="color:blue;">private</span> <span style="color:blue;">bool</span> IsDone => program.Length <= programPointer;
<span style="color:blue;">private</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">InterpretInstruction</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">instruction</span> = program[programPointer];
<span style="font-weight:bold;color:#8f08c4;">switch</span> (instruction)
{
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'+'</span>:
data[0]++;
programPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'.'</span>:
output.Write((<span style="color:blue;">char</span>)data[0]);
programPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">default</span>:
programPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
}
}
}</pre>
</p>
<p>
With only two test cases, all that code isn't warranted, but I was more focused on implementing an interpreter than on moving in small steps. Even with decades of TDD experience, discipline sometimes slips. Or maybe exactly because of it.
</p>
<p>
Once again, I was fortunate enough that this implementation structure turned out to work all the way, but the point of the TDD process is that you can't always know that.
</p>
<p>
You may wonder why I decided to delegate the work to an inner class. I did that because I expected to have to maintain a <code>programPointer</code> over the actual <code>program</code>, and having a class that interprets <em>one</em> program has better encapsulation. I'll remind the reader than when I use the word <em>encapsulation</em>, I don't necessarily mean <em>information hiding</em>. Usually, I think in terms of <em>contracts</em>: Invariants, pre-, and postconditions.
</p>
<p>
With this design, the <code>program</code> is guaranteed to be present as a class field, since it's <code>readonly</code> and assigned upon initialisation. No <a href="/2013/07/08/defensive-coding">defensive coding</a> is required.
</p>
<h3 id="630304784a59460e836bd9d773e56b60">
Remaining memory-management instructions <a href="#630304784a59460e836bd9d773e56b60">#</a>
</h3>
<p>
While I wasn't planning on making use of the <a href="/2019/10/07/devils-advocate">Devil's advocate</a> technique, I did leave one little deliberate mistake in the above implementation: I'd hardcoded the data pointer as <code>0</code>.
</p>
<p>
This made it easy to choose the next test case, and the next one after that, and so on.
</p>
<p>
At the two-hour mark, I had these test cases:
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:#a31515;">"++++++++++++++++++++++++++++++++."</span>, <span style="color:#a31515;">" "</span>)] <span style="color:green;">// 32 increments; ASCII 32 is space</span>
[InlineData(<span style="color:#a31515;">"+++++++++++++++++++++++++++++++++."</span>, <span style="color:#a31515;">"!"</span>)] <span style="color:green;">// 33 increments; ASCII 32 is !</span>
[InlineData(<span style="color:#a31515;">"+>++++++++++++++++++++++++++++++++."</span>, <span style="color:#a31515;">" "</span>)] <span style="color:green;">// 32 increments after >; ASCII 32 is space</span>
[InlineData(<span style="color:#a31515;">"+++++++++++++++++++++++++++++++++-."</span>, <span style="color:#a31515;">" "</span>)] <span style="color:green;">// 33 increments and 1 decrement; ASCII 32</span>
[InlineData(<span style="color:#a31515;">">+<++++++++++++++++++++++++++++++++."</span>, <span style="color:#a31515;">" "</span>)] <span style="color:green;">// 32 increments after movement; ASCII 32</span>
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Run</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">program</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">expected</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">output</span> = <span style="color:blue;">new</span> StringWriter();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> BrainfuckInterpreter(output);
sut.Run(program);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = output.ToString();
Assert.Equal(expected, actual);
}</pre>
</p>
<p>
And this implementation:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">InterpreterImp</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">int</span> programPointer;
<span style="color:blue;">private</span> <span style="color:blue;">int</span> dataPointer;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">byte</span>[] data;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">string</span> program;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> StringWriter output;
<span style="color:blue;">internal</span> <span style="color:#2b91af;">InterpreterImp</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">program</span>, StringWriter <span style="font-weight:bold;color:#1f377f;">output</span>)
{
data = <span style="color:blue;">new</span> <span style="color:blue;">byte</span>[30_000];
<span style="color:blue;">this</span>.program = program;
<span style="color:blue;">this</span>.output = output;
}
<span style="color:blue;">internal</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Run</span>()
{
<span style="font-weight:bold;color:#8f08c4;">while</span> (!IsDone)
InterpretInstruction();
}
<span style="color:blue;">private</span> <span style="color:blue;">bool</span> IsDone => program.Length <= programPointer;
<span style="color:blue;">private</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">InterpretInstruction</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">instruction</span> = program[programPointer];
<span style="font-weight:bold;color:#8f08c4;">switch</span> (instruction)
{
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'>'</span>:
dataPointer++;
programPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'<'</span>:
dataPointer--;
programPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'+'</span>:
data[dataPointer]++;
programPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'-'</span>:
data[dataPointer]--;
programPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'.'</span>:
output.Write((<span style="color:blue;">char</span>)data[dataPointer]);
programPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">default</span>:
programPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
}
}
}</pre>
</p>
<p>
I'm only showing the inner <code>InterpreterImp</code> class, since I didn't change the outer <code>BrainfuckInterpreter</code> class.
</p>
<p>
At this point, I had used my two hours, but I think that I managed to leave my imaginary audience with a sketch of a possible solution.
</p>
<h3 id="ff7d802a-14bb-4c1d-bb39-1ecaa9059f03">
Jumps <a href="#ff7d802a-14bb-4c1d-bb39-1ecaa9059f03">#</a>
</h3>
<p>
What remained was the jumping instructions <code>[</code> and <code>]</code>, as well as input.
</p>
<p>
Perhaps I could have kept adding small <code>[InlineData]</code> test cases to my single test method, but I thought I was ready to take on some of the small example programs on the Wikipedia page. I started with the addition example in this manner:
</p>
<p>
<pre> <span style="color:green;">// Copied from https://en.wikipedia.org/wiki/Brainfuck</span>
<span style="color:blue;">const</span> <span style="color:blue;">string</span> addTwoProgram = <span style="color:maroon;">@"
++ Cell c0 = 2
> +++++ Cell c1 = 5
[ Start your loops with your cell pointer on the loop counter (c1 in our case)
< + Add 1 to c0
> - Subtract 1 from c1
] End your loops with the cell pointer on the loop counter
At this point our program has added 5 to 2 leaving 7 in c0 and 0 in c1
but we cannot output this value to the terminal since it is not ASCII encoded
To display the ASCII character ""7"" we must add 48 to the value 7
We use a loop to compute 48 = 6 * 8
++++ ++++ c1 = 8 and this will be our loop counter again
[
< +++ +++ Add 6 to c0
> - Subtract 1 from c1
]
< . Print out c0 which has the value 55 which translates to ""7""!"</span>;
[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">AddTwoValues</span>()
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">input</span> = <span style="color:blue;">new</span> StringReader(<span style="color:#a31515;">""</span>);
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">output</span> = <span style="color:blue;">new</span> StringWriter();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> BrainfuckInterpreter(input, output);
sut.Run(addTwoProgram);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = output.ToString();
Assert.Equal(<span style="color:#a31515;">"7"</span>, actual);
}</pre>
</p>
<p>
I got that test passing, added the next example, got that passing, and so on. My final implementation looks like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">BrainfuckInterpreter</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> TextReader input;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> TextWriter output;
<span style="color:blue;">public</span> <span style="color:#2b91af;">BrainfuckInterpreter</span>(TextReader <span style="font-weight:bold;color:#1f377f;">input</span>, TextWriter <span style="font-weight:bold;color:#1f377f;">output</span>)
{
<span style="color:blue;">this</span>.input = input;
<span style="color:blue;">this</span>.output = output;
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Run</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">program</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">imp</span> = <span style="color:blue;">new</span> InterpreterImp(program, input, output);
imp.Run();
}
<span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">InterpreterImp</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">int</span> instructionPointer;
<span style="color:blue;">private</span> <span style="color:blue;">int</span> dataPointer;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">byte</span>[] data;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">string</span> program;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> TextReader input;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> TextWriter output;
<span style="color:blue;">internal</span> <span style="color:#2b91af;">InterpreterImp</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">program</span>, TextReader <span style="font-weight:bold;color:#1f377f;">input</span>, TextWriter <span style="font-weight:bold;color:#1f377f;">output</span>)
{
data = <span style="color:blue;">new</span> <span style="color:blue;">byte</span>[30_000];
<span style="color:blue;">this</span>.program = program;
<span style="color:blue;">this</span>.input = input;
<span style="color:blue;">this</span>.output = output;
}
<span style="color:blue;">internal</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Run</span>()
{
<span style="font-weight:bold;color:#8f08c4;">while</span> (!IsDone)
InterpretInstruction();
}
<span style="color:blue;">private</span> <span style="color:blue;">bool</span> IsDone => program.Length <= instructionPointer;
<span style="color:blue;">private</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">InterpretInstruction</span>()
{
WrapDataPointer();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">instruction</span> = program[instructionPointer];
<span style="font-weight:bold;color:#8f08c4;">switch</span> (instruction)
{
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'>'</span>:
dataPointer++;
instructionPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'<'</span>:
dataPointer--;
instructionPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'+'</span>:
data[dataPointer]++;
instructionPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'-'</span>:
data[dataPointer]--;
instructionPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'.'</span>:
output.Write((<span style="color:blue;">char</span>)data[dataPointer]);
instructionPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">','</span>:
data[dataPointer] = (<span style="color:blue;">byte</span>)input.Read();
instructionPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">'['</span>:
<span style="font-weight:bold;color:#8f08c4;">if</span> (data[dataPointer] == 0)
MoveToMatchingClose();
<span style="font-weight:bold;color:#8f08c4;">else</span>
instructionPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">case</span> <span style="color:#a31515;">']'</span>:
<span style="font-weight:bold;color:#8f08c4;">if</span> (data[dataPointer] != 0)
MoveToMatchingOpen();
<span style="font-weight:bold;color:#8f08c4;">else</span>
instructionPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
<span style="font-weight:bold;color:#8f08c4;">default</span>:
instructionPointer++;
<span style="font-weight:bold;color:#8f08c4;">break</span>;
}
}
<span style="color:blue;">private</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">WrapDataPointer</span>()
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (dataPointer == -1)
dataPointer = data.Length - 1;
<span style="font-weight:bold;color:#8f08c4;">if</span> (dataPointer == data.Length)
dataPointer = 0;
}
<span style="color:blue;">private</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">MoveToMatchingClose</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">nestingLevel</span> = 1;
<span style="font-weight:bold;color:#8f08c4;">while</span> (0 < nestingLevel)
{
instructionPointer++;
<span style="font-weight:bold;color:#8f08c4;">if</span> (program[instructionPointer] == <span style="color:#a31515;">'['</span>)
nestingLevel++;
<span style="font-weight:bold;color:#8f08c4;">if</span> (program[instructionPointer] == <span style="color:#a31515;">']'</span>)
nestingLevel--;
}
instructionPointer++;
}
<span style="color:blue;">private</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">MoveToMatchingOpen</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">nestingLevel</span> = 1;
<span style="font-weight:bold;color:#8f08c4;">while</span> (0 < nestingLevel)
{
instructionPointer--;
<span style="font-weight:bold;color:#8f08c4;">if</span> (program[instructionPointer] == <span style="color:#a31515;">']'</span>)
nestingLevel++;
<span style="font-weight:bold;color:#8f08c4;">if</span> (program[instructionPointer] == <span style="color:#a31515;">'['</span>)
nestingLevel--;
}
instructionPointer++;
}
}
}</pre>
</p>
<p>
As you can see, I finally discovered that I'd been too concrete when using <code>StringWriter</code>. Now, <code>input</code> is defined as a <a href="https://learn.microsoft.com/dotnet/api/system.io.textreader">TextReader</a>, and <code>output</code> as a <code>TextWriter</code>.
</p>
<p>
When <a href="https://learn.microsoft.com/dotnet/api/system.io.textreader.read">TextReader.Read</a> encounters the end of the input stream, it returns <code>-1</code>, and when you cast that to <code>byte</code>, it becomes <code>255</code>. I admit that I haven't read through the Wikipedia article's <em>ROT13</em> example code to a degree that I understand how it decides to stop processing, but the test passes.
</p>
<p>
I also realised that the Wikipedia article used the term <em>instruction pointer</em>, so I renamed <code>programPointer</code> to <code>instructionPointer</code>.
</p>
<h3 id="868485959d2a4adfbdaa5e34de5c0484">
Assessment <a href="#868485959d2a4adfbdaa5e34de5c0484">#</a>
</h3>
<p>
Due to the <code>switch/case</code> structure, the <code>InterpretInstruction</code> method has a <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a> of <em>12</em>, which is more than I recommend in my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>.
</p>
<p>
It's not uncommon that <code>switch/case</code> code has high cyclomatic complexity, and this is also a common criticism of the measure. When each <code>case</code> block is as simple as it is here, or delegates to helper methods such as <code>MoveToMatchingClose</code>, you could reasonably argue that the code is still maintainable.
</p>
<p>
<a href="/ref/refactoring">Refactoring</a> lists switch statements as a code smell and suggests better alternatives. Had I followed the kata description's <em>additional constraints</em> to the letter, I should also have made it easy to add new instructions, or rename existing ones. This might suggest that one of <a href="https://martinfowler.com/">Martin Fowler</a>'s refactorings might be in order.
</p>
<p>
That is, however, an entirely different kind of exercise, and I thought that I'd already gotten what I wanted out of the kata.
</p>
<h3 id="9ed50e79927541a18a53a37d3c810442">
Conclusion <a href="#9ed50e79927541a18a53a37d3c810442">#</a>
</h3>
<p>
At first glance, the Brainfuck language isn't difficult to understand (but onerous to read). Even so, it took me so long time to understand the example code that I almost gave up more than once. Still, once I understood how it worked, the interpreter actually wasn't that hard to write.
</p>
<p>
In retrospect, perhaps I should have structured my code differently. Perhaps I should have used polymorphism instead of a switch statement. Perhaps I should have written the code in a more functional style. Regular readers will at least recognise that the code shown here is uncharacteristically imperative for me. I do, however, try to vary my approach to fit the problem at hand (<em>use the right tool for the job</em>, as the old saw goes), and the Brainfuck language is described in so imperative terms that imperative code seemed like the most fitting style.
</p>
<p>
Now that I understand how Brainfuck works, I might later try to <a href="/2020/01/13/on-doing-katas">do the kata with some other constraints</a>. It might prove interesting.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Decomposing CTFiYH's sample code basehttps://blog.ploeh.dk/2023/09/04/decomposing-ctfiyhs-sample-code-base2023-09-04T06:00:00+00:00Mark Seemann
<div id="post">
<p>
<em>An experience report.</em>
</p>
<p>
In my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> (CTFiYH) I write in the last chapter:
</p>
<blockquote>
<p>
If you've looked at the book's sample code base, you may have noticed that it looks disconcertingly monolithic. If you consider the full code base that includes the integration tests, as [the following figure] illustrates, there are all of three packages[...]. Of those, only one is production code.
</p>
<p>
<img src="/content/binary/ctfiyh-monolith-architecture.png" alt="Three boxes labelled unit tests, integration tests, and REST API.">
</p>
<p>
[Figure caption:] The packages that make up the sample code base. With only a single production package, it reeks of a monolith.
</p>
<footer><a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>, subsection 16.2.1.</footer>
</blockquote>
<p>
Later, after discussing dependency cycles, I conclude:
</p>
<blockquote>
<p>
I've been writing F# and Haskell for enough years that I naturally follow the beneficial rules that they enforce. I'm confident that the sample code is nicely decoupled, even though it's packaged as a monolith. But unless you have a similar experience, I recommend that you separate your code base into multiple packages.
</p>
<footer><a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>, subsection 16.2.2.</footer>
</blockquote>
<p>
Usually, you can't write something that cocksure without it backfiring, but I was really, really confident that this was the case. Still, it's always nagged me, because I believe that I should walk the walk rather than talk the talk. And I do admit that this was one of the few claims in the book that I actually didn't have code to back up.
</p>
<p>
So I decided to spend part of a weekend to verify that what I wrote was true. You won't believe what happened next.
</p>
<h3 id="850effbb523248579dd4c382ed75f923">
Decomposition <a href="#850effbb523248579dd4c382ed75f923">#</a>
</h3>
<p>
Reader, I was right all along.
</p>
<p>
I stopped my experiment when my package graph looked like this:
</p>
<p>
<img src="/content/binary/ctfiyh-decomposed-architecture.png" alt="Ports-and-adapters architecture diagram.">
</p>
<p>
Does that look familiar? It should; it's a poster example of <a href="/2013/12/03/layers-onions-ports-adapters-its-all-the-same">Ports and Adapters</a>, or, if you will, <a href="/ref/clean-architecture">Clean Architecture</a>. Notice how all dependencies point inward, following the <a href="https://en.wikipedia.org/wiki/Dependency_inversion_principle">Dependency Inversion Principle</a>.
</p>
<p>
The Domain Model has no dependencies, while the HTTP Model (Application Layer) only depends on the Domain Model. The outer layer contains the ports and adapters, as well as the <a href="/2011/07/28/CompositionRoot">Composition Root</a>. The Web Host is a small web project that composes everything else. In order to do that, it must reference everything else, either directly (SMTP, SQL, HTTP Model) or transitively (Domain Model).
</p>
<p>
The <a href="https://en.wikipedia.org/wiki/Adapter_pattern">Adapters</a>, on the other hand, depend on the HTTP Model and (not shown) the SDKs that they adapt. The <code>SqlReservationsRepository</code> class, for example, is implemented in the SQL library, adapting the <code>System.Data.SqlClient</code> SDK to look like an <code>IReservationsRepository</code>, which is defined in the HTTP Model.
</p>
<p>
The SMTP library is similar. It contains a concrete implementation called <code>SmtpPostOffice</code> that adapts the <code>System.Net.Mail</code> API to look like an <code>IPostOffice</code> object. Once again, the <code>IPostOffice</code> interface is defined in the HTTP Model.
</p>
<p>
The above figure is not to scale. In reality, the outer ring is quite thin. The SQL library contains only <code>SqlReservationsRepository</code> and some supporting text files with SQL <a href="https://en.wikipedia.org/wiki/Data_definition_language">DDL</a> definitions. The SMTP library contains only the <code>SmtpPostOffice</code> class. And the Web Host contains <code>Program</code>, <code>Startup</code>, and a few configuration file <a href="https://en.wikipedia.org/wiki/Data_transfer_object">DTOs</a> (<em>options</em>, in ASP.NET parlance).
</p>
<h3 id="813610f5bf6446cfb1daaa22423df14c">
Application layer <a href="#813610f5bf6446cfb1daaa22423df14c">#</a>
</h3>
<p>
The majority of code, at least if I use a rough proxy measure like number of files, is in the HTTP Model. I often think of this as the <em>application layer</em>, because it's all the logic that's specific to to the application, in contrast to the Domain Model, which ought to contain code that can be used in a variety of application contexts (REST API, web site, batch job, etc.).
</p>
<p>
In this particular, case the application is a REST API, and it turns out that while the Domain Model isn't trivial, more goes on making sure that the REST API behaves correctly: That it returns correctly formatted data, that it <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">validates input</a>, that it <a href="/2020/11/09/checking-signed-urls-with-aspnet">detects attempts at tampering with URLs</a>, that it <a href="/2020/11/16/redirect-legacy-urls">redirects legacy URLs</a>, etc.
</p>
<p>
This layer also contains the <a href="https://stackoverflow.blog/2022/01/03/favor-real-dependencies-for-unit-testing/">interfaces for the application's real dependencies</a>: <code>IReservationsRepository</code>, <code>IPostOffice</code>, <code>IRestaurantDatabase</code>, and <code>IClock</code>. This explains why the SQL and SMTP packages need to reference the HTTP Model.
</p>
<p>
If you have bought the book, you have access to its example code base, and while it's a Git repository, this particular experiment isn't included. After all, I just made it, two years after finishing the book. Thus, if you want to compare with the code that comes with the book, here's a list of all the files I moved to the HTTP Model package:
</p>
<ul>
<li>AccessControlList.cs</li>
<li>CalendarController.cs</li>
<li>CalendarDto.cs</li>
<li>Day.cs</li>
<li>DayDto.cs</li>
<li>DtoConversions.cs</li>
<li>EmailingReservationsRepository.cs</li>
<li>Grandfather.cs</li>
<li>HomeController.cs</li>
<li>HomeDto.cs</li>
<li>Hypertext.cs</li>
<li>IClock.cs</li>
<li>InMemoryRestaurantDatabase.cs</li>
<li>IPeriod.cs</li>
<li>IPeriodVisitor.cs</li>
<li>IPostOffice.cs</li>
<li>IReservationsRepository.cs</li>
<li>IRestaurantDatabase.cs</li>
<li>Iso8601.cs</li>
<li>LinkDto.cs</li>
<li>LinksFilter.cs</li>
<li>LoggingClock.cs</li>
<li>LoggingPostOffice.cs</li>
<li>LoggingReservationsRepository.cs</li>
<li>Month.cs</li>
<li>NullPostOffice.cs</li>
<li>Period.cs</li>
<li>ReservationDto.cs</li>
<li>ReservationsController.cs</li>
<li>ReservationsRepository.cs</li>
<li>RestaurantDto.cs</li>
<li>RestaurantsController.cs</li>
<li>ScheduleController.cs</li>
<li>SigningUrlHelper.cs</li>
<li>SigningUrlHelperFactory.cs</li>
<li>SystemClock.cs</li>
<li>TimeDto.cs</li>
<li>UrlBuilder.cs</li>
<li>UrlIntegrityFilter.cs</li>
<li>Year.cs</li>
</ul>
<p>
As you can see, this package contains the Controllers, the DTOs, the interfaces, and some REST- and HTTP-specific code such as <a href="/2020/11/02/signing-urls-with-aspnet">SigningUrlHelper</a>, <a href="/2020/11/09/checking-signed-urls-with-aspnet">UrlIntegrityFilter</a>, <a href="/2020/08/24/adding-rest-links-as-a-cross-cutting-concern">LinksFilter</a>, security, <a href="https://en.wikipedia.org/wiki/ISO_8601">ISO 8601</a> formatters, etc.
</p>
<h3 id="eace4360f69240c0b8077a3c1e236473">
Domain Model <a href="#eace4360f69240c0b8077a3c1e236473">#</a>
</h3>
<p>
The Domain Model is small, but not insignificant. Perhaps the most striking quality of it is that (with a single, inconsequential exception) it contains no interfaces. There are no polymorphic types that model application dependencies such as databases, web services, messaging systems, or the system clock. Those are all the purview of the application layer.
</p>
<p>
As the book describes, the architecture is <a href="https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell">functional core, imperative shell</a>, and since <a href="/2017/01/27/from-dependency-injection-to-dependency-rejection">dependencies make everything impure</a>, you can't have those in your functional core. While, with a language like C#, <a href="/2020/02/24/discerning-and-maintaining-purity">you can never be sure that a function truly is pure</a>, I believe that the entire Domain Model is <a href="https://en.wikipedia.org/wiki/Referential_transparency">referentially transparent</a>.
</p>
<p>
For those readers who have the book's sample code base, here's a list of the files I moved to the Domain Model:
</p>
<ul>
<li>Email.cs</li>
<li>ITableVisitor.cs</li>
<li>MaitreD.cs</li>
<li>Name.cs</li>
<li>Reservation.cs</li>
<li>ReservationsVisitor.cs</li>
<li>Restaurant.cs</li>
<li>Seating.cs</li>
<li>Table.cs</li>
<li>TimeOfDay.cs</li>
<li>TimeSlot.cs</li>
</ul>
<p>
If the entire Domain Model consists of immutable values and <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a>, and if impure dependencies make everything impure, what's the <code>ITableVisitor</code> interface doing there?
</p>
<p>
This interface doesn't model any external application dependency, but rather represents <a href="/2018/06/25/visitor-as-a-sum-type">a sum type with the Visitor pattern</a>. The interface looks like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">ITableVisitor</span><<span style="color:#2b91af;">T</span>>
{
T <span style="font-weight:bold;color:#74531f;">VisitStandard</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">seats</span>, Reservation? <span style="font-weight:bold;color:#1f377f;">reservation</span>);
T <span style="font-weight:bold;color:#74531f;">VisitCommunal</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">seats</span>, IReadOnlyCollection<Reservation> <span style="font-weight:bold;color:#1f377f;">reservations</span>);
}</pre>
</p>
<p>
Restaurant tables are modelled this way because the Domain Model distinguishes between two fundamentally different kinds of tables: Normal restaurant tables, and communal or shared tables. In <a href="https://fsharp.org/">F#</a> or <a href="https://www.haskell.org/">Haskell</a> such a <a href="https://en.wikipedia.org/wiki/Tagged_union">sum type</a> would be a one-liner, but in C# you need to use either the <a href="https://en.wikipedia.org/wiki/Visitor_pattern">Visitor pattern</a> or a <a href="/2018/05/22/church-encoding">Church encoding</a>. For the book, I chose the Visitor pattern in order to keep the code base as object-oriented as possible.
</p>
<h3 id="c76f92f5413246958cfd31a6edc44845">
Circular dependencies <a href="#c76f92f5413246958cfd31a6edc44845">#</a>
</h3>
<p>
In the book I wrote:
</p>
<blockquote>
<p>
The passive prevention of cycles [that comes from separate packages] is worth the extra complexity. Unless team members have extensive experience with a language that prevents cycles, I recommend this style of architecture.
</p>
<p>
Such languages do exist, though. F# famously prevents cycles. In it, you can't use a piece of code unless it's already defined above. Newcomers to the language see this as a terrible flaw, but it's actually one of its <a href="https://fsharpforfunandprofit.com/posts/cycles-and-modularity-in-the-wild/">best</a> <a href="http://evelinag.com/blog/2014/06-09-comparing-dependency-networks/">features</a>.
</p>
<p>
Haskell takes a different approach, but ultimately, its explicit treatment of side effects at the type level <a href="/2016/03/18/functional-architecture-is-ports-and-adapters">steers you towards a ports-and-adapters-style architecture</a>. Your code simply doesn't compile otherwise!
</p>
<p>
I've been writing F# and Haskell for enough years that I naturally follow the beneficial rules that they enforce. I'm confident that the sample code is nicely decoupled, even though it's packaged as a monolith. But unless you have a similar experience, I recommend that you separate your code base into multiple packages.
</p>
<footer><a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>, subsection 16.2.2.</footer>
</blockquote>
<p>
As so far demonstrated in this article, I'm now sufficiently conditioned to be aware of side effects and non-determinism that I know to avoid them and push them to be boundaries of the application. Even so, it turns out that it's insidiously easy to introduce small cycles when the language doesn't stop you.
</p>
<p>
This wasn't much of a problem in the Domain Model, but one small example may still illustrate how easy it is to let your guard down. In the Domain Model, I'd added a class called <code>TimeOfDay</code> (since this code base predates <a href="https://learn.microsoft.com/dotnet/api/system.timeonly">TimeOnly</a>), but without thinking much of it, I'd added this method:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">string</span> <span style="font-weight:bold;color:#74531f;">ToIso8601TimeString</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> durationSinceMidnight.ToIso8601TimeString();
}</pre>
</p>
<p>
While this doesn't look like much, formatting a time value as an ISO 8601 value isn't a Domain concern. It's an application boundary concern, so belongs in the HTTP Model. And sure enough, once I moved the file that contained the ISO 8601 conversions to the HTTP Model, the <code>TimeOfDay</code> class no longer compiled.
</p>
<p>
In this case, the fix was easy. I removed the method from the <code>TimeOfDay</code> class, but added an extension method to the other ISO 8601 conversions:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">string</span> <span style="font-weight:bold;color:#74531f;">ToIso8601TimeString</span>(<span style="color:blue;">this</span> TimeOfDay <span style="font-weight:bold;color:#1f377f;">timeOfDay</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> ((TimeSpan)timeOfDay).ToIso8601TimeString();
}</pre>
</p>
<p>
While I had little trouble moving the files to the Domain Model one-by-one, once I started moving files to the HTTP Model, it turned out that this part of the code base contained more coupling.
</p>
<p>
Since I had made many classes and interfaces <a href="/2021/03/01/pendulum-swing-internal-by-default">internal by default</a>, once I started moving types to the HTTP Model, I was faced with either making them public, or move them en block. Ultimately, I decided to move eighteen files that were transitively linked to each other in one go. I could have moved them in smaller chunks, but that would have made the <code>internal</code> types invisible to the code that (temporarily) stayed behind. I decided to move them all at once. After all, while I prefer to <a href="https://stackoverflow.blog/2022/12/19/use-git-tactically/">move in small, controlled steps</a>, even moving eighteen files isn't that big an operation.
</p>
<p>
In the end, I still had to make <code>LinksFilter</code>, <code>UrlIntegrityFilter</code>, <code>SigningUrlHelperFactory</code>, and <code>AccessControlList.FromUser</code> public, because I needed to reference them from the Web Host.
</p>
<h3 id="f3da2544f60b4e7f8abc441e3ddaaa7f">
Test dependencies <a href="#f3da2544f60b4e7f8abc441e3ddaaa7f">#</a>
</h3>
<p>
You may have noticed that in the above diagram, it doesn't look as though I separated the two test packages into more packages, and that is, indeed, the case. I've recently described <a href="/2023/07/31/test-driving-the-pyramids-top">how I think about distinguishing kinds of tests</a>, and I don't really care too much whether an automated test exercises only a single function, or a whole bundle of objects. What I do care about is whether a test is simple or complex, fast or slow. That kind of thing.
</p>
<p>
The package labelled "Integration tests" on the diagram is really a small test library that exercises some SQL Server-specific behaviour. Some of the tests in it verify that certain race conditions don't occur. They do that by keep trying to make the race condition occur, until they time out. Since the timeout is 30 seconds per test, this test suite is <em>slow</em>. That's the reason it's a separate library, even though it contains only eight tests. The book contains more details.
</p>
<p>
The "Unit tests" package contains the bulk of the tests: 176 tests, <a href="/2021/02/15/when-properties-are-easier-than-examples">some of which</a> are <a href="https://fscheck.github.io/FsCheck/">FsCheck</a> properties that each run a hundred test cases.
</p>
<p>
Some tests are <a href="/2021/01/25/self-hosted-integration-tests-in-aspnet">self-hosted integration tests</a> that rely on the Web Host, and some of them are more 'traditional' unit tests. Dependencies are transitive, so I drew an arrow from the "Unit tests" package to the Web Host. Some unit tests exercise objects in the HTTP Model, and some exercise the Domain Model.
</p>
<p>
You may have another question: If the Integration tests reference the SQL package, then why not the SMTP package? Why is it okay that the unit tests reference the SMTP package?
</p>
<p>
Again, I want to reiterate that the reason I distinguished between these two test packages were related to execution speed rather than architecture. The few SMTP tests are fast enough, so there's no reason to keep them in a separate package.
</p>
<p>
In fact, the SMTP tests don't exercise that the <code>SmtpPostOffice</code> can send email. Rather, I treat that class as a <a href="http://xunitpatterns.com/Humble%20Object.html">Humble Object</a>. The few tests that I do have only verify that the system can parse configuration settings:
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:#a31515;">"m.example.net"</span>, 587, <span style="color:#a31515;">"grault"</span>, <span style="color:#a31515;">"garply"</span>, <span style="color:#a31515;">"g@example.org"</span>)]
[InlineData(<span style="color:#a31515;">"n.example.net"</span>, 465, <span style="color:#a31515;">"corge"</span>, <span style="color:#a31515;">"waldo"</span>, <span style="color:#a31515;">"c@example.org"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">ToPostOfficeReturnsSmtpOffice</span>(
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">host</span>,
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">port</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">userName</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">password</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">from</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = Create.SmtpOptions(host, port, userName, password, from);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> InMemoryRestaurantDatabase();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = sut.ToPostOffice(db);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = <span style="color:blue;">new</span> SmtpPostOffice(host, port, userName, password, from, db);
Assert.Equal(expected, actual);
}</pre>
</p>
<p>
Notice, by the way, the <a href="/2021/05/03/structural-equality-for-better-tests">use of structural equality on a service</a>. Consider doing that more often.
</p>
<p>
In any case, the separation of automated tests into two packages may not be the final iteration. It's worked well so far, in this context, but it's possible that had things been different, I would have chosen to have more test packages.
</p>
<h3 id="d61ec0b4390b470ba9b3340be8f902f4">
Conclusion <a href="#d61ec0b4390b470ba9b3340be8f902f4">#</a>
</h3>
<p>
In the book, I made a bold claim: Although I had developed the example code as a monolith, I asserted that I'd been so careful that I could easily tease it apart into multiple packages should I chose to do so.
</p>
<p>
This sounded so much like <a href="https://en.wikipedia.org/wiki/Hubris">hubris</a> that I was trepidatious writing it. I wrote it anyway, because, while I hadn't tried, I was convinced that I was right.
</p>
<p>
Still, it nagged me ever since. What if, after all, I was wrong? I've been wrong before.
</p>
<p>
So I decided to finally make the experiment, and to my relief it turned out that I'd been right.
</p>
<p>
Don't try this at home, kids.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.A first crack at the Args katahttps://blog.ploeh.dk/2023/08/28/a-first-crack-at-the-args-kata2023-08-28T07:28:00+00:00Mark Seemann
<div id="post">
<p>
<em>Test-driven development in C#.</em>
</p>
<p>
A customer hired me to swing by to demonstrate test-driven development and <a href="https://stackoverflow.blog/2022/12/19/use-git-tactically/">tactical Git</a>. To make things interesting, we agreed that they'd give me a <a href="https://en.wikipedia.org/wiki/Kata_(programming)">kata</a> at the beginning of the session. I didn't know which problem they'd give me, so I thought it'd be a good idea to come prepared. I decided to seek out katas that I hadn't done before.
</p>
<p>
The demonstration session was supposed to be two hours in front of a participating audience. In order to make my preparation aligned to that situation, I decided to impose a two-hour time limit to see how far I could get. At the same time, I'd also keep an eye on didactics, so preferably proceeding in an order that would be explainable to an audience.
</p>
<p>
Some katas are more complicated than others, so I'm under no illusion that I can complete any, to me unknown, kata in under two hours. My success criterion for the time limit is that I'd like to reach a point that would satisfy an audience. Even if, after two hours, I don't reach a complete solution, I should leave a creative and intelligent audience with a good idea of how to proceed.
</p>
<p>
The first kata I decided to try was the <a href="https://codingdojo.org/kata/Args/">Args kata</a>. In this article, I'll describe some of the most interesting things that happened along the way. If you want all the details, the code is <a href="https://github.com/ploeh/ArgsCSharp">available on GitHub</a>.
</p>
<h3 id="e51a29225774493ca6b20b6dde4c0f3e">
Boolean parser <a href="#e51a29225774493ca6b20b6dde4c0f3e">#</a>
</h3>
<p>
In short, the goal of the Args kata is to develop an API for parsing command-line arguments.
</p>
<p>
When you encounter a new problem, it's common to have a few false starts until you develop a promising plan. This happened to me as well, but after a few attempts that I quickly stashed, I realised that this is really a validation problem - as in <a href="https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/">parse, don't validate</a>.
</p>
<p>
The first thing I did after that realisation was to copy verbatim the <code>Validated</code> code from <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">An applicative reservation validation example in C#</a>. I consider it fair game to reuse general-purpose code like this for a kata.
</p>
<p>
With that basic building block available, I decided to start with a parser that would handle Boolean flags. My reasoning was that this might be the simplest parser, since it doesn't have many failure modes. If the flag is present, the value should be interpreted to be <code>true</code>; otherwise, <code>false</code>.
</p>
<p>
Over a series of iterations, I developed this parametrised <a href="https://xunit.net/">xUnit.net</a> test:
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:#a31515;">'l'</span>, <span style="color:#a31515;">"-l"</span>, <span style="color:blue;">true</span>)]
[InlineData(<span style="color:#a31515;">'l'</span>, <span style="color:#a31515;">" -l "</span>, <span style="color:blue;">true</span>)]
[InlineData(<span style="color:#a31515;">'l'</span>, <span style="color:#a31515;">"-l -p 8080 -d /usr/logs"</span>, <span style="color:blue;">true</span>)]
[InlineData(<span style="color:#a31515;">'l'</span>, <span style="color:#a31515;">"-p 8080 -l -d /usr/logs"</span>, <span style="color:blue;">true</span>)]
[InlineData(<span style="color:#a31515;">'l'</span>, <span style="color:#a31515;">"-p 8080 -d /usr/logs"</span>, <span style="color:blue;">false</span>)]
[InlineData(<span style="color:#a31515;">'l'</span>, <span style="color:#a31515;">"-l true"</span>, <span style="color:blue;">true</span>)]
[InlineData(<span style="color:#a31515;">'l'</span>, <span style="color:#a31515;">"-l false"</span>, <span style="color:blue;">false</span>)]
[InlineData(<span style="color:#a31515;">'l'</span>, <span style="color:#a31515;">"nonsense"</span>, <span style="color:blue;">false</span>)]
[InlineData(<span style="color:#a31515;">'f'</span>, <span style="color:#a31515;">"-f"</span>, <span style="color:blue;">true</span>)]
[InlineData(<span style="color:#a31515;">'f'</span>, <span style="color:#a31515;">"foo"</span>, <span style="color:blue;">false</span>)]
[InlineData(<span style="color:#a31515;">'f'</span>, <span style="color:#a31515;">""</span>, <span style="color:blue;">false</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">TryParseSuccess</span>(<span style="color:blue;">char</span> <span style="color:#1f377f;">flagName</span>, <span style="color:blue;">string</span> <span style="color:#1f377f;">candidate</span>, <span style="color:blue;">bool</span> <span style="color:#1f377f;">expected</span>)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">sut</span> = <span style="color:blue;">new</span> BoolParser(flagName);
<span style="color:blue;">var</span> <span style="color:#1f377f;">actual</span> = sut.TryParse(candidate);
Assert.Equal(Validated.Succeed<<span style="color:blue;">string</span>, <span style="color:blue;">bool</span>>(expected), actual);
}</pre>
</p>
<p>
To be clear, this test started as a <code>[Fact]</code> (single, non-parametrised test) that I subsequently converted to a parametrised test, and then added more and more test cases to.
</p>
<p>
The final implementation of <code>BoolParser</code> looks like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">BoolParser</span> : IParser<<span style="color:blue;">bool</span>>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">char</span> flagName;
<span style="color:blue;">public</span> <span style="color:#2b91af;">BoolParser</span>(<span style="color:blue;">char</span> <span style="color:#1f377f;">flagName</span>)
{
<span style="color:blue;">this</span>.flagName = flagName;
}
<span style="color:blue;">public</span> Validated<<span style="color:blue;">string</span>, <span style="color:blue;">bool</span>> <span style="color:#74531f;">TryParse</span>(<span style="color:blue;">string</span> <span style="color:#1f377f;">candidate</span>)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">idx</span> = candidate.IndexOf(<span style="color:#a31515;">$"-</span>{flagName}<span style="color:#a31515;">"</span>);
<span style="color:#8f08c4;">if</span> (idx < 0)
<span style="color:#8f08c4;">return</span> Validated.Succeed<<span style="color:blue;">string</span>, <span style="color:blue;">bool</span>>(<span style="color:blue;">false</span>);
<span style="color:blue;">var</span> <span style="color:#1f377f;">nextFlagIdx</span> = candidate[(idx + 2)..].IndexOf(<span style="color:#a31515;">'-'</span>);
<span style="color:blue;">var</span> <span style="color:#1f377f;">bFlag</span> = nextFlagIdx < 0
? candidate[(idx + 2)..]
: candidate.Substring(idx + 2, nextFlagIdx);
<span style="color:#8f08c4;">if</span> (<span style="color:blue;">bool</span>.TryParse(bFlag, <span style="color:blue;">out</span> var <span style="color:#1f377f;">b</span>))
<span style="color:#8f08c4;">return</span> Validated.Succeed<<span style="color:blue;">string</span>, <span style="color:blue;">bool</span>>(b);
<span style="color:#8f08c4;">return</span> Validated.Succeed<<span style="color:blue;">string</span>, <span style="color:blue;">bool</span>>(<span style="color:blue;">true</span>);
}
}</pre>
</p>
<p>
This may not be the most elegant solution, but it passes all tests. Since I was under time pressure, I didn't want to spend too much time polishing the implementation details. As longs as I'm comfortable with the API design and the test cases, I can always refactor later. (I usually say that <em>later is never</em>, which also turned out to be true this time. On the other hand, it's not that the implementation code is <em>awful</em> in any way. It has a cyclomatic complexity of <em>4</em> and fits within a <a href="/2019/11/04/the-80-24-rule">80 x 20 box</a>. It could be much worse.)
</p>
<p>
The <code>IParser</code> interface came afterwards. It wasn't driven by the above test, but by later developments.
</p>
<h3 id="2d94e2103d6f405f89e4bd0b4e1f2ff3">
Rough proof of concept <a href="#2d94e2103d6f405f89e4bd0b4e1f2ff3">#</a>
</h3>
<p>
Once I had a passable implementation of <code>BoolParser</code>, I developed a similar <code>IntParser</code> to a degree where it supported a happy path. With two parsers, I had enough building blocks to demonstrate how to combine them. At that point, I also had some 40 minutes left, so it was time to produce something that might look useful.
</p>
<p>
At first, I wanted to demonstrate that it's possible to combine the two parsers, so I wrote this test:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">ParseBoolAndIntProofOfConceptRaw</span>()
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">args</span> = <span style="color:#a31515;">"-l -p 8080"</span>;
<span style="color:blue;">var</span> <span style="color:#1f377f;">l</span> = <span style="color:blue;">new</span> BoolParser(<span style="color:#a31515;">'l'</span>).TryParse(args).SelectFailure(<span style="color:#1f377f;">s</span> => <span style="color:blue;">new</span>[] { s });
<span style="color:blue;">var</span> <span style="color:#1f377f;">p</span> = <span style="color:blue;">new</span> IntParser(<span style="color:#a31515;">'p'</span>).TryParse(args).SelectFailure(<span style="color:#1f377f;">s</span> => <span style="color:blue;">new</span>[] { s });
Func<<span style="color:blue;">bool</span>, <span style="color:blue;">int</span>, (<span style="color:blue;">bool</span>, <span style="color:blue;">int</span>)> <span style="color:#1f377f;">createTuple</span> = (<span style="color:#1f377f;">b</span>, <span style="color:#1f377f;">i</span>) => (b, i);
<span style="color:blue;">static</span> <span style="color:blue;">string</span>[] <span style="color:#74531f;">combineErrors</span>(<span style="color:blue;">string</span>[] <span style="color:#1f377f;">s1</span>, <span style="color:blue;">string</span>[] <span style="color:#1f377f;">s2</span>) => s1.Concat(s2).ToArray();
<span style="color:blue;">var</span> <span style="color:#1f377f;">actual</span> = createTuple.Apply(l, combineErrors).Apply(p, combineErrors);
Assert.Equal(Validated.Succeed<<span style="color:blue;">string</span>[], (<span style="color:blue;">bool</span>, <span style="color:blue;">int</span>)>((<span style="color:blue;">true</span>, 8080)), actual);
}</pre>
</p>
<p>
That's really not pretty, and I wouldn't expect an unsuspecting audience to understand what's going on. It doesn't help that C# is inadequate for <a href="/2018/10/01/applicative-functors">applicative functors</a>. While it's possible to implement <a href="/2018/11/05/applicative-validation">applicative validation</a>, the C# API is awkward. (There are ways to make it better than what's on display here, but keep in mind that I came into this exercise unprepared, and had to grab what was closest at hand.)
</p>
<p>
The main point of the above test was only to demonstrate that it's possible to combine two parsers into one. That took me roughly 15 minutes.
</p>
<p>
Armed with that knowledge, I then proceeded to define this base class:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">abstract</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ArgsParser</span><<span style="color:#2b91af;">T1</span>, <span style="color:#2b91af;">T2</span>, <span style="color:#2b91af;">T</span>>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IParser<T1> parser1;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IParser<T2> parser2;
<span style="color:blue;">public</span> <span style="color:#2b91af;">ArgsParser</span>(IParser<T1> <span style="color:#1f377f;">parser1</span>, IParser<T2> <span style="color:#1f377f;">parser2</span>)
{
<span style="color:blue;">this</span>.parser1 = parser1;
<span style="color:blue;">this</span>.parser2 = parser2;
}
<span style="color:blue;">public</span> Validated<<span style="color:blue;">string</span>[], T> <span style="color:#74531f;">TryParse</span>(<span style="color:blue;">string</span> <span style="color:#1f377f;">candidate</span>)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">l</span> = parser1.TryParse(candidate).SelectFailure(<span style="color:#1f377f;">s</span> => <span style="color:blue;">new</span>[] { s });
<span style="color:blue;">var</span> <span style="color:#1f377f;">p</span> = parser2.TryParse(candidate).SelectFailure(<span style="color:#1f377f;">s</span> => <span style="color:blue;">new</span>[] { s });
Func<T1, T2, T> <span style="color:#1f377f;">create</span> = Create;
<span style="color:#8f08c4;">return</span> create.Apply(l, CombineErrors).Apply(p, CombineErrors);
}
<span style="color:blue;">protected</span> <span style="color:blue;">abstract</span> T <span style="color:#74531f;">Create</span>(T1 <span style="color:#1f377f;">x1</span>, T2 <span style="color:#1f377f;">x2</span>);
<span style="color:blue;">private</span> <span style="color:blue;">static</span> <span style="color:blue;">string</span>[] <span style="color:#74531f;">CombineErrors</span>(<span style="color:blue;">string</span>[] <span style="color:#1f377f;">s1</span>, <span style="color:blue;">string</span>[] <span style="color:#1f377f;">s2</span>)
{
<span style="color:#8f08c4;">return</span> s1.Concat(s2).ToArray();
}
}</pre>
</p>
<p>
While I'm not a fan of inheritance, this seemed the fasted way to expand on the proof of concept. The class encapsulates the ugly details of the <code>ParseBoolAndIntProofOfConceptRaw</code> test, while leaving just enough room for a derived class:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ProofOfConceptParser</span> : ArgsParser<<span style="color:blue;">bool</span>, <span style="color:blue;">int</span>, (<span style="color:blue;">bool</span>, <span style="color:blue;">int</span>)>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">ProofOfConceptParser</span>() : <span style="color:blue;">base</span>(<span style="color:blue;">new</span> BoolParser(<span style="color:#a31515;">'l'</span>), <span style="color:blue;">new</span> IntParser(<span style="color:#a31515;">'p'</span>))
{
}
<span style="color:blue;">protected</span> <span style="color:blue;">override</span> (<span style="color:blue;">bool</span>, <span style="color:blue;">int</span>) <span style="color:#74531f;">Create</span>(<span style="color:blue;">bool</span> <span style="color:#1f377f;">x1</span>, <span style="color:blue;">int</span> <span style="color:#1f377f;">x2</span>)
{
<span style="color:#8f08c4;">return</span> (x1, x2);
}
}</pre>
</p>
<p>
This class only defines which parsers to use and how to translate successful results to a single object. Here, because this is still a proof of concept, the resulting object is just a tuple.
</p>
<p>
The corresponding test looks like this:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">ParseBoolAndIntProofOfConcept</span>()
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ProofOfConceptParser();
<span style="color:blue;">var</span> <span style="color:#1f377f;">actual</span> = sut.TryParse(<span style="color:#a31515;">"-l -p 8080"</span>);
Assert.Equal(Validated.Succeed<<span style="color:blue;">string</span>[], (<span style="color:blue;">bool</span>, <span style="color:blue;">int</span>)>((<span style="color:blue;">true</span>, 8080)), actual);
}</pre>
</p>
<p>
At this point, I hit the two-hour mark, but I think I managed to produce enough code to convince a hypothetical audience that a complete solution is within grasp.
</p>
<p>
What remained was to
</p>
<ul>
<li>add proper error handling to <code>IntParser</code></li>
<li>add a corresponding <code>StringParser</code></li>
<li>improve the <code>ArgsParser</code> API</li>
<li>add better demo examples of the improved <code>ArgsParser</code> API</li>
</ul>
<p>
While I could leave this as an exercise to the reader, I couldn't just leave the code like that.
</p>
<h3 id="0a1a987c02f8421785382a6973eccd47">
Finishing the kata <a href="#0a1a987c02f8421785382a6973eccd47">#</a>
</h3>
<p>
For my own satisfaction, I decided to complete the kata, which I did in another hour.
</p>
<p>
Although I had started with an abstract base class, I know <a href="/2018/02/19/abstract-class-isomorphism">how to refactor it to a <code>sealed</code> class with an injected Strategy</a>. I did that for the existing class, and also added one that supports three parsers instead of two:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ArgsParser</span><<span style="color:#2b91af;">T1</span>, <span style="color:#2b91af;">T2</span>, <span style="color:#2b91af;">T3</span>, <span style="color:#2b91af;">T</span>>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IParser<T1> parser1;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IParser<T2> parser2;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IParser<T3> parser3;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> Func<T1, T2, T3, T> create;
<span style="color:blue;">public</span> <span style="color:#2b91af;">ArgsParser</span>(
IParser<T1> <span style="color:#1f377f;">parser1</span>,
IParser<T2> <span style="color:#1f377f;">parser2</span>,
IParser<T3> <span style="color:#1f377f;">parser3</span>,
Func<T1, T2, T3, T> <span style="color:#1f377f;">create</span>)
{
<span style="color:blue;">this</span>.parser1 = parser1;
<span style="color:blue;">this</span>.parser2 = parser2;
<span style="color:blue;">this</span>.parser3 = parser3;
<span style="color:blue;">this</span>.create = create;
}
<span style="color:blue;">public</span> Validated<<span style="color:blue;">string</span>[], T> <span style="color:#74531f;">TryParse</span>(<span style="color:blue;">string</span> <span style="color:#1f377f;">candidate</span>)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">x1</span> = parser1.TryParse(candidate).SelectFailure(<span style="color:#1f377f;">s</span> => <span style="color:blue;">new</span>[] { s });
<span style="color:blue;">var</span> <span style="color:#1f377f;">x2</span> = parser2.TryParse(candidate).SelectFailure(<span style="color:#1f377f;">s</span> => <span style="color:blue;">new</span>[] { s });
<span style="color:blue;">var</span> <span style="color:#1f377f;">x3</span> = parser3.TryParse(candidate).SelectFailure(<span style="color:#1f377f;">s</span> => <span style="color:blue;">new</span>[] { s });
<span style="color:#8f08c4;">return</span> create
.Apply(x1, CombineErrors)
.Apply(x2, CombineErrors)
.Apply(x3, CombineErrors);
}
<span style="color:blue;">private</span> <span style="color:blue;">static</span> <span style="color:blue;">string</span>[] <span style="color:#74531f;">CombineErrors</span>(<span style="color:blue;">string</span>[] <span style="color:#1f377f;">s1</span>, <span style="color:blue;">string</span>[] <span style="color:#1f377f;">s2</span>)
{
<span style="color:#8f08c4;">return</span> s1.Concat(s2).ToArray();
}
}</pre>
</p>
<p>
Granted, that's a bit of boilerplate, but if you imagine this as supplied by a reusable library, you only have to write this once.
</p>
<p>
I was now ready to parse the kata's central example, <code>"-l -p 8080 -d /usr/logs"</code>, to a strongly typed value:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">record</span> <span style="color:#2b91af;">TestConfig</span>(<span style="color:blue;">bool</span> <span style="color:#1f377f;">DoLog</span>, <span style="color:blue;">int</span> <span style="color:#1f377f;">Port</span>, <span style="color:blue;">string</span> <span style="color:#1f377f;">Directory</span>);
[Theory]
[InlineData(<span style="color:#a31515;">"-l -p 8080 -d /usr/logs"</span>)]
[InlineData(<span style="color:#a31515;">"-p 8080 -l -d /usr/logs"</span>)]
[InlineData(<span style="color:#a31515;">"-d /usr/logs -l -p 8080"</span>)]
[InlineData(<span style="color:#a31515;">" -d /usr/logs -l -p 8080 "</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">ParseConfig</span>(<span style="color:blue;">string</span> <span style="color:#1f377f;">args</span>)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ArgsParser<<span style="color:blue;">bool</span>, <span style="color:blue;">int</span>, <span style="color:blue;">string</span>, TestConfig>(
<span style="color:blue;">new</span> BoolParser(<span style="color:#a31515;">'l'</span>),
<span style="color:blue;">new</span> IntParser(<span style="color:#a31515;">'p'</span>),
<span style="color:blue;">new</span> StringParser(<span style="color:#a31515;">'d'</span>),
(<span style="color:#1f377f;">b</span>, <span style="color:#1f377f;">i</span>, <span style="color:#1f377f;">s</span>) => <span style="color:blue;">new</span> TestConfig(b, i, s));
<span style="color:blue;">var</span> <span style="color:#1f377f;">actual</span> = sut.TryParse(args);
Assert.Equal(
Validated.Succeed<<span style="color:blue;">string</span>[], TestConfig>(
<span style="color:blue;">new</span> TestConfig(<span style="color:blue;">true</span>, 8080, <span style="color:#a31515;">"/usr/logs"</span>)),
actual);
}</pre>
</p>
<p>
This test parses some variations of the example input into an immutable <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/builtin-types/record">record</a>.
</p>
<p>
What happens if the input is malformed? Here's an example of that:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">FailToParseConfig</span>()
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ArgsParser<<span style="color:blue;">bool</span>, <span style="color:blue;">int</span>, <span style="color:blue;">string</span>, TestConfig>(
<span style="color:blue;">new</span> BoolParser(<span style="color:#a31515;">'l'</span>),
<span style="color:blue;">new</span> IntParser(<span style="color:#a31515;">'p'</span>),
<span style="color:blue;">new</span> StringParser(<span style="color:#a31515;">'d'</span>),
(<span style="color:#1f377f;">b</span>, <span style="color:#1f377f;">i</span>, <span style="color:#1f377f;">s</span>) => <span style="color:blue;">new</span> TestConfig(b, i, s));
<span style="color:blue;">var</span> <span style="color:#1f377f;">actual</span> = sut.TryParse(<span style="color:#a31515;">"-p aityaity"</span>);
Assert.True(actual.Match(
onFailure: <span style="color:#1f377f;">ss</span> => ss.Contains(<span style="color:#a31515;">"Expected integer for flag '-p', but got \"aityaity\"."</span>),
onSuccess: <span style="color:#1f377f;">_</span> => <span style="color:blue;">false</span>));
Assert.True(actual.Match(
onFailure: <span style="color:#1f377f;">ss</span> => ss.Contains(<span style="color:#a31515;">"Missing value for flag '-d'."</span>),
onSuccess: <span style="color:#1f377f;">_</span> => <span style="color:blue;">false</span>));
}</pre>
</p>
<p>
Of particular interest is that, as promised by applicative validation, parsing failures don't short-circuit. The input value <code>"-p aityaity"</code> has two problems, and both are reported by <code>TryParse</code>.
</p>
<p>
At this point I was happy that I had sufficiently demonstrated the viability of the design. I decided to call it a day.
</p>
<h3 id="0d715ed2918d48c3840fe2f3ac7a6966">
Conclusion <a href="#0d715ed2918d48c3840fe2f3ac7a6966">#</a>
</h3>
<p>
As I did the Args kata, I found it interesting enough to warrant an article. Once I realised that I could use applicative parsing as the basis for the API, the rest followed.
</p>
<p>
There's room for improvement, but while <a href="/2020/01/13/on-doing-katas">doing katas</a> is valuable, there are marginal returns in perfecting the code. Get the main functionality working, learn from it, and move on to another exercise.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Compile-time type-checked truth tableshttps://blog.ploeh.dk/2023/08/21/compile-time-type-checked-truth-tables2023-08-21T08:07:00+00:00Mark Seemann
<div id="post">
<p>
<em>With simple and easy-to-understand examples in F# and Haskell.</em>
</p>
<p>
<a href="https://blog.testdouble.com/authors/eve-ragins/">Eve Ragins</a> recently published an article called <a href="https://blog.testdouble.com/posts/2023-08-14-using-truth-tables/">Why you should use truth tables in your job</a>. It's a good article. You should read it.
</p>
<p>
In it, she outlines how creating a <a href="https://en.wikipedia.org/wiki/Truth_table">Truth Table</a> can help you smoke out edge cases or unclear requirements.
</p>
<p>
I agree, and it also beautifully explains why I find <a href="https://en.wikipedia.org/wiki/Algebraic_data_type">algebraic data types</a> so useful.
</p>
<p>
With languages like <a href="https://fsharp.org/">F#</a> or <a href="https://www.haskell.org/">Haskell</a>, this kind of modelling is part of the language, and you even get statically-typed compile-time checking that tells you whether you've handled all combinations.
</p>
<p>
Eve Ragins points out that there are other, socio-technical benefits from drawing up a truth table that you can, perhaps, print out, or otherwise share with non-technical stakeholders. Thus, the following is in no way meant as a full replacement, but rather as examples of how certain languages have affordances that enable you to think like this while programming.
</p>
<h3 id="9e4397fd77a043db8043dac6aff81c4a">
F# <a href="#9e4397fd77a043db8043dac6aff81c4a">#</a>
</h3>
<p>
I'm not going to go through Eve Ragins' blow-by-blow walkthrough, explaining how you construct a truth table. Rather, I'm just briefly going to show how simple it is to do the same in F#.
</p>
<p>
Most of the inputs in her example are Boolean values, which already exist in the language, but we need a type for the item status:
</p>
<p>
<pre><span style="color:blue;">type</span> ItemStatus = NotAvailable | Available | InUse</pre>
</p>
<p>
As is typical in F#, a type declaration is just a one-liner.
</p>
<p>
Now for something a little more interesting. In Eve Ragins' final table, there's a footnote that says that the dash/minus symbol indicates that the value is irrelevant. If you look a little closer, it turns out that the <code>should_field_be_editable</code> value is irrelevant whenever the <code>should_field_show</code> value is <code>FALSE</code>.
</p>
<p>
So instead of a <code>bool * bool</code> tuple, you really have a three-state type like this:
</p>
<p>
<pre><span style="color:blue;">type</span> FieldState = Hidden | ReadOnly | ReadWrite</pre>
</p>
<p>
It would probably have taken a few iterations to learn this if you'd jumped straight into pattern matching in F#, but since F# requires you to define types and functions before you can use them, I list the type now.
</p>
<p>
That's all you need to produce a truth table in F#:
</p>
<p>
<pre><span style="color:blue;">let</span> decide requiresApproval canUserApprove itemStatus =
<span style="color:blue;">match</span> requiresApproval, canUserApprove, itemStatus <span style="color:blue;">with</span>
| <span style="color:blue;">true</span>, <span style="color:blue;">true</span>, NotAvailable <span style="color:blue;">-></span> Hidden
| <span style="color:blue;">false</span>, <span style="color:blue;">true</span>, NotAvailable <span style="color:blue;">-></span> Hidden
| <span style="color:blue;">true</span>, <span style="color:blue;">false</span>, NotAvailable <span style="color:blue;">-></span> Hidden
| <span style="color:blue;">false</span>, <span style="color:blue;">false</span>, NotAvailable <span style="color:blue;">-></span> Hidden
| <span style="color:blue;">true</span>, <span style="color:blue;">true</span>, Available <span style="color:blue;">-></span> ReadWrite
| <span style="color:blue;">false</span>, <span style="color:blue;">true</span>, Available <span style="color:blue;">-></span> Hidden
| <span style="color:blue;">true</span>, <span style="color:blue;">false</span>, Available <span style="color:blue;">-></span> ReadOnly
| <span style="color:blue;">false</span>, <span style="color:blue;">false</span>, Available <span style="color:blue;">-></span> Hidden
| <span style="color:blue;">true</span>, <span style="color:blue;">true</span>, InUse <span style="color:blue;">-></span> ReadOnly
| <span style="color:blue;">false</span>, <span style="color:blue;">true</span>, InUse <span style="color:blue;">-></span> Hidden
| <span style="color:blue;">true</span>, <span style="color:blue;">false</span>, InUse <span style="color:blue;">-></span> ReadOnly
| <span style="color:blue;">false</span>, <span style="color:blue;">false</span>, InUse <span style="color:blue;">-></span> Hidden
</pre>
</p>
<p>
I've called the function <code>decide</code> because it wasn't clear to me what else to call it.
</p>
<p>
What's so nice about F# pattern matching is that the compiler can tell if you've missed a combination. If you forget a combination, you get a helpful <a href="https://learn.microsoft.com/dotnet/fsharp/language-reference/compiler-messages/fs0025">Incomplete pattern match</a> compiler warning that points out the combination that you missed.
</p>
<p>
And as I argue in my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, you should turn warnings into errors. This would also be helpful in a case like this, since you'd be prevented from forgetting an edge case.
</p>
<h3 id="e8bce005200f4f3394eed71cffe955d0">
Haskell <a href="#e8bce005200f4f3394eed71cffe955d0">#</a>
</h3>
<p>
You can do the same exercise in Haskell, and the result is strikingly similar:
</p>
<p>
<pre><span style="color:blue;">data</span> ItemStatus = NotAvailable | Available | InUse <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Eq</span>, <span style="color:#2b91af;">Show</span>)
<span style="color:blue;">data</span> FieldState = Hidden | ReadOnly | ReadWrite <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Eq</span>, <span style="color:#2b91af;">Show</span>)
<span style="color:#2b91af;">decide</span> <span style="color:blue;">::</span> <span style="color:#2b91af;">Bool</span> <span style="color:blue;">-></span> <span style="color:#2b91af;">Bool</span> <span style="color:blue;">-></span> <span style="color:blue;">ItemStatus</span> <span style="color:blue;">-></span> <span style="color:blue;">FieldState</span>
decide True True NotAvailable = Hidden
decide False True NotAvailable = Hidden
decide True False NotAvailable = Hidden
decide False False NotAvailable = Hidden
decide True True Available = ReadWrite
decide False True Available = Hidden
decide True False Available = ReadOnly
decide False False Available = Hidden
decide True True InUse = ReadOnly
decide False True InUse = Hidden
decide True False InUse = ReadOnly
decide False False InUse = Hidden</pre>
</p>
<p>
Just like in F#, if you forget a combination, the compiler will tell you:
</p>
<p>
<pre>LibrarySystem.hs:8:1: <span style="color:red;">warning:</span> [<span style="color:red;">-Wincomplete-patterns</span>]
Pattern match(es) are non-exhaustive
In an equation for `decide':
Patterns of type `Bool', `Bool', `ItemStatus' not matched:
False False NotAvailable
<span style="color:blue;">|</span>
<span style="color:blue;">8 |</span> <span style="color:red;">decide True True NotAvailable = Hidden</span>
<span style="color:blue;">|</span> <span style="color:red;">^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^...</span></pre>
</p>
<p>
To be clear, that combination is <em>not</em> missing from the above code example. This compiler warning was one I subsequently caused by commenting out a line.
</p>
<p>
It's also possible to turn warnings into errors in Haskell.
</p>
<h3 id="52ba967712374cd9aa91b21bc61aa8a1">
Conclusion <a href="#52ba967712374cd9aa91b21bc61aa8a1">#</a>
</h3>
<p>
I love languages with algebraic data types because they don't just enable modelling like this, they <em>encourage</em> it. This makes it much easier to write code that handles various special cases that I'd easily overlook in other languages. In languages like F# and Haskell, the compiler will tell you if you forgot to deal with a combination.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Replacing Mock and Stub with a Fakehttps://blog.ploeh.dk/2023/08/14/replacing-mock-and-stub-with-a-fake2023-08-14T07:23:00+00:00Mark Seemann
<div id="post">
<p>
<em>A simple C# example.</em>
</p>
<p>
A reader recently wrote me about my 2013 article <a href="/2013/10/23/mocks-for-commands-stubs-for-queries">Mocks for Commands, Stubs for Queries</a>, commenting that the 'final' code looks suspect. Since it looks like the following, that's hardly an overstatement.
</p>
<p>
<pre><span style="color:blue;">public</span> User <span style="font-weight:bold;color:#74531f;">GetUser</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">u</span> = <span style="color:blue;">this</span>.userRepository.Read(userId);
<span style="font-weight:bold;color:#8f08c4;">if</span> (u.Id == 0)
<span style="color:blue;">this</span>.userRepository.Create(1234);
<span style="font-weight:bold;color:#8f08c4;">return</span> u;
}</pre>
</p>
<p>
Can you spot what's wrong?
</p>
<h3 id="a1c674ca5ff049ddac08a7618f95e57b">
Missing test cases <a href="#a1c674ca5ff049ddac08a7618f95e57b">#</a>
</h3>
<p>
You might point out that this example seems to violate <a href="https://en.wikipedia.org/wiki/Command%E2%80%93query_separation">Command Query Separation</a>, and probably other design principles as well. I agree that the example is a bit odd, but that's not what I have in mind.
</p>
<p>
The problem with the above example is that while it correctly calls the <code>Read</code> method with the <code>userId</code> parameter, it calls <code>Create</code> with the hardcoded constant <code>1234</code>. It really ought to call <code>Create</code> with <code>userId</code>.
</p>
<p>
Does this mean that the technique that I described in 2013 is wrong? I don't think so. Rather, I left the code in a rather unhelpful state. What I had in mind with that article was the technique I called <em>data flow verification</em>. As soon as I had delivered that message, I was, according to my own goals, done. I wrapped up the article, leaving the code as shown above.
</p>
<p>
As the reader remarked, it's noteworthy that an article about better unit testing leaves the System Under Test (SUT) in an obviously defect state.
</p>
<p>
The short response is that at least one test case is missing. Since this was only demo code to show an example, the entire test suite is this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">SomeControllerTests</span>
{
[Theory]
[InlineData(1234)]
[InlineData(9876)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">GetUserReturnsCorrectValue</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = <span style="color:blue;">new</span> User();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">td</span> = <span style="color:blue;">new</span> Mock<IUserRepository>();
td.Setup(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Read(userId)).Returns(expected);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> SomeController(td.Object);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = sut.GetUser(userId);
Assert.Equal(expected, actual);
}
[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">UserIsSavedIfItDoesNotExist</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">td</span> = <span style="color:blue;">new</span> Mock<IUserRepository>();
td.Setup(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Read(1234)).Returns(<span style="color:blue;">new</span> User { Id = 0 });
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> SomeController(td.Object);
sut.GetUser(1234);
td.Verify(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Create(1234));
}
}</pre>
</p>
<p>
There are three test cases: Two for the parametrised <code>GetUserReturnsCorrectValue</code> method and one test case for the <code>UserIsSavedIfItDoesNotExist</code> test. Since the latter only verifies the hardcoded value <code>1234</code> the <a href="/2019/10/07/devils-advocate">Devil's advocate</a> can get by with using that hardcoded value as well.
</p>
<h3 id="f0bc9ac185d345c29e61efec2ee7e6b9">
Adding a test case <a href="#f0bc9ac185d345c29e61efec2ee7e6b9">#</a>
</h3>
<p>
The solution to that problem is simple enough. Add another test case by converting <code>UserIsSavedIfItDoesNotExist</code> to a parametrised test:
</p>
<p>
<pre>[Theory]
[InlineData(1234)]
[InlineData(9876)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">UserIsSavedIfItDoesNotExist</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">td</span> = <span style="color:blue;">new</span> Mock<IUserRepository>();
td.Setup(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Read(userId)).Returns(<span style="color:blue;">new</span> User { Id = 0 });
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> SomeController(td.Object);
sut.GetUser(userId);
td.Verify(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Create(userId));
}</pre>
</p>
<p>
There's no reason to edit the other test method; this should be enough to elicit a change to the SUT:
</p>
<p>
<pre><span style="color:blue;">public</span> User <span style="font-weight:bold;color:#74531f;">GetUser</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">u</span> = <span style="color:blue;">this</span>.userRepository.Read(userId);
<span style="font-weight:bold;color:#8f08c4;">if</span> (u.Id == 0)
<span style="color:blue;">this</span>.userRepository.Create(userId);
<span style="font-weight:bold;color:#8f08c4;">return</span> u;
}</pre>
</p>
<p>
When you use <a href="http://xunitpatterns.com/Mock%20Object.html">Mocks</a> (or, rather, <a href="http://xunitpatterns.com/Test%20Spy.html">Spies</a>) and <a href="http://xunitpatterns.com/Test%20Stub.html">Stubs</a> the Data Flow Verification technique is useful.
</p>
<p>
On the other hand, I no longer use Spies or Stubs since <a href="/2022/10/17/stubs-and-mocks-break-encapsulation">they tend to break encapsulation</a>.
</p>
<h3 id="0864ba24b8a841828c3aaba0bc84877c">
Fake <a href="#0864ba24b8a841828c3aaba0bc84877c">#</a>
</h3>
<p>
These days, I tend to <a href="https://stackoverflow.blog/2022/01/03/favor-real-dependencies-for-unit-testing/">only model real application dependencies as Test Doubles</a>, and when I do, I use <a href="http://xunitpatterns.com/Fake%20Object.html">Fakes</a>.
</p>
<p>
<img src="/content/binary/dos-equis-fakes.jpg" alt="Dos Equis meme with the text: I don't always use Test Doubles, but when I do, I use Fakes.">
</p>
<p>
While the article series <a href="/2019/02/18/from-interaction-based-to-state-based-testing">From interaction-based to state-based testing</a> goes into more details, I think that this small example is a good opportunity to demonstrate the technique.
</p>
<p>
The <code>IUserRepository</code> interface is defined like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IUserRepository</span>
{
User <span style="font-weight:bold;color:#74531f;">Read</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>);
<span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Create</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>);
}</pre>
</p>
<p>
A typical Fake is an in-memory collection:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FakeUserRepository</span> : Collection<User>, IUserRepository
{
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Create</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>)
{
Add(<span style="color:blue;">new</span> User { Id = userId });
}
<span style="color:blue;">public</span> User <span style="font-weight:bold;color:#74531f;">Read</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">user</span> = <span style="color:blue;">this</span>.SingleOrDefault(<span style="font-weight:bold;color:#1f377f;">u</span> => u.Id == userId);
<span style="font-weight:bold;color:#8f08c4;">if</span> (user == <span style="color:blue;">null</span>)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> User { Id = 0 };
<span style="font-weight:bold;color:#8f08c4;">return</span> user;
}
}</pre>
</p>
<p>
In my experience, they're typically easy to implement by inheriting from a collection base class. Such an object exhibits typical traits of a Fake object: It fulfils the implied contract, but it lacks some of the 'ilities'.
</p>
<p>
The contract of a Repository is typically that if you add an Entity, you'd expect to be able to retrieve it later. If the Repository offers a <code>Delete</code> method (this one doesn't), you'd expect the deleted Entity to be gone, so that you <em>can't</em> retrieve it. And so on. The <code>FakeUserRepository</code> class fulfils such a contract.
</p>
<p>
On the other hand, you'd also expect a proper Repository implementation to support more than that:
</p>
<ul>
<li>You'd expect a proper implementation to persist data so that you can reboot or change computers without losing data.</li>
<li>You'd expect a proper implementation to correctly handle multiple threads.</li>
<li>You <em>may</em> expect a proper implementation to support <a href="https://en.wikipedia.org/wiki/ACID">ACID</a> transactions.</li>
</ul>
<p>
The <code>FakeUserRepository</code> does none of that, but in the context of a unit test, it doesn't matter. The data exists as long as the object exists, and that's until it goes out of scope. As long as a test needs the Repository, it remains in scope, and the data is there.
</p>
<p>
Likewise, each test runs in a single thread. Even when tests run in parallel, each test has its own Fake object, so there's no shared state. Therefore, even though <code>FakeUserRepository</code> isn't thread-safe, it doesn't have to be.
</p>
<h3 id="a479805231124c1e8d74856a2cce2762">
Testing with the Fake <a href="#a479805231124c1e8d74856a2cce2762">#</a>
</h3>
<p>
You can now rewrite the tests to use <code>FakeUserRepository</code>:
</p>
<p>
<pre>[Theory]
[InlineData(1234)]
[InlineData(9876)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">GetUserReturnsCorrectValue</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = <span style="color:blue;">new</span> User { Id = userId };
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> FakeUserRepository { expected };
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> SomeController(db);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = sut.GetUser(userId);
Assert.Equal(expected, actual);
}
[Theory]
[InlineData(1234)]
[InlineData(9876)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">UserIsSavedIfItDoesNotExist</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">userId</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> FakeUserRepository();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> SomeController(db);
sut.GetUser(userId);
Assert.Single(db, <span style="font-weight:bold;color:#1f377f;">u</span> => u.Id == userId);
}</pre>
</p>
<p>
Instead of asking a Spy whether or not a particular method was called (which is an implementation detail), the <code>UserIsSavedIfItDoesNotExist</code> test verifies the posterior state of the database.
</p>
<h3 id="5ce6d826519d44ec905096a2098513ca">
Conclusion <a href="#5ce6d826519d44ec905096a2098513ca">#</a>
</h3>
<p>
In my experience, using Fakes simplifies unit tests. While you may have to edit the Fake implementation from time to time, you edit that code in a single place. The alternative is to edit <em>all</em> affected tests, every time you change something about a dependency. This is also known as <a href="https://en.wikipedia.org/wiki/Shotgun_surgery">Shotgun Surgery</a> and considered an antipattern.
</p>
<p>
The code base that accompanies my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> has more realistic examples of this technique, and much else.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="0afe67b375254fe193a3fd10234a1ce9">
<div class="comment-author"><a href="https://github.com/AmiradelBeyg">AmirB</a> <a href="#0afe67b375254fe193a3fd10234a1ce9">#</a></div>
<div class="comment-content">
<p>
<p>
Hi Mark,
</p>
<p>
Firstly, thank you for another insightful article.
</p>
<p>
I'm curious about using Fakes and testing exceptions. In scenarios where dynamic mocks (like Moq) are employed, we can mock a method to throw an exception, allowing us to test the expected behavior of the System Under Test (SUT).
In your example, if we were using Moq, we could create a test to mock the UserRepository's Read method to throw a specific exception (e.g., SqlException). This way, we could ensure that the controller responds appropriately, perhaps with an internal server response.
However, I'm unsure about how to achieve a similar test using Fakes. Is this type of test suitable for Fakes, or do such tests not align with the intended use of Fakes? Personally, I avoid using try-catch blocks in repositories or controllers and prefer handling exceptions in middleware (e.g., ErrorHandler). In such cases, I write separate unit tests for the middleware. Could this be a more fitting approach?
Your guidance would be much appreciated.
</p>
<p>
(And yes, I remember your advice about framing questions —it's in your 'Code that Fits in Your Head' book! :D )
</p>
<p>
Thanks
</p>
</p>
</div>
<div class="comment-date">2024-01-15 03:10 UTC</div>
</div>
<div class="comment" id="d0ac4c5ecf444fb2a18b993ca16822de">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#d0ac4c5ecf444fb2a18b993ca16822de">#</a></div>
<div class="comment-content">
<p>
Thank you for writing. That's a question that warrants an article or two. I've now published an article titled <a href="/2024/01/29/error-categories-and-category-errors">Error categories and category errors</a>. It's not a direct answer to your question, but I found it useful to first outline my thinking on errors in general.
</p>
<p>
I'll post an update here when I also have an answer to your specific question.
</p>
</div>
<div class="comment-date">2024-01-30 7:13 UTC</div>
</div>
<div class="comment" id="5844fd8b3ca94d318c5295bc1b3e2c80">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#5844fd8b3ca94d318c5295bc1b3e2c80">#</a></div>
<div class="comment-content">
<p>
AmirB, once again, thank you for writing. I've now published an article titled <a href="/2024/02/26/testing-exceptions">Testing exceptions</a> that attempt to answer your question.
</p>
</div>
<div class="comment-date">2024-02-26 6:57 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.NonEmpty catamorphismhttps://blog.ploeh.dk/2023/08/07/nonempty-catamorphism2023-08-07T11:40:00+00:00Mark Seemann
<div id="post">
<p>
<em>The universal API for generic non-empty collections, with examples in C# and Haskell.</em>
</p>
<p>
This article is part of an <a href="/2019/04/29/catamorphisms">article series about catamorphisms</a>. A catamorphism is a <a href="/2017/10/04/from-design-patterns-to-category-theory">universal abstraction</a> that describes how to digest a data structure into a potentially more compact value.
</p>
<p>
I was recently doing some work that required a data structure like a collection, but with the additional constraint that it should be guaranteed to have at least one element. I've known about <a href="https://www.haskell.org/">Haskell</a>'s <a href="https://hackage.haskell.org/package/base/docs/Data-List-NonEmpty.html">NonEmpty</a> type, and <a href="/2017/12/11/semigroups-accumulate">how to port it to C#</a> for years. This time I needed to implement it in a third language, and since I had a little extra time available, I thought it'd be interesting to pursue a conjecture of mine: It seems as though you can implement most (all?) of a generic data structure's API based on its catamorphism.
</p>
<p>
While I could make a guess as to how a catamorphism might look for a non-empty collection, I wasn't sure. A quick web search revealed nothing conclusive, so I decided to deduce it from first principles. As this article series demonstrates, you can derive the catamorphism from a type's isomorphic <a href="https://bartoszmilewski.com/2017/02/28/f-algebras/">F-algebra</a>.
</p>
<p>
The beginning of this article presents the catamorphism in C#, with an example. The rest of the article describes how to deduce the catamorphism. This part of the article presents my work in Haskell. Readers not comfortable with Haskell can just read the first part, and consider the rest of the article as an optional appendix.
</p>
<h3 id="1e7c622eea7d4a7bad2edc9958d865ce">
C# catamorphism <a href="#1e7c622eea7d4a7bad2edc9958d865ce">#</a>
</h3>
<p>
This article will use a custom C# class called <code><span style="color:#2b91af;">NonEmptyCollection</span><<span style="color:#2b91af;">T</span>></code>, which is near-identical to the <code><span style="color:#2b91af;">NotEmptyCollection</span><<span style="color:#2b91af;">T</span>></code> originally introduced in the article <a href="/2017/12/11/semigroups-accumulate">Semigroups accumulate</a>.
</p>
<p>
I don't know why I originally chose to name the class <code>NotEmptyCollection</code> instead of <code>NonEmptyCollection</code>, but it's annoyed me ever since. I've finally decided to rectify that mistake, so from now on, the name is <code>NonEmptyCollection</code>.
</p>
<p>
The catamorphism for <code>NonEmptyCollection</code> is this instance method:
</p>
<p>
<pre><span style="color:blue;">public</span> TResult Aggregate<<span style="color:#2b91af;">TResult</span>>(Func<T, IReadOnlyCollection<T>, TResult> algebra)
{
<span style="color:blue;">return</span> algebra(Head, Tail);
}</pre>
</p>
<p>
Because the <code>NonEmptyCollection</code> class is really just a glorified tuple, the <code>algebra</code> is any function which produces a single value from the two constituent values.
</p>
<p>
It's easy to fall into the trap of thinking of the catamorphism as 'reducing' the data structure to a more compact form. While this is a common kind of operation, loss of data is not inevitable. You can, for example, return a new collection, essentially doing nothing:
</p>
<p>
<pre><span style="color:blue;">var</span> nec = <span style="color:blue;">new</span> NonEmptyCollection<<span style="color:blue;">int</span>>(42, 1337, 2112, 666);
<span style="color:blue;">var</span> same = nec.Aggregate((x, xs) => <span style="color:blue;">new</span> NonEmptyCollection<<span style="color:blue;">int</span>>(x, xs.ToArray()));</pre>
</p>
<p>
This <code>Aggregate</code> method enables you to safely find a maximum value:
</p>
<p>
<pre><span style="color:blue;">var</span> nec = <span style="color:blue;">new</span> NonEmptyCollection<<span style="color:blue;">int</span>>(42, 1337, 2112, 666);
<span style="color:blue;">var</span> max = nec.Aggregate((x, xs) => xs.Aggregate(x, Math.Max));</pre>
</p>
<p>
or to <a href="/2020/02/03/non-exceptional-averages">safely calculate an average</a>:
</p>
<p>
<pre><span style="color:blue;">var</span> nec = <span style="color:blue;">new</span> NonEmptyCollection<<span style="color:blue;">int</span>>(42, 1337, 2112, 666);
<span style="color:blue;">var</span> average = nec.Aggregate((x, xs) => xs.Aggregate(x, (a, b) => a + b) / (xs.Count + 1.0));</pre>
</p>
<p>
Both of these two last examples use the built-in <a href="https://learn.microsoft.com/dotnet/api/system.linq.enumerable.aggregate">Aggregate</a> function to accumulate the <code>xs</code>. It uses the overload that takes a seed, for which it supplies <code>x</code>. This means that there's guaranteed to be at least that one value.
</p>
<p>
The catamorphism given here is not unique. You can create a trivial variation by swapping the two function arguments, so that <code>x</code> comes after <code>xs</code>.
</p>
<h3 id="3b188484128d4cef935a13426d8fb51b">
NonEmpty F-algebra <a href="#3b188484128d4cef935a13426d8fb51b">#</a>
</h3>
<p>
As in the <a href="/2019/05/27/list-catamorphism">previous article</a>, I'll use <code>Fix</code> and <code>cata</code> as explained in <a href="https://bartoszmilewski.com">Bartosz Milewski</a>'s excellent <a href="https://bartoszmilewski.com/2017/02/28/f-algebras/">article on F-algebras</a>.
</p>
<p>
As always, start with the underlying endofunctor:
</p>
<p>
<pre><span style="color:blue;">data</span> NonEmptyF a c = NonEmptyF { <span style="color:blue;">head</span> :: a, <span style="color:blue;">tail</span> :: ListFix a }
<span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Eq</span>, <span style="color:#2b91af;">Show</span>, <span style="color:#2b91af;">Read</span>)
<span style="color:blue;">instance</span> <span style="color:blue;">Functor</span> (<span style="color:blue;">NonEmptyF</span> a) <span style="color:blue;">where</span>
<span style="color:blue;">fmap</span> _ (NonEmptyF x xs) = NonEmptyF x xs</pre>
</p>
<p>
Instead of using Haskell's standard list (<code>[]</code>) for the tail, I've used <code>ListFix</code> from <a href="/2019/05/27/list-catamorphism">the article on list catamorphism</a>. This should, hopefully, demonstrate how you can build on already established definitions derived from first principles.
</p>
<p>
Since a non-empty collection is really just a glorified tuple of <em>head</em> and <em>tail</em>, there's no recursion, and thus, the carrier type <code>c</code> is not used. You could argue that going through all of these motions is overkill, but it still provides some insights. This is similar to the <a href="/2019/05/06/boolean-catamorphism">Boolean catamorphism</a> and <a href="/2019/05/20/maybe-catamorphism">Maybe catamorphism</a>.
</p>
<p>
The <code>fmap</code> function ignores the mapping argument (often called <code>f</code>), since the <code>Functor</code> instance maps <code>NonEmptyF a c</code> to <code>NonEmptyF a c1</code>, but the <code>c</code> or <code>c1</code> type is not used.
</p>
<p>
As was the case when deducing the recent catamorphisms, Haskell isn't too happy about defining instances for a type like <code>Fix (NonEmptyF a)</code>. To address that problem, you can introduce a <code>newtype</code> wrapper:
</p>
<p>
<pre><span style="color:blue;">newtype</span> NonEmptyFix a =
NonEmptyFix { unNonEmptyFix :: Fix (NonEmptyF a) } <span style="color:blue;">deriving</span> (<span style="color:#2b91af;">Eq</span>, <span style="color:#2b91af;">Show</span>, <span style="color:#2b91af;">Read</span>)</pre>
</p>
<p>
You can define <code>Functor</code>, <code>Applicative</code>, <code>Monad</code>, etc. instances for this type without resorting to any funky GHC extensions. Keep in mind that, ultimately, the purpose of all this code is just to figure out what the catamorphism looks like. This code isn't intended for actual use.
</p>
<p>
A helper function makes it easier to define <code>NonEmptyFix</code> values:
</p>
<p>
<pre><span style="color:#2b91af;">createNonEmptyF</span> <span style="color:blue;">::</span> a <span style="color:blue;">-></span> <span style="color:blue;">ListFix</span> a <span style="color:blue;">-></span> <span style="color:blue;">NonEmptyFix</span> a
createNonEmptyF x xs = NonEmptyFix $ Fix $ NonEmptyF x xs</pre>
</p>
<p>
Here's how to use it:
</p>
<p>
<pre>ghci> createNonEmptyF 42 $ consF 1337 $ consF 2112 nilF
NonEmptyFix {
unNonEmptyFix = Fix (NonEmptyF 42 (ListFix (Fix (ConsF 1337 (Fix (ConsF 2112 (Fix NilF)))))))}</pre>
</p>
<p>
While this is quite verbose, keep in mind that the code shown here isn't meant to be used in practice. The goal is only to deduce catamorphisms from more basic universal abstractions, and you now have all you need to do that.
</p>
<h3 id="ccc6b13fcf794ec39f98fdd3e0c61460">
Haskell catamorphism <a href="#ccc6b13fcf794ec39f98fdd3e0c61460">#</a>
</h3>
<p>
At this point, you have two out of three elements of an F-Algebra. You have an endofunctor (<code>NonEmptyF a</code>), and an object <code>c</code>, but you still need to find a morphism <code>NonEmptyF a c -> c</code>. Notice that the algebra you have to find is the function that reduces the functor to its <em>carrier type</em> <code>c</code>, not the 'data type' <code>a</code>. This takes some time to get used to, but that's how catamorphisms work. This doesn't mean, however, that you get to ignore <code>a</code>, as you'll see.
</p>
<p>
As in the previous articles, start by writing a function that will become the catamorphism, based on <code>cata</code>:
</p>
<p>
<pre>nonEmptyF = cata alg . unNonEmptyFix
<span style="color:blue;">where</span> alg (NonEmptyF x xs) = <span style="color:blue;">undefined</span></pre>
</p>
<p>
While this compiles, with its <code>undefined</code> implementation of <code>alg</code>, it obviously doesn't do anything useful. I find, however, that it helps me think. How can you return a value of the type <code>c</code> from <code>alg</code>? You could pass a function argument to the <code>nonEmptyF</code> function and use it with <code>x</code> and <code>xs</code>:
</p>
<p>
<pre><span style="color:#2b91af;">nonEmptyF</span> <span style="color:blue;">::</span> (a <span style="color:blue;">-></span> <span style="color:blue;">ListFix</span> a <span style="color:blue;">-></span> c) <span style="color:blue;">-></span> <span style="color:blue;">NonEmptyFix</span> a <span style="color:blue;">-></span> c
nonEmptyF f = cata alg . unNonEmptyFix
<span style="color:blue;">where</span> alg (NonEmptyF x xs) = f x xs</pre>
</p>
<p>
This works. Since <code>cata</code> has the type <code>Functor f => (f a -> a) -> Fix f -> a</code>, that means that <code>alg</code> has the type <code>f a -> a</code>. In the case of <code>NonEmptyF</code>, the compiler infers that the <code>alg</code> function has the type <code>NonEmptyF a c -> c1</code>, which fits the bill, since <code>c</code> may be the same type as <code>c1</code>.
</p>
<p>
This, then, is the catamorphism for a non-empty collection. This one is just a single function. It's still not the only possible catamorphism, since you could trivially flip the arguments to <code>f</code>.
</p>
<p>
I've chosen this representation because the arguments <code>x</code> and <code>xs</code> are defined in the same order as the order of <code>head</code> before <code>tail</code>. Notice how this is the same order as the above C# <code>Aggregate</code> method.
</p>
<h3 id="2ecc4634c63e40e4a9d47be4bffa4d5f">
Basis <a href="#2ecc4634c63e40e4a9d47be4bffa4d5f">#</a>
</h3>
<p>
You can implement most other useful functionality with <code>nonEmptyF</code>. Here's the <code>Semigroup</code> instance and a useful helper function:
</p>
<p>
<pre><span style="color:#2b91af;">toListFix</span> <span style="color:blue;">::</span> <span style="color:blue;">NonEmptyFix</span> a <span style="color:blue;">-></span> <span style="color:blue;">ListFix</span> a
toListFix = nonEmptyF consF
<span style="color:blue;">instance</span> <span style="color:blue;">Semigroup</span> (<span style="color:blue;">NonEmptyFix</span> a) <span style="color:blue;">where</span>
xs <> ys =
nonEmptyF (\x xs' -> createNonEmptyF x $ xs' <> toListFix ys) xs</pre>
</p>
<p>
The implementation uses <code>nonEmptyF</code> to operate on <code>xs</code>. Inside the lambda expression, it converts <code>ys</code> to a list, and uses <a href="/2019/05/27/list-catamorphism">the <code>ListFix</code> <code>Semigroup</code> instance</a> to concatenate <code>xs</code> with it.
</p>
<p>
Here's the <code>Functor</code> instance:
</p>
<p>
<pre><span style="color:blue;">instance</span> <span style="color:blue;">Functor</span> <span style="color:blue;">NonEmptyFix</span> <span style="color:blue;">where</span>
<span style="color:blue;">fmap</span> f = nonEmptyF (\x xs -> createNonEmptyF (f x) $ <span style="color:blue;">fmap</span> f xs)</pre>
</p>
<p>
Like the <code>Semigroup</code> instance, this <code>fmap</code> implementation uses <code>fmap</code> on <code>xs</code>, which is the <code>ListFix</code> <code>Functor</code> instance.
</p>
<p>
The <code>Applicative</code> instance is much harder to write from scratch (or, at least, I couldn't come up with a simpler way):
</p>
<p>
<pre><span style="color:blue;">instance</span> <span style="color:blue;">Applicative</span> <span style="color:blue;">NonEmptyFix</span> <span style="color:blue;">where</span>
pure x = createNonEmptyF x nilF
liftA2 f xs ys =
nonEmptyF
(\x xs' ->
nonEmptyF
(\y ys' ->
createNonEmptyF
(f x y)
(liftA2 f (consF x nilF) ys' <> liftA2 f xs' (consF y ys')))
ys)
xs</pre>
</p>
<p>
While that looks complicated, it's not <em>that</em> bad. It uses <code>nonEmptyF</code> to 'loop' over the <code>xs</code>, and then a nested call to <code>nonEmptyF</code> to 'loop' over the <code>ys</code>. The inner lambda expression uses <code>f x y</code> to calculate the head, but it also needs to calculate all other combinations of values in <code>xs</code> and <code>ys</code>.
</p>
<p>
<img src="/content/binary/non-empty-applicative-x-y.png" alt="Boxes labelled x, x1, x2, x3 over other boxes labelled y, y1, y2, y3. The x and y box are connected by an arrow labelled f.">
</p>
<p>
First, it keeps <code>x</code> fixed and 'loops' through all the remaining <code>ys'</code>; that's the <code>liftA2 f (consF x nilF) ys'</code> part:
</p>
<p>
<img src="/content/binary/non-empty-applicative-x-ys.png" alt="Boxes labelled x, x1, x2, x3 over other boxes labelled y, y1, y2, y3. The x and y1, y2, y3 boxes are connected by three arrows labelled with a single f.">
</p>
<p>
Then it 'loops' over all the remaining <code>xs'</code> and all the <code>ys</code>; that is, <code>liftA2 f xs' (consF y ys')</code>.
</p>
<p>
<img src="/content/binary/non-empty-applicative-xs-ys.png" alt="Boxes labelled x, x1, x2, x3 over other boxes labelled y, y1, y2, y3. The x1, x2, x3 boxes are connected to the y, y1, y2, y3 boxes by arrows labelled with a single f.">
</p>
<p>
The two <code>liftA2</code> functions apply to the <code>ListFix</code> <code>Applicative</code> instance.
</p>
<p>
You'll be happy to see, I think, that the <code>Monad</code> instance is simpler:
</p>
<p>
<pre><span style="color:blue;">instance</span> <span style="color:blue;">Monad</span> <span style="color:blue;">NonEmptyFix</span> <span style="color:blue;">where</span>
xs >>= f =
nonEmptyF (\x xs' ->
nonEmptyF
(\y ys -> createNonEmptyF y $ ys <> (xs' >>= toListFix . f)) (f x)) xs</pre>
</p>
<p>
And fortunately, <code>Foldable</code> and <code>Traversable</code> are even simpler:
</p>
<p>
<pre><span style="color:blue;">instance</span> <span style="color:blue;">Foldable</span> <span style="color:blue;">NonEmptyFix</span> <span style="color:blue;">where</span>
<span style="color:blue;">foldr</span> f seed = nonEmptyF (\x xs -> f x $ <span style="color:blue;">foldr</span> f seed xs)
<span style="color:blue;">instance</span> <span style="color:blue;">Traversable</span> <span style="color:blue;">NonEmptyFix</span> <span style="color:blue;">where</span>
traverse f = nonEmptyF (\x xs -> liftA2 createNonEmptyF (f x) (traverse f xs))</pre>
</p>
<p>
Finally, you can implement conversions to and from the <code>NonEmpty</code> type from <code>Data.List.NonEmpty</code>:
</p>
<p>
<pre><span style="color:#2b91af;">toNonEmpty</span> <span style="color:blue;">::</span> <span style="color:blue;">NonEmptyFix</span> a <span style="color:blue;">-></span> <span style="color:blue;">NonEmpty</span> a
toNonEmpty = nonEmptyF (\x xs -> x :| toList xs)
<span style="color:#2b91af;">fromNonEmpty</span> <span style="color:blue;">::</span> <span style="color:blue;">NonEmpty</span> a <span style="color:blue;">-></span> <span style="color:blue;">NonEmptyFix</span> a
fromNonEmpty (x :| xs) = createNonEmptyF x $ fromList xs</pre>
</p>
<p>
This demonstrates that <code>NonEmptyFix</code> is isomorphic to <code>NonEmpty</code>.
</p>
<h3 id="f57599f8fcbb4b02af85816dff99a790">
Conclusion <a href="#f57599f8fcbb4b02af85816dff99a790">#</a>
</h3>
<p>
The catamorphism for a non-empty collection is a single function that produces a single value from the head and the tail of the collection. While it's possible to implement a 'standard fold' (<code>foldr</code> in Haskell), the non-empty catamorphism doesn't require a seed to get started. The data structure guarantees that there's always at least one value available, and this value can then be use to 'kick off' a fold.
</p>
<p>
In C# one can define the catamorphism as the above <code>Aggregate</code> method. You could then define all other instance functions based on <code>Aggregate</code>.
</p>
<p>
<strong>Next:</strong> <a href="/2019/06/03/either-catamorphism">Either catamorphism</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Test-driving the pyramid's tophttps://blog.ploeh.dk/2023/07/31/test-driving-the-pyramids-top2023-07-31T07:00:00+00:00Mark Seemann
<div id="post">
<p>
<em>Some thoughts on TDD related to integration and systems testing.</em>
</p>
<p>
My recent article <a href="/2023/07/17/works-on-most-machines">Works on most machines</a> elicited some responses. Upon reflection, it seems that most of the responses relate to the top of the <a href="https://martinfowler.com/bliki/TestPyramid.html">Test Pyramid</a>.
</p>
<p>
While I don't have an one-shot solution that addresses all concerns, I hope that nonetheless I can suggest some ideas and hopefully inspire a reader or two. That's all. I intend nothing of the following to be prescriptive. I describe my own professional experience: What has worked for me. Perhaps it could also work for you. Use the ideas if they inspire you. Ignore them if you find them impractical.
</p>
<h3 id="5031910a0b37420bbcea753fc9a31dd0">
The Test Pyramid <a href="#5031910a0b37420bbcea753fc9a31dd0">#</a>
</h3>
<p>
The Test Pyramid is often depicted like this:
</p>
<p>
<img src="/content/binary/standard-test-pyramid.png" alt="Standard Test Pyramid, which is really a triangle with three layers: Unit tests, integration tests, and UI tests.">
</p>
<p>
This seems to indicate that while the majority of tests should be unit tests, you should also have a substantial number of integration tests, and quite a few UI tests.
</p>
<p>
Perhaps the following is obvious, but the Test Pyramid is an idea; it's a way to communicate a concept in a compelling way. What one should take away from it, I think, is only this: The number of tests in each category should form a <a href="https://en.wikipedia.org/wiki/Total_order">total order</a>, where the <em>unit test</em> category is the maximum. In other words, you should have more unit tests than you have tests in the next category, and so on.
</p>
<p>
No-one says that you can only have three levels, or that they have to have the same height. Finally, the above figure isn't even a <a href="https://en.wikipedia.org/wiki/Pyramid_(geometry)">pyramid</a>, but rather a <a href="https://en.wikipedia.org/wiki/Triangle">triangle</a>.
</p>
<p>
I sometimes think of the Test Pyramid like this:
</p>
<p>
<img src="/content/binary/test-pyramid-perspective.png" alt="Test pyramid in perspective.">
</p>
<p>
To be honest, it's not so much whether or not the pyramid is shown in perspective, but rather that the <em>unit test</em> base is significantly more voluminous than the other levels, and that the top is quite small.
</p>
<h3 id="7b385ee20e7e4054afc625a15525285f">
Levels <a href="#7b385ee20e7e4054afc625a15525285f">#</a>
</h3>
<p>
In order to keep the above discussion as recognisable as possible, I've used the labels <em>unit tests</em>, <em>integration tests</em>, and <em>UI tests</em>. It's easy to get caught up in a discussion about how these terms are defined. Exactly what is a <em>unit test?</em> How does it differ from an <em>integration test?</em>
</p>
<p>
There's no universally accepted definition of a <em>unit test</em>, so it tends to be counter-productive to spend too much time debating the finer points of what to call the tests in each layer.
</p>
<p>
Instead, I find the following criteria useful:
</p>
<ol>
<li>In-process tests</li>
<li>Tests that involve more than one process</li>
<li>Tests that can only be performed in production</li>
</ol>
<p>
I'll describe each in a little more detail. Along the way, I'll address some of the reactions to <a href="/2023/07/17/works-on-most-machines">Works on most machines</a>.
</p>
<h3 id="7501caecf9d74c26aa15661fdc7982d7">
In-process tests <a href="#7501caecf9d74c26aa15661fdc7982d7">#</a>
</h3>
<p>
The <em>in-process</em> category corresponds roughly to the Test Pyramid's <em>unit test</em> level. It includes 'traditional' unit tests such as tests of stand-alone functions or methods on objects, but also <a href="/2012/06/27/FacadeTest">Facade Tests</a>. The latter may involve multiple modules or objects, perhaps even from multiple libraries. Many people may call them <em>integration tests</em> because they integrate more than one module.
</p>
<p>
As long as an automated test runs in a single process, in memory, it tends to be fast and leave no persistent state behind. This is almost exclusively the kind of test I tend to test-drive. I often follow an <a href="/outside-in-tdd">outside-in TDD</a> process, an example of which is shown in my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>.
</p>
<p>
Consider an example from the source code that accompanies the book:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task ReserveTableAtTheVaticanCellar()
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> api = <span style="color:blue;">new</span> SelfHostedApi();
<span style="color:blue;">var</span> client = api.CreateClient();
<span style="color:blue;">var</span> timeOfDayLaterThanLastSeatingAtTheOtherRestaurants =
TimeSpan.FromHours(21.5);
<span style="color:blue;">var</span> at = DateTime.Today.AddDays(433).Add(
timeOfDayLaterThanLastSeatingAtTheOtherRestaurants);
<span style="color:blue;">var</span> dto = Some.Reservation.WithDate(at).ToDto();
<span style="color:blue;">var</span> response =
<span style="color:blue;">await</span> client.PostReservation(<span style="color:#a31515;">"The Vatican Cellar"</span>, dto);
response.EnsureSuccessStatusCode();
}</pre>
</p>
<p>
I think of a test like this as an automated acceptance test. It uses an internal test-specific domain-specific language (<a href="http://xunitpatterns.com/Test%20Utility%20Method.html">test utilities</a>) to exercise the REST service's API. It uses <a href="/2021/01/25/self-hosted-integration-tests-in-aspnet">ASP.NET self-hosting</a> to run both the service and the HTTP client in the same process.
</p>
<p>
Even though this may, at first glance, look like an integration test, it's an artefact of test-driven development. Since it does cut across both HTTP layer and domain model, some readers may think of it as an integration test. It uses a <a href="/2019/02/18/from-interaction-based-to-state-based-testing">stateful in-memory data store</a>, so it doesn't involve more than a single process.
</p>
<h3 id="2f036c4726d74c6285b1b0a759a54269">
Tests that span processes <a href="#2f036c4726d74c6285b1b0a759a54269">#</a>
</h3>
<p>
There are aspects of software that you can't easily drive with tests. I'll return to some really gnarly examples in the third category, but in between, we find concerns that are hard, but still possible to test. The reason that they are hard is often because they involve more than one process.
</p>
<p>
The most common example is data access. Many software systems save or retrieve data. With test-driven development, you're supposed to let the tests inform your API design decisions in such a way that everything that involves difficult, error-prone code is factored out of the data access layer, and into another part of the code that <em>can</em> be tested in process. This development technique ought to drain the hard-to-test components of logic, leaving behind a <a href="http://xunitpatterns.com/Humble%20Object.html">Humble Object</a>.
</p>
<p>
One reaction to <a href="/2023/07/17/works-on-most-machines">Works on most machines</a> concerns exactly that idea:
</p>
<blockquote>
<p>
"As a developer, you need to test HumbleObject's behavior."
</p>
<footer><cite><a href="https://twitter.com/ladeak87/status/1680915766764351489">ladeak</a></cite></footer>
</blockquote>
<p>
It's almost tautologically part of the definition of a Humble Object that you're <em>not</em> supposed to test it. Still, realistically, ladeak has a point.
</p>
<p>
When I wrote the example code to <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>, I applied the Humble Object pattern to the data access component. For a good while, I had a <code>SqlReservationsRepository</code> class that was so simple, so drained of logic, that it couldn't possibly fail.
</p>
<p>
Until, of course, the inevitable happened: There was a bug in the <code>SqlReservationsRepository</code> code. Not to make a long story out of it, but even with a really low <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a>, I'd accidentally swapped two database columns when reading from a table.
</p>
<p>
Whenever possible, when I discover a bug, I first write an automated test that exposes that bug, and only then do I fix the problem. This is congruent with <a href="/2023/01/23/agilean">my lean bias</a>. If a defect can occur once, it can occur again in the future, so it's better to have a regression test.
</p>
<p>
The problem with this bug is that it was in a Humble Object. So, ladeak is right. Sooner or later, you'll have to test the Humble Object, too.
</p>
<p>
That's when I had to bite the bullet and add a test library that tests against the database.
</p>
<p>
One such test looks like this:
</p>
<p>
<pre>[Theory]
[InlineData(Grandfather.Id, <span style="color:#a31515;">"2022-06-29 12:00"</span>, <span style="color:#a31515;">"e@example.gov"</span>, <span style="color:#a31515;">"Enigma"</span>, 1)]
[InlineData(Grandfather.Id, <span style="color:#a31515;">"2022-07-27 11:40"</span>, <span style="color:#a31515;">"c@example.com"</span>, <span style="color:#a31515;">"Carlie"</span>, 2)]
[InlineData(2, <span style="color:#a31515;">"2021-09-03 14:32"</span>, <span style="color:#a31515;">"bon@example.edu"</span>, <span style="color:#a31515;">"Jovi"</span>, 4)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task CreateAndReadRoundTrip(
<span style="color:blue;">int</span> restaurantId,
<span style="color:blue;">string</span> at,
<span style="color:blue;">string</span> email,
<span style="color:blue;">string</span> name,
<span style="color:blue;">int</span> quantity)
{
<span style="color:blue;">var</span> expected = <span style="color:blue;">new</span> Reservation(
Guid.NewGuid(),
DateTime.Parse(at, CultureInfo.InvariantCulture),
<span style="color:blue;">new</span> Email(email),
<span style="color:blue;">new</span> Name(name),
quantity);
<span style="color:blue;">var</span> connectionString = ConnectionStrings.Reservations;
<span style="color:blue;">var</span> sut = <span style="color:blue;">new</span> SqlReservationsRepository(connectionString);
<span style="color:blue;">await</span> sut.Create(restaurantId, expected);
<span style="color:blue;">var</span> actual = <span style="color:blue;">await</span> sut.ReadReservation(restaurantId, expected.Id);
Assert.Equal(expected, actual);
}</pre>
</p>
<p>
The entire test runs in a special context where a database is automatically created before the test runs, and torn down once the test has completed.
</p>
<blockquote>
<p>
"When building such behavior, you can test against a shared instance of the service in your dev team or run that service on your dev machine in a container."
</p>
<footer><cite><a href="https://twitter.com/ladeak87/status/1680915837782224905">ladeak</a></cite></footer>
</blockquote>
<p>
Yes, those are two options. A third, in the spirit of <a href="/ref/goos">GOOS</a>, is to strongly favour technologies that support automation. Believe it or not, you can automate <a href="https://en.wikipedia.org/wiki/Microsoft_SQL_Server">SQL Server</a>. You don't need a Docker container for it. That's what I did in the above test.
</p>
<p>
I can see how a Docker container with an external dependency can be useful too, so I'm not trying to dismiss that technology. The point is, however, that simpler alternatives may exist. I, and others, did test-driven development for more than a decade before Docker existed.
</p>
<h3 id="4884dac317d84c70ac4824a5ee3fe922">
Tests that can only be performed in production <a href="#4884dac317d84c70ac4824a5ee3fe922">#</a>
</h3>
<p>
The last category of tests are those that you can only perform on a production system. What might be examples of that?
</p>
<p>
I've run into a few over the years. One such test is what I call a <a href="https://en.wikipedia.org/wiki/Smoke_testing_(software)">Smoke Test</a>: Metaphorically speaking, turn it on and see if it develops smoke. These kinds of tests are good at catching configuration errors. Does the web server have the right connection string to the database? A test can verify whether that's the case, but it makes no sense to run such a test on a development machine, or against a test system, or a staging environment. You want to verify that the production system is correctly configured. Only a test against the production system can do that.
</p>
<p>
For every configuration value, you may want to consider a Smoke Test.
</p>
<p>
There are other kinds of tests you can only perform in production. Sometimes, it's not technical concerns, but rather legal or financial constraints, that dictate circumstances.
</p>
<p>
A few years ago I worked with a software organisation that, among other things, integrated with the Danish <a href="https://en.wikipedia.org/wiki/Personal_identification_number_(Denmark)">personal identification number system (CPR)</a>. Things may have changed since, but back then, an organisation had to have a legal agreement with CPR before being granted access to its integration services. It's an old system (originally from 1968) with a proprietary data integration protocol.
</p>
<p>
We test-drove a parser of the data format, but that still left behind a Humble Object that would actually perform the data transfers. How do we test that Humble Object?
</p>
<p>
Back then, at least, there was no test system for the CPR service, and it was illegal to query the live system unless you had a business reason. And software testing did not constitute a legal reason.
</p>
<p>
The only <em>legal</em> option was to make the Humble Object as simple and foolproof as possible, and then observe how it worked in actual production situations. Containers wouldn't help in such a situation.
</p>
<p>
It's possible to write automated tests against production systems, but unless you're careful, they're difficult to write and maintain. At least, go easy on the assertions, since you can't assume much about the run-time data and behaviour of a live system. Smoke tests are mostly just 'pings', so can be written to be fairly maintenance-free, but you shouldn't need many of them.
</p>
<p>
Other kinds of tests against production are likely to be fragile, so it pays to minimise their number. That's the top of the pyramid.
</p>
<h3 id="32214fa81f2f4dc188b0990d4308bded">
User interfaces <a href="#32214fa81f2f4dc188b0990d4308bded">#</a>
</h3>
<p>
I no longer develop user interfaces, so take the following with a pinch of salt.
</p>
<p>
The 'original' Test Pyramid that I've depicted above has <em>UI tests</em> at the pyramid's top. That doesn't necessarily match the categories I've outlined here; don't assume parity.
</p>
<p>
A UI test may or may not involve more than one process, but they are often difficult to maintain for other reasons. Perhaps this is where the pyramid metaphor starts to break down. <a href="https://en.wikipedia.org/wiki/All_models_are_wrong">All models are wrong, but some are useful</a>.
</p>
<p>
Back when I still programmed user interfaces, I'd usually test-drive them via a <a href="https://martinfowler.com/bliki/SubcutaneousTest.html">subcutaneous API</a>, and rely on some variation of <a href="https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller">MVC</a> to keep the rendered controls in sync. Still, once in a while, you need to verify that the user interface looks as it's supposed to. Often, the best tool for that job is the good old Mark I Eyeball.
</p>
<p>
This still means that you need to run the application from time to time.
</p>
<blockquote>
<p>
"Docker is also very useful for enabling others to run your software on their machines. Recently, we've been exploring some apps that consisted of ~4 services (web servers) and a database. All of them written in different technologies (PHP, Java, C#). You don't have to setup environment variables. You don't need to have relevant SDKs to build projects etc. Just run docker command, and spin them instantly on your PC."
</p>
<footer><cite><a href="/2023/07/17/works-on-most-machines#4012c2cddcb64a068c0b06b7989a676e">qfilip</a></cite></footer>
</blockquote>
<p>
That sounds like a reasonable use case. I've never found myself in such circumstances, but I can imagine the utility that containers offer in a situation like that. Here's how I envision the scenario:
</p>
<p>
<img src="/content/binary/app-with-services-in-containers.png" alt="A box with arrows to three other boxes, which again have arrows to a database symbol.">
</p>
<p>
The boxes with rounded corners symbolise containers.
</p>
<p>
Again, my original goal with the <a href="/2023/07/17/works-on-most-machines">previous article</a> wasn't to convince you that container technologies are unequivocally bad. Rather, it was to suggest that test-driven development (TDD) solves many of the problems that people seem to think can only be solved with containers. Since TDD has many other beneficial side effects, it's worth considering instead of mindlessly reaching for containers, which may represent only a local maximum.
</p>
<p>
How could TDD address qfilip's concern?
</p>
<p>
When I test-drive software, I <a href="https://stackoverflow.blog/2022/01/03/favor-real-dependencies-for-unit-testing/">favour real dependencies</a>, and I <a href="/2019/02/18/from-interaction-based-to-state-based-testing">favour Fake objects over Mocks and Stubs</a>. Were I to return to user-interface programming today, I'd define its external dependencies as one or more interfaces, and implement a <a href="http://xunitpatterns.com/Fake%20Object.html">Fake Object</a> for each.
</p>
<p>
Not only will this enable me to simulate the external dependencies with the Fakes. If I implement the Fakes as part of the production code, I'd even be able to spin up the system, using the Fakes instead of the real system.
</p>
<p>
<img src="/content/binary/app-with-fake-dependencies.png" alt="App box with arrows pointing to itself.">
</p>
<p>
A Fake is an implementation that 'almost works'. A common example is an in-memory collection instead of a database. It's neither persistent nor thread-safe, but it's internally consistent. What you add, you can retrieve, until you delete it again. For the purposes of starting the app in order to verify that the user interface looks correct, that should be good enough.
</p>
<p>
Another related example is <a href="https://particular.net/nservicebus">NServiceBus</a>, which comes with a <a href="https://docs.particular.net/transports/learning/">file transport that is clearly labeled as not for production use</a>. While it's called the <em>Learning Transport</em>, it's also useful for exploratory testing on a development machine. While this example clearly makes use of an external resource (the file system), it illustrates how a Fake implementation can alleviate the need for a container.
</p>
<h3 id="687363d13cf24a569b1dea6a45f8771e">
Uses for containers <a href="#687363d13cf24a569b1dea6a45f8771e">#</a>
</h3>
<p>
Ultimately, it's still useful to be able to stand up an entire system, as qfilip suggests, and if containers is a good way to do that, it doesn't bother me. At the risk of sounding like a broken record, I never intended to say that containers are useless.
</p>
<p>
When I worked as a Software Development Engineer in Microsoft, I had two computers: A laptop and a rather beefy <a href="https://en.wikipedia.org/wiki/Computer_tower">tower PC</a>. I've always done all programming on laptops, so I repurposed the tower as a <a href="https://en.wikipedia.org/wiki/Microsoft_Virtual_Server">virtual server</a> with all my system's components on separate virtual machines (VM). The database in one VM, the application server in another, and so on. I no longer remember what all the components were, but I seem to recall that I had four VMs running on that one box.
</p>
<p>
While I didn't use it much, I found it valuable to occasionally verify that all components could talk to each other on a realistic network topology. This was in 2008, and <a href="https://en.wikipedia.org/wiki/Docker_(software)">Docker wasn't around then</a>, but I could imagine it would have made that task easier.
</p>
<p>
I don't dispute that Docker and <a href="https://en.wikipedia.org/wiki/Kubernetes">Kubernetes</a> are useful, but the job of a software architect is to carefully identify the technologies on which a system should be based. The more technology dependencies you take on, the more rigid the design.
</p>
<p>
After a few decades of programming, my experience is that as a programmer and architect, I can find better alternatives than depending on container technologies. If testers and IT operators find containers useful to do their jobs, then that's fine by me. Since my code <a href="/2023/07/17/works-on-most-machines">works on most machines</a>, it works in containers, too.
</p>
<h3 id="be02c64c095e474eaa54ab1750a2d471">
Truly Humble Objects <a href="#be02c64c095e474eaa54ab1750a2d471">#</a>
</h3>
<p>
One last response, and I'll wrap this up.
</p>
<blockquote>
<p>
"As a developer, you need to test HumbleObject's behavior. What if a DatabaseConnection or a TCP conn to a message queue is down?"
</p>
<footer><cite><a href="https://twitter.com/ladeak87/status/1680915766764351489">ladeak</a></cite></footer>
</blockquote>
<p>
How should such situations be handled? There may always be special cases, but in general, I can think of two reactions:
</p>
<ul>
<li>Log the error</li>
<li>Retry the operation</li>
</ul>
<p>
Assuming that the Humble Object is a polymorphic type (i.e. inherits a base class or implements an interface), you should be able to extract each of these behaviours to general-purpose components.
</p>
<p>
In order to log errors, you can either use a <a href="https://en.wikipedia.org/wiki/Decorator_pattern">Decorator</a> or a global exception handler. Most frameworks provide a way to catch (otherwise) unhandled exceptions, exactly for this purpose, so you don't have to add such functionality to a Humble Object.
</p>
<p>
Retry logic can also be delegated to a third-party component. For .NET I'd start looking at <a href="https://www.thepollyproject.org/">Polly</a>, but I'd be surprised if other platforms don't have similar libraries that implement the stability patterns from <a href="/ref/release-it">Release It</a>.
</p>
<p>
Something more specialised, like a fail-over mechanism, sounds like a good reason to wheel out the <a href="https://en.wikipedia.org/wiki/Chain-of-responsibility_pattern">Chain of Responsibility</a> pattern.
</p>
<p>
All of these can be tested independently of any Humble Object.
</p>
<h3 id="ea544f519e0b4114a21bb094d9798c6c">
Conclusion <a href="#ea544f519e0b4114a21bb094d9798c6c">#</a>
</h3>
<p>
In a recent article I reflected on my experience with TDD and speculated that a side effect of that process is code flexible enough to work on most machines. Thus, I've never encountered the need for a containers.
</p>
<p>
Readers responded with comments that struck me as mostly related to the upper levels of the Test Pyramid. In this article, I've attempted to address some of those concerns. I still get by without containers.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Is software getting worse?https://blog.ploeh.dk/2023/07/24/is-software-getting-worse2023-07-24T06:02:00+00:00Mark Seemann
<div id="post">
<p>
<em>A rant, with some examples.</em>
</p>
<p>
I've been a software user for thirty years.
</p>
<p>
My first PC was <a href="https://en.wikipedia.org/wiki/DOS">DOS</a>-based. In my first job, I used <a href="https://en.wikipedia.org/wiki/OS/2">OS/2</a>, in the next, <a href="https://en.wikipedia.org/wiki/Windows_3.1x">Windows 3.11</a>, <a href="https://en.wikipedia.org/wiki/Windows_NT">NT</a>, and later incarnations of Windows.
</p>
<p>
I wrote my first web sites in <a href="https://en.wikipedia.org/wiki/Arachnophilia">Arachnophilia</a>, and my first professional software in Visual Basic, Visual C++, and <a href="https://en.wikipedia.org/wiki/Visual_InterDev">Visual InterDev</a>.
</p>
<p>
I used <a href="https://en.wikipedia.org/wiki/Terminate_(software)">Terminate</a> with my first modem. If I recall correctly, it had a built-in email downloader and offline reader. Later, I switched to Outlook for email. I've used <a href="https://en.wikipedia.org/wiki/Netscape_Navigator">Netscape Navigator</a>, <a href="https://en.wikipedia.org/wiki/Internet_Explorer">Internet Explorer</a>, Firefox, and Chrome to surf the web.
</p>
<p>
I've written theses, articles, reports, etc. in <a href="https://en.wikipedia.org/wiki/WordPerfect">Word Perfect</a> for DOS and MS Word for Windows. I wrote my new book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits In your Head</a> in <a href="https://www.texstudio.org/">TexStudio</a>. Yes, it was written entirely in <a href="https://en.wikipedia.org/wiki/LaTeX">LaTeX</a>.
</p>
<h3 id="c3ee042288d94295bec33308840d075f">
Updates <a href="#c3ee042288d94295bec33308840d075f">#</a>
</h3>
<p>
For the first fifteen years, new software version were rare. You'd get a version of AutoCAD, Windows, or Visual C++, and you'd use it for years. After a few years, a new version would come out, and that would be a big deal.
</p>
<p>
Interim service releases were rare, too, since there was no network-based delivery mechanism. Software came on <a href="https://en.wikipedia.org/wiki/Floppy_disk">floppy disks</a>, and later on <a href="https://en.wikipedia.org/wiki/Compact_disc">CD</a>s.
</p>
<p>
Even if a bug fix was easy to make, it was difficult for a software vendor to <em>distribute</em> it, so most software releases were well-tested. Granted, software had bugs back then, and some of them you learned to work around.
</p>
<p>
When a new version came out, the same forces were at work. The new version had to be as solid and stable as the previous one. Again, I grant that once in a while, even in those days, this wasn't always the case. Usually, a bad release spelled the demise of a company, because release times were so long that competitors could take advantage of a bad software release.
</p>
<p>
Usually, however, software updates were <em>improvements</em>, and you looked forward to them.
</p>
<h3 id="865a6ab3b3c64982bbeb31ccf64db6b5">
Decay <a href="#865a6ab3b3c64982bbeb31ccf64db6b5">#</a>
</h3>
<p>
I no longer look forward to updates. These days, software is delivered over the internet, and some applications update automatically.
</p>
<p>
From a security perspective it can be a good idea to stay up-to-date, and for years, I diligently did that. Lately, however, I've become more conservative. Particularly when it comes to Windows, I ignore all suggestions to update it until it literally forces the update on me.
</p>
<p>
Just like <a href="https://tvtropes.org/pmwiki/pmwiki.php/Main/StarTrekMovieCurse">even-numbered Star Trek movies don't suck</a> the same pattern seems to be true for Windows: <a href="https://en.wikipedia.org/wiki/Windows_XP">Windows XP</a> was good, <a href="https://en.wikipedia.org/wiki/Windows_7">Windows 7</a> was good, and <a href="https://en.wikipedia.org/wiki/Windows_10">Windows 10</a> wasn't bad either. I kept putting off Windows 11 for as long as possible, but now I use it, and I can't say that I'm surprised that I don't like it.
</p>
<p>
This article, however, isn't a rant about Windows in particular. This seems to be a general trend, and it's been noticeable for years.
</p>
<h3 id="f3bd0d4e750b4d9aa53ee9a6c0f59ddc">
Examples <a href="#f3bd0d4e750b4d9aa53ee9a6c0f59ddc">#</a>
</h3>
<p>
I think that the first time I noticed a particular application degrading was <a href="https://en.wikipedia.org/wiki/Vivino">Vivino</a>. It started out as a local company here in Copenhagen, and I was a fairly early adopter. Initially, it was great: If you like wine, but don't know that much about it, you could photograph a bottle's label, and it'd automatically recognise the wine and register it in your 'wine library'. I found it useful that I could look up my notes about a wine I'd had a year ago to remind me what I thought of it. As time went on, however, I started to notice errors in my wine library. It might be double entries, or wines that were silently changed to another vintage, etc. Eventually it got so bad that I lost trust in the application and uninstalled it.
</p>
<p>
Another example is <a href="https://www.sublimetext.com/">Sublime Text</a>, which I used for writing articles for this blog. I even bought a licence for it. Version 3 was great, but version 4 was weird from the outset. One thing was that they changed how they indicated which parts of a file I'd edited after opening it, and I never understood the idea behind the visuals. Worse was that auto-closing of HTML stopped working. Since I'm that <a href="https://rakhim.org/honestly-undefined/19/">weird dude who writes raw HTML</a>, such a feature is quite important to me. If I write an HTML tag, I expect the editor to automatically add the closing tag, and place my cursor between the two. Sublime Text stopped doing that consistently, and eventually it became annoying enough that I though: <em>Why bother?</em> Now I write in <a href="https://code.visualstudio.com/">Visual Studio Code</a>.
</p>
<p>
Microsoft is almost a chapter in itself, but to be fair, I don't consider Microsoft products <em>worse</em> than others. There's just so much of it, and since I've always been working in the Microsoft tech stack, I use a lot of it. Thus, <a href="https://en.wikipedia.org/wiki/Selection_bias">selection bias</a> clearly is at work here. Still, while I don't think Microsoft is worse than the competition, it seems to be part of the trend.
</p>
<p>
For years, my login screen was stuck on the same mountain lake, even though I tried every remedy suggested on the internet. Eventually, however, a new version of Windows fixed the issue. So, granted, sometimes new versions improve things.
</p>
<p>
Now, however, I have another problem with <a href="https://en.wikipedia.org/wiki/Windows_spotlight">Windows Spotlight</a>. It shows nice pictures, and there used to be an option to see where the picture was taken. Since I repaved my machine, this option is gone. Again, I've scoured the internet for resolutions to this problem, but neither rebooting, regedit changes, etc. has so far solved the problem.
</p>
<p>
That sounds like small problems, so let's consider something more serious. Half a year ago, Outlook used to be able to detect whether I was writing an email in English or Danish. It could even handle the hybrid scenario where parts of an email was in English, and parts in Danish. Since I repaved my machine, this feature no longer works. Outlook doesn't recognise Danish when I write it. One thing are the red squiggly lines under most words, but that's not even the worst. The worst part of this is that even though I'm writing in Danish, outlook thinks I'm writing in English, so it silently auto-corrects Danish words to whatever looks adjacent in English.
</p>
<p>
<img src="/content/binary/outlook-language-bug.png" alt="Screen shot of Outlook language bug.">
</p>
<p>
This became so annoying that I contacted Microsoft support about it, but while they had me try a number of things, nothing worked. They eventually had to give up and suggested that I reinstalled my machine - which, at that point, I'd done two weeks before.
</p>
<p>
This used to work, but now it doesn't.
</p>
<h3 id="b314b021b97f455184e44c52ab584afa">
It's not all bad <a href="#b314b021b97f455184e44c52ab584afa">#</a>
</h3>
<p>
I could go on with other examples, but I think that this suffices. After all, I don't think it makes for a compelling read.
</p>
<p>
Of course, not everything is bad. While it looks as though I'm particularly harping on Microsoft, I rarely detect problems with <a href="https://visualstudio.microsoft.com/">Visual Studio</a> or Code, and I usually install updates as soon as they are available. The same is true for much other software I use. <a href="https://www.getpaint.net/">Paint.NET</a> is awesome, <a href="https://www.getmusicbee.com/">MusicBee</a> is solid, and even the <a href="https://www.sonos.com/">Sonos</a> Windows app, while horrific, is at least consistently so.
</p>
<h3 id="708ec0a2be1d42f083c1c2d37c24221d">
Conclusion <a href="#708ec0a2be1d42f083c1c2d37c24221d">#</a>
</h3>
<p>
It seems to me that some software is actually getting worse, and that this is a more recent trend.
</p>
<p>
The point isn't that some software is bad. This has always been the case. What seems new to me is that software that <em>used to be good</em> deteriorates. While this wasn't entirely unheard of in the nineties (I'm looking at you, WordPerfect), this is becoming much more noticeable.
</p>
<p>
Perhaps it's just <a href="https://en.wikipedia.org/wiki/Frequency_illusion">frequency illusion</a>, or perhaps it's because I use software much more than I did in the nineties. Still, I can't shake the feeling that some software is deteriorating.
</p>
<p>
Why does this happen? I don't know, but my own bias suggests that it's because there's less focus on regression testing. Many of the problems I see look like regression bugs to me. A good engineering team could have caught them with automated regression tests, but these days, it seems as though many teams rely on releasing often and then letting users do the testing.
</p>
<p>
The problem with that approach, however, is that if you don't have good automated tests, fixing one regression may resurrect another.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="16748f443dbe46f6b9f039e3de5165bf">
<div class="comment-author"><a href="https://carlosschults.net">Carlos Schults</a> <a href="#16748f443dbe46f6b9f039e3de5165bf">#</a></div>
<div class="comment-content">
<p>I don't think your perception that software is getting worse is wrong, Mark. I've been an Evernote user since 2011. And, for a good portion of those years, I've been a paid customer.</p>
<p>
I'm certainly not alone in my perception that the application is becoming worse with each new release, to the point of, at times, becoming
unusable. Syncing problems with the mobile version, weird changes in the UI that don't accomplish anything, general sluggishness, and, above all,
not listening to the users regarding long-needed features.
</p>
</div>
<div class="comment-date">2023-07-29 18:05 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Works on most machineshttps://blog.ploeh.dk/2023/07/17/works-on-most-machines2023-07-17T08:01:00+00:00Mark Seemann
<div id="post">
<p>
<em>TDD encourages deployment flexibility. Functional programming also helps.</em>
</p>
<p>
Recently several of the podcasts I subscribe to have had episodes about various container technologies, of which <a href="https://en.wikipedia.org/wiki/Kubernetes">Kubernetes</a> dominates. I tune out of such content, since it has nothing to do with me.
</p>
<p>
I've never found containerisation relevant. I remember being fascinated when I first heard of <a href="https://en.wikipedia.org/wiki/Docker_(software)">Docker</a>, and for a while, I awaited a reason to use it. It never materialised.
</p>
<p>
I'd test-drive whatever system I was working on, and deploy it to production. Usually, it'd just work.
</p>
<p>
Since my process already produced good results, why make it more complicated?
</p>
<p>
Occasionally, I would become briefly aware of the lack of containers in my life, but then I'd forget about it again. Until now, I haven't thought much about it, and it's probably only the random coincidence of a few podcast episodes back-to-back that made me think more about it.
</p>
<h3 id="98b5a360a5ba413ca4dbccce86cbe331">
Be liberal with what system you run on <a href="#98b5a360a5ba413ca4dbccce86cbe331">#</a>
</h3>
<p>
When I was a beginner programmer a few years ago, things were different. I'd write code that <a href="https://blog.codinghorror.com/the-works-on-my-machine-certification-program/">worked on my machine</a>, but not always on the test server.
</p>
<p>
As I gained experience, this tended to happen less often. This doubtlessly have multiple causes, and increased experience is likely one of them, but I also think that my interest in loose coupling and test-driven development plays a role.
</p>
<p>
Increasingly I developed an ethos of writing software that would work on most machines, instead of only my own. It seems reminiscent of <a href="https://en.wikipedia.org/wiki/Robustness_principle">Postel's law</a>: Be liberal with what system you run on.
</p>
<p>
Test-driven development helps in that regard, because you write code that must be able to execute in at least two contexts: The test context, and the actual system context. These two contexts both exist on your machine.
</p>
<p>
A colleague once taught me: <em>The most difficult generalisation step is going from one to two</em>. Once you've generalised to two cases, it's much easier to generalise to three, four, or <em>n</em> cases.
</p>
<p>
It seems to me that such from-one-to-two-cases generalisation is an inadvertent by-product of test-driven development. Once your code already matches two different contexts, making it even more flexible isn't that much extra work. It's not even <a href="https://wiki.c2.com/?SpeculativeGenerality">speculative generality</a> because you also need to make it work on the production system and (one hopes) on a build server or continuous delivery pipeline. That's 3-4 contexts. Odds are that software that runs successfully in four separate contexts runs successfully on many more systems.
</p>
<h3 id="a0315ee333ff454fb1bd6814f5806121">
General-purpose modules <a href="#a0315ee333ff454fb1bd6814f5806121">#</a>
</h3>
<p>
In <a href="/ref/a-philosophy-of-software-design">A Philosophy of Software Design</a> John Ousterhout argues that one should aim for designing general-purpose objects or modules, rather than specialised APIs. He calls them <em>deep modules</em> and their counterparts <em>shallow modules</em>. On the surface, this seems to go against the grain of <a href="https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it">YAGNI</a>, but the way I understand the book, the point is rather that general-purpose solutions also solve special cases, and, when done right, the code doesn't have to be more complicated than the one that handles the special case.
</p>
<p>
As I write in <a href="https://www.goodreads.com/review/show/5498011140">my review of the book</a>, I think that there's a connection with test-driven development. General-purpose code is code that works in more than one situation, including automated testing environments. This is almost tautological. If it doesn't work in an automated test, an argument could be made that it's insufficiently general.
</p>
<p>
Likewise, general-purpose software should be able to work when deployed to more than one machine. It should even work on machines where other versions of that software already exist.
</p>
<p>
When you have general-purpose software, though, do you really need containers?
</p>
<h3 id="960f22bd15fb48b1a3d7dfddc9f60408">
Isolation <a href="#960f22bd15fb48b1a3d7dfddc9f60408">#</a>
</h3>
<p>
While I've routinely made use of test-driven development since 2003, I started my shift towards functional programming around ten years later. I think that this has amplified my code's flexibility.
</p>
<p>
As <a href="https://jessitron.com/">Jessica Kerr</a> <a href="http://www.functionalgeekery.com/episode-8-jessica-kerr">pointed out years ago</a>, a corollary of <a href="https://en.wikipedia.org/wiki/Referential_transparency">referential transparency</a> is that <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a> are <em>isolated</em> from their environment. Only input arguments affect the output of a pure function.
</p>
<p>
Ultimately, you may need to query the environment about various things, but in functional programming, querying the environment is impure, so you <a href="/2016/03/18/functional-architecture-is-ports-and-adapters">push it to the boundary of the system</a>. Functional programming encourages you to <a href="/2018/11/19/functional-architecture-a-definition">explicitly consider and separate impure actions from pure functions</a>. This implies that the environment-specific code is small, cohesive, and easy to review.
</p>
<h3 id="4775b8ab484c4833a6a5a86bac3b8b8e">
Conclusion <a href="#4775b8ab484c4833a6a5a86bac3b8b8e">#</a>
</h3>
<p>
For a while, when Docker was new, I expected it to be a technology that I'd eventually pick up and make part of my tool belt. As the years went by, that never happened. As a programmer, I've never had the need.
</p>
<p>
I think that a major contributor to that is that since I mostly develop software with test-driven development, the resulting software is already robust or flexible enough to run in multiple environments. Adding functional programming to the mix helps to achieve isolation from the run-time environment.
</p>
<p>
All of this seems to collaborate to enable code to work not just on my machine, but on most machines. Including containers.
</p>
<p>
Perhaps there are other reasons to use containers and Kubernetes. In a devops context, I could imagine that it makes deployment and operations easier. I don't know much about that, but I also don't mind. If someone wants to take the code I've written and run it in a container, that's fine. It's going to run there too.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="4012c2cddcb64a068c0b06b7989a676e">
<div class="comment-author">qfilip <a href="#4012c2cddcb64a068c0b06b7989a676e">#</a></div>
<div class="comment-content">
<p>
Commenting for the first time. I hope I made these changes in proper manner. Anyway...
</p>
<p>
Kubernetes usually also means the usage of cloud infrastructure, and as such, it can be automated (and change-tracked)
in various interesting ways. Is it worth it? Well, that depends as always... Docker isn't the only container technology supported by
k8s, but since it's the most popular one... they go hand in hand.
</p>
<p>
Docker is also very useful for enabling others to run your software on their machines. Recently,
we've been exploring some apps that consisted of ~4 services (web servers) and a database. All of them written
in different technologies (PHP, Java, C#). You don't have to setup environment variables. You don't need to have relevant SDKs
to build projects etc. Just run docker command, and spin them instantly on your PC.
</p>
<p>So there's that...</p>
<p>
Unrelated to the topic above, I'd like to ask you, if you could write an article on the specific subject. Or, if
the answer is short, comment me back. As an F# enthusiast, I find yours and <a href="https://fsharpforfunandprofit.com">Scott's</a>
blog very valuable. One thing I've failed to find here is why you don't like ORMs. I think the words were
<i>they solve a problem that we shouldn't have in the first place</i>. Since F# doesn't play too well with
Entity Framework, and I pretty much can't live without it... I'm curious if I'm missing something.
A different approach, way of thinking. I can work with raw SQL ofcourse... but the mapping... oh the mapping...
</p>
</div>
<div class="comment-date">2023-07-18 22:30 UTC</div>
</div>
<div class="comment" id="04d9d2a2e9884b0ba2a0049898b98e5f">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#04d9d2a2e9884b0ba2a0049898b98e5f">#</a></div>
<div class="comment-content">
<p>
I'm contemplating turning my response into a new article, but it may take some time before I get to it. I'll post here once I have a more thorough response.
</p>
</div>
<div class="comment-date">2023-07-23 13:56 UTC</div>
</div>
<div class="comment" id="2adbe12cf0e541e7b76cf39037c6a96c">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#2adbe12cf0e541e7b76cf39037c6a96c">#</a></div>
<div class="comment-content">
<p>
qfilip, thank you for writing. I've now published <a href="/2023/07/31/test-driving-the-pyramids-top">the article</a> that, among many other things, respond to your comment about containers.
</p>
<p>
I'll get back to your question about <a href="https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping">ORMs</a> as soon as possible.
</p>
</div>
<div class="comment-date">2023-07-31 07:01 UTC</div>
</div>
<div class="comment" id="1702d8d84d024d3b83b683e2589460f5">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#1702d8d84d024d3b83b683e2589460f5">#</a></div>
<div class="comment-content">
<p>
I'm still considering how to best address the question about ORMs, but in the meanwhile, I'd like to point interested readers to Ted Neward's famous article <a href="https://blogs.newardassociates.com/blog/2006/the-vietnam-of-computer-science.html">The Vietnam of Computer Science</a>.
</p>
</div>
<div class="comment-date">2023-08-14 20:01 UTC</div>
</div>
<div class="comment" id="74654595140446239d66bcb85fc51234">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#74654595140446239d66bcb85fc51234">#</a></div>
<div class="comment-content">
<p>
Finally, I'm happy to announce that I've written an article trying to explain my position: <a href="/2023/09/18/do-orms-reduce-the-need-for-mapping">Do ORMs reduce the need for mapping?</a>.
</p>
</div>
<div class="comment-date">2023-09-18 14:50 UTC</div>
</div>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.AI for doc commentshttps://blog.ploeh.dk/2023/07/10/ai-for-doc-comments2023-07-10T06:02:00+00:00Mark Seemann
<div id="post">
<p>
<em>A solution in search of a problem?</em>
</p>
<p>
I was recently listening to <a href="https://www.dotnetrocks.com/details/1850">a podcast episode</a> where the guest (among other things) enthused about how advances in <a href="https://en.wikipedia.org/wiki/Large_language_model">large language models</a> mean that you can now get these systems to write <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/xmldoc/">XML doc comments</a>.
</p>
<p>
You know, these things:
</p>
<p>
<pre><span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"><</span><span style="color:gray;">summary</span><span style="color:gray;">></span>
<span style="color:gray;">///</span><span style="color:green;"> Scorbles a dybliad.</span>
<span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"></</span><span style="color:gray;">summary</span><span style="color:gray;">></span>
<span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"><</span><span style="color:gray;">param</span> <span style="color:gray;">name</span><span style="color:gray;">=</span><span style="color:gray;">"</span>dybliad<span style="color:gray;">"</span><span style="color:gray;">></span><span style="color:green;">The dybliad to scorble.</span><span style="color:gray;"></</span><span style="color:gray;">param</span><span style="color:gray;">></span>
<span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"><</span><span style="color:gray;">param</span> <span style="color:gray;">name</span><span style="color:gray;">=</span><span style="color:gray;">"</span>flag<span style="color:gray;">"</span><span style="color:gray;">></span>
<span style="color:gray;">///</span><span style="color:green;"> A flag that controls wether scorbling is done pre- or postvotraid.</span>
<span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"></</span><span style="color:gray;">param</span><span style="color:gray;">></span>
<span style="color:gray;">///</span><span style="color:green;"> </span><span style="color:gray;"><</span><span style="color:gray;">returns</span><span style="color:gray;">></span><span style="color:green;">The scorbled dybliad.</span><span style="color:gray;"></</span><span style="color:gray;">returns</span><span style="color:gray;">></span>
<span style="color:blue;">public</span> <span style="color:blue;">string</span> Scorble(<span style="color:blue;">string</span> dybliad, <span style="color:blue;">bool</span> flag)</pre>
</p>
<p>
And it struck me how that's not the first time I've encountered that notion. Finally, you no longer need to write those tedious documentation comments in your code. Instead, you can get <a href="https://github.com/features/copilot">Github Copilot</a> or <a href="https://en.wikipedia.org/wiki/ChatGPT">ChatGPT</a> to write them for you.
</p>
<p>
When was the last time you wrote such comments?
</p>
<p>
I'm sure that there are readers who wrote some just yesterday, but generally, I rarely encounter them in the wild.
</p>
<p>
As a rule, I only write them when my modelling skills fail me so badly that I need to <a href="http://butunclebob.com/ArticleS.TimOttinger.ApologizeIncode">apologise in code</a>. Whenever I run into such a situation, I may as well take advantage of the format already in place for such things, but it's not taking up a big chunk of my time.
</p>
<p>
It's been a decade since I ran into a code base where doc comments were mandatory. When I had to write comments, I'd use <a href="https://submain.com/ghostdoc/">GhostDoc</a>, which used heuristics to produce 'documentation' on par with modern AI tools.
</p>
<p>
Whether you use GhostDoc, Github Copilot, or write the comments yourself, most of them tend to be equally inane and vacuous. Good design only amplifies this quality. The better names you use, and the more you leverage the type system to <a href="https://blog.janestreet.com/effective-ml-video">make illegal states unrepresentable</a>, the less you need the kind of documentation furnished by doc comments.
</p>
<p>
I find it striking that more than one person wax poetic about AI's ability to produce doc comments.
</p>
<p>
Is that, ultimately, the only thing we'll entrust to large language models?
</p>
<p>
I <a href="/2022/12/05/github-copilot-preliminary-experience-report">know that that they can do more than that</a>, but are we going to let them? Or is automatic doc comments a solution in search of a problem?
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Validating or verifying emailshttps://blog.ploeh.dk/2023/07/03/validating-or-verifying-emails2023-07-03T05:41:00+00:00Mark Seemann
<div id="post">
<p>
<em>On separating preconditions from business rules.</em>
</p>
<p>
My recent article <a href="/2023/06/26/validation-and-business-rules">Validation and business rules</a> elicited this question:
</p>
<blockquote>
<p>
"Regarding validation should be pure function, lets have user registration as an example, is checking the email address uniqueness a validation or a business rule? It may not be pure since the check involves persistence mechanism."
</p>
<footer><cite><a href="https://twitter.com/Cherif_b/status/1673906245172969473">Cherif BOUCHELAGHEM</a></cite></footer>
</blockquote>
<p>
This is a great opportunity to examine some heuristics in greater detail. As always, this mostly presents how I think about problems like this, and so doesn't represent any rigid universal truth.
</p>
<p>
The specific question is easily answered, but when the topic is email addresses and validation, I foresee several follow-up questions that I also find interesting.
</p>
<h3 id="579eacb264e94f3bb80aa6d5020df26f">
Uniqueness constraint <a href="#579eacb264e94f3bb80aa6d5020df26f">#</a>
</h3>
<p>
A new user signs up for a system, and as part of the registration process, you want to verify that the email address is unique. Is that validation or a business rule?
</p>
<p>
Again, I'm going to put the cart before the horse and first use the definition to answer the question.
</p>
<blockquote>
<p>
Validation is a <a href="https://en.wikipedia.org/wiki/Pure_function">pure function</a> that decides whether data is acceptable.
</p>
<footer><cite><a href="/2023/06/26/validation-and-business-rules">Validation and business rules</a></cite></footer>
</blockquote>
<p>
Can you implement the uniqueness constraint with a pure function? Not easily. What most systems would do, I'm assuming, is to keep track of users in some sort of data store. This means that in order to check whether or not a email address is unique, you'd have to query that database.
</p>
<p>
Querying a database is non-deterministic because you could be making multiple subsequent queries with the same input, yet receive differing responses. In this particular example, imagine that you ask the database whether <em>ann.siebel@example.com</em> is already registered, and the answer is <em>no, that address is new to us</em>.
</p>
<p>
Database queries are snapshots in time. All that answer tells you is that at the time of the query, the address would be unique in your database. While that answer travels over the network back to your code, a concurrent process might add that very address to the database. Thus, the next time you ask the same question: <em>Is ann.siebel@example.com already registered?</em> the answer would be: <em>Yes, we already know of that address</em>.
</p>
<p>
Verifying that the address is unique (most likely) involves an impure action, and so according to the above definition isn't a validation step. By the <a href="https://en.wikipedia.org/wiki/Law_of_excluded_middle">law of the the excluded middle</a>, then, it must be a business rule.
</p>
<p>
Using a different rule of thumb, <a href="https://en.wikipedia.org/wiki/Robert_C._Martin">Robert C. Martin</a> arrives at the same conclusion:
</p>
<blockquote>
<p>
"Uniqueness is semantic not syntactic, so I vote that uniqueness is a business rule not a validation rule."
</p>
<footer><cite><a href="https://twitter.com/unclebobmartin/status/1674023070611263493">Robert C. Martin</a></cite></footer>
</blockquote>
<p>
This highlights a point about this kind of analysis. Using functional purity is a heuristic shortcut to sorting verification problems. Those that are deterministic and have no side effects are validation problems, and those that are either non-deterministic or have side effects are not.
</p>
<p>
Being able to sort problems in this way is useful because it enables you to choose the right tool for the job, and to avoid the wrong tool. In this case, trying to address the uniqueness constraint with validation is likely to cause trouble.
</p>
<p>
Why is that? Because of what I already described. A database query is a snapshot in time. If you make a decision based on that snapshot, it may be the wrong decision once you reach a conclusion. Granted, when discussing user registration, the risk of several processes concurrently trying to register the same email address probably isn't that big, but in other domains, contention may be a substantial problem.
</p>
<p>
Being able to identify a uniqueness constraint as something that <em>isn't</em> validation enables you to avoid that kind of attempted solution. Instead, you may contemplate other designs. If you keep users in a relational database, the easiest solution is to put a uniqueness constraint on the <code>Email</code> column and let the database deal with the problem. Just be prepared to handle the exception that the <code>INSERT</code> statement may generate.
</p>
<p>
If you have another kind of data store, there are other ways to model the constraint. You can even do so using lock-free architectures, but that's out of scope for this article.
</p>
<h3 id="4ebc57fb6a4d40d4a168b454f63804fb">
Validation checks preconditions <a href="#4ebc57fb6a4d40d4a168b454f63804fb">#</a>
</h3>
<p>
<a href="/encapsulation-and-solid">Encapsulation</a> is an important part of object-oriented programming (<a href="/2022/10/24/encapsulation-in-functional-programming">and functional programming as well</a>). As I've often outlined, I base my understanding of encapsulation on <a href="/ref/oosc">Object-Oriented Software Construction</a>. I consider <em>contract</em> (preconditions, invariants, and postconditions) essential to encapsulation.
</p>
<p>
I'll borrow a figure from my article <a href="/2022/08/22/can-types-replace-validation">Can types replace validation?</a>:
</p>
<p>
<img src="/content/binary/validation-as-a-function-from-data-to-type.png" alt="An arrow labelled 'validation' pointing from a document to the left labelled 'Data' to a box to the right labelled 'Type'.">
</p>
<p>
The role of validation is to answer the question: <em>Does the data make sense?</em>
</p>
<p>
This question, and its answer, is typically context-dependent. What 'makes sense' means may differ. This is even true for email addresses.
</p>
<p>
When I wrote the example code for my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, I had to contemplate how to model email addresses. Here's an excerpt from the book:
</p>
<blockquote>
<p>
Email addresses are <a href="https://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx/">notoriously difficult to validate</a>, and even if you had a full implementation of the SMTP specification, what good would it do you?
</p>
<p>
Users can easily give you a bogus email address that fits the spec. The only way to really validate an email address is to send a message to it and see if that provokes a response (such as the user clicking on a validation link). That would be a long-running asynchronous process, so even if you'd want to do that, you can't do it as a blocking method call.
</p>
<p>
The bottom line is that it makes little sense to validate the email address, apart from checking that it isn't null. For that reason, I'm not going to validate it more than I've already done.
</p>
<footer><cite><a href="/ctfiyh">Code That Fits in Your Head</a>, p. 102</cite></footer>
</blockquote>
<p>
In this example, I decided that the only precondition I would need to demand was that the email address isn't null. This was motivated by the operations I needed to perform with the email address - or rather, in this case, the operations I didn't need to perform. The only thing I needed to do with the address was to save it in a database and send emails:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> Task EmailReservationCreated(<span style="color:blue;">int</span> restaurantId, Reservation reservation)
{
<span style="color:blue;">if</span> (reservation <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="color:blue;">throw</span> <span style="color:blue;">new</span> ArgumentNullException(nameof(reservation));
<span style="color:blue;">var</span> r = <span style="color:blue;">await</span> RestaurantDatabase.GetRestaurant(restaurantId).ConfigureAwait(<span style="color:blue;">false</span>);
<span style="color:blue;">var</span> subject = <span style="color:#a31515;">$"Your reservation for </span>{r?.Name}<span style="color:#a31515;">."</span>;
<span style="color:blue;">var</span> body = CreateBodyForCreated(reservation);
<span style="color:blue;">var</span> email = reservation.Email.ToString();
<span style="color:blue;">await</span> Send(subject, body, email).ConfigureAwait(<span style="color:blue;">false</span>);
}</pre>
</p>
<p>
This code example suggests why I made it a precondition that <code>Email</code> mustn't be null. Had null be allowed, I would have had to resort to <a href="/2013/07/08/defensive-coding">defensive coding, which is exactly what encapsulation makes redundant</a>.
</p>
<p>
Validation is a process that determines whether data is useful in a particular context. In this particular case, all it takes is to check the <code>Email</code> property on the <a href="https://en.wikipedia.org/wiki/Data_transfer_object">DTO</a>. The sample code that comes with <a href="/ctfiyh">Code That Fits in Your Head</a> shows the basics, while <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">An applicative reservation validation example in C#</a> contains a more advanced solution.
</p>
<h3 id="55a584c6f15d4899beac8e40190068d5">
Preconditions are context-dependent <a href="#55a584c6f15d4899beac8e40190068d5">#</a>
</h3>
<p>
I would assume that a normal user registration process has little need to validate an ostensible email address. A system may want to verify the address, but that's a completely different problem. It usually involves sending an email to the address in question and have some asynchronous process register if the user verifies that email. For an article related to this problem, see <a href="/2019/12/02/refactoring-registration-flow-to-functional-architecture">Refactoring registration flow to functional architecture</a>.
</p>
<p>
Perhaps you've been reading this with mounting frustration: <em>How about validating the address according to the SMTP spec?</em>
</p>
<p>
Indeed, that sounds like something one should do, but turns out to be rarely necessary. As already outlined, users can easily supply a bogus address like <code>foo@bar.com</code>. It's valid according to the spec, and so what? How does that information help you?
</p>
<p>
In most contexts I've found myself, validating according to the SMTP specification is a distraction. One might, however, imagine scenarios where it might be required. If, for example, you need to sort addresses according to user name or host name, or perform some filtering on those parts, etc. it might be warranted to actually <em>require</em> that the address is valid.
</p>
<p>
This would imply a <a href="https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/">validation step that attempts to parse</a> the address. Once again, parsing here implies translating less-structured data (a string) to more-structured data. On .NET, I'd consider using the <a href="https://learn.microsoft.com/dotnet/api/system.net.mail.mailaddress">MailAddress</a> class which already comes with <a href="https://learn.microsoft.com/dotnet/api/system.net.mail.mailaddress.trycreate">built-in parser functions</a>.
</p>
<p>
The point being that your needs determine your preconditions, which again determine what validation should do. The preconditions are context-dependent, and so is validation.
</p>
<h3 id="6e66e8eb91f742109a02def661115656">
Conclusion <a href="#6e66e8eb91f742109a02def661115656">#</a>
</h3>
<p>
Email addresses offer a welcome opportunity to discuss the difference between validation and verification in a way that is specific, but still, I hope, easy to extrapolate from.
</p>
<p>
Validation is a translation from one (less-structured) data format to another. Typically, the more-structured data format is an object, a record, or a hash map (depending on language). Thus, validation is determined by two forces: What the input data looks like, and what the desired object requires; that is, its preconditions.
</p>
<p>
Validation is always a translation with the potential for error. Some input, being less-structured, can't be represented by the more-structured format. In addition to parsing, a validation function must also be able to fail in a composable matter. That is, fortunately, <a href="/2020/12/14/validation-a-solved-problem">a solved problem</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Validation and business ruleshttps://blog.ploeh.dk/2023/06/26/validation-and-business-rules2023-06-26T06:05:00+00:00Mark Seemann
<div id="post">
<p>
<em>A definition of validation as distinguished from business rules.</em>
</p>
<p>
This article suggests a definition of <em>validation</em> in software development. <em>A</em> definition, not <em>the</em> definition. It presents how I currently distinguish between validation and business rules. I find the distinction useful, although perhaps it's a case of reversed causality. The following definition of <em>validation</em> is useful because, if defined like that, <a href="/2020/12/14/validation-a-solved-problem">it's a solved problem</a>.
</p>
<p>
My definition is this:
</p>
<p>
<em>Validation is a <a href="https://en.wikipedia.org/wiki/Pure_function">pure function</a> that decides whether data is acceptable.</em>
</p>
<p>
I've used the word <em>acceptable</em> because it suggests a link to <a href="https://en.wikipedia.org/wiki/Robustness_principle">Postel's law</a>. When validating, you may want to allow for some flexibility in input, even if, strictly speaking, it's not entirely on spec.
</p>
<p>
That's not, however, the key ingredient in my definition. The key is that validation should be a pure function.
</p>
<p>
While this may sound like an arbitrary requirement, there's a method to my madness.
</p>
<h3 id="8b91bccdf17f42fa9cfc93599e35bd6c">
Business rules <a href="#8b91bccdf17f42fa9cfc93599e35bd6c">#</a>
</h3>
<p>
Before I explain the benefits of the above definition, I think it'll be useful to outline typical problems that developers face. My thesis in <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> is that understanding limits of human cognition is a major factor in making a code base sustainable. This again explains why <a href="/encapsulation-and-solid">encapsulation</a> is such an important idea. You want to <em>confine</em> knowledge in small containers that fit in your head. Information shouldn't leak out of these containers, because that would require you to keep track of too much stuff when you try to understand other code.
</p>
<p>
When discussing encapsulation, I emphasise <em>contract</em> over information hiding. A contract, in the spirit of <a href="/ref/oosc">Object-Oriented Software Construction</a>, is a set of preconditions, invariants, and postconditions. Preconditions are particularly relevant to the topic of validation, but I've often experienced that some developers struggle to identify where validation ends and business rules begin.
</p>
<p>
Consider an online restaurant reservation system as an example. We'd like to implement a feature that enables users to make reservations. In order to meet that end, we decide to introduce a <code>Reservation</code> class. What are the preconditions for creating a valid instance of such a class?
</p>
<p>
When I go through such an exercise, people quickly identify requirement such as these:
</p>
<ul>
<li>The reservation should have a date and time.</li>
<li>The reservation should contain the number of guests.</li>
<li>The reservation should contain the name or email (or other data) about the person making the reservation.</li>
</ul>
<p>
A common suggestion is that the restaurant should also be able to accommodate the reservation; that is, it shouldn't be fully booked, it should have an available table at the desired time of an appropriate size, etc.
</p>
<p>
That, however, isn't a precondition for creating a valid <code>Reservation</code> object. That's a business rule.
</p>
<h3 id="5c9983d374e84212bbd37e7cd2476287">
Preconditions are self-contained <a href="#5c9983d374e84212bbd37e7cd2476287">#</a>
</h3>
<p>
How do you distinguish between a precondition and a business rule? And what does that have to do with input validation?
</p>
<p>
Notice that in the above examples, the three preconditions I've listed are self-contained. They are statements about the object or value's constituent parts. On the other hand, the requirement that the restaurant should be able to accommodate the reservation deals with a wider context: The table layout of the restaurant, prior reservations, opening and closing times, and other business rules as well.
</p>
<p>
Validation is, as <a href="https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/">Alexis King points out</a>, a parsing problem. You receive less-structured data (<a href="https://en.wikipedia.org/wiki/Comma-separated_values">CSV</a>, <a href="https://en.wikipedia.org/wiki/JSON">JSON</a>, <a href="https://en.wikipedia.org/wiki/XML">XML</a>, etc.) and attempt to project it to a more-structured format (C# objects, <a href="https://fsharp.org/">F#</a> records, <a href="https://clojure.org/">Clojure</a> maps, etc.). This succeeds when the input satisfies the preconditions, and fails otherwise.
</p>
<p>
Why can't we add more preconditions than required? Consider <a href="https://en.wikipedia.org/wiki/Robustness_principle">Postel's law</a>. An operation (and that includes object constructors) should be liberal in what it accepts. While you have to draw the line somewhere (you can't really work with a reservation if the date is missing), an object shouldn't require <em>more</em> than it needs.
</p>
<p>
In general we observe that the fewer pre-conditions, the easier it is to create an object (or equivalent functional data structure). As a counter-example, this explains why <a href="https://en.wikipedia.org/wiki/Active_record_pattern">Active Record</a> is antithetical to unit testing. One precondition is that there's a database available, and while not impossible to automate in tests, it's quite the hassle. It's easier to work with <a href="https://en.wikipedia.org/wiki/Plain_old_Java_object">POJOs</a> in tests. And unit tests, being <a href="/2011/11/10/TDDimprovesreusability">the first clients of an API</a>, tell you how easy it is to use that API.
</p>
<h3 id="56b5624f416c4648aab2b27216c10bb1">
Contracts with third parties <a href="#56b5624f416c4648aab2b27216c10bb1">#</a>
</h3>
<p>
If validation is fundamentally parsing, it seems reasonable that operations should be pure functions. After all, a parser operates on unchanging (less-structured) data. A programming-language parser takes contents of text files as input. There's little need for more input than that, and the output is expected to be deterministic. Not surprisingly, <a href="https://www.haskell.org/">Haskell</a> is well-suited for writing parsers.
</p>
<p>
You don't, however, have to buy the argument that validation is essentially parsing, so consider another perspective.
</p>
<p>
Validation is a data transformation step you perform to deal with input. Data comes from a source external to your system. It can be a user filling in a form, another program making an HTTP request, or a batch job that receives files over <a href="https://en.wikipedia.org/wiki/File_Transfer_Protocol">FTP</a>.
</p>
<p>
Even if you don't have a formal agreement with any third party, <a href="https://www.hyrumslaw.com/">Hyrum's law</a> implies that a contract does exist. It behoves you to pay attention to that, and make it as explicit as possible.
</p>
<p>
Such a contract should be stable. Third parties should be able to rely on deterministic behaviour. If they supply data one day, and you accept it, you can't reject the same data the next days on grounds that it was malformed. At best, you may <a href="/2021/12/13/backwards-compatibility-as-a-profunctor">be contravariant in input as time passes</a>; in other words, you may accept things tomorrow that you didn't accept today, but you may not reject tomorrow what you accepted today.
</p>
<p>
Likewise, you can't have validation rules that erratically accept data one minute, reject the same data the next minute, only to accept it later. This implies that validation must, at least, be deterministic: The same input should always produce the same output.
</p>
<p>
That's half of the way to <a href="https://en.wikipedia.org/wiki/Referential_transparency">referential transparency</a>. Do you need side effects in your validation logic? Hardly, so you might as well implement it as pure functions.
</p>
<h3 id="45e45eee8b8947dc9eb2743de5adac13">
Putting the cart before the horse <a href="#45e45eee8b8947dc9eb2743de5adac13">#</a>
</h3>
<p>
You may still think that my definition smells of a solution in search of a problem. Yes, pure functions are convenient, but does it naturally follow that validation should be implemented as pure functions? Isn't this a case of poor <a href="https://en.wikipedia.org/wiki/Retroactive_continuity">retconning</a>?
</p>
<p>
<img src="/content/binary/distinguishing-between-validation-and-business-rules.png" alt="Two buckets with a 'lid' labeled 'applicative validation' conveniently fitting over the validation bucket.">
</p>
<p>
When faced with the question: <em>What is validation, and what are business rules?</em> it's almost as though I've conveniently sized the <em>Validation</em> sorting bucket so that it perfectly aligns with <a href="/2018/11/05/applicative-validation">applicative validation</a>. Then, the <em>Business rules</em> bucket fits whatever is left. (In the figure, the two buckets are of equal size, which hardly reflects reality. I estimate that the <em>Business rules</em> bucket is much larger, but had I tried to illustrate that, too, in the figure, it would have looked akilter.)
</p>
<p>
This is suspiciously convenient, but consider this: My experience is that this perspective on validation works well. To a great degree, this is because <a href="/2020/12/14/validation-a-solved-problem">I consider validation a solved problem</a>. It's productive to be able to take a chunk of a larger problem and put it aside: <em>We know how to deal with this. There are no risks there.</em>
</p>
<p>
Definitions do, I believe, rarely spring fully formed from some <a href="https://en.wikipedia.org/wiki/Theory_of_forms">Platonic ideal</a>. Rather, people observe what works and eventually extract a condensed description and call it a definition. That's what I've attempted to do here.
</p>
<h3 id="62140574219942c9b34e6b84fe53bfb2">
Business rules change <a href="#62140574219942c9b34e6b84fe53bfb2">#</a>
</h3>
<p>
Let's return to the perspective of validation as a technical contract between your system and a third party. While that contract should be as stable as possible, business rules change.
</p>
<p>
Consider the online restaurant reservation example. Imagine that you're the third-party programmer, and that you've developed a client that can make reservations on behalf of users. When a user wants to make a reservation, there's always a risk that it's not possible. Your client should be able to handle that scenario.
</p>
<p>
Now the restaurant becomes so popular that it decides to change a rule. Earlier, you could make reservations for one, three, or five people, even though the restaurant only has tables for two, four, or six people. Based on its new-found popularity, the restaurant decides that it only accepts reservations for entire tables. Unless it's on the same day and they still have a free table.
</p>
<p>
This changes the <em>behaviour</em> of the system, but not the contract. A reservation for three is still <em>valid</em>, but will be declined because of the new rule.
</p>
<blockquote>
<p>
"Things that change at the same rate belong together. Things that change at different rates belong apart."
</p>
<footer><cite><a href="https://www.facebook.com/notes/kent-beck/naming-from-the-outside-in/464270190272517">Kent Beck</a></cite></footer>
</blockquote>
<p>
Business rules change at different rates than preconditions, so it makes sense to decouple those concerns.
</p>
<h3 id="ac220ae336c24be88aea366e64de39b4">
Conclusion <a href="#ac220ae336c24be88aea366e64de39b4">#</a>
</h3>
<p>
Since validation is a solved problem, it's useful to be able to identify what is validation, and what is something else. As long as an 'input rule' is self-contained (or parametrisable), deterministic, and has no side-effects, you can model it with applicative validation.
</p>
<p>
Equally useful is it to be able to spot when applicative validation isn't a good fit. While I'm sure that someone has published a <code>ValidationT</code> monad transformer for Haskell, I'm not sure I would recommend going that route. In other words, if some business operation involves impure actions, it's not going to fit the mold of applicative validation.
</p>
<p>
This doesn't mean that you can't implement business rules with pure functions. You can, but in my experience, abstractions other than applicative validation are more useful in those cases.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.When is an implementation detail an implementation detail?https://blog.ploeh.dk/2023/06/19/when-is-an-implementation-detail-an-implementation-detail2023-06-19T06:10:00+00:00Mark Seemann
<div id="post">
<p>
<em>On the tension between encapsulation and testability.</em>
</p>
<p>
This article is part of a series called <a href="/2023/02/13/epistemology-of-interaction-testing">Epistemology of interaction testing</a>. A <a href="/2023/03/13/confidence-from-facade-tests">previous article in the series</a> elicited this question:
</p>
<blockquote>
<p>
"following your suggestion, aren’t we testing implementation details?"
</p>
<footer><cite><a href="https://www.relativisticramblings.com/">Christer van der Meeren</a></cite></footer>
</blockquote>
<p>
This frequently-asked question reminds me of an old joke. I think that I first heard it in the eighties, a time when phones had <a href="https://en.wikipedia.org/wiki/Rotary_dial">rotary dials</a>, everyone smoked, you'd receive mail through your apartment door's <a href="https://en.wikipedia.org/wiki/Letter_box">letter slot</a>, and unemployment was high. It goes like this:
</p>
<p>
<em>A painter gets a helper from the unemployment office. A few days later the lady from the office calls the painter and apologizes deeply for the mistake.</em>
</p>
<p>
<em>"What mistake?"</em>
</p>
<p>
<em>"I'm so sorry, instead of a painter we sent you a gynaecologist. Please just let him go, we'll send you a..."</em>
</p>
<p>
<em>"Let him go? Are you nuts, he's my best worker! At the last job, they forgot to leave us the keys, and the guy painted the whole room through the letter slot!"</em>
</p>
<p>
I always think of this joke when the topic is testability. Should you test everything through a system's public API, or do you choose to expose some internal APIs in order to make the code more testable?
</p>
<h3 id="815100cf9c3b492f8abf392528fc5b1e">
Letter slots <a href="#815100cf9c3b492f8abf392528fc5b1e">#</a>
</h3>
<p>
Consider the simplest kind of program you could write: <a href="https://en.wikipedia.org/wiki/%22Hello,_World!%22_program">Hello world</a>. If you didn't consider automated testing, then an <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> C# implementation might look like this:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Program</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">static</span> <span style="color:blue;">void</span> Main(<span style="color:blue;">string</span>[] args)
{
Console<span style="color:#b4b4b4;">.</span>WriteLine(<span style="color:#a31515;">"Hello, World!"</span>);
}
}</pre>
</p>
<p>
(Yes, I know that with modern C# you can write such a program using a single <a href="https://learn.microsoft.com/dotnet/csharp/fundamentals/program-structure/top-level-statements">top-level statement</a>, but I'm writing for a broader audience, and only use C# as an example language.)
</p>
<p>
How do we test a program like that? Of course, no-one seriously suggests that we <em>really</em> need to test something that simple, but what if we make it a little more complex? What if we make it possible to supply a name as a command-line argument? What if we want to internationalise the program? What if we want to add a <em>help</em> feature? What if we want to add a feature so that we can send a <em>hello</em> to another recipient, on another machine? When does the behaviour become sufficiently complex to warrant automated testing, and how do we achieve that goal?
</p>
<p>
For now, I wish to focus on <em>how</em> to achieve the goal of testing software. For the sake of argument, then, assume that we want to test the above <em>hello world</em> program.
</p>
<p>
As given, we can run the program and verify that it prints <em>Hello, World!</em> to the console. This is easy to do as a manual test, but harder if you want to automate it.
</p>
<p>
You could write a test framework that automatically starts a new operating-system process (the program) and waits until it exits. This framework should be able to handle processes that exit with success and failure status codes, as well as processes that hang, or never start, or keep restarting... Such a framework also requires a way to capture the standard output stream in order to verify that the expected text is written to it.
</p>
<p>
I'm sure such frameworks exist for various operating systems and programming languages. There is, however, a simpler solution if you can live with the trade-off: You could open the API of your source code a bit:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Program</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">void</span> Main(<span style="color:blue;">string</span>[] args)
{
Console.WriteLine(<span style="color:#a31515;">"Hello, World!"</span>);
}
}</pre>
</p>
<p>
While I haven't changed the structure or the layout of the source code, I've made both class and method <code>public</code>. This means that I can now write a normal C# unit test that calls <code>Program.Main</code>.
</p>
<p>
I still need a way to observe the behaviour of the program, but <a href="https://stackoverflow.com/a/2139303/126014">there are known ways of redirecting the Console output in .NET</a> (and I'd be surprised if that wasn't the case on other platforms and programming languages).
</p>
<p>
As we add more and more features to the command-line program, we may be able to keep testing by calling <code>Program.Main</code> and asserting against the redirected <a href="https://learn.microsoft.com/dotnet/api/system.console">Console</a>. As the complexity of the program grows, however, this starts to look like painting a room through the letter slot.
</p>
<h3 id="4681beea02be47b89c9a5d7c2618fad7">
Adding new APIs <a href="#4681beea02be47b89c9a5d7c2618fad7">#</a>
</h3>
<p>
Real programs are usually more than just a command-line utility. They may be smartphone apps that react to user input or network events, or web services that respond to HTTP requests, or complex asynchronous systems that react to, and send messages over durable queues. Even good old batch jobs are likely to pull data from files in order to write to a database, or the other way around. Thus, the interface to the rest of the world is likely larger than just a single <code>Main</code> method.
</p>
<p>
Smartphone apps or message-based systems have event handlers. Web sites or services have classes, methods, or functions that handle incoming HTTP requests. These are essentially event handlers, too. This increases the size of the 'test surface': There are more than a single method you can invoke in order to exercise the system.
</p>
<p>
Even so, a real program will soon grow to a size where testing entirely through the real-world-facing API becomes reminiscent of painting through a letter slot. <a href="https://www.infoq.com/presentations/integration-tests-scam/">J.B. Rainsberger explains that one major problem is the combinatorial explosion of required test cases</a>.
</p>
<p>
Another problem is that the system may produce side effects that you care about. As a basic example, consider a system that, as part of its operation, sends emails. When testing this system, you want to verify that under certain circumstances, the system sends certain emails. How do you do that?
</p>
<p>
If the system has <em>absolutely no concessions to testability</em>, I can think of two options:
</p>
<ul>
<li>You contact the person to whom the system sends the email, and ask him or her to verify receipt of the email. You do that <em>every time</em> you test.</li>
<li>You deploy the System Under Test in an environment with an <a href="https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol">SMTP</a> gateway that redirects all email to another address.</li>
</ul>
<p>
Clearly the first option is unrealistic. The second option is a little better, but you still have to open an email inbox and look for the expected message. Doing so programmatically is, again, technically possible, and I'm sure that there are <a href="https://en.wikipedia.org/wiki/Post_Office_Protocol">POP3</a> or <a href="https://en.wikipedia.org/wiki/Internet_Message_Access_Protocol">IMAP</a> assertion libraries out there. Still, this seems complicated, error-prone, and slow.
</p>
<p>
What could we do instead? I would usually introduce a polymorphic interface such as <code>IPostOffice</code> as a way to substitute the real <code>SmtpPostOffice</code> with a <a href="https://martinfowler.com/bliki/TestDouble.html">Test Double</a>.
</p>
<p>
Notice what happens in these cases: We introduce (or make public) new APIs in order to facilitate automated testing.
</p>
<h3 id="0cc85d98c7ae437aa1156e72fc25dcf6">
Application-boundary API and internal APIs <a href="#0cc85d98c7ae437aa1156e72fc25dcf6">#</a>
</h3>
<p>
It's helpful to distinguish between the real-world-facing API and everything else. In this diagram, I've indicated the public-facing API as a thin green slice facing upwards (assuming that external stimulus - button clicks, HTTP requests, etc. - arrives from above).
</p>
<p>
<img src="/content/binary/public-and-internal-system-apis.png" alt="A box depicting a program, with a small green slice indicating public-facing APIs, and internal blue slices indicating internal APIs.">
</p>
<p>
The real-world-facing API is the code that <em>must</em> be present for the software to work. It could be a button-click handler or an ASP.NET <em>action method</em>:
</p>
<p>
<pre>[HttpPost(<span style="color:#a31515;">"restaurants/{restaurantId}/reservations"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task<ActionResult> Post(<span style="color:blue;">int</span> restaurantId, ReservationDto dto)</pre>
</p>
<p>
Of course, if you're using another web framework or another programming language, the details differ, but the application <em>has</em> to have code that handles an HTTP <em>POST</em> request on matching addresses. Or a button click, or a message that arrives on a message bus. You get the point.
</p>
<p>
These APIs are fairly fixed. If you change them, you change the externally observable behaviour of the system. Such changes are likely breaking changes.
</p>
<p>
Based on which framework and programming language you're using, the shape of these APIs will be given. Like I did with the above <code>Main</code> method, you can make it <code>public</code> and use it for testing.
</p>
<p>
A software system of even middling complexity will usually also be decomposed into smaller components. In the figure, I've indicated such subdivisions as boxes with gray outlines. Each of these may present an API to other parts of the system. I've indicated these APIs with light blue.
</p>
<p>
The total size of internal APIs is likely to be larger than the public-facing API. On the other hand, you can (theoretically) change these internal interfaces without breaking the observable behaviour of the system. This is called <a href="/ref/refactoring">refactoring</a>.
</p>
<p>
These internal APIs will often have <code>public</code> access modifiers. That doesn't make them real-world-facing. Be careful not to confuse programming-language <a href="https://en.wikipedia.org/wiki/Access_modifiers">access modifiers</a> with architectural concerns. Objects or their members can have <code>public</code> access modifiers even if the object plays an exclusively internal role. <a href="/2011/05/31/AttheBoundaries,ApplicationsareNotObject-Oriented">At the boundaries, applications aren't object-oriented</a>. And <a href="/2022/05/02/at-the-boundaries-applications-arent-functional">neither are they functional</a>.
</p>
<p>
Likewise, as the original <code>Main</code> method example shows, public APIs may be implemented with a <code>private</code> access modifier.
</p>
<p>
Why do such internal APIs exist? Is it only to support automated testing?
</p>
<h3 id="e6bf8d9b8617435f914d976c05cc9731">
Decomposition <a href="#e6bf8d9b8617435f914d976c05cc9731">#</a>
</h3>
<p>
If we introduce new code, such as the above <code>IPostOffice</code> interface, in order to facilitate testing, we have to be careful that it doesn't lead to <a href="https://dhh.dk/2014/test-induced-design-damage.html">test-induced design damage</a>. The idea that one might introduce an API exclusively to support automated testing rubs some people the wrong way.
</p>
<p>
On the other hand, we do introduce (or make public) APIs for other reasons, too. One common reason is that we want to decompose an application's source code so that parallel development is possible. One person (or team) works on one part, and other people work on other parts. If those parts need to communicate, we need to agree on a contract.
</p>
<p>
Such a contract exists for purely internal reasons. End users don't care, and never know of it. You can change it without impacting users, but you may need to coordinate with other teams.
</p>
<p>
What remains, though, is that we do decompose systems into internal parts, and we've done this since before <a href="https://en.wikipedia.org/wiki/David_Parnas">Parnas</a> wrote <em>On the Criteria to Be Used in Decomposing Systems into Modules</em>.
</p>
<p>
Successful test-driven development introduces <a href="http://wiki.c2.com/?SoftwareSeam">seams</a> where they ought to be in any case.
</p>
<h3 id="0df7cb8e24734176b20f4d61e7e24264">
Testing implementation details <a href="#0df7cb8e24734176b20f4d61e7e24264">#</a>
</h3>
<p>
An internal seam is an implementation detail. Even so, when designed with care, it can serve multiple purposes. It enables teams to develop in parallel, and it enables automated testing.
</p>
<p>
Consider the example from <a href="/2023/04/03/an-abstract-example-of-refactoring-from-interaction-based-to-property-based-testing">a previous article in this series</a>. I'll repeat one of the tests here:
</p>
<p>
<pre>[Theory]
[AutoData]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">HappyPath</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">state</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">code</span>, (<span style="color:blue;">string</span>, <span style="color:blue;">bool</span>, Uri) <span style="font-weight:bold;color:#1f377f;">knownState</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">response</span>)
{
_repository.Add(state, knownState);
_stateValidator
.Setup(<span style="font-weight:bold;color:#1f377f;">validator</span> => validator.Validate(code, knownState))
.Returns(<span style="color:blue;">true</span>);
_renderer
.Setup(<span style="font-weight:bold;color:#1f377f;">renderer</span> => renderer.Success(knownState))
.Returns(response);
_target
.Complete(state, code)
.Should().Be(response);
}</pre>
</p>
<p>
This test exercises a <em>happy-path</em> case by manipulating <code>IStateValidator</code> and <code>IRenderer</code> Test Doubles. It's a common approach to testability, and what <a href="https://en.wikipedia.org/wiki/David_Heinemeier_Hansson">dhh</a> would label test-induced design damage. While I'm sympathetic to that position, that's not my point. My point is that I consider <code>IStateValidator</code> and <code>IRenderer</code> internal APIs. End users (who probably don't even know what C# is) don't care about these interfaces.
</p>
<p>
Tests like these test against implementation details.
</p>
<p>
This need not be a problem. If you've designed good, stable seams then these tests can serve you for a long time. Testing against implementation details become a problem if those details change. Since it's hard to predict how things change in the future, it behoves us to decouple tests from implementation details as much as possible.
</p>
<p>
The alternative, however, is mail-slot testing, which comes with its own set of problems. Thus, judicious introduction of seams is helpful, even if it couples tests to implementation details.
</p>
<p>
Actually, in the question I quoted above, Christer van der Meeren asked whether my proposed alternative isn't testing implementation details. And, yes, that style of testing <em>also</em> relies on implementation details for testing. It's just a different way to design seams. Instead of designing seams around polymorphic objects, we design them around pure functions and immutable data.
</p>
<p>
There are, I think, advantages to functional programming, but when it comes to relying on implementation details, it's only on par with object-oriented design. Not worse, not better, but the same.
</p>
<h3 id="d5e57741df0247d5a75879a75ca588dc">
Conclusion <a href="#d5e57741df0247d5a75879a75ca588dc">#</a>
</h3>
<p>
Every API in use carries a cost. You need to keep the API stable so that users can use it tomorrow like they did yesterday. This can make it difficult to evolve or improve an API, because you risk introducing a breaking change.
</p>
<p>
There are APIs that a system <em>must</em> have. Software exists to be used, and whether that entails a user clicking on a button or another computer system sending a message to your system, your code must handle such stimulus. This is your real-world-facing contract, and you need to be careful to keep it consistent. The smaller that surface area is, the simpler that task is.
</p>
<p>
The same line of reasoning applies to internal APIs. While end users aren't impacted by changes in internal seams, other code is. If you change an implementation detail, this could cost maintenance work somewhere else. (Modern IDEs can handle some changes like that automatically, such as method renames. In those cases, the cost of change is low.) Therefore, it pays to minimise the internal seams as much as possible. One way to do this is by <a href="/2022/11/21/decouple-to-delete">decoupling to delete code</a>.
</p>
<p>
Still, some internal APIs are warranted. They help you decompose a large system into smaller subparts. While there's a potential maintenance cost with every internal API, there's also the advantage of working with smaller, independent units of code. Often, the benefits are larger than the cost.
</p>
<p>
When done well, such internal seams are useful testing APIs as well. They're still implementation details, though.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Collatz sequences by function compositionhttps://blog.ploeh.dk/2023/06/12/collatz-sequences-by-function-composition2023-06-12T05:27:00+00:00Mark Seemann
<div id="post">
<p>
<em>Mostly in C#, with a few lines of Haskell code.</em>
</p>
<p>
A <a href="/2023/05/08/is-cyclomatic-complexity-really-related-to-branch-coverage">recent article</a> elicited more comments than usual, and I've been so unusually buried in work that only now do I have a little time to respond to some of them. In <a href="/2023/05/08/is-cyclomatic-complexity-really-related-to-branch-coverage#02568f995d91432da540858644b61e89">one comment</a> <a href="http://github.com/neongraal">Struan Judd</a> offers a refactored version of my <a href="https://en.wikipedia.org/wiki/Collatz_conjecture">Collatz sequence</a> in order to shed light on the relationship between <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a> and test case coverage.
</p>
<p>
Struan Judd's agenda is different from what I have in mind in this article, but the comment inspired me to refactor my own code. I wanted to see what it would look like with this constraint: It should be possible to test odd input numbers without exercising the code branches related to even numbers.
</p>
<p>
The problem with more naive implementations of Collatz sequence generators is that (apart from when the input is <em>1</em>) the sequence ends with a tail of even numbers halving down to <em>1</em>. I'll start with a simple example to show what I mean.
</p>
<h3 id="ccd16365f7ce4842870f90e43267c33f">
Standard recursion <a href="#ccd16365f7ce4842870f90e43267c33f">#</a>
</h3>
<p>
At first I thought that my confusion originated from the imperative structure of the original example. For more than a decade, I've preferred functional programming (FP), and even when I write object-oriented code, I tend to use concepts and patterns from FP. Thus I, naively, rewrote my Collatz generator as a recursive function:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IReadOnlyCollection<<span style="color:blue;">int</span>> Sequence(<span style="color:blue;">int</span> n)
{
<span style="color:blue;">if</span> (n < 1)
<span style="color:blue;">throw</span> <span style="color:blue;">new</span> ArgumentOutOfRangeException(
nameof(n),
<span style="color:#a31515;">$"Only natural numbers allowed, but given </span>{n}<span style="color:#a31515;">."</span>);
<span style="color:blue;">if</span> (n == 1)
<span style="color:blue;">return</span> <span style="color:blue;">new</span>[] { n };
<span style="color:blue;">else</span>
<span style="color:blue;">if</span> (n % 2 == 0)
<span style="color:blue;">return</span> <span style="color:blue;">new</span>[] { n }.Concat(Sequence(n / 2)).ToArray();
<span style="color:blue;">else</span>
<span style="color:blue;">return</span> <span style="color:blue;">new</span>[] { n }.Concat(Sequence(n * 3 + 1)).ToArray();
}</pre>
</p>
<p>
Recursion is usually not recommended in C#, because a sufficiently long sequence could blow the call stack. I wouldn't write production C# code like this, but you could do something like this in <a href="https://fsharp.org/">F#</a> or <a href="https://www.haskell.org/">Haskell</a> where the languages offer solutions to that problem. In other words, the above example is only for educational purposes.
</p>
<p>
It doesn't, however, solve the problem that confused me: If you want to test the branch that deals with odd numbers, you can't avoid also exercising the branch that deals with even numbers.
</p>
<h3 id="4587120e2d9e483aba8cf7297704eb28">
Calculating the next value <a href="#4587120e2d9e483aba8cf7297704eb28">#</a>
</h3>
<p>
In functional programming, you solve most problems by decomposing them into smaller problems and then compose the smaller <a href="/2018/03/05/some-design-patterns-as-universal-abstractions">Lego bricks</a> with standard combinators. It seemed like a natural refactoring step to first pull the calculation of the next value into an independent function:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">int</span> Next(<span style="color:blue;">int</span> n)
{
<span style="color:blue;">if</span> ((n % 2) == 0)
<span style="color:blue;">return</span> n / 2;
<span style="color:blue;">else</span>
<span style="color:blue;">return</span> n * 3 + 1;
}</pre>
</p>
<p>
This function has a cyclomatic complexity of <em>2</em> and no loops or recursion. Test cases that exercise the even branch never touch the odd branch, and vice versa.
</p>
<p>
A parametrised test might look like this:
</p>
<p>
<pre>[Theory]
[InlineData( 2, 1)]
[InlineData( 3, 10)]
[InlineData( 4, 2)]
[InlineData( 5, 16)]
[InlineData( 6, 3)]
[InlineData( 7, 22)]
[InlineData( 8, 4)]
[InlineData( 9, 28)]
[InlineData(10, 5)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> NextExamples(<span style="color:blue;">int</span> n, <span style="color:blue;">int</span> expected)
{
<span style="color:blue;">int</span> actual = Collatz.Next(n);
Assert.Equal(expected, actual);
}</pre>
</p>
<p>
The <code>NextExamples</code> test obviously defines more than the two test cases that are required to cover the <code>Next</code> function, but since <a href="/2015/11/16/code-coverage-is-a-useless-target-measure">code coverage shouldn't be used as a target measure</a>, I felt that more than two test cases were warranted. This often happens, and should be considered normal.
</p>
<h3 id="e46d3353db8b4d50bcbe595dbbe3dbd0">
A Haskell proof of concept <a href="#e46d3353db8b4d50bcbe595dbbe3dbd0">#</a>
</h3>
<p>
While I had a general idea about the direction in which I wanted to go, I felt that I lacked some standard functional building blocks in C#: Most notably an infinite, lazy sequence generator. Before moving on with the C# code, I threw together a proof of concept in Haskell.
</p>
<p>
The <code>next</code> function is just a one-liner (if you ignore the optional type declaration):
</p>
<p>
<pre><span style="color:#2b91af;">next</span> <span style="color:blue;">::</span> <span style="color:blue;">Integral</span> a <span style="color:blue;">=></span> a <span style="color:blue;">-></span> a
next n = <span style="color:blue;">if</span> <span style="color:blue;">even</span> n <span style="color:blue;">then</span> n `div` 2 <span style="color:blue;">else</span> n * 3 + 1</pre>
</p>
<p>
A few examples in GHCi suggest that it works as intended:
</p>
<p>
<pre>ghci> next 2
1
ghci> next 3
10
ghci> next 4
2
ghci> next 5
16</pre>
</p>
<p>
Haskell comes with enough built-in functions that that was all I needed to implement a Colaltz-sequence generator:
</p>
<p>
<pre><span style="color:#2b91af;">collatz</span> <span style="color:blue;">::</span> <span style="color:blue;">Integral</span> a <span style="color:blue;">=></span> a <span style="color:blue;">-></span> [a]
collatz n = (<span style="color:blue;">takeWhile</span> (1 <) $ <span style="color:blue;">iterate</span> next n) ++ [1]</pre>
</p>
<p>
Again, a few examples suggest that it works as intended:
</p>
<p>
<pre>ghci> collatz 1
[1]
ghci> collatz 2
[2,1]
ghci> collatz 3
[3,10,5,16,8,4,2,1]
ghci> collatz 4
[4,2,1]
ghci> collatz 5
[5,16,8,4,2,1]</pre>
</p>
<p>
I should point out, for good measure, that since this is a proof of concept I didn't add a Guard Clause against zero or negative numbers. I'll keep that in the C# code.
</p>
<h3 id="3c46180459334e86a278f9934fa4b032">
Generator <a href="#3c46180459334e86a278f9934fa4b032">#</a>
</h3>
<p>
While C# does come with a <a href="https://learn.microsoft.com/dotnet/api/system.linq.enumerable.takewhile">TakeWhile</a> function, there's no direct equivalent to Haskell's <a href="https://hackage.haskell.org/package/base/docs/Prelude.html#v:iterate">iterate</a> function. It's not difficult to implement, though:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IEnumerable<T> Iterate<<span style="color:#2b91af;">T</span>>(Func<T, T> f, T x)
{
<span style="color:blue;">var</span> current = x;
<span style="color:blue;">while</span> (<span style="color:blue;">true</span>)
{
<span style="color:blue;">yield</span> <span style="color:blue;">return</span> current;
current = f(current);
}
}</pre>
</p>
<p>
While this <code>Iterate</code> implementation has a cyclomatic complexity of only <em>2</em>, it exhibits the same kind of problem as the previous attempts at a Collatz-sequence generator: You can't test one branch without testing the other. Here, it even seems as though it's impossible to test the branch that skips the loop.
</p>
<p>
In Haskell the <code>iterate</code> function is simply a lazily-evaluated recursive function, but that's not going to solve the problem in the C# case. On the other hand, it helps to know that the <code>yield</code> keyword in C# is just syntactic sugar over a compiler-generated <a href="https://en.wikipedia.org/wiki/Iterator_pattern">Iterator</a>.
</p>
<p>
Just for the exercise, then, I decided to write an explicit Iterator instead.
</p>
<h3 id="a834d92c5d864109aff31cb270c06b83">
Iterator <a href="#a834d92c5d864109aff31cb270c06b83">#</a>
</h3>
<p>
For the sole purpose of demonstrating that it's possible to refactor the code so that branches are independent of each other, I rewrote the <code>Iterate</code> function to return an explicit <code>IEnumerable<T></code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IEnumerable<T> Iterate<<span style="color:#2b91af;">T</span>>(Func<T, T> f, T x)
{
<span style="color:blue;">return</span> <span style="color:blue;">new</span> Iterable<T>(f, x);
}</pre>
</p>
<p>
The <code><span style="color:#2b91af;">Iterable</span><<span style="color:#2b91af;">T</span>></code> class is a private helper class, and only exists to return an <code>IEnumerator<T></code>:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Iterable</span><<span style="color:#2b91af;">T</span>> : IEnumerable<T>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> Func<T, T> f;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> T x;
<span style="color:blue;">public</span> <span style="color:#2b91af;">Iterable</span>(Func<T, T> f, T x)
{
<span style="color:blue;">this</span>.f = f;
<span style="color:blue;">this</span>.x = x;
}
<span style="color:blue;">public</span> IEnumerator<T> GetEnumerator()
{
<span style="color:blue;">return</span> <span style="color:blue;">new</span> Iterator<T>(f, x);
}
IEnumerator IEnumerable.GetEnumerator()
{
<span style="color:blue;">return</span> GetEnumerator();
}
}</pre>
</p>
<p>
The <code><span style="color:#2b91af;">Iterator</span><<span style="color:#2b91af;">T</span>></code> class does the heavy lifting:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Iterator</span><<span style="color:#2b91af;">T</span>> : IEnumerator<T>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> Func<T, T> f;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> T original;
<span style="color:blue;">private</span> <span style="color:blue;">bool</span> iterating;
<span style="color:blue;">internal</span> <span style="color:#2b91af;">Iterator</span>(Func<T, T> f, T x)
{
<span style="color:blue;">this</span>.f = f;
original = x;
Current = x;
}
<span style="color:blue;">public</span> T Current { <span style="color:blue;">get</span>; <span style="color:blue;">private</span> <span style="color:blue;">set</span>; }
[MaybeNull]
<span style="color:blue;">object</span> IEnumerator.Current => Current;
<span style="color:blue;">public</span> <span style="color:blue;">void</span> Dispose()
{
}
<span style="color:blue;">public</span> <span style="color:blue;">bool</span> MoveNext()
{
<span style="color:blue;">if</span> (iterating)
Current = f(Current);
<span style="color:blue;">else</span>
iterating = <span style="color:blue;">true</span>;
<span style="color:blue;">return</span> <span style="color:blue;">true</span>;
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> Reset()
{
Current = original;
iterating = <span style="color:blue;">false</span>;
}
}</pre>
</p>
<p>
I can't think of a situation where I would write code like this in a real production code base. Again, I want to stress that this is only an exploration of what's possible. What this does show is that all members have low cyclomatic complexity, and none of them involve looping or recursion. Only one method, <code>MoveNext</code>, has a cyclomatic complexity greater than one, and its branches are independent.
</p>
<h3 id="f8030072341448bd80c8c0af055abc10">
Composition <a href="#f8030072341448bd80c8c0af055abc10">#</a>
</h3>
<p>
All Lego bricks are now in place, enabling me to compose the <code>Sequence</code> like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IReadOnlyCollection<<span style="color:blue;">int</span>> Sequence(<span style="color:blue;">int</span> n)
{
<span style="color:blue;">if</span> (n < 1)
<span style="color:blue;">throw</span> <span style="color:blue;">new</span> ArgumentOutOfRangeException(
nameof(n),
<span style="color:#a31515;">$"Only natural numbers allowed, but given </span>{n}<span style="color:#a31515;">."</span>);
<span style="color:blue;">return</span> Generator.Iterate(Next, n).TakeWhile(i => 1 < i).Append(1).ToList();
}</pre>
</p>
<p>
This function has a cyclomatic complexity of <em>2</em>, and each branch can be exercised independently of the other.
</p>
<p>
Which is what I wanted to accomplish.
</p>
<h3 id="e32a8549e23446ba8c9cffd5739d62d6">
Conclusion <a href="#e32a8549e23446ba8c9cffd5739d62d6">#</a>
</h3>
<p>
I'm still re-orienting myself when it comes to understanding the relationship between cyclomatic complexity and test coverage. As part of that work, I wanted to refactor the Collatz code I originally showed. This article shows one way to decompose and reassemble the function in such a way that all branches are independent of each other, so that each can be covered by test cases without exercising the other branch.
</p>
<p>
I don't know if this is useful to anyone else, but I found the hours well-spent.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="b43aefd2fa5f4e7a916a31587fa4886e">
<div class="comment-author"><a href="https://about.me/tysonwilliams">Tyson Williams</a> <a href="#b43aefd2fa5f4e7a916a31587fa4886e">#</a></div>
<div class="comment-content">
<p>
I really like this article. So much so that I tried to implement this approach for a recursive function at my work. However, I realized that there are some required conditions.
</p>
<p>
First, the recusrive funciton must be tail recursive. Second, the recursive function must be closed (i.e. the output is a subset/subtype of the input). Neither of those were true for my function at work. An example of a function that doesn't satisfy either of these conditions is the function that computes the depth of a tree.
</p>
<p>
A less serious issue is that your code, as currently implemented, requires that there only be one base case value. The issue is that you have duplicated code: the unique base case value appears both in the call to TakeWhile and in the subsequent call to Append. Instead of repeating yourself, I recommend defining an extension method on Enumerable called TakeUntil that works like TakeWhile but also returns the first value on which the predicate returned false. <a href="https://stackoverflow.com/questions/2242318/how-could-i-take-1-more-item-from-linqs-takewhile/6817553#6817553">Here</a> is an implementation of that extension method.
</p>
</div>
<div class="comment-date">2023-06-22 13:45 UTC</div>
</div>
<div class="comment" id="420c5aef12504e048d5f8c6d2691f0fa">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#420c5aef12504e048d5f8c6d2691f0fa">#</a></div>
<div class="comment-content">
<p>
Tyson, thank you for writing. I suppose that you can't share the function that you mention, so I'll have to discuss it in general terms.
</p>
<p>
As far as I can tell you can always(?) <a href="/2015/12/22/tail-recurse">refactor non-tail-recursive functions to tail-recursive implementations</a>. In practice, however, there's rarely need for that, since you can usually separate the problem into a general-purpose library function on the one hand, and your special function on the other. Examples of general-purpose functions are the various maps and folds. If none of the standard functions do the trick, the type's associated <a href="/2019/04/29/catamorphisms">catamorphism</a> ought to.
</p>
<p>
One example of that is computing the depth of a tree, which we've <a href="/2019/08/05/rose-tree-catamorphism">already discussed</a>.
</p>
<p>
I don't insist that any of this is universally true, so if you have another counter-example, I'd be keen to see it.
</p>
<p>
You are, of course, right about using a <code>TakeUntil</code> extension instead. I was, however, trying to use as many built-in components as possible, so as to not unduly confuse casual readers.
</p>
</div>
<div class="comment-date">2023-06-27 12:35 UTC</div>
</div>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.The Git repository that vanishedhttps://blog.ploeh.dk/2023/06/05/the-git-repository-that-vanished2023-06-05T06:38:00+00:00Mark Seemann
<div id="post">
<p>
<em>A pair of simple operations resurrected it.</em>
</p>
<p>
The other day I had an 'interesting' experience. I was about to create a small pull request, so I checked out a new branch in Git and switched to my editor in order to start coding when the battery on my laptop died.
</p>
<p>
Clearly, when this happens, the computer immediately stops, without any graceful shutdown.
</p>
<p>
I plugged in the laptop and booted it. When I navigated to the source code folder I was working on, the files where there, but it was no longer a Git repository!
</p>
<h3 id="25e1ded964e041ba82394682a7ce046b">
Git is fixable <a href="#25e1ded964e041ba82394682a7ce046b">#</a>
</h3>
<p>
Git is more complex, and more powerful, than most developers care to deal with. Over the years, I've observed hundreds of people interact with Git in various ways, and most tend to give up at the first sign of trouble.
</p>
<p>
The point of this article isn't to point fingers at anyone, but rather to serve as a gentle reminder that Git tends to be eminently fixable.
</p>
<p>
Often, when people run into problems with Git, their only recourse is to delete the repository and clone it again. I've seen people do that enough times to realise that it might be helpful to point out: <em>You may not have to do that.</em>
</p>
<h3 id="c7afb00a22cd49f0a86b3c2e9f560d91">
Corruption <a href="#c7afb00a22cd49f0a86b3c2e9f560d91">#</a>
</h3>
<p>
Since I <a href="https://stackoverflow.blog/2022/12/19/use-git-tactically/">use Git tactically</a> I have many repositories on my machine that have no remotes. In those cases, deleting the entire directory and cloning it from the remote isn't an option. I do take backups, though.
</p>
<p>
Still, in this story, the repository I was working with <em>did</em> have a remote. Even so, I was reluctant to delete everything and start over, since I had multiple branches and stashes I'd used for various experiments. Many of those I'd never pushed to the remote, so starting over would mean that I'd lose all of that. It was, perhaps, not a catastrophe, but I would certainly prefer to restore my local repository, if possible.
</p>
<p>
The symptoms were these: When you work with Git in Git Bash, the prompt will indicate which branch you're on. That information was absent, so I was already worried. A quick query confirmed my fears:
</p>
<p>
<pre>$ git status
fatal: not a git repository (or any of the parent directories): .git</pre>
</p>
<p>
All the source code was there, but it looked as though the Git repository was gone. The code still compiled, but there was no source history.
</p>
<p>
Since all code files were there, I had hope. It helps knowing that Git, too, is file-based, and all files are in a hidden directory called <code>.git</code>. If all the source code was still there, perhaps the <code>.git</code> files were there, too. Why wouldn't they be?
</p>
<p>
<pre>$ ls .git
COMMIT_EDITMSG description gitk.cache hooks/ info/ modules/ objects/ packed-refs
config FETCH_HEAD HEAD index logs/ ms-persist.xml ORIG_HEAD refs/</pre>
</p>
<p>
Jolly good! The <code>.git</code> files were still there.
</p>
<p>
I now had a hypothesis: The unexpected shutdown of my machine had left some 'dangling pointers' in <code>.git</code>. A modern operating system may delay writes to disk, so perhaps my <code>git checkout</code> command had never made it all the way to disk - or, at least, not all of it.
</p>
<p>
If the repository was 'merely' corrupted in the sense that a few of the reference pointers had gone missing, perhaps it was fixable.
</p>
<h3 id="a875078b3cbf466cbc8bf3396f9f8ce2">
Empty-headed <a href="#a875078b3cbf466cbc8bf3396f9f8ce2">#</a>
</h3>
<p>
A few web searches indicated that the problem might be with the <code>HEAD</code> file, so I investigated its contents:
</p>
<p>
<pre>$ cat .git/HEAD
</pre>
</p>
<p>
That was all. No output. The <code>HEAD</code> file was empty.
</p>
<p>
That file is not supposed to be empty. It's supposed to contain a commit ID or a reference that tells the Git CLI what the current <em>head</em> is - that is, which commit is currently checked out.
</p>
<p>
While I had checked out a new branch when my computer shut down, I hadn't written any code yet. Thus, the easiest remedy would be to restore the head to <code>master</code>. So I opened the <code>HEAD</code> file in Vim and added this to it:
</p>
<p>
<pre>ref: refs/heads/master</pre>
</p>
<p>
And just like that, the entire Git repository returned!
</p>
<h3 id="ce1dfafa82dd4ea0b4c6d5603f0111cf">
Bad object <a href="#ce1dfafa82dd4ea0b4c6d5603f0111cf">#</a>
</h3>
<p>
The branches, the history, everything looked as though it was restored. A little more investigation, however, revealed one more problem:
</p>
<p>
<pre>$ git log --oneline --all
fatal: bad object refs/heads/some-branch</pre>
</p>
<p>
While a normal <code>git log</code> command worked fine, as soon as I added the <code>--all</code> switch, I got that <code>bad object</code> error message, with the name of the branch I had just created before the computer shut down. (The name of that branch wasn't <code>some-branch</code> - that's just a surrogate I'm using for this article.)
</p>
<p>
Perhaps this was the same kind of problem, so I explored the <code>.git</code> directory further and soon discovered a <code>some-branch</code> file in <code>.git/refs/heads/</code>. What did the contents look like?
</p>
<p>
<pre>$ cat .git/refs/heads/some-branch
</pre>
</p>
<p>
Another empty file!
</p>
<p>
Since I had never committed any work to that branch, the easiest fix was to simply delete the file:
</p>
<p>
<pre>$ rm .git/refs/heads/some-branch</pre>
</p>
<p>
That solved that problem as well. No more <code>fatal: bad object</code> error when using the <code>--all</code> switch with <code>git log</code>.
</p>
<p>
No more problems have shown up since then.
</p>
<h3 id="018956b830c74f3797fff92449045b37">
Conclusion <a href="#018956b830c74f3797fff92449045b37">#</a>
</h3>
<p>
My experience with Git is that it's so powerful that you can often run into trouble. On the other hand, it's also so powerful that you can also use it to extricate yourself from trouble. Learning how to do that will teach you how to use Git to your advantage.
</p>
<p>
The problem that I ran into here wasn't fixable with the Git CLI itself, but turned out to still be easily remedied. A Git guru like <a href="https://megakemp.com/">Enrico Campidoglio</a> could most likely have solved my problems without even searching the web. The details of how to solve the problems were new to me, but it took me a few web searches and perhaps five-ten minutes to fix them.
</p>
<p>
The point of this article, then, isn't in the details. It's that it pays to do a little investigation when you run into problems with Git. I already knew that, but I thought that this little story was a good occasion to share that knowledge.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Favour flat code file foldershttps://blog.ploeh.dk/2023/05/29/favour-flat-code-file-folders2023-05-29T19:20:00+00:00Mark Seemann
<div id="post">
<p>
<em>How code files are organised is hardly related to sustainability of code bases.</em>
</p>
<p>
My recent article <a href="/2023/05/15/folders-versus-namespaces">Folders versus namespaces</a> prompted some reactions. A few kind people <a href="https://twitter.com/Savlambda/status/1658453377489960960">shared how they organise code bases</a>, both on Twitter and in the comments. Most reactions, however, carry the (subliminal?) subtext that organising code in file folders is how things are done.
</p>
<p>
I'd like to challenge that notion.
</p>
<p>
As is usually my habit, I mostly do this to make you think. I don't insist that I'm universally right in all contexts, and that everyone else are wrong. I only write to suggest that alternatives exist.
</p>
<p>
The <a href="/2023/05/15/folders-versus-namespaces">previous article</a> wasn't a recommendation; it's was only an exploration of an idea. As I describe in <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, I recommend flat folder structures. Put most code files in the same directory.
</p>
<h3 id="58c997618e794a26bc366859b216e226">
Finding files <a href="#58c997618e794a26bc366859b216e226">#</a>
</h3>
<p>
People usually dislike that advice. <em>How can I find anything?!</em>
</p>
<p>
Let's start with a counter-question: How can you find anything if you have a deep file hierarchy? Usually, if you've organised code files in subfolders of subfolders of folders, you typically start with a collapsed view of the tree.
</p>
<p>
<img src="/content/binary/mostly-collapsed-solution-explorer-tree.png" alt="Mostly-collapsed Solution Explorer tree.">
</p>
<p>
Those of my readers who know a little about search algorithms will point out that a <a href="https://en.wikipedia.org/wiki/Search_tree">search tree</a> is an efficient data structure for locating content. The assumption, however, is that you already know (or can easily construct) the <em>path</em> you should follow.
</p>
<p>
In a view like the above, <em>most</em> files are hidden in one of the collapsed folders. If you want to find, say, the <code>Iso8601.cs</code> file, where do you look for it? Which path through the tree do you take?
</p>
<p>
<em>Unfair!</em>, you protest. You don't know what the <code>Iso8601.cs</code> file does. Let me enlighten you: That file contains functions that render dates and times in <a href="https://en.wikipedia.org/wiki/ISO_8601">ISO 8601</a> formats. These are used to transmit dates and times between systems in a platform-neutral way.
</p>
<p>
So where do you look for it?
</p>
<p>
It's probably not in the <code>Controllers</code> or <code>DataAccess</code> directories. Could it be in the <code>Dtos</code> folder? <code>Rest</code>? <code>Models</code>?
</p>
<p>
Unless your first guess is correct, you'll have to open more than one folder before you find what you're looking for. If each of these folders have subfolders of their own, that only exacerbates the problem.
</p>
<p>
If you're curious, some programmer (me) decided to put the <code>Iso8601.cs</code> file in the <code>Dtos</code> directory, and perhaps you already guessed that. That's not the point, though. The point is this: 'Organising' code files in folders is only efficient if you can unerringly predict the correct path through the tree. You'll have to get it right the first time, every time. If you don't, it's not the most efficient way.
</p>
<p>
Most modern code editors come with features that help you locate files. In <a href="https://visualstudio.microsoft.com/">Visual Studio</a>, for example, you just hit <kbd>Ctrl</kbd>+<kbd>,</kbd> and type a bit of the file name: <em>iso</em>:
</p>
<p>
<img src="/content/binary/visual-studio-go-to-all.png" alt="Visual Studio Go To All dialog.">
</p>
<p>
Then hit <kbd>Enter</kbd> to open the file. In <a href="https://code.visualstudio.com/">Visual Studio Code</a>, the corresponding keyboard shortcut is <kbd>Ctrl</kbd>+<kbd>p</kbd>, and I'd be highly surprised if other editors didn't have a similar feature.
</p>
<p>
To conclude, so far: Organising files in a folder hierarchy is <em>at best</em> on par with your editor's built-in search feature, but is likely to be less productive.
</p>
<h3 id="f3dffaf6e73e42a6a10e1f403b8bf37b">
Navigating a code base <a href="#f3dffaf6e73e42a6a10e1f403b8bf37b">#</a>
</h3>
<p>
What if you don't quite know the name of the file you're looking for? In such cases, the file system is even less helpful.
</p>
<p>
I've seen people work like this:
</p>
<ol>
<li>Look at some code. Identify another code item they'd like to view. (Examples may include: Looking at a unit test and wanting to see the <a href="https://en.wikipedia.org/wiki/System_under_test">SUT</a>, or looking at a class and wanting to see the base class.)</li>
<li>Move focus to the editor's folder view (in Visual Studio called the <em>Solution Explorer</em>).</li>
<li>Scroll to find the file in question.</li>
<li>Double-click said file.</li>
</ol>
<p>
Regardless of how the files are organised, you could, instead, <em>go to definition</em> (<kbd>F12</kbd> with my Visual Studio keyboard layout) in a single action. Granted, how well this works varies with editor and language. Still, even when editor support is less optimal (e.g. a code base with a mix of <a href="https://fsharp.org/">F#</a> and C#, or a <a href="https://www.haskell.org/">Haskell</a> code base), I can often find things faster with a search (<kbd>Ctrl</kbd>+<kbd>Shift</kbd>+<kbd>f</kbd>) than via the file system.
</p>
<p>
A modern editor has efficient tools that can help you find what you're looking for. Looking through the file system is often the least efficient way to find the code you're looking for.
</p>
<h3 id="b050833e38ab4ee7a1ed3429979d8405">
Large code bases <a href="#b050833e38ab4ee7a1ed3429979d8405">#</a>
</h3>
<p>
Do I recommend that you dump thousands of code files in a single directory, then?
</p>
<p>
Hardly, but a question like that presupposes that code bases have thousands of code files. Or more, even. And I've seen such code bases.
</p>
<p>
Likewise, it's a common complaint that Visual Studio is slow when opening solutions with hundreds of projects. And the day Microsoft fixes that problem, people are going to complain that it's slow when opening a solution with thousands of projects.
</p>
<p>
Again, there's an underlying assumption: That a 'real' code base <em>must</em> be so big.
</p>
<p>
Consider alternatives: Could you decompose the code base into multiple smaller code bases? Could you extract subsystems of the code base and package them as reusable packages? Yes, you can do all those things.
</p>
<p>
Usually, I'd pull code bases apart long before they hit a thousand files. Extract modules, libraries, utilities, etc. and put them in separate code bases. Use existing package managers to distribute these smaller pieces of code. Keep the code bases small, and you don't need to organise the files.
</p>
<h3 id="64b5d272a18d41778a021540cd710fd1">
Maintenance <a href="#64b5d272a18d41778a021540cd710fd1">#</a>
</h3>
<p>
<em>But, if all files are mixed together in a single folder, how do we keep the code maintainable?</em>
</p>
<p>
Once more, implicit (but false) assumptions underlie such questions. The assumption is that 'neatly' organising files in hierarchies somehow makes the code easier to maintain. Really, though, it's more akin to a teenager who 'cleans' his room by sweeping everything off the floor only to throw it into his cupboard. It does enable hoovering the floor, but it doesn't make it easier to find anything. The benefit is mostly superficial.
</p>
<p>
Still, consider a tree.
</p>
<p>
<img src="/content/binary/file-tree.png" alt="A tree of folders with files.">
</p>
<p>
This may not be the way you're used to see files and folders rendered, but this diagram emphases the tree structure and makes what happens next starker.
</p>
<p>
The way that most languages work, putting code files in folders makes little difference to the compiler. If the classes in my <code>Controllers</code> folder need some classes from the <code>Dtos</code> folder, you just use them. You may need to import the corresponding namespace, but modern editors make that a breeze.
</p>
<p>
<img src="/content/binary/two-files-coupled-across-tree-branches.png" alt="A tree of folders with files. Two files connect across the tree's branches.">
</p>
<p>
In the above tree, the two files who now communicate are coloured orange. Notice that they span across two main branches of the tree.
</p>
<p>
Thus, even though the files are organised in a tree, it has no impact on the maintainability of the code base. Code can reference other code in other parts of the tree. You can <a href="http://evelinag.com/blog/2014/06-09-comparing-dependency-networks/">easily create cycles in a language like C#</a>, and organising files in trees makes no difference.
</p>
<p>
Most languages, however, enforce that library dependencies form a <a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph">directed acyclic graph</a> (i.e. <a href="/2013/12/03/layers-onions-ports-adapters-its-all-the-same">if the data access library references the domain model, the domain model can't reference the data access library</a>). The C# (and most other languages) compiler enforces what <a href="/ref/appp">Robert C. Martin calls the Acyclic Dependencies Principle</a>. Preventing cycles prevents <a href="https://en.wikipedia.org/wiki/Spaghetti_code">spaghetti code</a>, which is <a href="/2022/11/21/decouple-to-delete">key to a maintainable code base</a>.
</p>
<p>
(Ironically, <a href="/2015/04/15/c-will-eventually-get-all-f-features-right">one of the more controversial features of F# is actually one of its greatest strengths: It doesn't allow cycles</a>.)
</p>
<h3 id="ffe0131017254522acd40d2445929f24">
Tidiness <a href="#ffe0131017254522acd40d2445929f24">#</a>
</h3>
<p>
Even so, I do understand the lure of organising code files in an elaborate hierarchy. It looks so <em>neat</em>.
</p>
<p>
Previously, I've <a href="/2021/05/17/against-consistency">touched on the related topic of consistency</a>, and while I'm a bit of a neat freak myself, I have to realise that tidiness seems to be largely unrelated to the sustainability of a code base.
</p>
<p>
As another example in this category, I've seen more than one code base with consistently beautiful documentation. Every method was adorned with formal <a href="https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/xmldoc/">XML documentation</a> with every input parameter as well as output described.
</p>
<p>
Every new phase in a method was delineated with another neat comment, nicely adorned with a 'comment frame' and aligned with other comments.
</p>
<p>
It was glorious.
</p>
<p>
Alas, that documentation sat on top of 750-line methods with a cyclomatic complexity above 50. The methods were so long that <a href="/2019/12/23/the-case-of-the-mysterious-curly-bracket">developers had to introduce artificial variable scopes to avoid naming collisions</a>.
</p>
<p>
The reason I was invited to look at that code in the first place was that the organisation had trouble with maintainability, and they asked me to help.
</p>
<p>
It was neat, yet unmaintainable.
</p>
<p>
This discussion about tidiness may seem like a digression, but I think it's important to make the implicit explicit. If I'm not much mistaken, preference for order is a major reason that so many developers want to organise code files into hierarchies.
</p>
<h3 id="47227afb0a674330bb1b3556f751799d">
Organising principles <a href="#47227afb0a674330bb1b3556f751799d">#</a>
</h3>
<p>
What other motivations for file hierarchies could there be? How about the directory structure as an organising principle?
</p>
<p>
The two most common organising principles are <a href="/2023/05/15/folders-versus-namespaces">those that I experimented with in the previous article</a>:
</p>
<ol>
<li>By technical role (Controller, View Model, DTO, etc.)</li>
<li>By feature</li>
</ol>
<p>
A technical leader might hope that, by presenting a directory structure to team members, it imparts an organising principle on the code to be.
</p>
<p>
It may even do so, but is that actually a benefit?
</p>
<p>
It might subtly discourage developers from introducing code that doesn't fit into the predefined structure. If you organise code by technical role, developers might put most code in Controllers, producing mostly procedural <a href="https://martinfowler.com/eaaCatalog/transactionScript.html">Transaction Scripts</a>. If you organise by feature, this might encourage duplication because developers don't have a natural place to put general-purpose code.
</p>
<p>
<em>You can put truly shared code in the root folder,</em> the counter-argument might be. This is true, but:
</p>
<ol>
<li>This seems to be implicitly discouraged by the folder structure. After all, the hierarchy is there for a reason, right? Thus, any file you place in the root seems to suggest a failure of organisation.</li>
<li>On the other hand, if you flaunt that not-so-subtle hint and put many code files in the root, what advantage does the hierarchy furnish?</li>
</ol>
<p>
In <em>Information Distribution Aspects of Design Methodology</em> <a href="https://en.wikipedia.org/wiki/David_Parnas">David Parnas</a> writes about documentation standards:
</p>
<blockquote>
<p>
"standards tend to force system structure into a standard mold. A standard [..] makes some assumptions about the system. [...] If those assumptions are violated, the [...] organization fits poorly and the vocabulary must be stretched or misused."
</p>
<footer><cite><a href="https://en.wikipedia.org/wiki/David_Parnas">David Parnas</a>, <em>Information Distribution Aspects of Design Methodology</em></cite></footer>
</blockquote>
<p>
(The above quote is on the surface about documentation standards, and I've deliberately butchered it a bit (clearly marked) to make it easier to spot the more general mechanism.)
</p>
<p>
In the same paper, Parnas describes the danger of making hard-to-change decisions too early. Applied to directory structure, the lesson is that you should postpone designing a file hierarchy until you know more about the problem. Start with a flat directory structure and add folders later, if at all.
</p>
<h3 id="426c5128ef804c6abfe6005d267cb624">
Beyond files? <a href="#426c5128ef804c6abfe6005d267cb624">#</a>
</h3>
<p>
My claim is that you don't need <em>much</em> in way of directory hierarchy. From this doesn't follow, however, that we may never leverage such options. Even though I left most of the example code for <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> in a single folder, I did add a specialised folder as an <a href="/ref/ddd">anti-corruption layer</a>. Folders do have their uses.
</p>
<blockquote>
<p>
"Why not take it to the extreme and place most code in a single file? If we navigate by "namespace view" and search, do we need all those files?"
</p>
<footer><cite><a href="https://twitter.com/ronnieholm/status/1662219232652963840">Ronnie Holm</a></cite></footer>
</blockquote>
<p>
Following a thought to its extreme end can shed light on a topic. Why not, indeed, put all code in a single file?
</p>
<p>
Curious thought, but possibly not new. I've never programmed in <a href="https://en.wikipedia.org/wiki/Smalltalk">SmallTalk</a>, but as I understand it, the language came with tooling that was both <a href="https://en.wikipedia.org/wiki/Integrated_development_environment">IDE</a> and execution environment. Programmers would write source code in the editor, but although the code was persisted to disk, it may not have been as text files.
</p>
<p>
Even if I completely misunderstand how SmallTalk worked, it's not inconceivable that you could have a development environment based directly on a database. Not that I think that this sounds like a good idea, but it sounds technically possible.
</p>
<p>
Whether we do it one way or another seems mostly to be a question of tooling. What problems would you have if you wrote an entire C# (Java, Python, F#, or similar) code base as a single file? It becomes more difficult to look at two or more parts of the code base at the same time. Still, Visual Studio can actually give you split windows of the same file, but I don't know how it scales if you need multiple views over the same huge file.
</p>
<h3 id="fd3145a641ad4de18dbab9616e2ed4b7">
Conclusion <a href="#fd3145a641ad4de18dbab9616e2ed4b7">#</a>
</h3>
<p>
I recommend flat directory structures for code files. Put most code files in the root of a library or app. Of course, if your system is composed from multiple libraries (dependencies), each library has its own directory.
</p>
<p>
Subfolders aren't <em>prohibited</em>, only generally discouraged. Legitimate reasons to create subfolders may emerge as the code base evolves.
</p>
<p>
My misgivings about code file directory hierarchies mostly stem from the impact they have on developers' minds. This may manifest as <a href="https://en.wikipedia.org/wiki/Magical_thinking">magical thinking</a> or <a href="https://en.wikipedia.org/wiki/Cargo_cult">cargo-cult programming</a>: Erect elaborate directory structures to keep out the evil spirits of spaghetti code.
</p>
<p>
It doesn't work that way.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Visual Studio Code snippet to make URLs relativehttps://blog.ploeh.dk/2023/05/23/visual-studio-code-snippet-to-make-urls-relative2023-05-23T19:23:00+00:00Mark Seemann
<div id="post">
<p>
<em>Yes, it involves JSON and regular expressions.</em>
</p>
<p>
Ever since I <a href="/2013/03/03/moving-the-blog-to-jekyll">migrated the blog off dasBlog</a> I've been <a href="https://rakhim.org/honestly-undefined/19/">writing the articles in raw HTML</a>. The reason is mostly a historical artefact: Originally, I used <a href="https://en.wikipedia.org/wiki/Windows_Live_Writer">Windows Live Writer</a>, but <a href="https://jekyllrb.com/">Jekyll</a> had no support for that, and since I'd been doing web development for more than a decade already, raw HTML seemed like a reliable and durable alternative. I increasingly find that relying on skill and knowledge is a far more durable strategy than relying on technology.
</p>
<p>
For a decade I used <a href="https://www.sublimetext.com/">Sublime Text</a> to write articles, but over the years, I found it degrading in quality. I only used Sublime Text to author blog posts, so when I recently repaved my machine, I decided to see if I could do without it.
</p>
<p>
Since I was already using <a href="https://code.visualstudio.com/">Visual Studio Code</a> for much of my programming, I decided to give it a go for articles as well. It always takes time when you decide to move off a tool you've been used for a decade, but after some initial frustrations, I quickly found a new modus operandi.
</p>
<p>
One benefit of rocking the boat is that it prompts you to reassess the way you do things. Naturally, this happened here as well.
</p>
<h3 id="28218d2cd10945e0886bd528cf2d792f">
My quest for relative URLs <a href="#28218d2cd10945e0886bd528cf2d792f">#</a>
</h3>
<p>
I'd been using a few Sublime Text snippets to automate a few things, like the markup for the section heading you see above this paragraph. Figuring out how to replicate that snippet in Visual Studio Code wasn't too hard, but as I was already perusing <a href="https://code.visualstudio.com/docs/editor/userdefinedsnippets">the snippet documentation</a>, I started investigating other options.
</p>
<p>
One little annoyance I'd lived with for years was adding links to other articles on the blog.
</p>
<p>
While I write an article, I run the site on my local machine. When linking to other articles, I sometimes use the existing page address off the public site, and sometimes I just copy the page address from <code>localhost</code>. In both cases, I want the URL to be relative so that I can navigate the site even if I'm offline. I've written enough articles on planes or while travelling without internet that this is an important use case for me.
</p>
<p>
For example, if I want to link to the article <a href="/2023/01/02/adding-nuget-packages-when-offline">Adding NuGet packages when offline</a>, I want the URL to be <code>/2023/01/02/adding-nuget-packages-when-offline</code>, but that's not the address I get when I copy from the browser's address bar. Here, I get the full URL, with either <code>http://localhost:4000/</code> or <code>https://blog.ploeh.dk/</code> as the origin.
</p>
<p>
For years, I've been manually stripping the origin away, as well as the trailing <code>/</code>. Looking through the Visual Studio Code snippet documentation, however, I eyed an opportunity to automate that workflow.
</p>
<h3 id="f13d4bb7acf84183b9ac18088206f0ca">
Snippet <a href="#f13d4bb7acf84183b9ac18088206f0ca">#</a>
</h3>
<p>
I wanted a piece of editor automation that could modify a URL after I'd pasted it into the article. After a few iterations, I've settled on a <em>surround-with</em> snippet that works pretty well. It looks like this:
</p>
<p>
<pre><span style="color:#2e75b6;">"Make URL relative"</span>: {
<span style="color:#2e75b6;">"prefix"</span>: <span style="color:#a31515;">"urlrel"</span>,
<span style="color:#2e75b6;">"body"</span>: [ <span style="color:#a31515;">"${TM_SELECTED_TEXT/^(?:http(?:s?):\\/\\/(?:[^\\/]+))(.+)\\//$1/}"</span> ],
<span style="color:#2e75b6;">"description"</span>: <span style="color:#a31515;">"Make URL relative."</span>
}</pre>
</p>
<p>
Don't you just love regular expressions? Write once, scrutinise forever.
</p>
<p>
I don't want to go over all the details, because I've already forgotten most of them, but essentially this expression strips away the URL origin starting with either <code>http</code> or <code>https</code> until it finds the first slash <code>/</code>.
</p>
<p>
The thing that makes it useful, though, is the <code>TM_SELECTED_TEXT</code> variable that tells Visual Studio Code that this snippet works on <em>selected</em> text.
</p>
<p>
When I paste a URL into an <code>a</code> tag, at first nothing happens because no text is selected. I can then use <kbd>Shift</kbd> + <kbd>Alt</kbd> + <kbd>→</kbd> to expand the selection, at which point the Visual Studio Code lightbulb (<em>Code Action</em>) appears:
</p>
<p>
<img src="/content/binary/make-url-relative-screen-shot.png" alt="Screen shot of the make-URL-relative code snippet in action.">
</p>
<p>
Running the snippet removes the URL's origin, as well as the trailing slash, and I can move on to write the link text.
</p>
<h3 id="4e72443f63fe42e382e854fdc9a8d07a">
Conclusion <a href="#4e72443f63fe42e382e854fdc9a8d07a">#</a>
</h3>
<p>
After I started using Visual Studio Code to write blog posts, I've created a few custom snippets to support my authoring workflow. Most of them are fairly mundane, but the <em>make-URLs-relative</em> snippet took me a few iterations to get right.
</p>
<p>
I'm not expecting many of my readers to have this particular need, but I hope that this outline showcases the capabilities of Visual Studio Code snippets, and perhaps inspires you to look into creating custom snippets for your own purposes.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="8d84d3c52d134dcda81f7b63faccb58b">
<div class="comment-author"><a href="https://chamook.lol">Adam Guest</a> <a href="#8d84d3c52d134dcda81f7b63faccb58b">#</a></div>
<div class="comment-content">
<p>
Seems like a useful function to have, so I naturally wondered if I could <del>make it worse</del>
<ins>implement a similar function in Emacs</ins>.
</p>
<p>
Emacs lisp has support for regular expressions, only typically with a bunch of extra slashes
included, so I needed to figure out how to work with the currently selected text to get this to work.
The currently selected text is referred to as the "region" and by specifying <code>"r"</code> as a parameter
for the <code>interactive</code> call we can pass the start and end positions for the region directly to the function.
</p>
<p>
I came up with this rather basic function:
</p>
<p>
<pre>
(defun make-url-relative (start end)
"Converts the selected uri from an absolute url and converts it to a relative one.
This is very simple and relies on the url starting with http/https, and removes each character to the
first slash in the path"
(interactive "r")
(replace-regexp-in-region "http[s?]:\/\/.+\/" "" start end))
</pre>
</p>
<p>
With this function included in config somewhere: it can be called by selecting a url, and using <kbd>M-x</kbd>
<code>make-url-relative</code> (or assigned to a key binding as required)
</p>
<p>
I'm not sure if there's an already existing package for this functionality, but I hadn't really thought to look for it before
so thanks for the idea 😊
</p>
</div>
<div class="comment-date">2023-05-24 11:20 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Folders versus namespaceshttps://blog.ploeh.dk/2023/05/15/folders-versus-namespaces2023-05-15T06:01:00+00:00Mark Seemann
<div id="post">
<p>
<em>What if you allow folder and namespace structure to diverge?</em>
</p>
<p>
I'm currently writing C# code with some first-year computer-science students. Since most things are new to them, they sometimes do things in a way that are 'not the way we usually do things'. As an example, teachers have instructed them to use namespaces, but apparently no-one have told them that the file folder structure has to mirror the namespace structure.
</p>
<p>
The compiler doesn't care, but as long as I've been programming in C#, it's been idiomatic to do it that way. There's even <a href="https://learn.microsoft.com/dotnet/fundamentals/code-analysis/style-rules/ide0130">a static code analysis rule</a> about it.
</p>
<p>
The first couple of times they'd introduce a namespace without a corresponding directory, I'd point out that they are supposed to keep those things in sync. One day, however, it struck me: What happens if you flout that convention?
</p>
<h3 id="dac7601856e94a4e88315cb2e80f74e5">
A common way to organise code files <a href="#dac7601856e94a4e88315cb2e80f74e5">#</a>
</h3>
<p>
Code scaffolding tools and wizards will often nudge you to organise your code according to technical concerns: Controllers, models, views, etc. I'm sure you've encountered more than one code base organised like this:
</p>
<p>
<img src="/content/binary/code-organised-by-tech-responsibility.png" alt="Code organised into folders like Controllers, Models, DataAccess, etc.">
</p>
<p>
You'll put all your Controller classes in the <em>Controllers</em> directory, and make sure that the namespace matches. Thus, in such a code base, the full name of the <code>ReservationsController</code> might be <code>Ploeh.Samples.Restaurants.RestApi.Controllers.ReservationsController</code>.
</p>
<p>
A common criticism is that this is the <em>wrong</em> way to organise the code.
</p>
<h3 id="ea3a6df18fa641638f18e0b554c1ee7c">
The problem with trees <a href="#ea3a6df18fa641638f18e0b554c1ee7c">#</a>
</h3>
<p>
The complaint that this is the wrong way to organise code implies that a correct way exists. I write about this in <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>:
</p>
<blockquote>
<p>
Should you create a subdirectory for Controllers, another for Models, one for Filters, and so on? Or should you create a subdirectory for each feature?
</p>
<p>
Few people like my answer: <em>Just put all files in one directory.</em> Be wary of creating subdirectories just for the sake of 'organising' the code.
</p>
<p>
File systems are <em>hierarchies</em>; they are trees: a specialised kind of acyclic graph in which any two vertices are connected by exactly one path. Put another way, each vertex can have at most one parent. Even more bluntly: If you put a file in a hypothetical <code>Controllers</code> directory, you can't <em>also</em> put it in a <code>Calendar</code> directory.
</p>
</blockquote>
<p>
But what if you could?
</p>
<h3 id="bcdc2397846c45a9bcb3cea0e71cfa7a">
Namespaces disconnected from directory hierarchy <a href="#bcdc2397846c45a9bcb3cea0e71cfa7a">#</a>
</h3>
<p>
The code that accompanies <em>Code That Fits in Your Head</em> is organised as advertised: 65 files in a single directory. (Tests go in separate directories, though, as they belong to separate libraries.)
</p>
<p>
If you decide to ignore the convention that namespace structure should mirror folder structure, however, you now have a second axis of variability.
</p>
<p>
As an experiment, I decided to try that idea with the book's code base. The above screen shot shows the stereotypical organisation according to technical responsibility, after I moved things around. To be clear: This isn't how the book's example code is organised, but an experiment I only now carried out.
</p>
<p>
If you open the <code>ReservationsController.cs</code> file, however, I've now declared that it belongs to a namespace called <code>Ploeh.Samples.Restaurants.RestApi.Reservations</code>. Using Visual Studio's <em>Class View</em>, things look different from the <em>Solution Explorer:</em>
</p>
<p>
<img src="/content/binary/code-organised-by-namespace.png" alt="Code organised into namespaces according to feature: Calandar, Reservations, etc.">
</p>
<p>
Here I've organised the namespaces according to feature, rather than technical role. The screen shot shows the <em>Reservations</em> feature opened, while other features remain closed.
</p>
<h3 id="19763ef496cc424f8fd419bc4ad699d6">
Initial reactions <a href="#19763ef496cc424f8fd419bc4ad699d6">#</a>
</h3>
<p>
This article isn't a recommendation. It's nothing but an initial exploration of an idea.
</p>
<p>
Do I like it? So far, I think I still prefer flat directory structures. Even though this idea gives two axes of variability, you still have to make judgment calls. It's easy enough with Controllers, but where do you put cross-cutting concerns? Where do you put domain logic that seems to encompass everything else?
</p>
<p>
As an example, the code base that accompanies <em>Code That Fits in Your Head</em> is a multi-tenant system. Each restaurant is a separate tenant, but I've modelled restaurants as part of the domain model, and I've put that 'feature' in its own namespace. Perhaps that's a mistake; at least, I now have the code wart that I have to import the <code>Ploeh.Samples.Restaurants.RestApi.Restaurants</code> namespace to implement the <code>ReservationsController</code>, because its constructor looks like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">ReservationsController</span>(
IClock <span style="font-weight:bold;color:#1f377f;">clock</span>,
IRestaurantDatabase <span style="font-weight:bold;color:#1f377f;">restaurantDatabase</span>,
IReservationsRepository <span style="font-weight:bold;color:#1f377f;">repository</span>)
{
Clock = clock;
RestaurantDatabase = restaurantDatabase;
Repository = repository;
}</pre>
</p>
<p>
The <code>IRestaurantDatabase</code> interface is defined in the <code>Restaurants</code> namespace, but the Controller needs it in order to look up the restaurant (i.e. tenant) in question.
</p>
<p>
You could argue that this isn't a problem with namespaces, but rather a code smell indicating that I should have organised the code in a different way.
</p>
<p>
That may be so, but then implies a deeper problem: Assigning files to hierarchies may not, after all, help much. It looks as though things are organised, but if the assignment of things to buckets is done without a predictable system, then what benefit does it provide? Does it make things easier to find, or is the sense of order mostly illusory?
</p>
<p>
I tend to still believe that this is the case. This isn't a nihilistic or defeatist position, but rather a realisation that order must arise from other origins.
</p>
<h3 id="305d423972e0422fa5ecdd04326cd132">
Conclusion <a href="#305d423972e0422fa5ecdd04326cd132">#</a>
</h3>
<p>
I was recently repeatedly encountering student code with a disregard for the convention that namespace structure should follow directory structure (or the other way around). Taking a cue from <a href="https://en.wikipedia.org/wiki/Kent_Beck">Kent Beck</a> I decided to investigate what happens if you forget about the rules and instead pursue what that new freedom might bring.
</p>
<p>
In this article, I briefly show an example where I reorganised a code base so that the file structure is according to implementation detail, but the namespace hierarchy is according to feature. Clearly, I could also have done it the other way around.
</p>
<p>
What if, instead of two, you have three organising principles? I don't know. I can't think of a third kind of hierarchy in a language like C#.
</p>
<p>
After a few hours reorganising the code, I'm not scared away from this idea. It might be worth to revisit in a larger code base. On the other hand, I'm still not convinced that forcing a hierarchy over a sophisticated software design is particularly beneficial.
</p>
<p>
<ins datetime="2023-05-30T12:22Z"><strong>P.S. 2023-05-30.</strong> This article is only a report on an experiment. For my general recommendation regarding code file organisation, see <a href="/2023/05/29/favour-flat-code-file-folders">Favour flat code file folders</a>.</ins>
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="6c209fa61ad34ef3aa8290b06a964aaf">
<div class="comment-author"><a href="http://github.com/m4rsh">Markus Schmits</a> <a href="#6c209fa61ad34ef3aa8290b06a964aaf">#</a></div>
<div class="comment-content">
<p>
Hi Mark,
<br> While reading your book "Code That Fits in Your Head", your latest blog entry caught my attention, as I am struggling in software development with similar issues.
<br> I find it hard, to put all classes into one project directory, as it feels overwhelming, when the number of classes increases.
<br> In the following, I would like to specify possible organising principles in my own words.
<p> <b> Postulations </b>
<br>- Folders should help the programmer (and reader) to keep the code organised
<br> - Namespaces should reflect the hierarchical organisation of the code base
<br> - Cross-cutting concerns should be addressed by modularity.
</p>
<p> <b> Definitions </b>
<br> 1. Folders
<br> - the allocation of classes in a project with similar technical concerns into folders should help the programmer in the first place, by visualising this similarity
<br> - the benefit lies just in the organisation, i.e. storage of code, not in the expression of hierarchy
</p>
<p>
2. Namespaces
<br> - expression of hierarchy can be achieved by namespaces, which indicate the relationship between allocated classes
<br> - classes can be organised in folders with same designation
<br> - the namespace designation could vary by concerns, although the classes are placed in same folders, as the technical concern of the class shouldn't affect the hierarchical organisation
</p>
<p>
3. Cross-cutting concerns
<br> - classes, which aren't related to a single task, could be indicated by a special namespace
<br> - they could be placed in a different folder, to signalize different affiliations
<br> - or even placed in a different assembly
</p>
<p>
<b> Summing up </b>
<br> A hierarchy should come by design. The organisation of code in folders should help the programmer or reader to grasp the file structure, not necessarily the program hierarchy.
<br>Folders should be a means, not an expression of design. Folders and their designations could change (or disappear) over time in development. Thus, explicit connection of namespace to folder designation seems not desirable, but it's not forbidden.
</p>
All views above are my own. Please let me know, what you think.
<p>
Best regards,
<br>Markus
</p>
</p>
</div>
<div class="comment-date">2023-05-18 19:13 UTC</div>
</div>
<div class="comment" id="3178e0d2d3494f7db7188ed455b78103">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#3178e0d2d3494f7db7188ed455b78103">#</a></div>
<div class="comment-content">
<p>
Markus, thank you for writing. You can, of course, organise code according to various principles, and what works in one case may not be the best fit in another case. The main point of this article was to suggest, as an idea, that folder hierarchy and namespace hierarchy doesn't <em>have</em> to match.
</p>
<p>
Based on reader reactions, however, I realised that I may have failed to clearly communicate my fundamental position, so I wrote <a href="/2023/05/29/favour-flat-code-file-folders">another article about that</a>. I do, indeed, favour flat folder hierarchies.
</p>
<p>
That is not to say that you can't have any directories in your code base, but rather that I'm sceptical that any such hierarchy addresses real problems.
</p>
<p>
For instance, you write that
</p>
<blockquote>
<p>
"Folders should help the programmer (and reader) to keep the code organised"
</p>
</blockquote>
<p>
If I focus on the word <em>should</em>, then I agree: Folders <em>should</em> help the programmer keep the code organised. In my view, then, it follows that if a tree structure does <em>not</em> assist in doing that, then that structure is of no use and should not be implemented (or abandoned if already in place).
</p>
<p>
I do get the impression from many people that they consider a directory tree vital to be able to navigate and understand a code base. What I've tried to outline in <a href="/2023/05/29/favour-flat-code-file-folders">my more recent article</a> is that I don't accept that as an undisputable axiom.
</p>
<p>
What I <em>do</em> find helpful as an organising principle is focusing on dependencies as a directed acyclic graph. Cyclic dependencies between objects is a main source of complexity. Keep dependency graphs directed and <a href="/2022/11/21/decouple-to-delete">make code easy to delete</a>.
</p>
<p>
Organising code files in a tree structure doesn't help achieve that goal. This is the reason I consider code folder hierarchies a red herring: Perhaps not explicitly detrimental to sustainability, but usually nothing but a distraction.
</p>
<p>
How, then, do you organise a large code base? I hope that I answer that question, too, in my more recent article <a href="/2023/05/29/favour-flat-code-file-folders">Favour flat code file folders</a>.
</p>
</div>
<div class="comment-date">2023-06-13 6:11 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Is cyclomatic complexity really related to branch coverage?https://blog.ploeh.dk/2023/05/08/is-cyclomatic-complexity-really-related-to-branch-coverage2023-05-08T05:38:00+00:00Mark Seemann
<div id="post">
<p>
<em>A genuine case of doubt and bewilderment.</em>
</p>
<p>
Regular readers of this blog may be used to its confident and opinionated tone. I write that way, not because I'm always convinced that I'm right, but because prose with too many caveats and qualifications tends to bury the message in verbose and circumlocutory ambiguity.
</p>
<p>
This time, however, I write to solicit feedback, and because I'm surprised to the edge of bemusement by a recent experience.
</p>
<h3 id="6ebe2eae2250441c918433ba40ce8e86">
Collatz sequence <a href="#6ebe2eae2250441c918433ba40ce8e86">#</a>
</h3>
<p>
Consider the following code:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Collatz</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">static</span> IReadOnlyCollection<<span style="color:blue;">int</span>> <span style="color:#74531f;">Sequence</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">n</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (n < 1)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> ArgumentOutOfRangeException(
nameof(n),
<span style="color:#a31515;">$"Only natural numbers allowed, but given </span>{n}<span style="color:#a31515;">."</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sequence</span> = <span style="color:blue;">new</span> List<<span style="color:blue;">int</span>>();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">current</span> = n;
<span style="font-weight:bold;color:#8f08c4;">while</span> (current != 1)
{
sequence.Add(current);
<span style="font-weight:bold;color:#8f08c4;">if</span> (current % 2 == 0)
current = current / 2;
<span style="font-weight:bold;color:#8f08c4;">else</span>
current = current * 3 + 1;
}
sequence.Add(current);
<span style="font-weight:bold;color:#8f08c4;">return</span> sequence;
}
}</pre>
</p>
<p>
As the names imply, the <code>Sequence</code> function calculates the <a href="https://en.wikipedia.org/wiki/Collatz_conjecture">Collatz sequence</a> for a given natural number.
</p>
<p>
Please don't tune out if that sounds mathematical and difficult, because it really isn't. While the Collatz conjecture still evades mathematical proof, the sequence is easy to calculate and understand. Given a number, produce a sequence starting with that number and stop when you arrive at 1. Every new number in the sequence is based on the previous number. If the input is even, divide it by two. If it's odd, multiply it by three and add one. Repeat until you arrive at one.
</p>
<p>
The conjecture is that any natural number will produce a finite sequence. That's the unproven part, but that doesn't concern us. In this article, I'm only interested in the above code, which computes such sequences.
</p>
<p>
Here are few examples:
</p>
<p>
<pre>> Collatz.Sequence(1)
List<<span style="color:blue;">int</span>>(1) { 1 }
> Collatz.Sequence(2)
List<<span style="color:blue;">int</span>>(2) { 2, 1 }
> Collatz.Sequence(3)
List<<span style="color:blue;">int</span>>(8) { 3, 10, 5, 16, 8, 4, 2, 1 }
> Collatz.Sequence(4)
List<<span style="color:blue;">int</span>>(3) { 4, 2, 1 }</pre>
</p>
<p>
While there seems to be a general tendency for the sequence to grow as the input gets larger, that's clearly not a rule. The examples show that the sequence for <code>3</code> is longer than the sequence for <code>4</code>.
</p>
<p>
All this, however, just sets the stage. The problem doesn't really have anything to do with Collatz sequences. I only ran into it while working with a Collatz sequence implementation that looked a lot like the above.
</p>
<h3 id="08c0cb2794184e9da8b9f72e6c9ce985">
Cyclomatic complexity <a href="#08c0cb2794184e9da8b9f72e6c9ce985">#</a>
</h3>
<p>
What is the <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a> of the above <code>Sequence</code> function? If you need a reminder of how to count cyclomatic complexity, this is a good opportunity to take a moment to refresh your memory, count the number, and compare it with my answer.
</p>
<p>
Apart from the opportunity for exercise, it was a rhetorical question. The answer is <em>4</em>.
</p>
<p>
This means that we'd need <a href="/2019/12/09/put-cyclomatic-complexity-to-good-use">at least four unit test to cover all branches</a>. Right? Right?
</p>
<p>
Okay, let's try.
</p>
<h3 id="d10688e0a13241c7b7124a5ce8f063ef">
Branch coverage <a href="#d10688e0a13241c7b7124a5ce8f063ef">#</a>
</h3>
<p>
Before we start, let's make the ritual <a href="/2015/11/16/code-coverage-is-a-useless-target-measure">denouncement of code coverage as a target metric</a>. The point isn't to reach 100% code coverage as such, but to <a href="/2018/11/12/what-to-test-and-not-to-test">gain confidence that you've added tests that cover whatever is important to you</a>. Also, the best way to do that is usually with TDD, which isn't the situation I'm discussing here.
</p>
<p>
The first branch that we might want to cover is the Guard Clause. This is easily addressed with an <a href="https://xunit.net/">xUnit.net</a> test:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">ThrowOnInvalidInput</span>()
{
Assert.Throws<ArgumentOutOfRangeException>(() => Collatz.Sequence(0));
}</pre>
</p>
<p>
This test calls the <code>Sequence</code> function with <code>0</code>, which (in this context, at least) isn't a <a href="https://en.wikipedia.org/wiki/Natural_number">natural number</a>.
</p>
<p>
If you measure test coverage (or, in this case, just think it through), there are no surprises yet. One branch is covered, the rest aren't. That's 25%.
</p>
<p>
(If you use the <a href="https://learn.microsoft.com/dotnet/core/testing/unit-testing-code-coverage">free code coverage option for .NET</a>, it will surprisingly tell you that you're only at 16% branch coverage. It deems the cyclomatic complexity of the <code>Sequence</code> function to be 6, not 4, and 1/6 is 16.67%. Why it thinks it's 6 is not entirely clear to me, but Visual Studio agrees with me that the cyclomatic complexity is 4. In this particular case, it doesn't matter anyway. The conclusion that follows remains the same.)
</p>
<p>
Let's add another test case, and perhaps one that gives the algorithm a good exercise.
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Example</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = Collatz.Sequence(5);
Assert.Equal(<span style="color:blue;">new</span>[] { 5, 16, 8, 4, 2, 1 }, actual);
}</pre>
</p>
<p>
As expected, the test passes. What's the branch coverage now?
</p>
<p>
Try to think it through instead of relying exclusively on a tool. The algorithm isn't more complicated that you can emulate execution in your head, or perhaps with the assistance of a notepad. How many branches does it execute when the input is <code>5</code>?
</p>
<p>
Branch coverage is now 100%. (Even the <em>dotnet</em> coverage tool agrees, despite its weird cyclomatic complexity value.) All branches are exercised.
</p>
<p>
Two tests produce 100% branch coverage of a function with a cyclomatic complexity of 4.
</p>
<h3 id="7a4d9c5fbb8e4e94a52c61990565e38f">
Surprise <a href="#7a4d9c5fbb8e4e94a52c61990565e38f">#</a>
</h3>
<p>
That's what befuddles me. I thought that cyclomatic complexity and branch coverage were related. I thought, that the number of branches was a good indicator of the number of tests you'd need to cover all branches. I even wrote <a href="/2019/12/09/put-cyclomatic-complexity-to-good-use">an article to that effect</a>, and no-one contradicted me.
</p>
<p>
That, in itself, is no proof of anything, but the notion that the article presents seems to be widely accepted. I never considered it controversial, and the only reason I didn't cite anyone is that this seems to be 'common knowledge'. I wasn't aware of a particular source I could cite.
</p>
<p>
Now, however, it seems that it's wrong. Is it wrong, or am I missing something?
</p>
<p>
To be clear, I completely understand why the above two tests are sufficient to fully cover the function. I also believe that I fully understand why the cyclomatic complexity is 4.
</p>
<p>
I am also painfully aware that the above two tests in no way fully specify the Collatz sequence. That's not the point.
</p>
<p>
The point is that it's possible to cover this function with only two tests, despite the cyclomatic complexity being 4. That surprises me.
</p>
<p>
Is this a known thing?
</p>
<p>
I'm sure it is. I've long since given up discovering anything new in programming.
</p>
<h3 id="829b5f9b3e9449d1a023d2bacff5b58c">
Conclusion <a href="#829b5f9b3e9449d1a023d2bacff5b58c">#</a>
</h3>
<p>
I recently encountered a function that performed a Collatz calculation similar to the one I've shown here. It exhibited the same trait, and since it had no Guard Clause, I could fully cover it with a single test case. That function even had a cyclomatic complexity of 6, so you can perhaps imagine my befuddlement.
</p>
<p>
Is it wrong, then, that cyclomatic complexity suggests a minimum number of test cases in order to cover all branches?
</p>
<p>
It seems so, but that's new to me. I don't mind being wrong on occasion. It's usually an opportunity to learn something new. If you have any insights, please <a href="https://github.com/ploeh/ploeh.github.com#comments">leave a comment</a>.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="02568f995d91432da540858644b61e89">
<div class="comment-author"><a href="http://github.com/neongraal">Struan Judd</a> <a href="#02568f995d91432da540858644b61e89">#</a></div>
<div class="comment-content">
<p>
My first thought is that the code looks like an unrolled recursive function, so perhaps if it's
refactored into a driver function and a "continuation passing style" it might make the cyclomatic
complexity match the covering tests.
</p>
<p>
So given the following:
<pre>public delegate void ResultFunc(IEnumerable<int> result);
public delegate void ContFunc(int n, ResultFunc result, ContFunc cont);
public static void Cont(int n, ResultFunc result, ContFunc cont) {
if (n == 1) {
result(new[] { n });
return;
}
void Result(IEnumerable<int> list) => result(list.Prepend(n));
if (n % 2 == 0)
cont(n / 2, Result, cont);
else
cont(n * 3 + 1, Result, cont);
}
public static IReadOnlyCollection<int> Continuation(int n) {
if (n < 1)
throw new ArgumentOutOfRangeException(
nameof(n),
$"Only natural numbers allowed, but given {n}.");
var output = new List<int>();
void Output(IEnumerable<int> list) => output = list.ToList();
Cont(n, Output, Cont);
return output;
}</pre>
</p>
<p>
I calculate the Cyclomatic complexity of <code>Continuation</code> to be <em>2</em> and <code>Step</code> to be <em>3</em>.
</p>
<p>
And it would seem you need 5 tests to properly cover the code, 3 for <code>Step</code> and 2 for <code>Continuation</code>.
</p>
<p>
But however you write the "n >=1" case for <code>Continuation</code> you will have to cover some of <code>Step</code>.
</p>
</div>
<div class="comment-date">2023-05-08 10:11 UTC</div>
</div>
<div class="comment" id="896f7e7c979144438a6e7f1a66dd72ea">
<div class="comment-author">Jeroen Heijmans <a href="#896f7e7c979144438a6e7f1a66dd72ea">#</a></div>
<div class="comment-content">
<p>
There is a relation between cyclomatic complexity and branches to cover, but it's not one of equality, cyclomatic
complexity is an upper bound for the number of branches. There's a nice example illustrating this in the
<a href="https://en.wikipedia.org/w/index.php?title=Cyclomatic_complexity#Implications_for_software_testing">Wikipedia
article on cyclomatic complexity</a> that explains this, as well as the relation with path coverage (for which
cyclomatic complexity is a lower bound).
</p>
</div>
<div class="comment-date">2023-05-08 15:03 UTC</div>
</div>
<div class="comment" id="b683f78855f8440389b973e24c88c253">
<div class="comment-author"><a href="https://github.com/bretthall">Brett Hall</a> <a href="#b683f78855f8440389b973e24c88c253">#</a></div>
<div class="comment-content">
<p>
I find cyclomatic complexity to be overly pedantic at times, and you will need four tests if you get really pedantic.
First, test the guard clause as you already did. Then, test with 1 in order to test the <pre>while</pre> loop body
not being run. Then, test with 2 in order to test that the <pre>while</pre> is executed, but we only hit the <pre>if</pre>
part of the <pre>if/else</pre>. Finally, test with 3 in order to hit the <pre>else</pre> inside of the <pre>while</pre>.
That's four tests where each test is only testing one of the branches (some tests hit more than one branch, but the
"extra branch" is already covered by another test). Again, this is being really pedantic and I wouldn't test this
function as laid out above (I'd probaby put in the test with 1, since it's an edge case, but otherwise test as you did).
</p>
<p>
I don't think there's a rigorous relationship between cyclomatic complexity and number of tests. In simple cases, treating
things as though the relationship exists can be helpful. But once you start having iterrelated branches in a function,
things get murky, and you may have to go to pedantic lengths in order to maintain the relationship. The same
thing goes for code coverage, which can be 100% even though you haven't actually tested all paths through your code if
there are multiple branches in the function that depend on each other.
</p>
</div>
<div class="comment-date">2023-05-08 15:30 UTC</div>
</div>
<div class="comment" id="61939b516c0e4c2caab7c6e8a3302595">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#61939b516c0e4c2caab7c6e8a3302595">#</a></div>
<div class="comment-content">
<p>
Thank you, all, for writing. I'm extraordinarily busy at the moment, so it'll take me longer than usual to respond. Rest assured, however, that I haven't forgotten.
</p>
</div>
<div class="comment-date">2023-05-11 12:42 UTC</div>
</div>
<div class="comment" id="e91eeb8bd09f446ab863f51ae30afad9">
<div class="comment-author"><a href="https://www.nikolamilekic.com">Nikola Milekic</a> <a href="#e91eeb8bd09f446ab863f51ae30afad9">#</a></div>
<div class="comment-content">
<p>
If we agree to the definition of cyclomatic complexity as the number of independent paths through a section of code, then the number of tests needed to cover that section <strong>must be</strong> the same per definition, <strong>if those tests are also independent</strong>. Independence is crucial here, and is also the main source of confusion. Both the <code>while</code> and <code>if</code> forks depend on the same variable (<code>current</code>), and so they are not independent.
</p>
<p>
The second test you wrote is similarly not independent, as it ends up tracing multiple paths through through <code>if</code>: odd for 5, and even for 16, 8, etc, and so ends up covering all paths. Had you picked 2 instead of 5 for the test, that would have been more independent, as it would not have traced the <code>else</code> path, requiring one additional test.
</p>
<p>
The standard way of computing cyclomatic complexity assumes independence, which simply is not possible in this case.
</p>
</div>
<div class="comment-date">2023-06-02 00:38 UTC</div>
</div>
<div class="comment" id="1fafb3fa289a415f9102dc8d6defc464">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#1fafb3fa289a415f9102dc8d6defc464">#</a></div>
<div class="comment-content">
<p>
Struan, thank you for writing, and please accept my apologies for the time it took me to respond. I agree with your calculations of cyclomatic complexity of your refactored code.
</p>
<p>
I agree with what you write, but you can't write a sentence like "however you write the "n >=1" case for [...] you will have to cover some of [..]" and expect me to just ignore it. To be clear, I agree with you in the particular case of the methods you provided, but you inspired me to refactor my code with that rule as a specific constraint. You can see the results in my new article <a href="/2023/06/12/collatz-sequences-by-function-composition">Collatz sequences by function composition</a>.
</p>
<p>
Thank you for the inspiration.
</p>
</div>
<div class="comment-date">2023-06-12 5:46 UTC</div>
</div>
<div class="comment" id="2878d9f87f90405aa64ed1d1400d8d2b">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#2878d9f87f90405aa64ed1d1400d8d2b">#</a></div>
<div class="comment-content">
<p>
Jeroen, thank you for writing, and please accept my apologies for the time it took me to respond. I should have read that Wikipedia article more closely, instead of just linking to it.
</p>
<p>
What still puzzles me is that I've been aware of, and actively used, cyclomatic complexity for more than a decade, and this distinction has never come up, and no-one has called me out on it.
</p>
<p>
As <a href="https://en.wikipedia.org/wiki/Ward_Cunningham#%22Cunningham's_Law%22">Cunningham's law</a> says, <em>the best way to get the right answer on the Internet is not to ask a question; it's to post the wrong answer.</em> Even so, I posted <a href="/2019/12/09/put-cyclomatic-complexity-to-good-use">Put cyclomatic complexity to good use</a> in 2019, and no-one contradicted it.
</p>
<p>
I don't mention this as an argument that I'm right. Obviously, I was wrong, but no-one told me. Have I had something in my teeth all these years, too?
</p>
</div>
<div class="comment-date">2023-06-12 6:35 UTC</div>
</div>
<div class="comment" id="da62194dabc947d0b3ecd7c4258b0e86">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#da62194dabc947d0b3ecd7c4258b0e86">#</a></div>
<div class="comment-content">
<p>
Brett, thank you for writing, and please accept my apologies for the time it took me to respond. I suppose that I failed to make my overall motivation clear. When doing proper test-driven development (TDD), one doesn't need cyclomatic complexity in order to think about coverage. When following the <a href="/2019/10/21/a-red-green-refactor-checklist">red-green-refactor checklist</a>, you only add enough code to pass all tests. With that process, cyclomatic complexity is rarely useful, and I tend to ignore it.
</p>
<p>
I do, however, often coach programmers in unit testing and TDD, and people new to the technique often struggle with basics. They add too much code, instead of the simplest thing that could possibly work, or they can't think of a good next test case to write.
</p>
<p>
When teaching TDD I sometimes suggest cyclomatic complexity as a metric to help decision-making. <em>Did we add more code to the System Under Test than warranted by tests? Is it okay to forgo writing a test of a one-liner with cyclomatic complexity of one?</em>
</p>
<p>
The metric is also useful in hybrid scenarios where you already have production code, and now you want to add <a href="https://en.wikipedia.org/wiki/Characterization_test">characterisation tests</a>: Which test cases should you <em>at least</em> write?
</p>
<p>
Another way to answer such questions is to run a code-coverage tool, but that often takes time. I find it useful to teach people about cyclomatic complexity, because it's a lightweight heuristic always at hand.
</p>
</div>
<div class="comment-date">2023-06-12 7:24 UTC</div>
</div>
<div class="comment" id="01b5af5ccab04843911cde37104c4a7c">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#01b5af5ccab04843911cde37104c4a7c">#</a></div>
<div class="comment-content">
<p>
Nikola, thank you for writing. The emphasis on independence is useful; I used compatible thinking in my new article <a href="/2023/06/12/collatz-sequences-by-function-composition">Collatz sequences by function composition</a>. By now, including the other comments to this article, it seems that we've been able to cover the problem better, and I, at least, feel that I've learned something.
</p>
<p>
I don't think, however, that the standard way of computing cyclomatic complexity assumes independence. You can easily compute the cyclomatic complexity of the above <code>Sequence</code> function, even though its branches aren't independent. Tooling such as Visual Studio seems to agree with me.
</p>
</div>
<div class="comment-date">2023-06-13 5:32 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Refactoring pure function composition without breaking existing testshttps://blog.ploeh.dk/2023/05/01/refactoring-pure-function-composition-without-breaking-existing-tests2023-05-01T06:44:00+00:00Mark Seemann
<div id="post">
<p>
<em>An example modifying a Haskell Gossiping Bus Drivers implementation.</em>
</p>
<p>
This is an article in an series of articles about the <a href="/2023/02/13/epistemology-of-interaction-testing">epistemology of interaction testing</a>. In short, this collection of articles discusses how to test the composition of <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a>. While a pure function is <a href="/2015/05/07/functional-design-is-intrinsically-testable">intrinsically testable</a>, how do you test the composition of pure functions? As the introductory article outlines, I consider it mostly a matter of establishing confidence. With <a href="/2018/11/12/what-to-test-and-not-to-test">enough test coverage</a> you can be confident that the composition produces the desired outputs.
</p>
<p>
Keep in mind that if you compose pure functions into a larger pure function, the composition is still pure. This implies that you can still test it by supplying input and verifying that the output is correct.
</p>
<p>
Tests that exercise the composition do so by verifying observable behaviour. This makes them more robust to refactoring. You'll see an example of that later in this article.
</p>
<h3 id="32b583422c354d0f8468406f0486a762">
Gossiping bus drivers <a href="#32b583422c354d0f8468406f0486a762">#</a>
</h3>
<p>
I recently did the <a href="https://kata-log.rocks/gossiping-bus-drivers-kata">Gossiping Bus Drivers</a> kata in <a href="https://www.haskell.org/">Haskell</a>. At first, I added the tests suggested in the kata description.
</p>
<p>
<pre>{-# OPTIONS_GHC -Wno-type-defaults #-}
<span style="color:blue;">module</span> Main <span style="color:blue;">where</span>
<span style="color:blue;">import</span> GossipingBusDrivers
<span style="color:blue;">import</span> Test.HUnit
<span style="color:blue;">import</span> Test.Framework.Providers.HUnit (<span style="color:#2b91af;">hUnitTestToTests</span>)
<span style="color:blue;">import</span> Test.Framework (<span style="color:#2b91af;">defaultMain</span>)
<span style="color:#2b91af;">main</span> <span style="color:blue;">::</span> <span style="color:#2b91af;">IO</span> ()
main = defaultMain $ hUnitTestToTests $ TestList [
<span style="color:#a31515;">"Kata examples"</span> ~: <span style="color:blue;">do</span>
(routes, expected) <-
[
([[3, 1, 2, 3],
[3, 2, 3, 1],
[4, 2, 3, 4, 5]],
Just 5),
([[2, 1, 2],
[5, 2, 8]],
Nothing)
]
<span style="color:blue;">let</span> actual = drive routes
<span style="color:blue;">return</span> $ expected ~=? actual
]</pre>
</p>
<p>
As I prefer them, these tests are <a href="/2018/04/30/parametrised-unit-tests-in-haskell">parametrised HUnit tests</a>.
</p>
<p>
The problem with those suggested test cases is that they don't provide enough confidence that an implementation is correct. In fact, I wrote this implementation to pass them:
</p>
<p>
<pre>drive routes = <span style="color:blue;">if</span> <span style="color:blue;">length</span> routes == 3 <span style="color:blue;">then</span> Just 5 <span style="color:blue;">else</span> Nothing</pre>
</p>
<p>
This is clearly incorrect. It just looks at the number of routes and returns a fixed value for each count. It doesn't look at the contents of the routes.
</p>
<p>
Even if you don't <a href="/2019/10/07/devils-advocate">try to deliberately cheat</a> I'm not convinced that these two tests are enough. You could <em>try</em> to write the correct implementation, but how do you know that you've correctly dealt with various edge cases?
</p>
<h3 id="0e40e3994772481f855b266452685852">
Helper function <a href="#0e40e3994772481f855b266452685852">#</a>
</h3>
<p>
The kata description isn't hard to understand, so while the suggested test cases seem insufficient, I knew what was required. Perhaps I could write a proper implementation without additional tests. After all, I was convinced that it'd be possible to do it with a <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a> of <em>1</em>, and since <a href="/2013/04/02/why-trust-tests">a test function also has a cyclomatic complexity of <em>1</em></a>, there's always that tension in test-driven development: Why write test code to exercise code with a cyclomatic complexity of <em>1?</em>.
</p>
<p>
To be clear: There are often good reasons to write tests even in this case, and this seems like one of them. <a href="/2019/12/09/put-cyclomatic-complexity-to-good-use">Cyclomatic complexity indicates a minimum number of test cases</a>, not necessarily a sufficient number.
</p>
<p>
Even though Haskell's type system is expressive, I soon found myself second-guessing the behaviour of various expressions that I'd experimented with. Sometimes I find GHCi (the Haskell <a href="https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop">REPL</a>) sufficiently edifying, but in this case I thought that I might want to keep some test cases around for a helper function that I was developing:
</p>
<p>
<pre><span style="color:blue;">import</span> Data.List
<span style="color:blue;">import</span> <span style="color:blue;">qualified</span> Data.Map.Strict <span style="color:blue;">as</span> Map
<span style="color:blue;">import</span> Data.Map.Strict (<span style="color:#2b91af;">(!)</span>)
<span style="color:blue;">import</span> <span style="color:blue;">qualified</span> Data.Set <span style="color:blue;">as</span> Set
<span style="color:blue;">import</span> Data.Set (<span style="color:blue;">Set</span>)
<span style="color:#2b91af;">evaluateStop</span> <span style="color:blue;">::</span> (<span style="color:blue;">Functor</span> f, <span style="color:blue;">Foldable</span> f, <span style="color:blue;">Ord</span> k, <span style="color:blue;">Ord</span> a)
=> f (k, Set a) -> f (k, Set a)
evaluateStop stopsAndDrivers =
<span style="color:blue;">let</span> gossip (stop, driver) = Map.insertWith Set.union stop driver
gossipAtStops = <span style="color:blue;">foldl</span>' (<span style="color:blue;">flip</span> gossip) Map.empty stopsAndDrivers
<span style="color:blue;">in</span> <span style="color:blue;">fmap</span> (\(stop, _) -> (stop, gossipAtStops ! stop)) stopsAndDrivers</pre>
</p>
<p>
I was fairly confident that this function worked as I intended, but I wanted to be sure. I needed some examples, so I added these tests:
</p>
<p>
<pre><span style="color:#a31515;">"evaluateStop examples"</span> ~: <span style="color:blue;">do</span>
(stopsAndDrivers, expected) <- [
([(1, fromList [1]), (2, fromList [2]), (1, fromList [1])],
[(1, fromList [1]), (2, fromList [2]), (1, fromList [1])]),
([(1, fromList [1]), (2, fromList [2]), (1, fromList [2])],
[(1, fromList [1, 2]), (2, fromList [2]), (1, fromList [1, 2])]),
([(1, fromList [1, 2, 3]), (1, fromList [2, 3, 4])],
[(1, fromList [1, 2, 3, 4]), (1, fromList [1, 2, 3, 4])])
]
<span style="color:blue;">let</span> actual = evaluateStop stopsAndDrivers
<span style="color:blue;">return</span> $ fromList expected ~=? fromList actual</pre>
</p>
<p>
They do, indeed, pass.
</p>
<p>
The idea behind that <code>evaluateStop</code> function is to evaluate the state at each 'minute' of the simulation. The first line of each test case is the state before the drivers meet, and the second line is the <em>expected</em> state after all drivers have gossiped.
</p>
<p>
My plan was to use some sort of left fold to keep evaluating states until all information has disseminated to all drivers.
</p>
<h3 id="06888628b5da4b4bb76fc59812c019da">
Property <a href="#06888628b5da4b4bb76fc59812c019da">#</a>
</h3>
<p>
Since I have already extolled the virtues of property-based testing in this article series, I wondered whether I could add some properties instead of relying on examples. Well, I did manage to add one <a href="https://hackage.haskell.org/package/QuickCheck">QuickCheck</a> property:
</p>
<p>
<pre>testProperty <span style="color:#a31515;">"drive image"</span> $ \ (routes :: [NonEmptyList Int]) ->
<span style="color:blue;">let</span> actual = drive $ <span style="color:blue;">fmap</span> getNonEmpty routes
<span style="color:blue;">in</span> isJust actual ==>
<span style="color:blue;">all</span> (\i -> 0 <= i && i <= 480) actual</pre>
</p>
<p>
There's not much to talk about here. The property only states that the result of the <code>drive</code> function must be between <code>0</code> and <code>480</code>, if it exists.
</p>
<p>
Such a property could vacuously pass if <code>drive</code> always returns <code>Nothing</code>, so I used the <code>==></code> QuickCheck combinator to make sure that the property is actually exercising only the <code>Just</code> cases.
</p>
<p>
Since the <code>drive</code> function only returns a number, apart from verifying its <a href="https://en.wikipedia.org/wiki/Image_(mathematics)">image</a> I couldn't think of any other general property to add.
</p>
<p>
You can always come up with more specific properties that explicitly set up more constrained test scenarios, but is it worth it?
</p>
<p>
It's always worthwhile to stop and think. If you're writing a 'normal' example-based test, consider whether a property would be better. Likewise, if you're about to write a property, consider whether an example would be better.
</p>
<p>
'Better' can mean more than one thing. Preventing regressions is one thing, but making the code maintainable is another. If you're writing a property that is too complicated, it might be better to write a simpler example-based test.
</p>
<p>
I could definitely think of some complicated properties, but I found that more examples might make the test code easier to understand.
</p>
<h3 id="64eac896c8ea4127bacccb8ca01cf2fb">
More examples <a href="#64eac896c8ea4127bacccb8ca01cf2fb">#</a>
</h3>
<p>
After all that angst and soul-searching, I added a few more examples to the first parametrised test:
</p>
<p>
<pre><span style="color:#a31515;">"Kata examples"</span> ~: <span style="color:blue;">do</span>
(routes, expected) <-
[
([[3, 1, 2, 3],
[3, 2, 3, 1],
[4, 2, 3, 4, 5]],
Just 5),
([[2, 1, 2],
[5, 2, 8]],
Nothing),
([[1, 2, 3, 4, 5],
[5, 6, 7, 8],
[3, 9, 6]],
Just 13),
([[1, 2, 3],
[2, 1, 3],
[2, 4, 5, 3]],
Just 5),
([[1, 2],
[2, 1]],
Nothing),
([[1]],
Just 0),
([[2],
[2]],
Just 1)
]
<span style="color:blue;">let</span> actual = drive routes
<span style="color:blue;">return</span> $ expected ~=? actual</pre>
</p>
<p>
The first two test cases are the same as before, and the last two are some edge cases I added myself. The middle three I adopted from <a href="https://dodona.ugent.be/en/activities/1792896126/">another page about the kata</a>. Since those examples turned out to be off by one, I did those examples on paper to verify that I understood what the expected value was. Then I adjusted them to my one-indexed results.
</p>
<h3 id="52417ab56528457891928ea551017f75">
Drive <a href="#52417ab56528457891928ea551017f75">#</a>
</h3>
<p>
The <code>drive</code> function now correctly implements the kata, I hope. At least it passes all the tests.
</p>
<p>
<pre><span style="color:#2b91af;">drive</span> <span style="color:blue;">::</span> (<span style="color:blue;">Num</span> b, <span style="color:blue;">Enum</span> b, <span style="color:blue;">Ord</span> a) <span style="color:blue;">=></span> [[a]] <span style="color:blue;">-></span> <span style="color:#2b91af;">Maybe</span> b
drive routes =
<span style="color:green;">-- Each driver starts with a single gossip. Any kind of value will do, as
</span> <span style="color:green;">-- long as each is unique. Here I use the one-based index of each route,
</span> <span style="color:green;">-- since it fulfills the requirements.
</span> <span style="color:blue;">let</span> drivers = <span style="color:blue;">fmap</span> Set.singleton [1 .. <span style="color:blue;">length</span> routes]
goal = Set.unions drivers
stops = transpose $ <span style="color:blue;">fmap</span> (<span style="color:blue;">take</span> 480 . <span style="color:blue;">cycle</span>) routes
propagation =
<span style="color:blue;">scanl</span> (\ds ss -> <span style="color:blue;">snd</span> <$> evaluateStop (<span style="color:blue;">zip</span> ss ds)) drivers stops
<span style="color:blue;">in</span> <span style="color:blue;">fmap</span> <span style="color:blue;">fst</span> $ find (<span style="color:blue;">all</span> (== goal) . <span style="color:blue;">snd</span>) $ <span style="color:blue;">zip</span> [0 ..] propagation</pre>
</p>
<p>
Haskell code can be information-dense, and if you don't have an integrated development environment (IDE) around, this may be hard to read.
</p>
<p>
<code>drivers</code> is a list of sets. Each set represents the gossip that a driver knows. At the beginning, each only knows one piece of gossip. The expression initialises each driver with a <code>singleton</code> set. Each piece of gossip is represented by a number, simply going from <code>1</code> to the number of routes. Incidentally, this is also the number of drivers, so you can consider the number <code>1</code> as a placeholder for the gossip that driver <em>1</em> knows, and so on.
</p>
<p>
The <code>goal</code> is the union of all the gossip. Once every driver's knowledge is equal to the <code>goal</code> the simulation can stop.
</p>
<p>
Since <code>evaluateStop</code> simulates one stop, the <code>drive</code> function needs a list of stops to fold. That's the <code>stops</code> value. In the very first example, you have three routes: <code>[3, 1, 2, 3]</code>, <code>[3, 2, 3, 1]</code>, and <code>[4, 2, 3, 4, 5]</code>. The first time the drivers stop (after one minute), the stops are <code>3</code>, <code>3</code>, and <code>4</code>. That is, the first element in <code>stops</code> would be the list <code>[3, 3, 4]</code>. The next one would be <code>[1, 2, 2]</code>, then <code>[2, 3, 3]</code>, and so on.
</p>
<p>
My plan all along was to use some sort of left fold to repeatedly run <code>evaluateStop</code> over each minute. Since I need to produce a list of states, <code>scanl</code> was an appropriate choice. The lambda expression that I have to pass to it, though, is more complicated than I appreciate. We'll return to that in a moment.
</p>
<p>
The <code>drive</code> function can now index the <code>propagation</code> list by zipping it with the infinite list <code>[0 ..]</code>, <code>find</code> the first element where <code>all</code> sets are equal to the <code>goal</code> set, and then return that index. That produces the correct results.
</p>
<h3 id="2b84136e78154ebdb81654efabf3d987">
The need for a better helper function <a href="#2b84136e78154ebdb81654efabf3d987">#</a>
</h3>
<p>
As I already warned, I wasn't happy with the lambda expression passed to <code>scanl</code>. It looks complicated and arcane. Is there a better way to express the same behaviour? Usually, when confronted with a nasty lambda expression like that, in Haskell my first instinct is to see if <a href="https://pointfree.io/">pointfree.io</a> has a better option. Alas, <code>(((<span style="color:blue;">snd</span> <$>) . evaluateStop) .) . <span style="color:blue;">flip</span> <span style="color:blue;">zip</span></code> hardly seems an improvement. That <code>flip zip</code> expression to the right, however, suggests that it might help flipping the arguments to <code>evaluateStop</code>.
</p>
<p>
When I developed the <code>evaluateStop</code> helper function, I found it intuitive to define it over a list of tuples, where the first element in the tuple is the stop, and the second element is the set of gossip that the driver at that stop knows.
</p>
<p>
The tuples don't <em>have</em> to be in that order, though. Perhaps if I flip the tuples that would make the lambda expression more readable. It was worth a try.
</p>
<h3 id="ca808b5a5fa3473c9046704e4bcc7357">
Confidence <a href="#ca808b5a5fa3473c9046704e4bcc7357">#</a>
</h3>
<p>
Since this article is part of a small series about the epistemology of testing composed functions, let's take a moment to reflect on the confidence we may have in the <code>drive</code> function.
</p>
<p>
Keep in mind the goal of the kata: Calculate the number of minutes it takes for all gossip to spread to all drivers. There's a few tests that verify that; seven examples and a fairly vacuous QuickCheck property. Is that enough to be confident that the function is correct?
</p>
<p>
If it isn't, I think the best option you have is to add more examples. For the sake of argument, however, let's assume that the tests are good enough.
</p>
<p>
When summarising the tests that cover the <code>drive</code> function, I didn't count the three examples that exercise <code>evaluateStop</code>. Do these three test cases improve your confidence in the <code>drive</code> function? A bit, perhaps, but keep in mind that <em>the kata description doesn't mandate that function.</em> It's just a helper function I created in order to decompose the problem.
</p>
<p>
Granted, having tests that cover a helper function does, to a degree, increase my confidence in the code. I have confidence in the function itself, but that is largely irrelevant, because the problem I'm trying to solve is <em>not</em> implementing this particular function. On the other hand, my confidence in <code>evaluateStop</code> means that I have increased confidence in the code that calls it.
</p>
<p>
Compared to interaction-based testing, I'm not <em>testing</em> that <code>drive</code> calls <code>evaluateStop</code>, but I can still verify that this happens. I can just look at the code.
</p>
<p>
The composition is already there in the code. What do I gain from replicating that composition with <a href="http://xunitpatterns.com/Test%20Stub.html">Stubs</a> and <a href="http://xunitpatterns.com/Test%20Spy.html">Spies</a>?
</p>
<p>
It's not a breaking change if I decide to implement <code>drive</code> in a different way.
</p>
<p>
What gives me confidence when composing pure functions isn't that I've subjected the composition to an interaction-based test. Rather, it's that the function is composed from trustworthy components.
</p>
<h3 id="a414a1e6b9e947d2b4632fcea7b916cf">
Strangler <a href="#a414a1e6b9e947d2b4632fcea7b916cf">#</a>
</h3>
<p>
My main grievance with Stubs and Spies is that <a href="/2022/10/17/stubs-and-mocks-break-encapsulation">they break encapsulation</a>. This may sound abstract, but is a real problem. This is the underlying reason that so many tests break when you refactor code.
</p>
<p>
This example code base, as other functional code that I write, avoids interaction-based testing. This makes it easier to refactor the code, as I will now demonstrate.
</p>
<p>
My goal is to change the <code>evaluateStop</code> helper function by flipping the tuples. If I just edit it, however, I'm going to (temporarily) break the <code>drive</code> function.
</p>
<p>
Katas typically result in small code bases where you can get away with a lot of bad practices that wouldn't work in a larger code base. To be honest, the refactoring I have in mind can be completed in a few minutes with a brute-force approach. Imagine, however, that we can't break compatibility of the <code>evaluateStop</code> function for the time being. Perhaps, had we had a larger code base, there were other code that depended on this function. At the very least, the tests do.
</p>
<p>
Instead of brute-force changing the function, I'm going to make use of the <a href="https://martinfowler.com/bliki/StranglerFigApplication.html">Strangler</a> pattern, as I've also described in my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>.
</p>
<p>
Leave the existing function alone, and add a new one. You can typically copy and paste the existing code and then make the necessary changes. In that way, you break neither client code nor tests, because there are none.
</p>
<p>
<pre><span style="color:#2b91af;">evaluateStop'</span> <span style="color:blue;">::</span> (<span style="color:blue;">Functor</span> f, <span style="color:blue;">Foldable</span> f, <span style="color:blue;">Ord</span> k, <span style="color:blue;">Ord</span> a)
=> f (Set a, k) -> f (Set a, k)
evaluateStop' driversAndStops =
<span style="color:blue;">let</span> gossip (driver, stop) = Map.insertWith Set.union stop driver
gossipAtStops = <span style="color:blue;">foldl</span>' (<span style="color:blue;">flip</span> gossip) Map.empty driversAndStops
<span style="color:blue;">in</span> <span style="color:blue;">fmap</span> (\(_, stop) -> (gossipAtStops ! stop, stop)) driversAndStops</pre>
</p>
<p>
In a language like C# you can often get away with overloading a method name, but Haskell doesn't have overloading. Since I consider this side-by-side situation to be temporary, I've appended a prime after the function name. This is a fairly normal convention in Haskell, I gather.
</p>
<p>
The only change this function represents is that I've swapped the tuple order.
</p>
<p>
Once you've added the new function, you may want to copy, paste and edit the tests. Or perhaps you want to do the tests first. During this process, make <a href="https://www.industriallogic.com/blog/whats-this-about-micro-commits/">micro-commits</a> so that you can easily suspend your 'refactoring' activity if something more important comes up.
</p>
<p>
Once everything is in place, you can change the <code>drive</code> function:
</p>
<p>
<pre><span style="color:#2b91af;">drive</span> <span style="color:blue;">::</span> (<span style="color:blue;">Num</span> b, <span style="color:blue;">Enum</span> b, <span style="color:blue;">Ord</span> a) <span style="color:blue;">=></span> [[a]] <span style="color:blue;">-></span> <span style="color:#2b91af;">Maybe</span> b
drive routes =
<span style="color:green;">-- Each driver starts with a single gossip. Any kind of value will do, as
</span> <span style="color:green;">-- long as each is unique. Here I use the one-based index of each route,
</span> <span style="color:green;">-- since it fulfills the requirements.
</span> <span style="color:blue;">let</span> drivers = <span style="color:blue;">fmap</span> Set.singleton [1 .. <span style="color:blue;">length</span> routes]
goal = Set.unions drivers
stops = transpose $ <span style="color:blue;">fmap</span> (<span style="color:blue;">take</span> 480 . <span style="color:blue;">cycle</span>) routes
propagation =
<span style="color:blue;">scanl</span> (\ds ss -> <span style="color:blue;">fst</span> <$> evaluateStop' (<span style="color:blue;">zip</span> ds ss)) drivers stops
<span style="color:blue;">in</span> <span style="color:blue;">fmap</span> <span style="color:blue;">fst</span> $ find (<span style="color:blue;">all</span> (== goal) . <span style="color:blue;">snd</span>) $ <span style="color:blue;">zip</span> [0 ..] propagation</pre>
</p>
<p>
Notice that the type of <code>drive</code> hasn't change, and neither has the behaviour. This means that although I've changed the composition (the <em>interaction</em>) no tests broke.
</p>
<p>
Finally, once I moved all code over, I deleted the old function and renamed the new one to take its place.
</p>
<h3 id="6be4ec66c14c4adbbb8bcb60307c7ba3">
Was it all worth it? <a href="#6be4ec66c14c4adbbb8bcb60307c7ba3">#</a>
</h3>
<p>
At first glance, it doesn't look as though much was gained. What happens if I eta-reduce the new lambda expression?
</p>
<p>
<pre><span style="color:#2b91af;">drive</span> <span style="color:blue;">::</span> (<span style="color:blue;">Num</span> b, <span style="color:blue;">Enum</span> b, <span style="color:blue;">Ord</span> a) <span style="color:blue;">=></span> [[a]] <span style="color:blue;">-></span> <span style="color:#2b91af;">Maybe</span> b
drive routes =
<span style="color:green;">-- Each driver starts with a single gossip. Any kind of value will do, as
</span> <span style="color:green;">-- long as each is unique. Here I use the one-based index of each route,
</span> <span style="color:green;">-- since it fulfills the requirements.
</span> <span style="color:blue;">let</span> drivers = <span style="color:blue;">fmap</span> Set.singleton [1 .. <span style="color:blue;">length</span> routes]
goal = Set.unions drivers
stops = transpose $ <span style="color:blue;">fmap</span> (<span style="color:blue;">take</span> 480 . <span style="color:blue;">cycle</span>) routes
propagation = <span style="color:blue;">scanl</span> (((<span style="color:blue;">fmap</span> <span style="color:blue;">fst</span> . evaluateStop) .) . <span style="color:blue;">zip</span>) drivers stops
<span style="color:blue;">in</span> <span style="color:blue;">fmap</span> <span style="color:blue;">fst</span> $ find (<span style="color:blue;">all</span> (== goal) . <span style="color:blue;">snd</span>) $ <span style="color:blue;">zip</span> [0 ..] propagation</pre>
</p>
<p>
Not much better. I can now fit the <code>propagation</code> expression on a single line of code and still stay within a <a href="/2019/11/04/the-80-24-rule">80x24 box</a>, but that's about it. Is <code>((<span style="color:blue;">fmap</span> <span style="color:blue;">fst</span> . evaluateStop) .) . <span style="color:blue;">zip</span></code> more readable than what we had before?
</p>
<p>
Hardly, I admit. I might consider reverting, and since I've been <a href="https://stackoverflow.blog/2022/12/19/use-git-tactically/">using Git tactically</a>, I have that option.
</p>
<p>
If I hadn't tried, though, I wouldn't have known.
</p>
<h3 id="5bf92cabf46e4ec9a6a41b51dd29e876">
Conclusion <a href="#5bf92cabf46e4ec9a6a41b51dd29e876">#</a>
</h3>
<p>
When composing one pure function with another, how can you test that the outer function correctly calls the inner function?
</p>
<p>
By the same way that you test any other pure function. The only way you can observe whether a pure function works as intended is to compare its actual output to the output you expect its input to produce. How it arrives at that output is irrelevant. It could be looking up all results in a big table. As long as the result is correct, the function is correct.
</p>
<p>
In this article, you saw an example of how to test a composed function, as well as how to refactor it without breaking tests.
</p>
<p>
<strong>Next:</strong> <a href="/2023/06/19/when-is-an-implementation-detail-an-implementation-detail">When is an implementation detail an implementation detail?</a>
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Are pull requests bad because they originate from open-source development?https://blog.ploeh.dk/2023/04/24/are-pull-requests-bad-because-they-originate-from-open-source-development2023-04-24T06:08:00+00:00Mark Seemann
<div id="post">
<p>
<em>I don't think so, and at least find the argument flawed.</em>
</p>
<p>
Increasingly I come across a quote that goes like this:
</p>
<blockquote>
<p>
Pull requests were invented for open source projects where you want to gatekeep changes from people you don't know and don't trust to change the code safely.
</p>
</blockquote>
<p>
If you're wondering where that 'quote' comes from, then read on. I'm not trying to stand up a straw man, but I had to do a bit of digging in order to find the source of what almost seems like a <a href="https://en.wikipedia.org/wiki/Meme">meme</a>.
</p>
<h3 id="c347774c419941a9987c74c95b6f91cd">
Quote investigation <a href="#c347774c419941a9987c74c95b6f91cd">#</a>
</h3>
<p>
The quote is usually attributed to <a href="https://www.davefarley.net/">Dave Farley</a>, who is a software luminary that <a href="https://www.goodreads.com/review/show/4812673890">I respect tremendously</a>. Even with the attribution, the source is typically missing, but after asking around, <a href="https://twitter.com/MitjaBezensek/status/1626165418296590336">Mitja Bezenšek pointed me in the right direction</a>.
</p>
<p>
The source is most likely a video, from which I've transcribed a longer passage:
</p>
<blockquote>
<p>
"Pull requests were invented to gatekeep access to open-source projects. In open source, it's very common that not everyone is given free access to changing the code, so contributors will issue a pull request so that a trusted person can then approve the change.
</p>
<p>
"I think this is really bad way to organise a development team.
</p>
<p>
"If you can't trust your team mates to make changes carefully, then your version control system is not going to fix that for you."
</p>
<footer><cite><a href="https://youtu.be/UQrlEXU6RM8">Dave Farley</a></cite></footer>
</blockquote>
<p>
I've made an effort to transcribe as faithfully as possible, but if you really want to be sure what Dave Farley said, watch the video. The quote comes twelve minutes in.
</p>
<h3 id="60fb5776c68b464d9ae77ad601e8c99b">
My biases <a href="#60fb5776c68b464d9ae77ad601e8c99b">#</a>
</h3>
<p>
I agree that the argument sounds compelling, but I find it flawed. Before I proceed to put forward my arguments I want to make my own biases clear. Arguing against someone like Dave Farley is not something I take lightly. As far as I can tell, he's worked on systems more impressive than any I can showcase. I also think he has more industry experience than I have.
</p>
<p>
That doesn't necessarily make him right, but on the other hand, why should you side with me, with my less impressive résumé?
</p>
<p>
My objective is not to attack Dave Farley, or any other person for that matter. My agenda is the argument itself. I do, however, find it intellectually honest to cite sources, with the associated risk that my argument may look like a personal attack. To steelman my opponent, then, I'll try to put my own biases on display. To the degree I'm aware of them.
</p>
<p>
I prefer pull requests over pair and ensemble programming. I've tried all three, and I do admit that real-time collaboration has obvious advantages, but I find pairing or ensemble programming exhausting.
</p>
<p>
Since <a href="https://www.goodreads.com/review/show/440837121">I read <em>Quiet</em></a> a decade ago, I've been alert to the introspective side of my personality. Although I agree with <a href="http://www.exampler.com/about/">Brian Marick</a> that one should <a href="https://podcast.oddly-influenced.dev/episodes/not-a-ted-talk-relevant-results-from-psychology">be wary of understanding personality traits as destiny</a>, I mostly prefer solo activities.
</p>
<p>
Increasingly, since I became self-employed, I've arranged my life to maximise the time I can work from home. The exercise regimen I've chosen for myself is independent of other people: I run, and lift weights at home. You may have noticed that I like writing. I like reading as well. And, hardly surprising, I prefer writing code in splendid isolation.
</p>
<p>
Even so, I find it perfectly possible to have meaningful relationships with other people. After all, I've been married to the same woman for decades, my (mostly) grown kids haven't fled from home, and I have friends that I've known for decades.
</p>
<p>
In a toot that I can no longer find, Brian Marick asked (and I paraphrase from memory): <em>If you've tried a technique and didn't like it, what would it take to make you like it?</em>
</p>
<p>
As a self-professed introvert, social interaction <em>does</em> tire me, but I still enjoy hanging out with friends or family. What makes those interactions different? Well, often, there's good food and wine involved. Perhaps ensemble programming would work better for me with a bottle of Champagne.
</p>
<p>
Other forces influence my preferences as well. I like the <a href="/2023/02/20/a-thought-on-workplace-flexibility-and-asynchrony">flexibility provided by asynchrony</a>, and similarly dislike having to be somewhere at a specific time.
</p>
<p>
Having to be somewhere also involves transporting myself there, which I also don't appreciate.
</p>
<p>
In short, I prefer pull requests over pairing and ensemble programming. All of that, however, is just my subjective opinion, and <a href="/2020/10/12/subjectivity">that's not an argument</a>.
</p>
<h3 id="c83cc60f53e049edbdae29dca4402563">
Counter-examples <a href="#c83cc60f53e049edbdae29dca4402563">#</a>
</h3>
<p>
The above tirade about my biases is <em>not</em> a refutation of Dave Farley's argument. Rather, I wanted to put my own blind spots on display. If you suspect me of <a href="https://en.wikipedia.org/wiki/Motivated_reasoning">motivated reasoning</a>, that just might be the case.
</p>
<p>
All that said, I want to challenge the argument.
</p>
<p>
First, it includes an appeal to <em>trust</em>, which is <a href="/2023/03/20/on-trust-in-software-development">a line of reasoning with which I don't agree</a>. You can't trust your colleagues, just like you can't trust yourself. A code review serves more purposes than keeping malicious actors out of the code base. It also helps catch mistakes, security issues, or misunderstandings. It can also improve shared understanding of common goals and standards. Yes, this is <em>also</em> possible with other means, such as pair or ensemble programming, but from that, it doesn't follow that code reviews <em>can't</em> do that. They can. I've lived that dream.
</p>
<p>
If you take away the appeal to trust, though, there isn't much left of the argument. What remains is essentially: <em>Pull requests were invented to solve a particular problem in open-source development. Internal software development is not open source. Pull requests are bad for internal software development.</em>
</p>
<p>
That an invention was done in one context, however, doesn't preclude it from being useful in another. Git was invented to address an open-source problem. Should we stop using Git for internal software development?
</p>
<p>
<a href="https://en.wikipedia.org/wiki/Solar_cell">Solar panels were originally developed for satellites and space probes</a>. Does that mean that we shouldn't use them on Earth?
</p>
<p>
<a href="https://en.wikipedia.org/wiki/Global_Positioning_System">GPS was invented for use by the US military</a>. Does that make civilian use wrong?
</p>
<h3 id="a30f67d73f0e488aac23dccb723370f8">
Are pull requests bad? <a href="#a30f67d73f0e488aac23dccb723370f8">#</a>
</h3>
<p>
I find the original <em>argument</em> logically flawed, but if I insist on logic, I'm also obliged to admit that my <a href="/ref/predicate-logic">possible-world counter-examples</a> don't prove that pull requests are good.
</p>
<p>
Dave Farley's claim may still turn out to be true. Not because of the argument he gives, but perhaps for other reasons.
</p>
<p>
I think I understand where the dislike of pull requests come from. As they are often practised, pull requests can sit for days with no-one looking at them. This creates unnecessary delays. If this is the only way you know of working with pull requests, no wonder you don't like them.
</p>
<p>
<a href="/2021/06/21/agile-pull-requests">I advocate a more agile workflow for pull requests</a>. I consider that congruent with <a href="/2023/01/23/agilean">my view on agile development</a>.
</p>
<h3 id="e0d3bb23dda34ae98241ff2bad442794">
Conclusion <a href="#e0d3bb23dda34ae98241ff2bad442794">#</a>
</h3>
<p>
Pull requests are often misused, but they don't have to be. On the other hand, that's just my experience and subjective preference.
</p>
<p>
Dave Farley has argued that pull requests are a bad way to organise a development team. I've argued that the argument is logically flawed.
</p>
<p>
The question remains unsettled. I've attempted to refute one particular argument, and even if you accept my counter-examples, pull requests may still be bad. Or good.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="bdd051fb26464bdbbc056ddea07712d5">
<div class="comment-author"><a href="https://cwb.dk/">Casper Weiss Bang</a> <a href="#bdd051fb26464bdbbc056ddea07712d5">#</a></div>
<div class="comment-content">
<p>
Another important angle, for me, is that pull requests are not merely code review. It can also be a way of enforcing a variety of automated checks, i.e. running tests or linting etc. This enforces quality too - so I'd argue to use pull requests even if you don't do peer review (I do on my hobby projects atleast, for the exact reasons you mentioned in <a href="https://blog.ploeh.dk/2023/03/20/on-trust-in-software-development/">On trust in software development</a> - I don't trust myself to be perfect.)
</p>
</div>
<div class="comment-date">2023-04-26 10:26 UTC</div>
</div>
<div class="comment" id="9b022dd663d34feba170006de5b66af4">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#9b022dd663d34feba170006de5b66af4">#</a></div>
<div class="comment-content">
<p>
Casper, thank you for writing. Indeed, other readers have made similar observations on other channels (Twitter, Mastodon). That, too, can be a benefit.
</p>
<p>
In order to once more steel-man 'the other side', they'd probably say that you can run automated checks in your Continuous Delivery pipeline, and halt it if automated checks fail.
</p>
<p>
When done this way, it's useful to be able to also run the same tests on your dev box. I consider that a good practice anyway.
</p>
</div>
<div class="comment-date">2023-04-28 14:49 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.A restaurant example of refactoring from example-based to property-based testinghttps://blog.ploeh.dk/2023/04/17/a-restaurant-example-of-refactoring-from-example-based-to-property-based-testing2023-04-17T06:37:00+00:00Mark Seemann
<div id="post">
<p>
<em>A C# example with xUnit.net and FsCheck.</em>
</p>
<p>
This is the second comprehensive example that accompanies the article <a href="/2023/02/13/epistemology-of-interaction-testing">Epistemology of interaction testing</a>. In that article, I argue that in a code base that leans toward functional programming (FP), property-based testing is a better fit than interaction-based testing. In this example, I will show how to refactor realistic <a href="/2019/02/18/from-interaction-based-to-state-based-testing">state-based tests</a> into (state-based) property-based tests.
</p>
<p>
The <a href="/2023/04/03/an-abstract-example-of-refactoring-from-interaction-based-to-property-based-testing">previous article</a> showed a <a href="https://en.wikipedia.org/wiki/Minimal_reproducible_example">minimal and self-contained example</a> that had the advantage of being simple, but the disadvantage of being perhaps too abstract and unrelatable. In this article, then, I will attempt to show a more realistic and concrete example. It actually doesn't start with interaction-based testing, since it's already written in the style of <a href="https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell">Functional Core, Imperative Shell</a>. On the other hand, it shows how to refactor from concrete example-based tests to property-based tests.
</p>
<p>
I'll use the online restaurant reservation code base that accompanies my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>.
</p>
<h3 id="e7aaa6310292411ab830de17f5906777">
Smoke test <a href="#e7aaa6310292411ab830de17f5906777">#</a>
</h3>
<p>
I'll start with a simple test which was, if I remember correctly, the second test I wrote for this code base. It was a smoke test that I wrote to drive a <a href="https://wiki.c2.com/?WalkingSkeleton">walking skeleton</a>. It verifies that if you post a valid reservation request to the system, you receive an HTTP response in the <code>200</code> range.
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task <span style="font-weight:bold;color:#74531f;">PostValidReservation</span>()
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">api</span> = <span style="color:blue;">new</span> LegacyApi();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = <span style="color:blue;">new</span> ReservationDto
{
At = DateTime.Today.AddDays(778).At(19, 0)
.ToIso8601DateTimeString(),
Email = <span style="color:#a31515;">"katinka@example.com"</span>,
Name = <span style="color:#a31515;">"Katinka Ingabogovinanana"</span>,
Quantity = 2
};
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">response</span> = <span style="color:blue;">await</span> api.PostReservation(expected);
response.EnsureSuccessStatusCode();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = <span style="color:blue;">await</span> response.ParseJsonContent<ReservationDto>();
Assert.Equal(expected, actual, <span style="color:blue;">new</span> ReservationDtoComparer());
}</pre>
</p>
<p>
Over the lifetime of the code base, I embellished and edited the test to reflect the evolution of the system as well as my understanding of it. Thus, when I wrote it, it may not have looked exactly like this. Even so, I kept it around even though other, more detailed tests eventually superseded it.
</p>
<p>
One characteristic of this test is that it's quite concrete. When I originally wrote it, I hard-coded the date and time as well. Later, however, <a href="/2021/01/11/waiting-to-happen">I discovered that I had to make the time relative to the system clock</a>. Thus, as you can see, the <code>At</code> property isn't a literal value, but all other properties (<code>Email</code>, <code>Name</code>, and <code>Quantity</code>) are.
</p>
<p>
This test is far from abstract or data-driven. Is it possible to turn such a test into a property-based test? Yes, I'll show you how.
</p>
<p>
A word of warning before we proceed: Tests with concrete, literal, easy-to-understand examples are valuable as programmer documentation. A person new to the code base can peruse such tests and learn about the system. Thus, this test is <em>already quite valuable as it is</em>. In a real, living code base, I'd prefer leaving it as it is, instead of turning it into a property-based test.
</p>
<p>
Since it's a simple and concrete test, on the other hand, it's easy to understand, and thus also a a good place to start. Thus, I'm going to refactor it into a property-based test; not because I think that you should (I don't), but because I think it'll be easy for you, the reader, to follow along. In other words, it's a good introduction to the process of turning a concrete test into a property-based test.
</p>
<h3 id="9dabcfae9e284a0ab9748cf817f4b2f9">
Adding parameters <a href="#9dabcfae9e284a0ab9748cf817f4b2f9">#</a>
</h3>
<p>
This code base already uses <a href="https://fscheck.github.io/FsCheck/">FsCheck</a> so it makes sense to stick to that framework for property-based testing. While it's written in <a href="https://fsharp.org/">F#</a> you can use it from C# as well. The easiest way to use it is as a parametrised test. This is possible with the <a href="https://www.nuget.org/packages/FsCheck.Xunit">FsCheck.Xunit</a> glue library. In fact, as I refactor the <code>PostValidReservation</code> test, it'll look much like the <a href="https://github.com/AutoFixture/AutoFixture">AutoFixture</a>-driven tests from <a href="/2023/04/03/an-abstract-example-of-refactoring-from-interaction-based-to-property-based-testing">the previous article</a>.
</p>
<p>
When turning concrete examples into properties, it helps to consider whether literal values are representative of an equivalence class. In other words, is that particular value important, or is there a wider set of values that would be just as good? For example, why is the test making a reservation 778 days in the future? Why not 777 or 779? Is the value <em>778</em> important? Not really. What's important is that the reservation is in the future. How far in the future actually isn't important. Thus, we can replace the literal value <code>778</code> with a parameter:
</p>
<p>
<pre>[Property]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task <span style="font-weight:bold;color:#74531f;">PostValidReservation</span>(PositiveInt <span style="font-weight:bold;color:#1f377f;">days</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">api</span> = <span style="color:blue;">new</span> LegacyApi();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = <span style="color:blue;">new</span> ReservationDto
{
At = DateTime.Today.AddDays((<span style="color:blue;">int</span>)days).At(19, 0)
.ToIso8601DateTimeString(),
<span style="color:green;">// The rest of the test...</span></pre>
</p>
<p>
Notice that I've replaced the literal value <code>778</code> with the method parameter <code>days</code>. The <code>PositiveInt</code> type is a type from FsCheck. It's a wrapper around <code>int</code> that guarantees that the value is positive. This is important because we don't want to make a reservation in the past. The <code>PositiveInt</code> type is a good choice because it's a type that's already available with FsCheck, and the framework knows how to generate valid values. Since it's a wrapper, though, the test needs to unwrap the value before using it. This is done with the <code>(int)days</code> cast.
</p>
<p>
Notice, also, that I've replaced the <code>[Fact]</code> attribute with the <code>[Property]</code> attribute that comes with FsCheck.Xunit. This is what enables FsCheck to automatically generate test cases and feed them to the test method. You can't always do this, as you'll see later, but when you can, it's a nice and succinct way to express a property-based test.
</p>
<p>
Already, the <code>PostValidReservation</code> test method is 100 test cases (the FsCheck default), rather than one.
</p>
<p>
What about <code>Email</code> and <code>Name</code>? Is it important for the test that these values are exactly <em>katinka@example.com</em> and <em>Katinka Ingabogovinanana</em> or might other values do? The answer is that it's not important. What's important is that the values are valid, and essentially any non-null string is. Thus, we can replace the literal values with parameters:
</p>
<p>
<pre>[Property]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task <span style="font-weight:bold;color:#74531f;">PostValidReservation</span>(
PositiveInt <span style="font-weight:bold;color:#1f377f;">days</span>,
StringNoNulls <span style="font-weight:bold;color:#1f377f;">email</span>,
StringNoNulls <span style="font-weight:bold;color:#1f377f;">name</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">api</span> = <span style="color:blue;">new</span> LegacyApi();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = <span style="color:blue;">new</span> ReservationDto
{
At = DateTime.Today.AddDays((<span style="color:blue;">int</span>)days).At(19, 0)
.ToIso8601DateTimeString(),
Email = email.Item,
Name = name.Item,
Quantity = 2
};
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">response</span> = <span style="color:blue;">await</span> api.PostReservation(expected);
response.EnsureSuccessStatusCode();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">actual</span> = <span style="color:blue;">await</span> response.ParseJsonContent<ReservationDto>();
Assert.Equal(expected, actual, <span style="color:blue;">new</span> ReservationDtoComparer());
}</pre>
</p>
<p>
The <code>StringNoNulls</code> type is another FsCheck wrapper, this time around <code>string</code>. It ensures that FsCheck will generate no null strings. This time, however, a cast isn't possible, so instead I had to pull the wrapped string out of the value with the <code>Item</code> property.
</p>
<p>
That's enough conversion to illustrate the process.
</p>
<p>
What about the literal values <em>19</em>, <em>0</em>, or <em>2?</em> Shouldn't we parametrise those as well? While we could, that takes a bit more effort. The problem is that with these values, any old positive integer isn't going to work. For example, the number <em>19</em> is the hour component of the reservation time; that is, the reservation is for 19:00. Clearly, we can't just let FsCheck generate any positive integer, because most integers aren't going to work. For example, <em>5</em> doesn't work because it's in the early morning, and the restaurant isn't open at that time.
</p>
<p>
Like other property-based testing frameworks FsCheck has an API that enables you to constrain value generation, but it doesn't work with the type-based approach I've used so far. Unlike <code>PositiveInt</code> there's no <code>TimeBetween16And21</code> wrapper type.
</p>
<p>
You'll see what you can do to control how FsCheck generates values, but I'll use another test for that.
</p>
<h3 id="80844424b40a48f4931c78d91d865323">
Parametrised unit test <a href="#80844424b40a48f4931c78d91d865323">#</a>
</h3>
<p>
The <code>PostValidReservation</code> test is a high-level smoke test that gives you an idea about how the system works. It doesn't, however, reveal much about the possible variations in input. To drive such behaviour, I wrote and evolved the following state-based test:
</p>
<p>
<pre>[Theory]
[InlineData(1049, 19, 00, <span style="color:#a31515;">"juliad@example.net"</span>, <span style="color:#a31515;">"Julia Domna"</span>, 5)]
[InlineData(1130, 18, 15, <span style="color:#a31515;">"x@example.com"</span>, <span style="color:#a31515;">"Xenia Ng"</span>, 9)]
[InlineData( 956, 16, 55, <span style="color:#a31515;">"kite@example.edu"</span>, <span style="color:blue;">null</span>, 2)]
[InlineData( 433, 17, 30, <span style="color:#a31515;">"shli@example.org"</span>, <span style="color:#a31515;">"Shanghai Li"</span>, 5)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task <span style="font-weight:bold;color:#74531f;">PostValidReservationWhenDatabaseIsEmpty</span>(
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">days</span>,
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">hours</span>,
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">minutes</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">email</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">name</span>,
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">quantity</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">at</span> = DateTime.Now.Date + <span style="color:blue;">new</span> TimeSpan(days, hours, minutes, 0);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> FakeDatabase();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ReservationsController(
<span style="color:blue;">new</span> SystemClock(),
<span style="color:blue;">new</span> InMemoryRestaurantDatabase(Grandfather.Restaurant),
db);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = <span style="color:blue;">new</span> Reservation(
<span style="color:blue;">new</span> Guid(<span style="color:#a31515;">"B50DF5B1-F484-4D99-88F9-1915087AF568"</span>),
at,
<span style="color:blue;">new</span> Email(email),
<span style="color:blue;">new</span> Name(name ?? <span style="color:#a31515;">""</span>),
quantity);
<span style="color:blue;">await</span> sut.Post(expected.ToDto());
Assert.Contains(expected, db.Grandfather);
}</pre>
</p>
<p>
This test gives more details, without exercising all possible code paths of the system. It's still a <a href="/2012/06/27/FacadeTest">Facade Test</a> that covers 'just enough' of the integration with underlying components to provide confidence that things work as they should. All the business logic is implemented by a class called <code>MaitreD</code>, which is covered by its own set of targeted unit tests.
</p>
<p>
While parametrised, this is still only four test cases, so perhaps you don't have sufficient confidence that everything works as it should. Perhaps, as I've outlined in <a href="/2023/02/13/epistemology-of-interaction-testing">the introductory article</a>, it would help if we converted it to an FsCheck property.
</p>
<h3 id="707d58026e914b708e6394b5d1d2abad">
Parametrised property <a href="#707d58026e914b708e6394b5d1d2abad">#</a>
</h3>
<p>
I find it safest to refactor this parametrised test to a property in a series of small steps. This implies that I need to keep the <code>[InlineData]</code> attributes around for a while longer, removing one or two literal values at a time, turning them into randomly generated values.
</p>
<p>
From the previous test we know that the <code>Email</code> and <code>Name</code> values are almost unconstrained. This means that they are trivial in themselves to have FsCheck generate. That change, in itself, is easy, which is good, because combining an <code>[InlineData]</code>-driven <code>[Theory]</code> with an FsCheck property is enough of a mouthful for one refactoring step:
</p>
<p>
<pre>[Theory]
[InlineData(1049, 19, 00, 5)]
[InlineData(1130, 18, 15, 9)]
[InlineData( 956, 16, 55, 2)]
[InlineData( 433, 17, 30, 5)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">PostValidReservationWhenDatabaseIsEmpty</span>(
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">days</span>,
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">hours</span>,
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">minutes</span>,
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">quantity</span>)
{
Prop.ForAll(
(<span style="color:blue;">from</span> r <span style="color:blue;">in</span> Gens.Reservation
<span style="color:blue;">select</span> r).ToArbitrary(),
<span style="color:blue;">async</span> <span style="font-weight:bold;color:#1f377f;">r</span> =>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">at</span> = DateTime.Now.Date + <span style="color:blue;">new</span> TimeSpan(days, hours, minutes, 0);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> FakeDatabase();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ReservationsController(
<span style="color:blue;">new</span> SystemClock(),
<span style="color:blue;">new</span> InMemoryRestaurantDatabase(Grandfather.Restaurant),
db);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = r
.WithQuantity(quantity)
.WithDate(at);
<span style="color:blue;">await</span> sut.Post(expected.ToDto());
Assert.Contains(expected, db.Grandfather);
}).QuickCheckThrowOnFailure();
}</pre>
</p>
<p>
I've now managed to get rid of the <code>email</code> and <code>name</code> parameters, so I've also removed those values from the <code>[InlineData]</code> attributes. Instead, I've asked FsCheck to generate a valid reservation <code>r</code>, which comes with both valid <code>Email</code> and <code>Name</code>.
</p>
<p>
It turned out that this code base already had some custom generators in a static class called <code>Gens</code>, so I reused those:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">static</span> Gen<Email> Email =>
<span style="color:blue;">from</span> s <span style="color:blue;">in</span> ArbMap.Default.GeneratorFor<NonWhiteSpaceString>()
<span style="color:blue;">select</span> <span style="color:blue;">new</span> Email(s.Item);
<span style="color:blue;">internal</span> <span style="color:blue;">static</span> Gen<Name> Name =>
<span style="color:blue;">from</span> s <span style="color:blue;">in</span> ArbMap.Default.GeneratorFor<StringNoNulls>()
<span style="color:blue;">select</span> <span style="color:blue;">new</span> Name(s.Item);
<span style="color:blue;">internal</span> <span style="color:blue;">static</span> Gen<Reservation> Reservation =>
<span style="color:blue;">from</span> id <span style="color:blue;">in</span> ArbMap.Default.GeneratorFor<Guid>()
<span style="color:blue;">from</span> d <span style="color:blue;">in</span> ArbMap.Default.GeneratorFor<DateTime>()
<span style="color:blue;">from</span> e <span style="color:blue;">in</span> Email
<span style="color:blue;">from</span> n <span style="color:blue;">in</span> Name
<span style="color:blue;">from</span> q <span style="color:blue;">in</span> ArbMap.Default.GeneratorFor<PositiveInt>()
<span style="color:blue;">select</span> <span style="color:blue;">new</span> Reservation(id, d, e, n, q.Item);</pre>
</p>
<p>
As was also the case with <a href="https://github.com/AnthonyLloyd/CsCheck">CsCheck</a> you typically use <a href="/2022/03/28/monads">syntactic sugar for monads</a> (which in C# is query syntax) to compose complex <a href="/2023/02/27/test-data-generator-monad">test data generators</a> from simpler generators. This enables me to generate an entire <code>Reservation</code> object with a single expression.
</p>
<h3 id="05d4d9e8c07b4162bd1b65347200456f">
Time of day <a href="#05d4d9e8c07b4162bd1b65347200456f">#</a>
</h3>
<p>
Some of the values (such as the reservation's name and email address) that are involved in the <code>PostValidReservationWhenDatabaseIsEmpty</code> test don't really matter. Other values are constrained in some way. Even for the reservation <code>r</code> the above version of the test has to override the arbitrarily generated <code>r</code> value with a specific <code>quantity</code> and a specific <code>at</code> value. This is because you can't just reserve any quantity at any time of day. The restaurant has opening hours and actual tables. Most likely, it doesn't have a table for 100 people at 3 in the morning.
</p>
<p>
This particular test actually exercises a particular restaurant called <code>Grandfather.Restaurant</code> (because it was the original restaurant that was <a href="https://en.wikipedia.org/wiki/Grandfather_clause">grandfathered in</a> when the system was expanded to a multi-tenant system). It opens at 16 and has the last seating at 21. This means that the <code>at</code> value has to be between 16 and 21. What's the best way to generate a <code>DateTime</code> value that satisfies this constraint?
</p>
<p>
You could, naively, ask FsCheck to generate an integer between these two values. You'll see how to do that when we get to the <code>quantity</code>. While that would work for the <code>at</code> value, it would only generate the whole hours <em>16:00</em>, <em>17:00</em>, <em>18:00</em>, etcetera. It would be nice if the test could also exercise times such as <em>18:30</em>, <em>20:45</em>, and so on. On the other hand, perhaps we don't want weird reservation times such as <em>17:09:23.282</em>. How do we tell FsCheck to generate a <code>DateTime</code> value like that?
</p>
<p>
It's definitely possible to do from scratch, but I chose to do something else. The following shows how test code and production code can co-exist in a symbiotic relationship. The main business logic component that deals with reservations in the system is a class called <code>MaitreD</code>. One of its methods is used to generate a list of time slots for every day. A user interface can use that list to populate a drop-down list of available times. The method is called <code>Segment</code> and can also be used as a data source for an FsCheck test data generator:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">static</span> Gen<TimeSpan> <span style="color:#74531f;">ReservationTime</span>(
Restaurant <span style="font-weight:bold;color:#1f377f;">restaurant</span>,
DateTime <span style="font-weight:bold;color:#1f377f;">date</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">slots</span> = restaurant.MaitreD
.Segment(date, Enumerable.Empty<Reservation>())
.Select(<span style="font-weight:bold;color:#1f377f;">ts</span> => ts.At.TimeOfDay);
<span style="font-weight:bold;color:#8f08c4;">return</span> Gen.Elements(slots);
}</pre>
</p>
<p>
The <code>Gen.Elements</code> function is an FsCheck combinator that randomly picks a value from a collection. This one, then, picks one of the <code>DataTime</code> values generated by <code>MaitreD.Segment</code>.
</p>
<p>
The <code>PostValidReservationWhenDatabaseIsEmpty</code> test can now use the <code>ReservationTime</code> generator to produce a time of day:
</p>
<p>
<pre>[Theory]
[InlineData(5)]
[InlineData(9)]
[InlineData(2)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">PostValidReservationWhenDatabaseIsEmpty</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">quantity</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">today</span> = DateTime.Now.Date;
Prop.ForAll(
(<span style="color:blue;">from</span> days <span style="color:blue;">in</span> ArbMap.Default.GeneratorFor<PositiveInt>()
<span style="color:blue;">from</span> t <span style="color:blue;">in</span> Gens.ReservationTime(Grandfather.Restaurant, today)
<span style="color:blue;">let</span> offset = TimeSpan.FromDays((<span style="color:blue;">int</span>)days) + t
<span style="color:blue;">from</span> r <span style="color:blue;">in</span> Gens.Reservation
<span style="color:blue;">select</span> (r, offset)).ToArbitrary(),
<span style="color:blue;">async</span> <span style="font-weight:bold;color:#1f377f;">t</span> =>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">at</span> = today + t.offset;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> FakeDatabase();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ReservationsController(
<span style="color:blue;">new</span> SystemClock(),
<span style="color:blue;">new</span> InMemoryRestaurantDatabase(Grandfather.Restaurant),
db);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = t.r
.WithQuantity(quantity)
.WithDate(at);
<span style="color:blue;">await</span> sut.Post(expected.ToDto());
Assert.Contains(expected, db.Grandfather);
}).QuickCheckThrowOnFailure();
}</pre>
</p>
<p>
Granted, the test code is getting more and more busy, but there's room for improvement. Before I simplify it, though, I think that it's more prudent to deal with the remaining literal values.
</p>
<p>
Notice that the <code>InlineData</code> attributes now only supply a single value each: The <code>quantity</code>.
</p>
<h3 id="e551e156b8344bc0bd5379084bd8a7ed">
Quantity <a href="#e551e156b8344bc0bd5379084bd8a7ed">#</a>
</h3>
<p>
Like the <code>at</code> value, the <code>quantity</code> is constrained. It must be a positive integer, but it can't be larger than the largest table in the restaurant. That number, however, isn't that hard to find:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">maxCapacity</span> = restaurant.MaitreD.Tables.Max(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Capacity);</pre>
</p>
<p>
The FsCheck API includes a function that generates a random number within a given range. It's called <code>Gen.Choose</code>, and now that we know the range, we can use it to generate the <code>quantity</code> value. Here, I'm only showing the test-data-generator part of the test, since the rest doesn't change that much. You'll see the full test again after a few more refactorings.
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">today</span> = DateTime.Now.Date;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">restaurant</span> = Grandfather.Restaurant;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">maxCapacity</span> = restaurant.MaitreD.Tables.Max(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Capacity);
Prop.ForAll(
(<span style="color:blue;">from</span> days <span style="color:blue;">in</span> ArbMap.Default.GeneratorFor<PositiveInt>()
<span style="color:blue;">from</span> t <span style="color:blue;">in</span> Gens.ReservationTime(restaurant, today)
<span style="color:blue;">let</span> offset = TimeSpan.FromDays((<span style="color:blue;">int</span>)days) + t
<span style="color:blue;">from</span> quantity <span style="color:blue;">in</span> Gen.Choose(1, maxCapacity)
<span style="color:blue;">from</span> r <span style="color:blue;">in</span> Gens.Reservation
<span style="color:blue;">select</span> (r.WithQuantity(quantity), offset)).ToArbitrary(),</pre>
</p>
<p>
There are now no more literal values in the test. In a sense, the refactoring from parametrised test to property-based test is complete. It could do with a bit of cleanup, though.
</p>
<h3 id="53494d59981c4c32b0dbbd93aa857874">
Simplification <a href="#53494d59981c4c32b0dbbd93aa857874">#</a>
</h3>
<p>
There's no longer any need to pass along the <code>offset</code> variable, and the explicit <code>QuickCheckThrowOnFailure</code> also seems a bit redundant. I can use the <code>[Property]</code> attribute from FsCheck.Xunit instead.
</p>
<p>
<pre>[Property]
<span style="color:blue;">public</span> Property <span style="font-weight:bold;color:#74531f;">PostValidReservationWhenDatabaseIsEmpty</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">today</span> = DateTime.Now.Date;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">restaurant</span> = Grandfather.Restaurant;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">maxCapacity</span> = restaurant.MaitreD.Tables.Max(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Capacity);
<span style="font-weight:bold;color:#8f08c4;">return</span> Prop.ForAll(
(<span style="color:blue;">from</span> days <span style="color:blue;">in</span> ArbMap.Default.GeneratorFor<PositiveInt>()
<span style="color:blue;">from</span> t <span style="color:blue;">in</span> Gens.ReservationTime(restaurant, today)
<span style="color:blue;">let</span> at = today + TimeSpan.FromDays((<span style="color:blue;">int</span>)days) + t
<span style="color:blue;">from</span> quantity <span style="color:blue;">in</span> Gen.Choose(1, maxCapacity)
<span style="color:blue;">from</span> r <span style="color:blue;">in</span> Gens.Reservation
<span style="color:blue;">select</span> r.WithQuantity(quantity).WithDate(at)).ToArbitrary(),
<span style="color:blue;">async</span> <span style="font-weight:bold;color:#1f377f;">expected</span> =>
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> FakeDatabase();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ReservationsController(
<span style="color:blue;">new</span> SystemClock(),
<span style="color:blue;">new</span> InMemoryRestaurantDatabase(restaurant),
db);
<span style="color:blue;">await</span> sut.Post(expected.ToDto());
Assert.Contains(expected, db.Grandfather);
});
}</pre>
</p>
<p>
Compared to the initial version of the test, it has become more top-heavy. It's about the same size, though. The original version was 30 lines of code. This version is only 26 lines of code, but it is admittedly more information-dense. The original version had more 'noise' interleaved with the 'signal'. The new variation actually has a better separation of data generation and the test itself. Consider the 'actual' test code:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> FakeDatabase();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ReservationsController(
<span style="color:blue;">new</span> SystemClock(),
<span style="color:blue;">new</span> InMemoryRestaurantDatabase(restaurant),
db);
<span style="color:blue;">await</span> sut.Post(expected.ToDto());
Assert.Contains(expected, db.Grandfather);</pre>
</p>
<p>
If we could somehow separate the data generation from the test itself, we might have something that was quite readable.
</p>
<h3 id="a23c68b065d140c588e30bd1db228879">
Extract test data generator <a href="#a23c68b065d140c588e30bd1db228879">#</a>
</h3>
<p>
The above data generation consists of a bit of initialisation and a query expression. Like all <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a> it's easy to extract:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> Gen<(Restaurant, Reservation)>
<span style="color:#74531f;">GenValidReservationForEmptyDatabase</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">today</span> = DateTime.Now.Date;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">restaurant</span> = Grandfather.Restaurant;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">capacity</span> = restaurant.MaitreD.Tables.Max(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Capacity);
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">from</span> days <span style="color:blue;">in</span> ArbMap.Default.GeneratorFor<PositiveInt>()
<span style="color:blue;">from</span> t <span style="color:blue;">in</span> Gens.ReservationTime(restaurant, today)
<span style="color:blue;">let</span> at = today + TimeSpan.FromDays((<span style="color:blue;">int</span>)days) + t
<span style="color:blue;">from</span> quantity <span style="color:blue;">in</span> Gen.Choose(1, capacity)
<span style="color:blue;">from</span> r <span style="color:blue;">in</span> Gens.Reservation
<span style="color:blue;">select</span> (restaurant, r.WithQuantity(quantity).WithDate(at));
}</pre>
</p>
<p>
While it's quite specialised, it leaves the test itself small and readable:
</p>
<p>
<pre>[Property]
<span style="color:blue;">public</span> Property <span style="font-weight:bold;color:#74531f;">PostValidReservationWhenDatabaseIsEmpty</span>()
{
<span style="font-weight:bold;color:#8f08c4;">return</span> Prop.ForAll(
GenValidReservationForEmptyDatabase().ToArbitrary(),
<span style="color:blue;">async</span> <span style="font-weight:bold;color:#1f377f;">t</span> =>
{
var (<span style="font-weight:bold;color:#1f377f;">restaurant</span>, <span style="font-weight:bold;color:#1f377f;">expected</span>) = t;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> FakeDatabase();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ReservationsController(
<span style="color:blue;">new</span> SystemClock(),
<span style="color:blue;">new</span> InMemoryRestaurantDatabase(restaurant),
db);
<span style="color:blue;">await</span> sut.Post(expected.ToDto());
Assert.Contains(expected, db[restaurant.Id]);
});
}</pre>
</p>
<p>
That's not the only way to separate test and data generation.
</p>
<h3 id="39343db1a22c4d0c93dbfb74e3af6689">
Test as implementation detail <a href="#39343db1a22c4d0c93dbfb74e3af6689">#</a>
</h3>
<p>
The above separation refactors the data-generating expression to a private helper function. Alternatively you can keep all that FsCheck infrastructure code in the public test method and extract the test body itself to a private helper method:
</p>
<p>
<pre>[Property]
<span style="color:blue;">public</span> Property <span style="font-weight:bold;color:#74531f;">PostValidReservationWhenDatabaseIsEmpty</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">today</span> = DateTime.Now.Date;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">restaurant</span> = Grandfather.Restaurant;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">capacity</span> = restaurant.MaitreD.Tables.Max(<span style="font-weight:bold;color:#1f377f;">t</span> => t.Capacity);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">g</span> = <span style="color:blue;">from</span> days <span style="color:blue;">in</span> ArbMap.Default.GeneratorFor<PositiveInt>()
<span style="color:blue;">from</span> t <span style="color:blue;">in</span> Gens.ReservationTime(restaurant, today)
<span style="color:blue;">let</span> at = today + TimeSpan.FromDays((<span style="color:blue;">int</span>)days) + t
<span style="color:blue;">from</span> quantity <span style="color:blue;">in</span> Gen.Choose(1, capacity)
<span style="color:blue;">from</span> r <span style="color:blue;">in</span> Gens.Reservation
<span style="color:blue;">select</span> (restaurant, r.WithQuantity(quantity).WithDate(at));
<span style="font-weight:bold;color:#8f08c4;">return</span> Prop.ForAll(
g.ToArbitrary(),
<span style="font-weight:bold;color:#1f377f;">t</span> => PostValidReservationWhenDatabaseIsEmptyImp(
t.restaurant,
t.Item2));
}</pre>
</p>
<p>
At first glance, that doesn't look like an improvement, but it has the advantage that the actual test method is now devoid of FsCheck details. If we use that as a yardstick for how decoupled the test is from FsCheck, this seems cleaner.
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">static</span> <span style="color:blue;">async</span> Task <span style="color:#74531f;">PostValidReservationWhenDatabaseIsEmptyImp</span>(
Restaurant <span style="font-weight:bold;color:#1f377f;">restaurant</span>, Reservation <span style="font-weight:bold;color:#1f377f;">expected</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">db</span> = <span style="color:blue;">new</span> FakeDatabase();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ReservationsController(
<span style="color:blue;">new</span> SystemClock(),
<span style="color:blue;">new</span> InMemoryRestaurantDatabase(restaurant),
db);
<span style="color:blue;">await</span> sut.Post(expected.ToDto());
Assert.Contains(expected, db[restaurant.Id]);
}</pre>
</p>
<p>
Using a property-based testing framework in C# is still more awkward than in a language with better support for monadic composition and pattern matching. That said, more recent versions of C# do have better pattern matching on tuples, but this code base is still on C# 8.
</p>
<p>
If you still think that this looks more complicated than the initial version of the test, then I agree. Property-based testing isn't free, but you get something in return. We started with four test cases and ended with 100. And that's just the default. If you want to increase the number of test cases, that's just an API call away. You could run 1,000 or 10,000 test cases if you wanted to. The only real downside is that the tests take longer to run.
</p>
<h3 id="624f8d06db274e54b76af869f1a790c5">
Unhappy paths <a href="#624f8d06db274e54b76af869f1a790c5">#</a>
</h3>
<p>
The tests above all test the happy path. A valid request arrives and the system is in a state where it can accept it. This small article series is, you may recall, a response to an email from Sergei Rogovtsev. In his email, he mentioned the need to test both happy path and various error scenarios. Let's cover a few before wrapping up.
</p>
<p>
As I was developing the system and fleshing out its behaviour, I evolved this parametrised test:
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:blue;">null</span>, <span style="color:#a31515;">"j@example.net"</span>, <span style="color:#a31515;">"Jay Xerxes"</span>, 1)]
[InlineData(<span style="color:#a31515;">"not a date"</span>, <span style="color:#a31515;">"w@example.edu"</span>, <span style="color:#a31515;">"Wk Hd"</span>, 8)]
[InlineData(<span style="color:#a31515;">"2023-11-30 20:01"</span>, <span style="color:blue;">null</span>, <span style="color:#a31515;">"Thora"</span>, 19)]
[InlineData(<span style="color:#a31515;">"2022-01-02 12:10"</span>, <span style="color:#a31515;">"3@example.org"</span>, <span style="color:#a31515;">"3 Beard"</span>, 0)]
[InlineData(<span style="color:#a31515;">"2045-12-31 11:45"</span>, <span style="color:#a31515;">"git@example.com"</span>, <span style="color:#a31515;">"Gil Tan"</span>, -1)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task <span style="font-weight:bold;color:#74531f;">PostInvalidReservation</span>(
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">at</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">email</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">name</span>,
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">quantity</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">api</span> = <span style="color:blue;">new</span> LegacyApi();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">response</span> = <span style="color:blue;">await</span> api.PostReservation(
<span style="color:blue;">new</span> { at, email, name, quantity });
Assert.Equal(HttpStatusCode.BadRequest, response.StatusCode);
}</pre>
</p>
<p>
The test body itself is about as minimal as it can be. There are four test cases that I added one or two at a time.
</p>
<ul>
<li>The first test case covers what happens if the <code>at</code> value is missing (i.e. null)</li>
<li>The next test case covers a malformed <code>at</code> value</li>
<li>The third test case covers a missing email address</li>
<li>The two last test cases covers non-positive quantities, both <em>0</em> and a negative number</li>
</ul>
<p>
It's possible to combine FsCheck generators that deal with each of these cases, but here I want to demonstrate how it's still possible to keep each error case separate, if that's what you need. First, separate the test body from its data source, like I did above:
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:blue;">null</span>, <span style="color:#a31515;">"j@example.net"</span>, <span style="color:#a31515;">"Jay Xerxes"</span>, 1)]
[InlineData(<span style="color:#a31515;">"not a date"</span>, <span style="color:#a31515;">"w@example.edu"</span>, <span style="color:#a31515;">"Wk Hd"</span>, 8)]
[InlineData(<span style="color:#a31515;">"2023-11-30 20:01"</span>, <span style="color:blue;">null</span>, <span style="color:#a31515;">"Thora"</span>, 19)]
[InlineData(<span style="color:#a31515;">"2022-01-02 12:10"</span>, <span style="color:#a31515;">"3@example.org"</span>, <span style="color:#a31515;">"3 Beard"</span>, 0)]
[InlineData(<span style="color:#a31515;">"2045-12-31 11:45"</span>, <span style="color:#a31515;">"git@example.com"</span>, <span style="color:#a31515;">"Gil Tan"</span>, -1)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task <span style="font-weight:bold;color:#74531f;">PostInvalidReservation</span>(
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">at</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">email</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">name</span>,
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">quantity</span>)
{
<span style="color:blue;">await</span> PostInvalidReservationImp(at, email, name, quantity);
}
<span style="color:blue;">private</span> <span style="color:blue;">static</span> <span style="color:blue;">async</span> Task <span style="color:#74531f;">PostInvalidReservationImp</span>(
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">at</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">email</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">name</span>,
<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">quantity</span>)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">api</span> = <span style="color:blue;">new</span> LegacyApi();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">response</span> = <span style="color:blue;">await</span> api.PostReservation(
<span style="color:blue;">new</span> { at, email, name, quantity });
Assert.Equal(HttpStatusCode.BadRequest, response.StatusCode);
}</pre>
</p>
<p>
If you consider this refactoring in isolation, it seems frivolous, but it's just preparation for further work. In each subsequent refactoring I'll convert each of the above error cases to a property.
</p>
<h3 id="d1d2ea6091c7440991e24e213c9154b6">
Missing date and time <a href="#d1d2ea6091c7440991e24e213c9154b6">#</a>
</h3>
<p>
Starting from the top, convert the reservation-at-null test case to a property:
</p>
<p>
<pre>[Property]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task <span style="font-weight:bold;color:#74531f;">PostReservationAtNull</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">email</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">name</span>, PositiveInt <span style="font-weight:bold;color:#1f377f;">quantity</span>)
{
<span style="color:blue;">await</span> PostInvalidReservationImp(<span style="color:blue;">null</span>, email, name, (<span style="color:blue;">int</span>)quantity);
}</pre>
</p>
<p>
I've left the parametrised <code>PostInvalidReservation</code> test in place, but removed the <code>[InlineData]</code> attribute with the <code>null</code> value for the <code>at</code> parameter:
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:#a31515;">"not a date"</span>, <span style="color:#a31515;">"w@example.edu"</span>, <span style="color:#a31515;">"Wk Hd"</span>, 8)]
[InlineData(<span style="color:#a31515;">"2023-11-30 20:01"</span>, <span style="color:blue;">null</span>, <span style="color:#a31515;">"Thora"</span>, 19)]
[InlineData(<span style="color:#a31515;">"2022-01-02 12:10"</span>, <span style="color:#a31515;">"3@example.org"</span>, <span style="color:#a31515;">"3 Beard"</span>, 0)]
[InlineData(<span style="color:#a31515;">"2045-12-31 11:45"</span>, <span style="color:#a31515;">"git@example.com"</span>, <span style="color:#a31515;">"Gil Tan"</span>, -1)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task <span style="font-weight:bold;color:#74531f;">PostInvalidReservation</span>(</pre>
</p>
<p>
The <code>PostReservationAtNull</code> property can use the FsCheck.Xunit <code>[Property]</code> attribute, because any <code>string</code> can be used for <code>email</code> and <code>name</code>.
</p>
<p>
To be honest, it is, perhaps, cheating a bit to post any positive quantity, because a number like, say, <em>1837</em> would be a problem even if the posted representation was well-formed and valid, since no table of the restaurant has that capacity.
</p>
<p>
Validation does, however, happen before evaluating business rules and application state, so the way the system is currently implemented, the test never fails because of that. The service never gets to that part of handling the request.
</p>
<p>
One might argue that this is relying on (and thereby coupling to) an implementation detail, but honestly, it seems unlikely that the service would begin processing an invalid request - 'invalid' implying that the request makes no sense. Concretely, if the date and time is missing from a reservation, how can the service begin to process it? On which date? At what time?
</p>
<p>
Thus, it's not that likely that this behaviour would change in the future, and therefore unlikely that the test would fail because of a policy change. It is, however, worth considering.
</p>
<h3 id="a93a77fa6f9f42b9810d37a300c1989a">
Malformed date and time <a href="#a93a77fa6f9f42b9810d37a300c1989a">#</a>
</h3>
<p>
The next error case is when the <code>at</code> value is present, but malformed. You can also convert that case to a property:
</p>
<p>
<pre>[Property]
<span style="color:blue;">public</span> Property <span style="font-weight:bold;color:#74531f;">PostMalformedDateAndTime</span>()
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">g</span> = <span style="color:blue;">from</span> at <span style="color:blue;">in</span> ArbMap.Default.GeneratorFor<<span style="color:blue;">string</span>>()
.Where(<span style="font-weight:bold;color:#1f377f;">s</span> => !DateTime.TryParse(s, <span style="color:blue;">out</span> _))
<span style="color:blue;">from</span> email <span style="color:blue;">in</span> Gens.Email
<span style="color:blue;">from</span> name <span style="color:blue;">in</span> Gens.Name
<span style="color:blue;">from</span> quantity <span style="color:blue;">in</span> Gen.Choose(1, 10)
<span style="color:blue;">select</span> (at,
email: email.ToString(),
name: name.ToString(),
quantity);
<span style="font-weight:bold;color:#8f08c4;">return</span> Prop.ForAll(
g.ToArbitrary(),
<span style="font-weight:bold;color:#1f377f;">t</span> => PostInvalidReservationImp(t.at, t.email, t.name, t.quantity));
}</pre>
</p>
<p>
Given how simple <code>PostReservationAtNull</code> turned out to be, you may be surprised that this case takes so much code to express. There's not that much going on, though. I reuse the generators I already have for <code>email</code> and <code>name</code>, and FsCheck's built-in <code>Gen.Choose</code> to pick a <code>quantity</code> between <code>1</code> and <code>10</code>. The only slightly tricky expression is for the <code>at</code> value.
</p>
<p>
The distinguishing part of this test is that the <code>at</code> value should be malformed. A randomly generated <code>string</code> is a good starting point. After all, most strings aren't well-formed date-and-time values. Still, <a href="/2016/01/18/make-pre-conditions-explicit-in-property-based-tests">a random string <em>could</em> be interpreted as a date or time, so it's better to explicitly disallow such values</a>. This is possible with the <code>Where</code> function. It's a filter that only allows values through that are <em>not</em> understandable as dates or times - which is the vast majority of them.
</p>
<h3 id="3b130aa436c44e25b0b4b9d9ccf87e9a">
Null email <a href="#3b130aa436c44e25b0b4b9d9ccf87e9a">#</a>
</h3>
<p>
The penultimate error case is when the email address is missing. That one is as easy to express as the missing <code>at</code> value.
</p>
<p>
<pre>[Property]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task <span style="font-weight:bold;color:#74531f;">PostNullEmail</span>(DateTime <span style="font-weight:bold;color:#1f377f;">at</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">name</span>, PositiveInt <span style="font-weight:bold;color:#1f377f;">quantity</span>)
{
<span style="color:blue;">await</span> PostInvalidReservationImp(at.ToIso8601DateTimeString(), <span style="color:blue;">null</span>, name, (<span style="color:blue;">int</span>)quantity);
}</pre>
</p>
<p>
Again, with the addition of this specific property, I've removed the corresponding <code>[InlineData]</code> attribute from the <code>PostInvalidReservation</code> test. It only has two remaining test cases, both about non-positive quantities.
</p>
<h3 id="c1897ba360a74189a3c49a1bf05fb46a">
Non-positive quantity <a href="#c1897ba360a74189a3c49a1bf05fb46a">#</a>
</h3>
<p>
Finally, we can add a property that checks what happens if the quantity isn't positive:
</p>
<p>
<pre>[Property]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task <span style="font-weight:bold;color:#74531f;">PostNonPositiveQuantity</span>(
DateTime <span style="font-weight:bold;color:#1f377f;">at</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">email</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">name</span>,
NonNegativeInt <span style="font-weight:bold;color:#1f377f;">quantity</span>)
{
<span style="color:blue;">await</span> PostInvalidReservationImp(at.ToIso8601DateTimeString(), email, name, -(<span style="color:blue;">int</span>)quantity);
}</pre>
</p>
<p>
FsCheck doesn't have a wrapper for non-positive integers, but I can use <code>NonNegativeInt</code> and negate it. The point is that I want to include <em>0</em>, which <code>NonNegativeInt</code> does. That wrapper generates integers greater than or equal to zero.
</p>
<p>
Since I've now modelled each error case as a separate FsCheck property, I can remove the <code>PostInvalidReservation</code> method.
</p>
<h3 id="1da5bfff9de24adba8fe5f3005f35e69">
Conclusion <a href="#1da5bfff9de24adba8fe5f3005f35e69">#</a>
</h3>
<p>
To be honest, I think that turning these parametrised tests into FsCheck properties is overkill. After all, when I wrote the code base, I found the parametrised tests adequate. I used test-driven development all the way through, and while I also kept the <a href="/2019/10/07/devils-advocate">Devil's Advocate</a> in mind, the tests that I wrote gave me sufficient confidence that the system works as it should.
</p>
<p>
The main point of this article is to show how you <em>can</em> convert example-based tests to property-based tests. After all, just because I felt confident in my test suite it doesn't follow that a few parametrised tests does it for you. <a href="/2018/11/12/what-to-test-and-not-to-test">How much testing you need depends on a variety of factors</a>, so you may need the extra confidence that thousands of test cases can give you.
</p>
<p>
The previous article in this series showed an abstract, but minimal example. This one is more realistic, but also more involved.
</p>
<p>
<strong>Next:</strong> <a href="/2023/05/01/refactoring-pure-function-composition-without-breaking-existing-tests">Refactoring pure function composition without breaking existing tests</a>.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="d7c5114a287c479db91235ded264bd55">
<div class="comment-author"><a href="https://www.relativisticramblings.com/">Christer van der Meeren</a> <a href="#d7c5114a287c479db91235ded264bd55">#</a></div>
<div class="comment-content">
<p>In the section "Missing date and time", you mention that it could be worth considering the coupling of the test to the implementation details regarding validation order and possible false positive test results. Given that you already have a test data generator that produces valid reservations (GenValidReservationForEmptyDatabase), wouldn't it be more or less trivial to just generate valid test data and modify it to make it invalid in the single specific way you want to test?</p>
</div>
<div class="comment-date">2023-04-18 14:00 UTC</div>
</div>
<div class="comment" id="114d5ded264bd7c5a912355287c479db">
<div class="comment-author"><a href="https://github.com/AnthonyLloyd">Anthony Lloyd</a> <a href="#114d5ded264bd7c5a912355287c479db">#</a></div>
<div class="comment-content">
<p>Am I right in thinking shrinking doesn't work in FsCheck with the query syntax? I've just tried with two ints. How would you make it work?</p>
<pre><code style="background-color: #eee;border: 1px solid #999;display:block;padding:5px;">[Fact]
public void ShrinkingTest()
{
Prop.ForAll(
(from a1 in Arb.Default.Int32().Generator
from a2 in Arb.Default.Int32().Generator
select (a1, a2)).ToArbitrary(),
t =>
{
if (t.a2 > 10)
throw new System.Exception();
})
.QuickCheckThrowOnFailure();
}</code></pre>
</div>
<div class="comment-date">2023-04-18 19:15 UTC</div>
</div>
<div class="comment" id="a784010d2340448ea69d4b30f82074c2">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#a784010d2340448ea69d4b30f82074c2">#</a></div>
<div class="comment-content">
<p>
Christer, thank you for writing. It wouldn't be impossible to address that concern, but I haven't found a good way of doing it without introducing other problems. So, it's a trade-off.
</p>
<p>
What I meant by my remark in the article is that in order to make an (otherwise) valid request, the test needs to know the maximum valid quantity, which varies from restaurant to restaurant. The problem, in a nutshell, is that the test in question operates exclusively against the REST API of the service, and that API doesn't expose any functionality that enable clients to query the configuration of tables for a given restaurant. There's no way to obtain that information.
</p>
<p>
The only two options I can think of are:
</p>
<ul>
<li>Add such a query API to the REST API. In this case, that seems unwarranted.</li>
<li>Add a <a href="http://xunitpatterns.com/Back%20Door%20Manipulation.html">backdoor API</a> to the self-host (<code>LegacyApi</code>).</li>
</ul>
<p>
If I had to, I'd prefer the second option, but it would still require me to add more (test) code to the code base. There's a cost to every line of code.
</p>
<p>
Here, I'm making a bet that the grandfathered restaurant isn't going to change its configuration. The tests are then written with the implicit knowledge that that particular restaurant has a maximum table size of 10, and also particular opening and closing times.
</p>
<p>
This makes those tests more concrete, which makes them more readable. They serve as easy-to-understand examples of how the system works (once the reader has gained the implicit knowledge I just described).
</p>
<p>
It's not perfect. The tests are, perhaps, too obscure for that reason, and they <em>are</em> vulnerable to configuration changes. Even so, the remedies I can think of come with their own disadvantages.
</p>
<p>
So far, I've decided that the trade-offs are best leaving things as you see them here. That doesn't mean that I wouldn't change that decision in the future if it turns out that these tests are too brittle.
</p>
</div>
<div class="comment-date">2023-04-19 8:18 UTC</div>
</div>
<div class="comment" id="1690c8861d8d44cfbe23b89613825250">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#1690c8861d8d44cfbe23b89613825250">#</a></div>
<div class="comment-content">
<p>
Anthony, thank you for writing. You're correct that in FsCheck shrinking doesn't work with query syntax; at least in the versions I've used. I'm not sure if that's planned for a future release.
</p>
<p>
As far as I can tell, this is a consequence of the maturity of the library. You have the same issue with <a href="https://hackage.haskell.org/package/QuickCheck">QuickCheck</a>, which also distinguishes between <code>Gen</code> and <code>Arbitrary</code>. While <code>Gen</code> is a monad, <code>Arbitrary</code>'s <code>shrink</code> function is <a href="/2022/08/01/invariant-functors">invariant</a>, which prevents it from being a functor (and hence, also from being a monad).
</p>
<p>
FsCheck is a mature port of QuickCheck, so it has the same limitation. No functor, no query syntax.
</p>
<p>
Later, this limitation was solved by modelling shrinking based on a lazily evaluated shrink tree, which does allow for a monad. The first time I saw that in effect was in <a href="https://hedgehog.qa/">Hedgehog</a>.
</p>
</div>
<div class="comment-date">2023-04-21 6:17 UTC</div>
</div>
<div class="comment" id="774e5ded214bd7c5a912355287c479db">
<div class="comment-author"><a href="https://github.com/AnthonyLloyd">Anthony Lloyd</a> <a href="#774e5ded214bd7c5a912355287c479db">#</a></div>
<div class="comment-content">
<p>Hedgehog does a little better than FsCheck but it doesn't shrink well when the variables are dependent.</p>
<pre><code style="background-color: #eee;border: 1px solid #999;display:block;padding:5px;">[Fact]
public void ShrinkingTest_Hedgehog()
{
Property.ForAll(
from a1 in Gen.Int32(Range.ConstantBoundedInt32())
from a2 in Gen.Int32(Range.ConstantBoundedInt32())
where a1 > a2
select (a1, a2))
.Select(t =>
{
if (t.a2 > 10)
throw new System.Exception();
})
.Check(PropertyConfig.Default.WithTests(1_000_000).WithShrinks(1_000_000));
}
[Fact]
public void ShrinkingTest_Hedgehog2()
{
Property.ForAll(
from a1 in Gen.Int32(Range.ConstantBoundedInt32())
from a2 in Gen.Int32(Range.Constant(0, a1))
select (a1, a2))
.Select(t =>
{
if (t.a2 > 10)
throw new System.Exception();
})
.Check(PropertyConfig.Default.WithTests(1_000_000).WithShrinks(1_000_000));
}
[Fact]
public void ShrinkingTest_CsCheck()
{
(from a1 in Gen.Int
from a2 in Gen.Int
where a1 > a2
select (a1, a2))
.Sample((_, a2) =>
{
if (a2 > 10)
throw new Exception();
}, iter: 1_000_000);
}
[Fact]
public void ShrinkingTest_CsCheck2()
{
(from a1 in Gen.Int.Positive
from a2 in Gen.Int[0, a1]
select (a1, a2))
.Sample((_, a2) =>
{
if (a2 > 10)
throw new Exception();
}, iter: 1_000_000);
}</code></pre>
<p>This and the syntax complexity I mentioned in the previous post were the reasons I developed CsCheck. Random shrinking is the key innovation that makes it simpler.</p>
</div>
<div class="comment-date">2023-04-21 16:38 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Anagrams kata as a one-linerhttps://blog.ploeh.dk/2023/04/10/anagrams-kata-as-a-one-liner2023-04-10T08:08:00+00:00Mark Seemann
<div id="post">
<p>
<em>A futile exercise in code compaction.</em>
</p>
<p>
Recently I was doing the <a href="http://codekata.com/kata/kata06-anagrams/">Anagrams kata</a> in <a href="https://fsharp.org/">F#</a> with <a href="https://github.com/gdziadkiewicz">Grzegorz Dziadkiewicz</a>, and along the way realised that the implementation is essentially a one-liner. I thought it would be fun to redo the exercise in <a href="https://www.haskell.org/">Haskell</a> and see how compact code I could get away with.
</p>
<p>
In short, in the exercise, you're given a list of words, and you need to find all the <a href="https://en.wikipedia.org/wiki/Anagram">anagrams</a> in the list. For example, given the list <em>bar, foo, bra</em>, the result should be <em>bar, bra</em>, and <em>foo</em> shouldn't be part of the output, since it's not an anagram of any other word in the list.
</p>
<h3 id="99c78f1d41a540128c07eb8912e4ca92">
A pipeline of transformations <a href="#99c78f1d41a540128c07eb8912e4ca92">#</a>
</h3>
<p>
My idea was to collect all the words in a <a href="https://hackage.haskell.org/package/containers/docs/Data-Map-Strict.html">Map</a> (dictionary) keyed by the string, but sorted. Even if the sorted string is a nonsense word, all anagrams sort to the same sequence of letters:
</p>
<p>
<pre>ghci> sort "bar"
"abr"
ghci> sort "bra"
"abr"</pre>
</p>
<p>
Each of the keys should contain a <a href="https://hackage.haskell.org/package/containers-0.6.7/docs/Data-Set.html">Set</a> of words, since I don't care about the order.
</p>
<p>
Once I have that map of sets, I can throw away the singleton sets, and then the keys. Or perhaps first throw away the keys, and then the singleton sets. The order of those two steps doesn't matter.
</p>
<p>
The reason I don't want the singleton sets is that a set with only one word means that no anagrams were found.
</p>
<h3 id="53831677170c455cb4624971c8e82249">
Creating the map <a href="#53831677170c455cb4624971c8e82249">#</a>
</h3>
<p>
How to create the desired map? The <code>Map</code> module exports the <a href="https://hackage.haskell.org/package/containers/docs/Data-Map-Strict.html#v:fromListWith">fromListWith</a> function that enables you to go through an <a href="https://en.wikipedia.org/wiki/Association_list">association list</a> and combine values along the way, in case you encounter the key more than once. That sounds useful, but means that first I have to convert the list of words to an association list.
</p>
<p>
Importing <a href="https://hackage.haskell.org/package/base/docs/Control-Arrow.html">Control.Arrow</a>, I can do it like this:
</p>
<p>
<pre>ghci> fmap (sort &&& Set.singleton) ["bar", "foo", "bra"]
[("abr",fromList ["bar"]),("foo",fromList ["foo"]),("abr",fromList ["bra"])]</pre>
</p>
<p>
Each element in the list is a pair of a key, and a set containing a single word. Notice that the set containing <code>"bar"</code> has the same key as the set containing <code>"bra"</code>. When using <code>fromListWith</code>, the function will have to unite these two sets whenever it encounters the same key.
</p>
<p>
<pre>ghci> Map.fromListWith Set.union $ fmap (sort &&& Set.singleton) ["bar", "foo", "bra"]
fromList [("abr",fromList ["bar","bra"]),("foo",fromList ["foo"])]</pre>
</p>
<p>
The two anagrams <code>"bar"</code> and <code>"bra"</code> now belong to the same set, while <code>"foo"</code> is still solitary.
</p>
<h3 id="bbd82971c2324018973182c27d613a64">
Finding the anagrams <a href="#bbd82971c2324018973182c27d613a64">#</a>
</h3>
<p>
Now that we've grouped sets according to key, we no longer need the keys:
</p>
<p>
<pre>ghci> Map.elems $ Map.fromListWith Set.union $ fmap (sort &&& Set.singleton) ["bar", "foo", "bra"]
[fromList ["bar","bra"],fromList ["foo"]]</pre>
</p>
<p>
The anagrams are those sets that have more than one element, so we can throw away those that are smaller.
</p>
<p>
<pre>ghci> filter ((1 <) . Set.size) $ Map.elems $ Map.fromListWith Set.union $
fmap (sort &&& Set.singleton) ["bar", "foo", "bra"]
[fromList ["bar","bra"]]</pre>
</p>
<p>
The expression has now grown to such a width that I've broken it into two lines to make it more readable. It really is just one line, though.
</p>
<h3 id="203f41225d944d1689cf1df0d3f8118f">
Function <a href="#203f41225d944d1689cf1df0d3f8118f">#</a>
</h3>
<p>
To save a bit of space, I eta-reduced the expression before I made it a function:
</p>
<p>
<pre><span style="color:#2b91af;">anagrams</span> <span style="color:blue;">::</span> <span style="color:blue;">Ord</span> a <span style="color:blue;">=></span> [[a]] <span style="color:blue;">-></span> <span style="color:blue;">Set</span> (<span style="color:blue;">Set</span> [a])
anagrams =
Set.fromList . <span style="color:blue;">filter</span> ((1 <) . Set.size) . Map.elems . Map.fromListWith Set.union
. <span style="color:blue;">fmap</span> (sort &&& Set.singleton)</pre>
</p>
<p>
The leftmost <code>Set.fromList</code> converts the list of anagrams to a <code>Set</code> of anagrams, since I didn't think that it was a postcondition that the anagrams should be returned in a specific order.
</p>
<p>
Unfortunately the expression is still so wide that I found it necessary to break it into two lines.
</p>
<p>
Just for the hell of it, I tried to fix the situation by changing the imports:
</p>
<p>
<pre><span style="color:blue;">import</span> Control.Arrow
<span style="color:blue;">import</span> Data.List (<span style="color:#2b91af;">sort</span>)
<span style="color:blue;">import</span> Data.Map.Strict (<span style="color:#2b91af;">fromListWith</span>, <span style="color:#2b91af;">elems</span>)
<span style="color:blue;">import</span> Data.Set (<span style="color:blue;">Set</span>, <span style="color:#2b91af;">fromList</span>, <span style="color:#2b91af;">singleton</span>)</pre>
</p>
<p>
With this very specific set of imports, the expression now barely fits on a single line:
</p>
<p>
<pre><span style="color:#2b91af;">anagrams</span> <span style="color:blue;">::</span> <span style="color:blue;">Ord</span> a <span style="color:blue;">=></span> [[a]] <span style="color:blue;">-></span> <span style="color:blue;">Set</span> (<span style="color:blue;">Set</span> [a])
anagrams = fromList . <span style="color:blue;">filter</span> ((1 <) . <span style="color:blue;">length</span>) . elems . fromListWith <span style="color:#2b91af;">(<>)</span> . <span style="color:blue;">fmap</span> (sort &&& singleton)</pre>
</p>
<p>
Here, I also took advantage of <code>Semigroup</code> <em>append</em> (<code><></code>) being equal to <code>Set.union</code> for <code>Set</code>.
</p>
<p>
Is it readable? Hardly.
</p>
<p>
My main goal with the exercise was to implement the desired functionality as a single expression. Perhaps I was inspired by Dave Thomas, who wrote:
</p>
<blockquote>
<p>
"I hacked a solution together in 25 lines of Ruby."
</p>
<footer><cite><a href="http://codekata.com/kata/kata06-anagrams/">Dave Thomas</a></cite></footer>
</blockquote>
<p>
25 lines of Ruby? I can do it in one line of Haskell!
</p>
<p>
Is that interesting? Does it make sense to compare two languages? Why not? By trying out different languages you learn the strengths and weaknesses of each. There's no denying that Haskell is <em>expressive</em>. On the other hand, what you can't see in this blog post is that compilation takes forever. Not for this code in particular, but in general.
</p>
<p>
I'm sure Dave Thomas was done with his Ruby implementation before my machine had even finished compiling the empty, scaffolded Haskell code.
</p>
<h3 id="3ee6ff0610174f3ea5293baaefbc9ee3">
Performance <a href="#3ee6ff0610174f3ea5293baaefbc9ee3">#</a>
</h3>
<p>
Dave Thomas also wrote:
</p>
<blockquote>
<p>
"It runs on <a href="http://codekata.com/data/wordlist.txt">this wordlist</a> in 1.8s on a 1.7GHz i7."
</p>
</blockquote>
<p>
Usually I don't care that much about performance as long as it's adequate. Or rather, I find that good software architecture with poor algorithms usually beats bad architecture with good algorithms. But I digress.
</p>
<p>
How fares my one-liner against Dave Thomas' implementation?
</p>
<p>
<pre>ghci> :set +s
ghci> length . anagrams . lines <$> readFile "wordlist.txt"
20683
(3.56 secs, 1,749,984,448 bytes)</pre>
</p>
<p>
Oh, 3.56 seconds isn't particularly g <em>Holy thunk, Batman! 1.7 gigabytes!</em>
</p>
<p>
That's actually disappointing, I admit. If only I could excuse this by running on a weaker machine, but mine is a 1.9 GHz i7. Nominally faster than Dave Thomas' machine.
</p>
<p>
At least, the time it takes to run through that 3.7 MB file is the same order of magnitude.
</p>
<h3 id="224925799e714bb1922f9c41a68207a3">
Tests <a href="#224925799e714bb1922f9c41a68207a3">#</a>
</h3>
<p>
Since I had a good idea about the kind of implementation I was aiming for, I didn't write that many tests. Only three, actually.
</p>
<p>
<pre><span style="color:#2b91af;">main</span> <span style="color:blue;">::</span> <span style="color:#2b91af;">IO</span> ()
main = defaultMain $ hUnitTestToTests (TestList [
<span style="color:#a31515;">"Examples"</span> ~: <span style="color:blue;">do</span>
(<span style="color:blue;">words</span>, expected) <-
[
([<span style="color:#a31515;">"foo"</span>, <span style="color:#a31515;">"bar"</span>, <span style="color:#a31515;">"baz"</span>], Set.empty),
([<span style="color:#a31515;">"bar"</span>, <span style="color:#a31515;">"foo"</span>, <span style="color:#a31515;">"bra"</span>], Set.fromList [Set.fromList [<span style="color:#a31515;">"bar"</span>, <span style="color:#a31515;">"bra"</span>]]),
([<span style="color:#a31515;">"foo"</span>, <span style="color:#a31515;">"bar"</span>, <span style="color:#a31515;">"bra"</span>, <span style="color:#a31515;">"oof"</span>],
Set.fromList [
Set.fromList [<span style="color:#a31515;">"foo"</span>, <span style="color:#a31515;">"oof"</span>], Set.fromList [<span style="color:#a31515;">"bar"</span>, <span style="color:#a31515;">"bra"</span>]])
]
<span style="color:blue;">let</span> actual = anagrams <span style="color:blue;">words</span>
<span style="color:blue;">return</span> $ expected ~=? actual
])</pre>
</p>
<p>
As I usually do it in Haskell, these are <a href="/2018/05/07/inlined-hunit-test-lists">inlined</a> <a href="/2018/04/30/parametrised-unit-tests-in-haskell">parametrised HUnit tests</a>.
</p>
<h3 id="3b4f477f18954c5095d48f450f935092">
Conclusion <a href="#3b4f477f18954c5095d48f450f935092">#</a>
</h3>
<p>
<a href="/2020/01/13/on-doing-katas">Doing katas</a> is a good way to try out new ideas, dumb or otherwise. Implementing the Anagrams kata as a one-liner was fun, but the final code shown here is sufficiently unreadable that I wouldn't recommend it.
</p>
<p>
You could still write the <code>anagrams</code> function based on the idea presented here, but in a shared code base with an indefinite life span, I'd break it up into multiple expressions with descriptive names.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.An abstract example of refactoring from interaction-based to property-based testinghttps://blog.ploeh.dk/2023/04/03/an-abstract-example-of-refactoring-from-interaction-based-to-property-based-testing2023-04-03T06:02:00+00:00Mark Seemann
<div id="post">
<p>
<em>A C# example with xUnit.net and CsCheck</em>
</p>
<p>
This is the first comprehensive example that accompanies the article <a href="/2023/02/13/epistemology-of-interaction-testing">Epistemology of interaction testing</a>. In that article, I argue that in a code base that leans toward functional programming (FP), property-based testing is a better fit than interaction-based testing. In this example, I will show how to refactor simple interaction-based tests into a property-based tests.
</p>
<p>
This small article series was prompted by an email from Sergei Rogovtsev, who was kind enough to furnish <a href="https://github.com/srogovtsev/mocks-in-tests">example code</a>. I'll use his code as a starting point for this example, so I've <a href="https://github.com/ploeh/mocks-in-tests">forked the repository</a>. If you want to follow along, all my work is in a branch called <em>no-mocks</em>. That branch simply continues off the <em>master</em> branch.
</p>
<h3 id="5fe789d2b85b449fa2677bbcbf095bfc">
Interaction-based testing <a href="#5fe789d2b85b449fa2677bbcbf095bfc">#</a>
</h3>
<p>
Sergei Rogovtsev writes:
</p>
<blockquote>
<p>
"A major thing to point out here is that I'm not following TDD here not by my own choice, but because my original question arose in a context of a legacy system devoid of tests, so I choose to present it to you in the same way. I imagine that working from tests would avoid a lot of questions."
</p>
</blockquote>
<p>
Even when using test-driven development (TDD), most code bases I've seen make use of <a href="http://xunitpatterns.com/Test%20Stub.html">Stubs</a> and <a href="http://xunitpatterns.com/Mock%20Object.html">Mocks</a> (or, rather, <a href="http://xunitpatterns.com/Test%20Spy.html">Spies</a>). In an object-oriented context this can make much sense. After all, a catch phrase of object-oriented programming is <em>tell, don't ask</em>.
</p>
<p>
If you base API design on that principle, you're modelling side effects, and it makes sense that tests use Spies to verify those side effects. The book <a href="/ref/goos">Growing Object-Oriented Software, Guided by Tests</a> is a good example of this approach. Thus, even if you follow established good TDD practice, you could easily arrive at a code base reminiscent of Sergei Rogovtsev's example. I've written plenty of such code bases myself.
</p>
<p>
Sergei Rogovtsev then extracts a couple of components, leaving him with a <code>Controller</code> class looking like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">string</span> <span style="font-weight:bold;color:#74531f;">Complete</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">state</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">code</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">knownState</span> = _repository.GetState(state);
<span style="font-weight:bold;color:#8f08c4;">try</span>
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (_stateValidator.Validate(code, knownState))
<span style="font-weight:bold;color:#8f08c4;">return</span> _renderer.Success(knownState);
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> _renderer.Failure(knownState);
}
<span style="font-weight:bold;color:#8f08c4;">catch</span> (Exception <span style="font-weight:bold;color:#1f377f;">e</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> _renderer.Error(knownState, e);
}
}</pre>
</p>
<p>
This code snippet doesn't show the entire class, but only its solitary action method. Keep in mind that the entire repository is available on GitHub if you want to see the surrounding code.
</p>
<p>
The <code>Complete</code> method orchestrates three injected dependencies: <code>_repository</code>, <code>_stateValidator</code>, and <code>_renderer</code>. The question that Sergei Rogovtsev asks is how to test this method. You may think that it's so simple that you don't need to test it, but keep in mind that this is a <a href="https://en.wikipedia.org/wiki/Minimal_reproducible_example">minimal and self-contained example</a> that stands in for something more complicated.
</p>
<p>
The method has a cyclomatic complexity of <em>3</em>, so <a href="/2019/12/09/put-cyclomatic-complexity-to-good-use">you need at least three test cases</a>. That's also what Sergei Rogovtsev's code contains. I'll show each test case in turn, while I refactor them.
</p>
<p>
The overall question is still this: Both <code>IStateValidator</code> and <code>IRenderer</code> interfaces have only a single production implementation, and in both cases the implementations are <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a>. If interaction-based testing is suboptimal, is there a better way to test this code?
</p>
<p>
As I outlined in <a href="/2023/02/13/epistemology-of-interaction-testing">the introductory article</a>, I consider property-based testing a good alternative. In the following, I'll refactor the tests. Since the tests already use <a href="https://github.com/AutoFixture/AutoFixture">AutoFixture</a>, most of the preliminary work can be done without choosing a property-based testing framework. I'll postpone that decision until I need it.
</p>
<h3 id="ca095000df004ef9b7cb7c022071f76b">
State validator <a href="#ca095000df004ef9b7cb7c022071f76b">#</a>
</h3>
<p>
The <code>IStateValidator</code> interface has a single implementation:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">StateValidator</span> : IStateValidator
{
<span style="color:blue;">public</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">Validate</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">code</span>, (<span style="color:blue;">string</span> expectedCode, <span style="color:blue;">bool</span> isMobile, Uri redirect) <span style="font-weight:bold;color:#1f377f;">knownState</span>)
=> code == knownState.expectedCode;
}</pre>
</p>
<p>
The <code>Validate</code> method is a pure function, so it's completely deterministic. It means that you don't have to hide it behind an interface and replace it with a <a href="https://martinfowler.com/bliki/TestDouble.html">Test Double</a> in order to control it. Rather, just feed it proper data. Still, that's not what the interaction-based tests do:
</p>
<p>
<pre>[Theory]
[AutoData]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">HappyPath</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">state</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">code</span>, (<span style="color:blue;">string</span>, <span style="color:blue;">bool</span>, Uri) <span style="font-weight:bold;color:#1f377f;">knownState</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">response</span>)
{
_repository.Add(state, knownState);
_stateValidator
.Setup(<span style="font-weight:bold;color:#1f377f;">validator</span> => validator.Validate(code, knownState))
.Returns(<span style="color:blue;">true</span>);
_renderer
.Setup(<span style="font-weight:bold;color:#1f377f;">renderer</span> => renderer.Success(knownState))
.Returns(response);
_target
.Complete(state, code)
.Should().Be(response);
}</pre>
</p>
<p>
These tests use AutoFixture, which will make it a bit easier to refactor them to properties. It also makes the test a bit more abstract, since you don't get to see concrete test data. In short, the <code>[AutoData]</code> attribute will generate a random <code>state</code> string, a random <code>code</code> string, and so on. If you want to see an example with concrete test data, the next article shows that variation.
</p>
<p>
The test uses <a href="https://github.com/moq/moq4">Moq</a> to control the behaviour of the Test Doubles. It states that the <code>Validate</code> method will return <code>true</code> when called with certain arguments. This is possible because you can redefine its behaviour, but as far as executable specifications go, this test doesn't reflect reality. There's only one <code>Validate</code> implementation, and it doesn't behave like that. Rather, it'll return <code>true</code> when <code>code</code> is equal to <code>knownState.expectedCode</code>. The test poorly communicates that behaviour.
</p>
<p>
Even before I replace AutoFixture with CsCheck, I'll prepare the test by making it more honest. I'll replace the <code>code</code> parameter with a <a href="http://xunitpatterns.com/Derived%20Value.html">Derived Value</a>:
</p>
<p>
<pre>[Theory]
[AutoData]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">HappyPath</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">state</span>, (<span style="color:blue;">string</span>, <span style="color:blue;">bool</span>, Uri) <span style="font-weight:bold;color:#1f377f;">knownState</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">response</span>)
{
var (<span style="font-weight:bold;color:#1f377f;">expectedCode</span>, <span style="color:blue;">_</span>, <span style="color:blue;">_</span>) = knownState;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">code</span> = expectedCode;
<span style="color:green;">// The rest of the test...</span></pre>
</p>
<p>
I've removed the <code>code</code> parameter to replace it with a variable derived from <code>knownState</code>. Notice how this <em>documents</em> the overall behaviour of the (sub-)system.
</p>
<p>
This also means that I can now replace the <code>IStateValidator</code> Test Double with the real, pure implementation:
</p>
<p>
<pre>[Theory]
[AutoData]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">HappyPath</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">state</span>, (<span style="color:blue;">string</span>, <span style="color:blue;">bool</span>, Uri) <span style="font-weight:bold;color:#1f377f;">knownState</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">response</span>)
{
var (<span style="font-weight:bold;color:#1f377f;">expectedCode</span>, <span style="color:blue;">_</span>, <span style="color:blue;">_</span>) = knownState;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">code</span> = expectedCode;
_repository.Add(state, knownState);
_renderer
.Setup(<span style="font-weight:bold;color:#1f377f;">renderer</span> => renderer.Success(knownState))
.Returns(response);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> Controller(_repository, <span style="color:blue;">new</span> StateValidator(), _renderer.Object);
sut
.Complete(state, code)
.Should().Be(response);
}</pre>
</p>
<p>
I give the <code>Failure</code> test case the same treatment:
</p>
<p>
<pre>[Theory]
[AutoData]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Failure</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">state</span>, (<span style="color:blue;">string</span>, <span style="color:blue;">bool</span>, Uri) <span style="font-weight:bold;color:#1f377f;">knownState</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">response</span>)
{
var (<span style="font-weight:bold;color:#1f377f;">expectedCode</span>, <span style="color:blue;">_</span>, <span style="color:blue;">_</span>) = knownState;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">code</span> = expectedCode + <span style="color:#a31515;">"1"</span>; <span style="color:green;">// Any extra string will do</span>
_repository.Add(state, knownState);
_renderer
.Setup(<span style="font-weight:bold;color:#1f377f;">renderer</span> => renderer.Failure(knownState))
.Returns(response);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> Controller(_repository, <span style="color:blue;">new</span> StateValidator(), _renderer.Object);
sut
.Complete(state, code)
.Should().Be(response);
}</pre>
</p>
<p>
The third test case is a bit more interesting.
</p>
<h3 id="aa150cf1b817416ea49e30cdc8b27ac7">
An impossible case <a href="#aa150cf1b817416ea49e30cdc8b27ac7">#</a>
</h3>
<p>
Before I make any changes to it, the third test case is this:
</p>
<p>
<pre>[Theory]
[AutoData]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Error</span>(
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">state</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">code</span>,
(<span style="color:blue;">string</span>, <span style="color:blue;">bool</span>, Uri) <span style="font-weight:bold;color:#1f377f;">knownState</span>,
Exception <span style="font-weight:bold;color:#1f377f;">e</span>,
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">response</span>)
{
_repository.Add(state, knownState);
_stateValidator
.Setup(<span style="font-weight:bold;color:#1f377f;">validator</span> => validator.Validate(code, knownState))
.Throws(e);
_renderer
.Setup(<span style="font-weight:bold;color:#1f377f;">renderer</span> => renderer.Error(knownState, e))
.Returns(response);
_target
.Complete(state, code)
.Should().Be(response);
}</pre>
</p>
<p>
This test case verifies the behaviour of the <code>Controller</code> class when the <code>Validate</code> method throws an exception. If we want to instead use the real, pure implementation, how can we get it to throw an exception? Consider it again:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">Validate</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">code</span>, (<span style="color:blue;">string</span> expectedCode, <span style="color:blue;">bool</span> isMobile, Uri redirect) <span style="font-weight:bold;color:#1f377f;">knownState</span>)
=> code == knownState.expectedCode;</pre>
</p>
<p>
As far as I can tell, there's no way to get this method to throw an exception. You might suggest passing <code>null</code> as the <code>knownState</code> parameter, but that's not possible. This is a new version of C# and the <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/builtin-types/nullable-reference-types">nullable reference types</a> feature is turned on. I spent some fifteen minutes trying to convince the compiler to pass a <code>null</code> argument in place of <code>knownState</code>, but I couldn't make it work in a unit test.
</p>
<p>
That's interesting. The <code>Error</code> test is exercising a code path that's impossible in production. Is it redundant?
</p>
<p>
It might be, but here I think that it's more an artefact of the process. Sergei Rogovtsev has provided a minimal example, and as it sometimes happens, perhaps it's a bit too minimal. He did write, however, that he considered it essential for the example that the logic involved more that an Boolean <em>true/false</em> condition. In order to keep with the spirit of the example, then, I'm going to modify the <code>Validate</code> method so that it's also possible to make it throw an exception:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">bool</span> <span style="font-weight:bold;color:#74531f;">Validate</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">code</span>, (<span style="color:blue;">string</span> expectedCode, <span style="color:blue;">bool</span> isMobile, Uri redirect) <span style="font-weight:bold;color:#1f377f;">knownState</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (knownState == <span style="color:blue;">default</span>)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> ArgumentNullException(nameof(knownState));
<span style="font-weight:bold;color:#8f08c4;">return</span> code == knownState.expectedCode;
}</pre>
</p>
<p>
The method now throws an exception if you pass it a <code>default</code> value for <code>knownState</code>. From an implementation standpoint, there's no reason to do this, so it's only for the sake of the example. You can now test how the <code>Controller</code> handles an exception:
</p>
<p>
<pre>[Theory]
[AutoData]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">Error</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">state</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">code</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">response</span>)
{
_repository.Add(state, <span style="color:blue;">default</span>);
_renderer
.Setup(<span style="font-weight:bold;color:#1f377f;">renderer</span> => renderer.Error(<span style="color:blue;">default</span>, It.IsAny<Exception>()))
.Returns(response);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> Controller(_repository, <span style="color:blue;">new</span> StateValidator(), _renderer.Object);
sut
.Complete(state, code)
.Should().Be(response);
}</pre>
</p>
<p>
The test no longer has a reference to the specific <code>Exception</code> object that <code>Validate</code> is going to throw, so instead it has to use Moq's <code>It.IsAny</code> API to configure the <code>_renderer</code>. This is, however, only an interim step, since it's now time to treat that dependency in the same way as the validator.
</p>
<h3 id="389a212d80a4445c99917cbd15b043eb">
Renderer <a href="#389a212d80a4445c99917cbd15b043eb">#</a>
</h3>
<p>
The <code>Renderer</code> class has three methods, and they are all pure functions:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Renderer</span> : IRenderer
{
<span style="color:blue;">public</span> <span style="color:blue;">string</span> <span style="font-weight:bold;color:#74531f;">Success</span>((<span style="color:blue;">string</span> expectedCode, <span style="color:blue;">bool</span> isMobile, Uri redirect) <span style="font-weight:bold;color:#1f377f;">knownState</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (knownState.isMobile)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#a31515;">"{\"success\": true, \"redirect\": \""</span> + knownState.redirect + <span style="color:#a31515;">"\"}"</span>;
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#a31515;">"302 Location: "</span> + knownState.redirect;
}
<span style="color:blue;">public</span> <span style="color:blue;">string</span> <span style="font-weight:bold;color:#74531f;">Failure</span>((<span style="color:blue;">string</span> expectedCode, <span style="color:blue;">bool</span> isMobile, Uri redirect) <span style="font-weight:bold;color:#1f377f;">knownState</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (knownState.isMobile)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#a31515;">"{\"success\": false, \"redirect\": \"login\"}"</span>;
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#a31515;">"302 Location: login"</span>;
}
<span style="color:blue;">public</span> <span style="color:blue;">string</span> <span style="font-weight:bold;color:#74531f;">Error</span>((<span style="color:blue;">string</span> expectedCode, <span style="color:blue;">bool</span> isMobile, Uri redirect) <span style="font-weight:bold;color:#1f377f;">knownState</span>, Exception <span style="font-weight:bold;color:#1f377f;">e</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (knownState.isMobile)
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#a31515;">"{\"error\": \""</span> + e.Message + <span style="color:#a31515;">"\"}"</span>;
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:#a31515;">"500"</span>;
}
}</pre>
</p>
<p>
Since all three methods are deterministic, automated tests can control their behaviour simply by passing in the appropriate arguments:
</p>
<p>
<pre>[Theory]
[AutoData]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">HappyPath</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">state</span>, (<span style="color:blue;">string</span>, <span style="color:blue;">bool</span>, Uri) <span style="font-weight:bold;color:#1f377f;">knownState</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">response</span>)
{
var (<span style="font-weight:bold;color:#1f377f;">expectedCode</span>, <span style="color:blue;">_</span>, <span style="color:blue;">_</span>) = knownState;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">code</span> = expectedCode;
_repository.Add(state, knownState);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">renderer</span> = <span style="color:blue;">new</span> Renderer();
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> Controller(_repository, renderer);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = renderer.Success(knownState);
sut
.Complete(state, code)
.Should().Be(expected);
}</pre>
</p>
<p>
Instead of configuring an <code>IRenderer</code> Stub, the test can state the <code>expected</code> output: That the output is equal to the output that <code>renderer.Success</code> would return.
</p>
<p>
Notice that the test doesn't <em>require</em> that the implementation calls <code>renderer.Success</code>. It only requires that the output is equal to the output that <code>renderer.Success</code> would return. Thus, it has less of an opinion about the implementation, which means that it's marginally less coupled to it.
</p>
<p>
You might protest that the test now duplicates the implementation code. This is partially true, but no more than the previous incarnation of it. Before, the test used Moq to explicitly <em>require</em> that <code>renderer.Success</code> gets called. Now, there's still coupling, but this refactoring reduces it.
</p>
<p>
As a side note, this may partially be an artefact of the process. Here I'm refactoring tests while keeping the implementation intact. Had I started with a property, perhaps the test would have turned out differently, and less coupled to the implementation. If you're interested in a successful exercise in using property-based TDD, you may find my article <a href="/2021/06/28/property-based-testing-is-not-the-same-as-partition-testing">Property-based testing is not the same as partition testing</a> interesting.
</p>
<h3 id="81bbcfc59f5c48df8ee15f9d1bad47cf">
Simplification <a href="#81bbcfc59f5c48df8ee15f9d1bad47cf">#</a>
</h3>
<p>
Once you've refactored the tests to use the pure functions as dependencies, you no longer need the interfaces. The interfaces <code>IStateValidator</code> and <code>IRenderer</code> only existed to support testing. Now that the tests no longer use the interfaces, you can delete them.
</p>
<p>
Furthermore, once you've removed those interfaces, there's no reason for the classes to support instantiation. Instead, make them <code>static</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">StateValidator</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">bool</span> <span style="color:#74531f;">Validate</span>(
<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">code</span>,
(<span style="color:blue;">string</span> expectedCode, <span style="color:blue;">bool</span> isMobile, Uri redirect) <span style="font-weight:bold;color:#1f377f;">knownState</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (knownState == <span style="color:blue;">default</span>)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> ArgumentNullException(nameof(knownState));
<span style="font-weight:bold;color:#8f08c4;">return</span> code == knownState.expectedCode;
}
}</pre>
</p>
<p>
You can do the same for the <code>Renderer</code> class.
</p>
<p>
This doesn't change the overall <em>flow</em> of the <code>Controller</code> class' <code>Complete</code> method, although the implementation details have changed a bit:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">string</span> <span style="font-weight:bold;color:#74531f;">Complete</span>(<span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">state</span>, <span style="color:blue;">string</span> <span style="font-weight:bold;color:#1f377f;">code</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">knownState</span> = _repository.GetState(state);
<span style="font-weight:bold;color:#8f08c4;">try</span>
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (StateValidator.Validate(code, knownState))
<span style="font-weight:bold;color:#8f08c4;">return</span> Renderer.Success(knownState);
<span style="font-weight:bold;color:#8f08c4;">else</span>
<span style="font-weight:bold;color:#8f08c4;">return</span> Renderer.Failure(knownState);
}
<span style="font-weight:bold;color:#8f08c4;">catch</span> (Exception <span style="font-weight:bold;color:#1f377f;">e</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> Renderer.Error(knownState, e);
}
}</pre>
</p>
<p>
<code>StateValidator</code> and <code>Renderer</code> are no longer injected dependencies, but rather 'modules' that affords pure functions.
</p>
<p>
Both the <code>Controller</code> class and the tests that cover it are simpler.
</p>
<h3 id="984cd44e8f6545ddafa673a411e188f1">
Properties <a href="#984cd44e8f6545ddafa673a411e188f1">#</a>
</h3>
<p>
So far I've been able to make all these changes without introducing a property-based testing framework. This was possible because the tests already used AutoFixture, which, while not a property-based testing framework, already strongly encourages you to write tests without literal test data.
</p>
<p>
This makes it easy to make the final change to property-based testing. On the other hand, it's a bit unfortunate from a pedagogical perspective. This means that you didn't get to see how to refactor a 'traditional' unit test to a property. The next article in this series will plug that hole, as well as show a more realistic example.
</p>
<p>
It's now time to pick a property-based testing framework. On .NET you have a few choices. Since this code base is C#, you may consider a framework written in C#. I'm not convinced that this is necessarily better, but it's a worthwhile experiment. Here I've used <a href="https://github.com/AnthonyLloyd/CsCheck">CsCheck</a>.
</p>
<p>
Since the tests already used randomly generated test data, the conversion to CsCheck is relatively straightforward. I'm only going to show one of the tests. You can always find the rest of the code in the Git repository.
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">HappyPath</span>()
{
(<span style="color:blue;">from</span> state <span style="color:blue;">in</span> Gen.String
<span style="color:blue;">from</span> expectedCode <span style="color:blue;">in</span> Gen.String
<span style="color:blue;">from</span> isMobile <span style="color:blue;">in</span> Gen.Bool
<span style="color:blue;">let</span> urls = <span style="color:blue;">new</span>[] { <span style="color:#a31515;">"https://example.com"</span>, <span style="color:#a31515;">"https://example.org"</span> }
<span style="color:blue;">from</span> redirect <span style="color:blue;">in</span> Gen.OneOfConst(urls).Select(<span style="font-weight:bold;color:#1f377f;">s</span> => <span style="color:blue;">new</span> Uri(s))
<span style="color:blue;">select</span> (state, (expectedCode, isMobile, redirect)))
.Sample((<span style="font-weight:bold;color:#1f377f;">state</span>, <span style="font-weight:bold;color:#1f377f;">knownState</span>) =>
{
var (<span style="font-weight:bold;color:#1f377f;">expectedCode</span>, <span style="color:blue;">_</span>, <span style="color:blue;">_</span>) = knownState;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">code</span> = expectedCode;
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">repository</span> = <span style="color:blue;">new</span> RepositoryStub();
repository.Add(state, knownState);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">sut</span> = <span style="color:blue;">new</span> Controller(repository);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">expected</span> = Renderer.Success(knownState);
sut
.Complete(state, code)
.Should().Be(expected);
});
}</pre>
</p>
<p>
Compared to the AutoFixture version of the test, this looks more complicated. Part of it is that CsCheck (as far as I know) doesn't have the same <a href="https://www.nuget.org/packages/AutoFixture.Xunit2/">integration with xUnit.net that AutoFixture has</a>. That might be an issue that someone could address; after all, <a href="https://fscheck.github.io/FsCheck/RunningTests.html">FsCheck has framework integration</a>, to name an example.
</p>
<p>
<a href="/2023/02/13/epistemology-of-interaction-testing">Test data generators are monads</a> so you typically leverage whatever syntactic sugar a language offers to simplify monadic composition. <a href="/2022/03/28/monads">In C# that syntactic sugar is query syntax</a>, which explains that initial <code>from</code> block.
</p>
<p>
The test does look too top-heavy for my taste. An equivalent problem appears in the next article, where I also try to address it. In general, the better monad support a language offers, the more elegantly you can address this kind of problem. C# isn't really there yet, whereas languages like <a href="https://fsharp.org/">F#</a> and <a href="https://www.haskell.org/">Haskell</a> offer superior alternatives.
</p>
<h3 id="b0e7738d8b0440bba33caa470fd440fb">
Conclusion <a href="#b0e7738d8b0440bba33caa470fd440fb">#</a>
</h3>
<p>
In this article I've tried to demonstrate how property-based testing is a viable alternative to using Stubs and Mocks for verification of composition. You can try to sabotage the <code>Controller.Complete</code> method in the <em>no-mocks</em> branch and see that one or more properties will fail.
</p>
<p>
While the example code base that I've used for this article has the strength of being small and self-contained, it also suffers from a few weaknesses. It's perhaps a bit too abstract to truly resonate. It also uses AutoFixture to generate test data, which already takes it halfway towards property-based testing. While that makes the refactoring easier, it also means that it may not fully demonstrate how to refactor an example-based test to a property. I'll try to address these shortcomings in the next article.
</p>
<p>
<strong>Next:</strong> <a href="/2023/04/17/a-restaurant-example-of-refactoring-from-example-based-to-property-based-testing">A restaurant example of refactoring from example-based to property-based testing</a>.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="0510331b953a4c308c3b01a918d2b65c">
<div class="comment-author"><a href="https://github.com/srogovtsev">Sergei Rogovtcev</a> <a href="#0510331b953a4c308c3b01a918d2b65c">#</a></div>
<div class="comment-content">
<p>First of all, thanks again for continuing to explore this matter. This was very enlightening, but in the end, I was left with is a sense of subtle wrongness, which is very hard to pin down, and even harder to tell apart between "this is actually not right for me" and "this is something new I'm not accustomed to".</p>
<p>I suppose that my main question would center around difference between your tests for <code>IStateValidator</code> and <code>IRenderer</code>. Let's start with the latter:</p>
<blockquote><p>Instead of configuring an <code>IRenderer</code> Stub, the test can state the expected output: That the output is equal to the output that <code>renderer.Success</code> would return.</p></blockquote>
<p>Coupled with the explanation ("[the test] has less of an opinion about the implementation, which means that it's marginally less coupled to it"), this makes a lot of sense, with the only caveat that in more production-like cases comparing the output would be harder (e.g., if <code>IRenderer</code> operates on <code>HttpContext</code> to produce a full HTTP response), but that's a technicality that can be sorted out with proper assertion library. But let's now look at the <code>IStateValidator</code> part:</p>
<blockquote><p>as far as executable specifications go, this test doesn't reflect reality. There's only one <code>Validate</code> implementation, and it doesn't behave like that. Rather, it'll return true when code is equal to <code>knownState.expectedCode</code>. The test poorly communicates that behaviour. </p></blockquote>
<p>Here you act with the opposite intent: you want the test to communicate the specification, and thus be explicitly tied to the logic in the implementation (if not the actual code of it). There are two thing about that that bother me. First of all, it's somewhat inconsistent, so it makes it harder for me to choose which path to follow when testing the next code I'd write (or articulating to another developer how they should do it). But what's more important - and that comes from my example being <em>minimal</em>, as you've already noted - is that the validation logic might be more complicated, and thus the setup would be complicated as well. And as you've already mentioned on Twitter, when chaging the code in the validator implementation, you might be forced to change the implementation in the test, even if the test is more about the controller itself.</p>
<p>There's also another frame for the same issue: the original test read as (at least for me): "if the state is valid, we return successful response based on this state". It didn't matter what is "valid" not did it matter what is "successful response". The new tests reads as "if state in the repository matches passed code, we return successful response for the state". It still doesn't matter what is "successful response", but the definition of validity <em>does</em> matter. For me, this is a change of test meaning, and I'm not sure I understand where that leads me.</p>
<p>Let's consider the following scenario: we need to add another validity criteria, such as "state in repository has an expiration date, and this date should be in the future". We obviously need to add a couple of tests for this (negative and positive). Where do we add them? I'd say we add them into the tests for the validator itself (which are "not shown" for the purposes of brevity), but then it feels weird that we <em>also</em> need to change this "happy path" test...</p>
</div>
<div class="comment-date">2023-04-03 21:24 UTC</div>
</div>
<div class="comment" id="1a4c308c3b01a85c1b953d2051033b69">
<div class="comment-author"><a href="https://github.com/AnthonyLloyd">Anthony Lloyd</a> <a href="#1a4c308c3b01a85c1b953d2051033b69">#</a></div>
<div class="comment-content">
<p>Thanks for showing CsCheck. I've put in a PR to show how I would refactor the CsCheck tests and will attempt to explain some of the design choices of CsCheck.</p>
<p>First of all, it may be a personal opinion but I don't really tend to use query syntax for CsCheck. I prefer to see the SelectManys and there are a number of additional overloads that simplify ranging and composing Gens.</p>
<p>On the design of CsCheck, I build it to not use reflection, attributes or target test frameworks. I've seen the very difficult problems these lead to (for author and user) when you try to move past simple examples.</p>
<p>I wanted the user to be able to move from simple general type generators to ranged complex types easily in a fluent style. No Arbs, attributes, PositiveInt type etc.</p>
<p>CsCheck has automatic shrinking even for the most complex types that just comes out of composing Gens.</p>
<p>I think some of the reason it was so easy to extend the library to areas such as concurrency testing was because of this simplicity (as well as the random shrinking insight).</p>
<pre><code style="background-color: #eee;border: 1px solid #999;display:block;padding:5px;">Gen<Uri> _genRedirect = Gen.OneOfConst(new Uri("https://example.com"), new Uri("https://example.org"));
[Fact]
public void HappyPath()
{
Gen.Select(Gen.String, Gen.String, Gen.Bool, _genRedirect)
.Sample((state, expectedCode, isMobile, redirect) =>
{
var code = expectedCode;
var knownState = (expectedCode, isMobile, redirect);
var repository = new RepositoryStub { { state, knownState } };
var sut = new Controller(repository);
var actual = sut.Complete(state, code);
var expected = Renderer.Success(knownState);
return actual == expected;
});
}</code></pre>
</div>
<div class="comment-date">2023-04-04 23:56 UTC</div>
</div>
<div class="comment" id="076f05eef3f649eab07392dc2ac83023">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#076f05eef3f649eab07392dc2ac83023">#</a></div>
<div class="comment-content">
<p>
Sergei, thank you for writing. I'm afraid all of this is context-dependent, and I seem to constantly fail giving enough context. It's a fair criticism that I seem to employ inconsistent heuristics when making technical decisions. Part of it is caused by my lack of context. The code base is deliberately stripped of context, which has many other benefits, but gives me little to navigate after. I'm flying blind, so to speak. I've had to (implicitly) imagine some forces acting on the software in order to make technical decisions. Since we haven't explicitly outlined such forces, I've had to make them up as I went. It's quite possible, even, that I've imagined one set of forces in one place, and another set somewhere else. If so, no wonder the decisions are inconsistent.
</p>
<p>
What do I mean by <em>forces?</em> I'm thinking of the surrounding context that you have to take into a account when making technical decisions about code: Is the current feature urgent? Is it a bug already in production? Is this a new system with a small code base? Or is it an old system with a large code base? Do you have good automated tests? Do you have a Continuous Delivery pipeline? Are you experiencing problems with code quality? What does the team makeup look like? Do you have mostly seasoned veterans who've worked in that code base for years? Or do you have many newcomers? Is the system public-facing or internal? Is it a system, even, or a library or framework? What sort of organisation owns the software? Is it a product group? Or a cost centre? What is the organization's goals? How are you incentivized? How are other stakeholders incentivized?
</p>
<p>
As you can imagine, I can keep going, asking questions like these, and they may all be relevant. Clearly, we can't expect a self-contained minimal example to also include all such superstructure, so that's what I (inconsistently) have to imagine, on the fly.
</p>
<p>
I admit that the decisions I describe seem inconsistent, and the explanation may simply be what is already implied above: I may have had a different context in mind when I made one, and a variation in mind when I made the other.
</p>
<p>
That's hardly the whole story, though. I didn't start my answer with the above litany of forces only to make a bad excuse for myself. Rather, what I had in mind was to argue that I use a wider context when making decisions. That context is not just technical, but includes, among many other considerations, the team structure.
</p>
<p>
As an example, I was recently working with some students in a university setting. These are people in their early twenties, with only a few months of academic programming under their belt, as well as, perhaps, a few years of hobby programming. They'd just been introduced to Git and GitHub a few weeks earlier, C# a month before that. I was trying to teach them how to use Git and GitHub, how to structure decent C# code, and many other things. During our project, they did send me a few pull requests I would have immediately rejected from a professional programmer. In this particular context, however, that would have been counter-productive. These students were doing a good job, based on their level of experience, and they needed the sense of accomplishment that I would (often, but not always) accept their code.
</p>
<p>
I could have insisted on a higher code quality, and I would also have been able to teach it to anyone patient enough to listen. One thing I've learned all too slowly in my decades of working with other people is that most people aren't as patient with me as I'd like them to be. I need to explicitly consider how to motivate my collaborators.
</p>
<p>
Here's another example: Years ago, I worked with a rag-tag team hastily assembled via word-of-mouth of some fine European freelancers. My challenge here was another. These people were used to be on top of their game - usually the ones brought in to an organisation because they were the best. I needed them to work together, and among other things, it meant showing them that even though they might think that their way was the best way, other ways exist. I wanted them to be able to work together and produce code with shared ownership. At the beginning, I was rather strict with my standards, clearly bruising a few egos, but ultimately several members have told what a positive transformative experience it was for them. It was a positive transformative experience for me, too.
</p>
<p>
I discuss all of this because you, among various points, mention the need to be able to articulate to other developers how to make technical decisions about tests. The point is that there's a lot of context that goes into making decisions, and hardly a one-size-fits-all heuristic.
</p>
<p>
What <em>usually</em> guides me is an emphasis on <em>coupling</em>, and that's also, I believe, what ultimately motivated me here. There's always going to be <em>some</em> coupling between tests and production code, but the less the better. For example, when considering whether how to write an assertion, I consider whether a change in production code's behaviour would cause a test to break.
</p>
<p>
Consider, for example, the <code>renderer</code> in the present example. How important is the exact output? What happens if I change a character in the string that is being returned?
</p>
<p>
That's a good example of context being important. If that output is part of an implementation of a network protocol or some other technical spec, just one character change could, indeed, imply that your implementation is off spec. In that case, we do want to test the exact output, and we do want the test to fail if the output changes.
</p>
<p>
On the other hand, if the output is a piece of UI, or perhaps an error message, then the exact wording is likely to change over time. Since this doesn't really imply a change in <em>behaviour</em>, changing such a string output shouldn't break a test.
</p>
<p>
You need that wider context in order to make decisions like that: If we change the System Under Test in this way, will the test break? Should it? What if we change it in another way?
</p>
<p>
This is relevant in order to address your final concern: What if you now decide that the expiration date should be in the future? The way you describe it, it sounds like this strengthens the preconditions of the system - in other words, it <a href="/2021/12/13/backwards-compatibility-as-a-profunctor">breaks backwards compatibility</a>. So yes, making that change may break existing tests, but this could be an indication that it's also going to break existing clients.
</p>
<p>
<em>If</em> you have any clients, that is. Again, you know your context better than I do, so only you can decide whether making such a change is okay. I can think of situations where is, but I usually find myself in contexts where it isn't, so I tend to err on the side of avoiding breaking changes.
</p>
</div>
<div class="comment-date">2023-04-09 14:28 UTC</div>
</div>
<div class="comment" id="f2fe826964404792954ddaa4155c7be6">
<div class="comment-author"><a href="https://github.com/srogovtsev">Sergei Rogovtcev</a> <a href="#f2fe826964404792954ddaa4155c7be6">#</a></div>
<div class="comment-content">
<p>Mark, thank you for taking the time to discuss this.</p>
<p>Having the magic word "architect" somewhere in my title, I know the importance of context, and in fact that would usually be the first counter-question that I have for somebody coming at me with a question: "what is your context?". So here we are, me being contractually obliged to strip as much context from my code as possible, and you having to reinvent it back from your experience. On the other hand, this allows us to point out which decisions are actually context-driven, and how different contexts affect them.</p>
<p>With that in mind, I can actually propose two different contexts to reframe the decisions above, so that we could arrive at more insights.</p>
<p>The first would be an imaginary context, which I had in mind when writing the code, but haven't thought of communicating: the <code>renderer</code> is as important as the <code>validator</code>. In case of "mobile" state the consumer is actually a mobile application, so we need to know we've produced the right JSON it will consume, and in case of non-"mobile" state the consumer is the browser, which again needs to be properly redirected. In my mind, this is no less important than the validation logic itself, because breaking it will break at least one consumer (mobile), and more likely both of them. Thus, according to the logic above, this is a compatibility issue, and as such we need to explicitly spell this behavior in the tests. Which gives us six outcome branches... six tests? Or something more complicated? This is especially interesting, considering the fact that we can test the <code>renderer</code> in isolation, so we'd be either duplicating our tests... or just just discarding the isolated tests for the <code>renderer</code>?</p>
<p>And then here's the actual real context, which I can thankfully divulge to this extent: this is, in fact, a <em>migration</em> problem, when we move from one externally-facing framework (i.e. ASP.NET Web API) to another (i.e. ASP.NET Core). So I am not, in fact, concerned about the validation at all - I'm concerned about the data being properly <em>passed</em> to the validator (because the validator already existed, and worked properly, I'm just calling it from another framework), its result properly handled in the controller (which I am replacing), and then I'm concerned that despite the heavy changes between ASP.NET versions I'm still rendering the output in the exactly same way.</p>
<p>Now that I'm thinking about it, it seems grossly unfair that I've hidden this context beforehand, but then, I didn't see how it was affecting the decisions in test design. So hopefully we can still find some use in this.</p>
</div>
<div class="comment-date">2023-04-10 18:03 UTC</div>
</div>
<div class="comment" id="82b07a802bd54507afb0aa90bfd827c8">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#82b07a802bd54507afb0aa90bfd827c8">#</a></div>
<div class="comment-content">
<p>
Anthony, thank you for writing. I didn't intend to disparage CsCheck. Having once run <a href="https://github.com/AutoFixture/AutoFixture/">a moderately successful open source project</a>, I've learned that it's a good idea to limit the scope to a functional minimum. I'm not complaining about the design decisions made for CsCheck.
</p>
<p>
I was a bit concerned that a casual reader might compare the CsCheck example with the previous code and be put off by the apparent complexity. I admit that your example looks less complex than mine, so to address that particular concern, I could have used code similar to yours.
</p>
<p>
Whether one prefers the syntactic sugar of query syntax or the more explicit use of <code>SelectMany</code> is, to a degree, subjective. There are, on the other hand, some cases where one is objectively better than the other. One day I should write an article about that.
</p>
<p>
I agree that the test-framework integration that exists in FsCheck is less than ideal. I'm not wishing for anything like that. Something declarative, however, would be nice. Contrary to you, I consider wrapper types like <code>PositiveInt</code> to be a boon, but perhaps not like they're implemented in FsCheck. (And this, by the way, isn't to harp on FsCheck either; I only bring up those examples because FsCheck has more of that kind of API than Hedgehog does.) The Haskell <a href="https://hackage.haskell.org/package/QuickCheck">QuickCheck</a> approach is nice: While a wrapper like <a href="https://hackage.haskell.org/package/QuickCheck/docs/Test-QuickCheck.html#t:Positive">Positive</a> is predefined in the package, it's just an <code>Arbitrary</code> instance. There's really nothing special about it, and you can easily define domain-specific wrappers for your own testing purposes. Here's an example: <a href="/2019/09/02/naming-newtypes-for-quickcheck-arbitraries">Naming newtypes for QuickCheck Arbitraries</a>. I'm wondering if something similar wouldn't be possible with interfaces in .NET.
</p>
</div>
<div class="comment-date">2023-04-13 8:46 UTC</div>
</div>
<div class="comment" id="ee233e2d67954197afcaa98d7637fda1">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#ee233e2d67954197afcaa98d7637fda1">#</a></div>
<div class="comment-content">
<p>
Sergei, thank you for writing. I'm sorry if I came across as condescending or implying that you weren't aware that context matters. Of course you do. I was mostly trying to reconstruct my own decision process, which is far less explicit than you might like.
</p>
<p>
Regarding the renderer component, I understand that testing such a thing in reality may be more involved than the toy example we're looking at. My first concern is to avoid duplicating efforts too much. Again, however, the external behaviour should be the primary concern. I'm increasingly <a href="/2021/03/01/pendulum-swing-internal-by-default">shifting towards making more and more things internal</a>, as long as I can still test them via the application boundary. As I understand it, this is the same concern that <a href="https://dannorth.net/introducing-bdd/">made Dan North come up with behaviour-driven development</a>. I'm also concerned primarily with testing the behaviour of systems, just without all the Gherkin overhead.
</p>
<p>
There comes a time, however, where testing everything through an external API becomes too awkward. I'm not adverse to introduce or make public classes or functions to enable testing at a different abstraction level. Doing that, however, represents a bet that it's possible to keep the new component's API stable enough that it isn't going to cause too much test churn. Still, if we imagine that we've already made such a decision, and that we now have some renderer component, then it's only natural to test it thoroughly. Then, in order to avoid duplicating assertions, we can state, as I did in this article, that the overall system should expect to see whatever the renderer component returns.
</p>
<p>
That was, perhaps, too wordy an explanation. Perhaps this is more helpful: Don't repeat yourself. What has been asserted in one place shouldn't be asserted in another place.
</p>
<p>
The other example you mention, about migrating to another framework, reminds me of two things.
</p>
<p>
The first is that we shouldn't forget about other ways of verifying code correctness. We've mostly been discussing black-box testing, and while it can be <a href="/2019/10/07/devils-advocate">an interesting exercise to imagine an adversary developer</a>, in general <a href="/2023/03/20/on-trust-in-software-development">that's rarely the reality</a>. Are there other ways to verify that methods are called with the correct values? How about <em>looking</em> at the code? Consistent code reviews are good at detecting bugs.
</p>
<p>
The second observation is that if you already have two working components (validator and renderer) you can treat them as <a href="https://en.wikipedia.org/wiki/Test_oracle">Test Oracles</a>. This still works well with property-based testing. Write tests based on equivalence classes and get a property-based testing framework to sample from those classes. Then use the Test Oracles to define the expected output. That's essentially what I have done in this article.
</p>
<p>
Does it <em>prove</em> that your framework-based code calls the components with the correct arguments? No, not like if you'd used a Test Spy. Property-based testing produces knowledge reminiscent of the kind of knowledge produced by experimental physics; not the kind of axiomatic knowledge produced by mathematics. That's why I named this article series <em>Epistemology of interaction testing.</em>
</p>
<p>
Is it wrong to test with Stubs or Spies in a case like this? Not necessarily. Ultimately, what I try to do with this blog is to investigate and present alternatives. Only once we have alternatives do we have choices.
</p>
</div>
<div class="comment-date">2023-04-15 15:53 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.More functional pits of successhttps://blog.ploeh.dk/2023/03/27/more-functional-pits-of-success2023-03-27T05:36:00+00:00Mark Seemann
<div id="post">
<p>
<em>FAQ: What are the other pits of successes of functional programming?</em>
</p>
<p>
People who have seen my presentation <a href="https://youtu.be/US8QG9I1XW0">Functional architecture: the pits of success</a> occasionally write to ask: <em>What are the other pits?</em>
</p>
<p>
The talk is about some of the design goals that we often struggle with in object-oriented programming, but which tend to happen automatically in functional programming (FP). In the presentation I cover three pits of success, but I also mention that there are more. In a one-hour conference presentation, I simply didn't have time to discuss more than three.
</p>
<p>
It's a natural question, then, to ask what are the pits of success that I don't cover in the talk?
</p>
<p>
I've been digging through my notes and found the following:
</p>
<ul>
<li>Parallelism</li>
<li>Ports and adapters</li>
<li>Services, Entities, Value Objects</li>
<li>Testability</li>
<li>Composition</li>
<li>Package and component principles</li>
<li>CQS</li>
<li>Encapsulation</li>
</ul>
<p>
Finding a lost list like this, more than six years after I jotted it down, presents a bit of a puzzle to me, too. In this post, I'll see if I can reconstruct some of the points.
</p>
<h3 id="08cc3545b4324d0ab76e8f38d6367324">
Parallelism <a href="#08cc3545b4324d0ab76e8f38d6367324" title="permalink">#</a>
</h3>
<p>
When most things are immutable you don't have to worry about multiple threads updating the same shared resource. Much has already been said and written about this in the context of functional programming, to the degree that for some people, it's the main (or only?) reason to adopt FP.
</p>
<p>
Even so, I had (and still have) a version of the presentation that included this advantage. When I realised that I had to cut some content for time, it was easy to cut this topic in favour of other benefits. After all, this one was already well known in 2016.
</p>
<h3 id="243b258ac0ad4d8fa517bcbc7f2b5e7a">
Ports and adapters <a href="#243b258ac0ad4d8fa517bcbc7f2b5e7a" title="permalink">#</a>
</h3>
<p>
This was one of the three benefits I kept in the talk. I've also covered it on my blog in the article <a href="/2016/03/18/functional-architecture-is-ports-and-adapters">Functional architecture is Ports and Adapters</a>.
</p>
<h3 id="18060ca17da341148f9027a290f06652">
Services, Entities, Value Objects <a href="#18060ca17da341148f9027a290f06652" title="permalink">#</a>
</h3>
<p>
This was the second topic I included in the talk. I don't think that there's an explicit article here on the blog that deals with this particular subject matter, so if you want the details, you'll have to view the recording.
</p>
<p>
In short, though, what I had in mind was that <a href="/ref/ddd">Domain-Driven Design</a> explicitly distinguishes between Services, Entities, and Value Objects, which often seems to pull in the opposite direction of the object-oriented notion of <em>data with behaviour</em>. In FP, on the contrary, it's natural to separate data from behaviour. Since behaviour often implements business logic, and since business logic tends to change at a different rate than data, it's a good idea to keep them apart.
</p>
<h3 id="17d84bcee48a42a989760327b779bf59">
Testability <a href="#17d84bcee48a42a989760327b779bf59" title="permalink">#</a>
</h3>
<p>
The third pit of success I covered in the talk was testability. I've also covered this here on the blog: <a href="/2015/05/07/functional-design-is-intrinsically-testable">Functional design is intrinsically testable</a>. <a href="https://en.wikipedia.org/wiki/Pure_function">Pure functions</a> are trivial to test: Supply some input and verify the output.
</p>
<h3 id="d2ac6cdd43794ade93ad664d72ed1ea1">
Composition <a href="#d2ac6cdd43794ade93ad664d72ed1ea1" title="permalink">#</a>
</h3>
<p>
Pure functions compose. In the simplest case, use the return value from one function as input for another function. In more complex cases, you may need various combinators in order to be able to 'click' functions together.
</p>
<p>
I don't have a single article about this. Rather, I have scores: <a href="/2017/10/04/from-design-patterns-to-category-theory">From design patterns to category theory</a>.
</p>
<h3 id="c3137baaeb50413e9caa4ebf2acde8e3">
Package and component principles <a href="#c3137baaeb50413e9caa4ebf2acde8e3" title="permalink">#</a>
</h3>
<p>
When it comes to this one, I admit, I no longer remember what I had in mind. Perhaps I was thinking about <a href="https://fsharpforfunandprofit.com/posts/cycles-and-modularity-in-the-wild/">Scott Wlaschin's studies of cycles and modularity</a>. Perhaps I did, again, have <a href="/2016/03/18/functional-architecture-is-ports-and-adapters">my article about Ports and Adapters</a> in mind, or perhaps it was my later articles on <a href="/2017/02/02/dependency-rejection">dependency rejection</a> that already stirred.
</p>
<h3 id="3554fe1eca684683bb793df1b1d23a87">
CQS <a href="#3554fe1eca684683bb793df1b1d23a87" title="permalink">#</a>
</h3>
<p>
The <a href="https://en.wikipedia.org/wiki/Command%E2%80%93query_separation">Command Query Separation</a> principle states that an operation (i.e. a method) should be either a Command or a Query, but not both. In most programming languages, the onus is on you to maintain that discipline. It's not a design principle that comes easy to most object-oriented programmers.
</p>
<p>
Functional programming, on the other hand, emphasises pure functions, which are all Queries. Commands have side effects, and pure functions can't have side effects. <a href="https://www.haskell.org/">Haskell</a> even makes sure to type-check that pure functions don't perform side effects. If you're interested in a C# explanation of how that works, see <a href="/2020/06/08/the-io-container">The IO Container</a>.
</p>
<p>
What impure actions remain after you've implemented most of your code base as pure functions can violate or follow CQS as you see fit. I usually still follow that principle, but since the impure parts of a functional code base tends to be fairly small and isolated to the edges of the application, even if you decide to violate CQS there, it probably makes little difference.
</p>
<h3 id="da267445bdae46bd97249acb0ea12246">
Encapsulation <a href="#da267445bdae46bd97249acb0ea12246" title="permalink">#</a>
</h3>
<p>
Functional programmers don't talk much about <em>encapsulation</em>, but you'll often hear them say that we should <a href="https://blog.janestreet.com/effective-ml-video/">make illegal states unrepresentable</a>. I recently wrote an article that explains that this tends to originate from the same motivation: <a href="/2022/10/24/encapsulation-in-functional-programming">Encapsulation in Functional Programming</a>.
</p>
<p>
In languages like <a href="https://fsharp.org/">F#</a> and Haskell, most type definitions require a single line of code, in contrast to object-oriented programming where types are normally classes, which take up a whole file each. This makes it much easier and succinct to define <a href="https://www.hillelwayne.com/post/constructive/">constructive data</a> in proper FP languages.
</p>
<p>
Furthermore, perhaps most importantly, pure functions are <a href="https://en.wikipedia.org/wiki/Referential_transparency">referentially transparent</a>, and <a href="/2021/07/28/referential-transparency-fits-in-your-head">referential transparency fits in your head</a>.
</p>
<h3 id="f89e19457a344cfc9bd2c38e5d5c5f2b">
Conclusion <a href="#f89e19457a344cfc9bd2c38e5d5c5f2b" title="permalink">#</a>
</h3>
<p>
In a recording of the talk titled <em>Functional architecture: the pits of success</em> I explain that the presentation only discusses three pits of success, but that there are more. Consulting my notes, I found five more that I didn't cover. I've now tried to remedy this lapse.
</p>
<p>
I don't, however, believe that this list is exhaustive. Why should it be?
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.On trust in software developmenthttps://blog.ploeh.dk/2023/03/20/on-trust-in-software-development2023-03-20T08:55:00+00:00Mark Seemann
<div id="post">
<p>
<em>Can you trust your colleagues to write good code? Can you trust yourself?</em>
</p>
<p>
I've recently noticed a trend among some agile thought leaders. They talk about <em>trust</em> and <em>gatekeeping</em>. It goes something like this:
</p>
<blockquote>
<p>
Why put up barriers to prevent people from committing code? Don't you trust your colleagues?
</p>
<p>
Gated check-ins, pull requests, reviews are a sign of a dysfunctional organisation.
</p>
</blockquote>
<p>
I'm deliberately paraphrasing. While I could cite multiple examples, I wish to engage with the idea rather than the people who propose it. Thus, I apologise for the seeming use of <a href="https://en.wikipedia.org/wiki/Weasel_word">weasel words</a> in the above paragraph, but my agenda is the opposite of appealing to anonymous authority.
</p>
<p>
If someone asks me: "Don't you trust your colleagues?", my answer is:
</p>
<p>
No, I don't trust my colleagues, as I don't trust myself.
</p>
<h3 id="25501f07623946a1925a83af5b5198a3">
Framing <a href="#25501f07623946a1925a83af5b5198a3">#</a>
</h3>
<p>
I don't trust myself to write defect-free code. I don't trust that I've always correctly understood the requirements. I don't trust that I've written the code in the best possible way. Why should I trust my colleagues to be superhumanly perfect?
</p>
<p>
The <em>trust</em> framing is powerful because few people like to be labeled as <em>mistrusting</em>. When asked <em>"don't you trust your colleagues?"</em> you don't want to answer in the affirmative. You don't want to come across as suspicious or paranoid. You want to <em>belong</em>.
</p>
<p>
<a href="https://en.wikipedia.org/wiki/Belongingness">The need to belong is fundamental to human nature</a>. When asked if you trust your colleagues, saying "no" implicitly disassociates you from the group.
</p>
<p>
Sometimes the trust framing goes one step further and labels processes such as code reviews or pull requests as <em>gatekeeping</em>. This is still the same framing, but now turns the group dynamics around. Now the question isn't whether <em>you</em> belong, but whether you're excluding others from the group. Most people (me included) want to be nice people, and excluding other people is bullying. Since you don't want to be a bully, you don't want to be a gatekeeper.
</p>
<p>
Framing a discussion about software engineering as one of trust and belonging is powerful and seductive. You're inclined to accept arguments made from that position, and you may not discover the sleight of hand. It's subliminal.
</p>
<p>
Most likely, it's such a fundamental and subconscious part of human psychology that the thought leaders who make the argument don't realise what they are doing. Many of them are professionals that I highly respect; people with more merit, experience, and education than I have. I don't think they're deliberately trying to put one over you.
</p>
<p>
I do think, on the other hand, that this is an argument to be challenged.
</p>
<h3 id="1a98f4b52b824194b23b5bb836efacb7">
Two kinds of trust <a href="#1a98f4b52b824194b23b5bb836efacb7">#</a>
</h3>
<p>
On the surface, the trust framing seems to be about belonging, or its opposite, exclusion. It implies that if you don't trust your co-workers, you suspect them of malign intent. Organisational dysfunction, it follows, is a Hobbesian state of nature where everyone is out for themselves: Expect your colleague to be a back-stabbing liar out to get you.
</p>
<p>
Indeed, the word <em>trust</em> implies that, too, but that's usually not the reason to introduce guardrails and checks to a software engineering process.
</p>
<p>
Rather, another fundamental human characteristic is <em>fallibility</em>. We make mistakes in all sorts of way, and we don't make them from malign intent. We make them because we're human.
</p>
<p>
Do we trust our colleagues to make no mistakes? Do we trust that our colleagues have perfect knowledge of requirement, goals, architecture, coding standards, and so on? I don't, just as I don't trust myself to have those qualities.
</p>
<p>
This interpretation of <em>trust</em> is, I believe, better aligned with software engineering. If we institute formal sign-offs, code reviews, and other guardrails, it's not that we suspect co-workers of ill intent. Rather, we're trying to prevent mistakes.
</p>
<h3 id="020b7202b31d421baa9e612ca168f767">
Two wrongs... <a href="#020b7202b31d421baa9e612ca168f767">#</a>
</h3>
<p>
That's not to say that all guardrails are necessary all of the time. The thought leaders I so vaguely refer to will often present alternatives: Pair programming instead of pull requests. Indeed, that can be an efficient and confidence-inducing way to work, in certain contexts. I describe advantages as well as disadvantages in my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>.
</p>
<p>
I've warned about the <em>trust framing</em>, but that doesn't mean that pull requests, code reviews, gated check-ins, or feature branches are always a good idea. Just because one argument is flawed it does't mean that the message is wrong. It could be correct for other reasons.
</p>
<p>
I agree with the likes of <a href="https://martinfowler.com/">Martin Fowler</a> and <a href="https://www.davefarley.net/">Dave Farley</a> that feature branching is a bad idea, and that you should adopt <a href="https://en.wikipedia.org/wiki/Continuous_delivery">Continuous Delivery</a>. <a href="/ref/accelerate">Accelerate</a> strongly suggests that.
</p>
<p>
I also agree that pull requests and formal reviews with sign-offs, <em>as they're usually practised</em>, is at odds with even <a href="https://en.wikipedia.org/wiki/Continuous_integration">Continuous Integration</a>. Again, be aware of common pitfalls in logic. Just because one way to do reviews is counter-productive, it doesn't follow that all reviews are bad.
</p>
<p>
As I have outlined in another article, under the right circumstances, <a href="/2021/06/21/agile-pull-requests">agile pull requests</a> are possible. I've had good result with pull requests like that. Reviewing was everyone's job, and we integrated multiple times a day.
</p>
<p>
Is that way to work always possible? Is it always the best way to work? No, of course not. Context matters. I've worked with another team where it was evident that that process had little chance of working. On the other hand, that team wasn't keen on pair programming either. Then what do you do?
</p>
<h3 id="1c3ff2a877214b76934d3817f24c1318">
Mistakes were made <a href="#1c3ff2a877214b76934d3817f24c1318">#</a>
</h3>
<p>
I rarely have reason to believe that co-workers have malign intent. When we are working together towards a common goal, I trust that they have as much interest in reaching that goal as I have.
</p>
<p>
Does that trust mean that everyone is free to do whatever they want? Of course not. Even with the best of intentions, we make mistakes, there are misunderstandings, or we have incomplete information.
</p>
<p>
This is one among several reasons I practice test-driven development (TDD). Writing a test before implementation code catches many mistakes early in the process. In this context, the point is that I don't trust myself to be perfect.
</p>
<p>
Even with TDD and the best of intentions, there are other reasons to look at other people's work.
</p>
<p>
Last year, I did some freelance programming for a customer, and sometimes I would receive feedback that a function I'd included in a pull request already existed in the code base. I didn't have that knowledge, but the review caught it.
</p>
<p>
Could we have caught that with pair or ensemble programming? Yes, that would work too. There's more than one way to make things work, and they tend to be context-dependent.
</p>
<p>
If everyone on a team have the luxury of being able to work together, then pair or ensemble programming is an efficient way to coordinate work. Little extra process may be required, because everyone is already on the same page.
</p>
<p>
If team members are less fortunate, or have different preferences, they may need to rely on the <a href="/2023/02/20/a-thought-on-workplace-flexibility-and-asynchrony">flexibility offered by asynchrony</a>. This doesn't mean that you can't do Continuous Delivery, even with pull requests and code reviews, but <a href="/2020/03/16/conways-law-latency-versus-throughput">the trade-offs are different</a>.
</p>
<h3 id="8cf2f27540de4461826bfe2b9ef88405">
Conclusion <a href="#8cf2f27540de4461826bfe2b9ef88405">#</a>
</h3>
<p>
There are many good reasons to be critical of code reviews, pull requests, and other processes that seem to slow things down. The lack of trust in co-workers is, however, not one of them.
</p>
<p>
You can easily be swayed by that argument because it touches something deep in our psyche. We want to be trusted, and we want to trust our colleagues. We want to <em>belong</em>.
</p>
<p>
The argument is visceral, but it misrepresents the motivation for process. We don't review code because we believe that all co-workers are really North Korean agents looking to sneak in security breaches if we look away.
</p>
<p>
We look at each other's work because it's human to make mistakes. If we can't all be in the same office at the same time, fast but asynchronous reviews also work.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="f2faef409ab74134938a6ae77404ec43">
<div class="comment-author"><a href="https://about.me/tysonwilliams">Tyson Williams</a> <a href="#f2faef409ab74134938a6ae77404ec43">#</a></div>
<div class="comment-content">
<p>
This reminds me of <a href="https://en.wikipedia.org/wiki/Hanlon%27s_razor">Hanlon's razor</a>.
</p>
<blockquote>
Never attribute to malice that which is adequately explained by stupidity.
</blockquote>
</div>
<div class="comment-date">2023-03-23 23:29 UTC</div>
</div>
<div class="comment" id="c46f7f22b22941f6a24858a40093424c">
<div class="comment-author"><a href="mailto:danielfrostdk@outlook.com">Daniel Frost</a> <a href="#c46f7f22b22941f6a24858a40093424c">#</a></div>
<div class="comment-content">
<p>
Too much technical attibution is still being presented on a whole, into a field which
is far more of a social and psycological construct. What's even worse is the evidence around organisation and team capabilities is pretty clear towards what makes a good team and organisation.
<br><br>
The technical solutions for reaching trust or beloningness is somewhat a menas to an end. They can't stand alone because it only takes two humans to side track them.
<br><br>
therefore I still absolutely believe that the technical parts of software engineering is by far the less demanding. As technical work most often still are done as individual contributions
based on loose requirements, equally loose leadership and often non-existing enforcement from tooling and scattered human ownership. Even worse subjective perceptions.
That I believe on the other hand has everything to do with trust, gatekeeping, belonging and other human psycology needs.
</p>
</div>
<div class="comment-date">2023-03-27 11:29 UTC</div>
</div>
<div class="comment" id="ebc68487f13d470a92260d91a890c5ad">
<div class="comment-author"><a href="https://www.linkedin.com/in/jakub-kwasniewski/">Jakub Kwaśniewski</a> <a href="#ebc68487f13d470a92260d91a890c5ad">#</a></div>
<div class="comment-content">
<p>
I like to look at it from the other side - I ask my colleagues for reviewing my code, because I trust them.
I trust they can help me make it better and it's in all of us interests to make it better - as soon as the code gets in to the main branch, it's not my code anymore - it's teams code.
</p>
<p>
Also, not tightly connected to the topic of code reviews and pull requests, but still about trust in software development, I had this general thought about trusting yourself in software programming:
<em>Don't trust your past self. Do trust your future self.</em><br>
Don't trust your past self in the meaning: don't hesitate to refactor your code. If you've written the code a while ago, even with the best intentions of making it readable, as you normally have, but now you read it and you don't understand it - don't make too much effort into trying to understand it. If you don't understand it, others won't as well. Rewrite it to make it better.<br>
Do trust your future self in the meaning: don't try to be smarter than you are now. If you don't know all the requirements or answers to all the questions, don't guess them when you don't have to. Don't create an abstraction from one use case you have - in future you will have more use cases and you will be able to create a better abstraction - trust me, or actually trust yourself. I guess it's just YAGNI rephrased ;)
</p>
</div>
<div class="comment-date">2023-05-16 21:22 UTC</div>
</div>
<div class="comment" id="b234787e00144b28ad897fc039820ef4">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#b234787e00144b28ad897fc039820ef4">#</a></div>
<div class="comment-content">
<p>
Jakub, thank you for writing. I don't think that I have much to add. More people should use refactoring as a tool for enhancing understanding when they run into incomprehensible code. Regardless of who originally wrote that code.
</p>
</div>
<div class="comment-date">2023-06-13 7:36 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Confidence from Facade Testshttps://blog.ploeh.dk/2023/03/13/confidence-from-facade-tests2023-03-13T07:15:00+00:00Mark Seemann
<div id="post">
<p>
<em>Recycling an old neologism of mine, I try to illustrate a point about the epistemology of testing function composition.</em>
</p>
<p>
This article continues the introduction of a series on the <a href="/2023/02/13/epistemology-of-interaction-testing">epistemology of interaction testing</a>. In the first article, I attempted to explain how to test the composition of functions. Despite my best efforts, I felt that that article somehow fell short of its potential. Particularly, I felt that I ought to have been able to provide some illustrations.
</p>
<p>
After publishing the first article, I finally found a way to illustrate what I'd been trying to communicate. That's this article. Better late than never.
</p>
<h3 id="0c1034fd05cb4b14a3e6ed985e98d6a3">
Previously, on epistemology of interaction testing <a href="#0c1034fd05cb4b14a3e6ed985e98d6a3">#</a>
</h3>
<p>
A brief summary of the previous article may be in order. The question this article series tries to address is how to unit test composition of functions - particularly <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a>.
</p>
<p>
Consider the illustration from the previous article, repeated here for your convenience:
</p>
<p>
<img src="/content/binary/component-graph.png" alt="Example component graph with four leaves.">
</p>
<p>
When the leaves are pure functions they are <a href="/2015/05/07/functional-design-is-intrinsically-testable">intrinsically testable</a>. That's not the hard part, but how do we test the internal nodes or the root?
</p>
<p>
While most people would reach for <a href="http://xunitpatterns.com/Test%20Stub.html">Stubs</a> and <a href="http://xunitpatterns.com/Test%20Spy.html">Spies</a>, those kinds of <a href="https://martinfowler.com/bliki/TestDouble.html">Test Doubles</a> tend to <a href="/2022/10/17/stubs-and-mocks-break-encapsulation">break encapsulation</a>.
</p>
<p>
What are the alternatives?
</p>
<p>
An alternative I find useful is to test groups of functions composed together. Particularly when they are pure functions, you have no problem with non-deterministic behaviour. On the other hand, this approach seems to run afoul of the problem with combinatorial explosion of integration testing <a href="https://www.infoq.com/presentations/integration-tests-scam/">so eloquently explained by J.B. Rainsberger</a>.
</p>
<p>
What I suggest, however, isn't quite integration testing.
</p>
<h3 id="bec86a649bfb420abbf0a333dd663b57">
Neologism <a href="#bec86a649bfb420abbf0a333dd663b57">#</a>
</h3>
<p>
If it isn't integration testing, then what is it? What do we call it?
</p>
<p>
I'm going to resurrect and recycle an old term of mine: <a href="/2012/06/27/FacadeTest">Facade Tests</a>. Ten years ago I had a more narrow view of a term like 'unit test' than I do today, but the overall idea seems apt in this new context. A Facade Test is a test that exercises a <a href="https://en.wikipedia.org/wiki/Facade_pattern">Facade</a>.
</p>
<p>
These days, I don't find it productive to distinguish narrowly between different kinds of tests. At least not to the the degree that I wish to fight over terminology. On the other hand, occasionally it's useful to have a name for a thing, in order to be able to differentiate it from some other thing.
</p>
<p>
The term <em>Facade Tests</em> is my attempt at a <a href="https://martinfowler.com/bliki/Neologism.html">neologism</a>. I hope it helps.
</p>
<h3 id="11713bd3f4bc48d8a9c82b3f955066ad">
Code coverage as a proxy for confidence <a href="#11713bd3f4bc48d8a9c82b3f955066ad">#</a>
</h3>
<p>
The question I'm trying to address is how to test functions that compose other functions - the internal nodes or the root in the above graph. As I tried to explain in the previous article, you need to build confidence that various parts of the composition work. How do you gain confidence in the leaves?
</p>
<p>
One way is to test each leaf individually.
</p>
<p>
<img src="/content/binary/test-and-leaf-no-test.png" alt="A single leaf node and a test module pointing to it.">
</p>
<p>
The first test or two may exercise a tiny slice of the System Under Test (SUT):
</p>
<p>
<img src="/content/binary/test-and-leaf-one-test.png" alt="A single leaf node with a thin slice of space filled in green, driven by a test.">
</p>
<p>
The next few tests may exercise another part of the SUT:
</p>
<p>
<img src="/content/binary/test-and-leaf-two-tests.png" alt="A single leaf node with two thin slices of space filled in green, driven by tests.">
</p>
<p>
Keep adding more tests:
</p>
<p>
<img src="/content/binary/test-and-leaf-several-tests.png" alt="A single leaf node with a triangle of space filled in green, driven by tests.">
</p>
<p>
Stop when you have good confidence that the SUT works as intended:
</p>
<p>
<img src="/content/binary/test-and-leaf-full-coverage.png" alt="A single leaf node fully filled in green, indicating full code coverage by tests.">
</p>
<p>
If you're now thinking of code coverage, I can't blame you. To be clear, I haven't changed my position about code coverage. <a href="/2015/11/16/code-coverage-is-a-useless-target-measure">Code coverage is a useless target measure</a>. On the other hand, there's no harm in having a high degree of code coverage. It still might give you confidence that the SUT works as intended.
</p>
<p>
You may think of the amount of green in the above diagrams as a proxy for confidence. The more green, the more confident you are in the SUT.
</p>
<p>
None of the arguments here hinge on <em>code coverage</em> per se. What matters is confidence.
</p>
<h3 id="1b6d2872c49c4f22aa3f84e541989be7">
Facade testing confidence <a href="#1b6d2872c49c4f22aa3f84e541989be7">#</a>
</h3>
<p>
With all the leaves covered, you can move on to the internal nodes. This is the actual problem that I'm trying to address. We would like to test an internal node, but it has dependencies. Fortunately, the context of this article is that the dependencies are pure functions, so we don't have a problem with non-deterministic behaviour. No need for Test Doubles.
</p>
<p>
It's really simple, then. Just test the internal node until you're confident that it works:
</p>
<p>
<img src="/content/binary/test-and-internal-node-full-coverage.png" alt="An internal node fully filled in blue, indicating full code coverage by tests. Dependencies are depicted as boxes below the internal node, but with only slivers of coverage.">
</p>
<p>
The goal is to build confidence in the internal node, the new SUT. While it has dependencies, covering those with tests is no longer the goal. This is the key difference between Facade Testing and Integration Testing. You're not trying to cover all combinations of code paths in the integrated set of components. You're still just trying to test the new SUT.
</p>
<p>
Whether or not these tests exercise the leaves is irrelevant. The leaves are already covered by other tests. What 'coverage' you get of the leaves is incidental.
</p>
<p>
Once you've built confidence in internal nodes, you can repeat the process with the root node:
</p>
<p>
<img src="/content/binary/test-and-root-full-coverage.png" alt="A root node fully filled in blue, indicating full code coverage by tests. Dependencies are depicted as trees below the root node, but with only slivers of coverage.">
</p>
<p>
The test covers enough of the root node to give you confidence in it. Some of the dependencies are also partially exercised by the tests, but this is still secondary. The way I've drawn the diagram, the left internal node is exercised in such a way that <em>its</em> dependencies (the leaves) are partially exercised. The test apparently also exercises the right internal node, but none of that activity makes it interact with the leaves.
</p>
<p>
These aren't integration tests, so they avoid the problem of combinatorial explosion.
</p>
<h3 id="8eeade5dc4f0433cbb2506dc50973e48">
Conclusion <a href="#8eeade5dc4f0433cbb2506dc50973e48">#</a>
</h3>
<p>
This article was an attempt to illustrate the prose in the previous article. You can unit test functions that compose other functions by first unit testing the leaf functions and then the compositions. While these tests exercise an 'integration' of components, the purpose is <em>not</em> to test the integration. Thus, they aren't integration tests. They're facade tests.
</p>
<p>
<strong>Next:</strong> <a href="/2023/04/03/an-abstract-example-of-refactoring-from-interaction-based-to-property-based-testing">An abstract example of refactoring from interaction-based to property-based testing</a>.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="3d4420c5b9f9425384a480be778993db">
<div class="comment-author"><a href="https://www.relativisticramblings.com/">Christer van der Meeren</a> <a href="#3d4420c5b9f9425384a480be778993db">#</a></div>
<div class="comment-content">
<p>I really appreciate that you are writing about testing compositions of pure functions. As an F# dev who tries to adhere to the <a href='https://blog.ploeh.dk/2020/03/02/impureim-sandwich/'>impureim sandwich</a> (which, indeed, you <a href='https://blog.ploeh.dk/2022/02/14/a-conditional-sandwich-example/'>helped me with before</a>), this is something I have also been struggling with, and failing to find good answers to.</p>
<p><strong>But following your suggestion, aren’t we testing implementation details?</strong></p>
<p>Using the terminology in this article, I often have a root function that is public, which composes and delegates work to private helper functions. Compared to having all the logic directly in the root function, the code is, unsurprisingly, easier to read and maintain this way. However, all the private helper functions (internal nodes and leaves) as well as the particularities of how the root and the internal nodes compose their “children”, are very much just implementation details of the root function.</p>
<p>I occasionally need to change such code in a way that does not change the public API (at least not significantly enough to cause excessive test maintenance), but which significantly restructures the internal helpers. If I were to test as suggested in this article, I would have many broken tests on my hands. These would be tests of the internal nodes and leaves (which may not exist at all after the refactor, having been replaced with completely different functions) as well as tests of how the root node composes the other functions (which, presumably, would still pass but may not actually test anything useful anymore).</p>
<p>In short, testing in the way suggested here would act as a force to avoid refactoring, which seems counter-productive.</p>
<p>One would also need to use <code>InternalsVisibleTo</code> or similar in order to test those helpers. I’m not very concerned about that on its own (though I’d like to keep the helpers <code>private</code>), but it always smells of testing implementation details, which, as I argue, is what I think we’re doing. (One could alternatively make the helpers public – they’re pure, after all, so presumably no harm done – but that would expose a public API that no-one should actually use, and doesn’t avoid the main problem anyway.)</p>
<p>As a motivating example from my world, consider a system for sending email notifications. The root function accepts a list of notifications that should be sent, together with any auxiliary data (names and other data from all users referenced by the notifications; translated strings; any environment-specific data such as base URLs for links; etc.), and returns the email HTML (or at least a structure that maps trivially to HTML). In doing this, the code has to group notifications in several levels, sort them in various ways, merge similar consecutive notifications in non-trivial ways, hide notifications that the user has not asked to receive (but which must still be passed to the root function since they are needed for other logic), and so on. All in all, I have almost 600 lines of pure code that does this. (In addition, I have 150 lines that fetches everything from the DB and creates necessary lookup maps of auxiliary data to pass to the root function. I consider this code “too boring to fail”.)</p>
<p>The pure part of the code was recently significantly revamped. Had I had tests for private/internal helpers, the refactor would likely have been much more painful.</p>
<p>I expect there is no perfect way to make the code both testable and easy to refactor. But I am still eager to hear your thoughts on my concern: <strong>Following your suggestion, aren’t we testing implementation details?</strong></p>
</div>
<div class="comment-date">2023-03-13 14:00 UTC</div>
</div>
<div class="comment" id="af0abf6dad214bfbb452488ca09c8a27">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#af0abf6dad214bfbb452488ca09c8a27">#</a></div>
<div class="comment-content">
<p>
Christer, thank you for writing. The short answer is: <em>yes</em>.
</p>
<p>
Isn't this a separate problem, though? If you're using Stubs and Spies to test interaction, and other tests to verify your implementations, then isn't that a similar problem?
</p>
<p>
I'm going to graze this topic in the future article in this series tentatively titled <em>Refactoring pure function composition without breaking existing tests</em>, but I should probably write another article more specifically about this topic...
</p>
</div>
<div class="comment-date">2023-03-14 9:06 UTC</div>
</div>
<div class="comment" id="ccd772ca4c0946adbd7500d928eda8e0">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#ccd772ca4c0946adbd7500d928eda8e0">#</a></div>
<div class="comment-content">
<p>
Christer (and everyone who may be interested), I've long been wanting to expand on my <a href="#af0abf6dad214bfbb452488ca09c8a27">previous answer</a>, and I finally found the time to write an <a href="/2023/06/19/when-is-an-implementation-detail-an-implementation-detail">article that discusses the implementation-detail question</a>.
</p>
<p>
Apart from that I also want to remind readers that the article <a href="/2023/05/01/refactoring-pure-function-composition-without-breaking-existing-tests">Refactoring pure function composition without breaking existing tests</a> has been available since May 1st, 2023. It shows one example of using the <a href="https://martinfowler.com/bliki/StranglerFigApplication.html">Strangler pattern</a> to refactor pure Facade Tests without breaking them.
</p>
<p>
This doesn't imply that one should irresponsibly make every pure function public. These days, I make things <a href="/2021/03/01/pendulum-swing-internal-by-default">internal by default</a>, but make them public if I think they'd be good <a href="http://wiki.c2.com/?SoftwareSeam">seams</a>. Particularly when following test-driven development, it's possible to <a href="/2021/09/13/unit-testing-private-helper-methods">unit test private helpers</a> via a <code>public</code> API. This does, indeed, have the benefit that you're free to refactor those helpers without impacting test code.
</p>
<p>
The point of this article series isn't that you <em>should</em> make pure functions public and test interactions with property-based testing. The point is that if you <em>already</em> have pure functions and you wish to test how they interact, then property-based testing is a good way to achieve that goal.
</p>
<p>
If, on the other hand, you have a pure function for composing emails, and you can keep all helper functions <code>private</code>, <em>still</em> <a href="/2018/11/12/what-to-test-and-not-to-test">cover it enough to be confident that it works</a>, and do that by only exercising a single <code>public</code> root function (mail-slot testing) then that's preferable. That's what I would aim for as well.
</p>
</div>
<div class="comment-date">2023-06-19 7:04 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Warnings-as-errors frictionhttps://blog.ploeh.dk/2023/03/06/warnings-as-errors-friction2023-03-06T07:17:00+00:00Mark Seemann
<div id="post">
<p>
<em>TDD friction. Surely that's a bad thing(?)</em>
</p>
<p>
<a href="https://furlough.merecomplexities.com/">Paul Wilson</a> recently wrote on Mastodon:
</p>
<blockquote>
<p>
Software development opinion (warnings as errors)
</p>
<p>
Just seen this via Elixir Radar, <a href="https://curiosum.com/til/warnings-as-errors-elixir-mix-compile">https://curiosum.com/til/warnings-as-errors-elixir-mix-compile</a> on on treating warnings as errors, and yeah don't integrate code with warnings. But ....
</p>
<p>
Having worked on projects with this switched on in dev, it's an annoying bit of friction when Test Driving code. Yes, have it switched on in CI, but don't make me fix all the warnings before I can run my failing test.
</p>
<p>
(Using an env variable for the switch is a good compromise here, imo).
</p>
<footer><cite><a href="https://mastodon.social/@paulwilson/109433259807695275">Paul Wilson</a></cite></footer>
</blockquote>
<p>
This made me reflect on similar experiences I've had. I thought perhaps I should write them down.
</p>
<p>
To be clear, <em>this article is not an attack on Paul Wilson</em>. He's right, but since he got me thinking, I only find it honest and respectful to acknowledge that.
</p>
<p>
The remark does, I think, invite more reflection.
</p>
<h3 id="512ad9e57e7b4c3c849cfc5e78c8b977">
Test friction example <a href="#512ad9e57e7b4c3c849cfc5e78c8b977" title="permalink">#</a>
</h3>
<p>
<a href="http://www.exampler.com/">An example would be handy right about now</a>.
</p>
<p>
As I was writing the example code base for <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, I was following the advice of the book:
</p>
<ul>
<li>Turn on <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/builtin-types/nullable-reference-types">Nullable reference types</a> (only relevant for C#)</li>
<li>Turn on static code analysis or <a href="https://en.wikipedia.org/wiki/Lint_(software)">linters</a></li>
<li>Treat warnings as errors. Yes, also the warnings produced by the two above steps</li>
</ul>
<p>
As Paul Wilson points out, this tends to create friction with test-driven development (TDD). When I started the code base, this was the first TDD test I wrote:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task PostValidReservation()
{
<span style="color:blue;">var</span> response = <span style="color:blue;">await</span> PostReservation(<span style="color:blue;">new</span> {
date = <span style="color:#a31515;">"2023-03-10 19:00"</span>,
email = <span style="color:#a31515;">"katinka@example.com"</span>,
name = <span style="color:#a31515;">"Katinka Ingabogovinanana"</span>,
quantity = 2 });
Assert.True(
response.IsSuccessStatusCode,
<span style="color:#a31515;">$"Actual status code: </span>{response.StatusCode}<span style="color:#a31515;">."</span>);
}</pre>
</p>
<p>
Looks good so far, doesn't it? Are any of the warnings-as-errors settings causing friction? Not directly, but now regard the <code>PostReservation</code> helper method:
</p>
<p>
<pre>[SuppressMessage(
<span style="color:#a31515;">"Usage"</span>,
<span style="color:#a31515;">"CA2234:Pass system uri objects instead of strings"</span>,
Justification = <span style="color:#a31515;">"URL isn't passed as variable, but as literal."</span>)]
<span style="color:blue;">private</span> <span style="color:blue;">async</span> Task<HttpResponseMessage> PostReservation(
<span style="color:blue;">object</span> reservation)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> factory = <span style="color:blue;">new</span> WebApplicationFactory<Startup>();
<span style="color:blue;">var</span> client = factory.CreateClient();
<span style="color:blue;">string</span> json = JsonSerializer.Serialize(reservation);
<span style="color:blue;">using</span> <span style="color:blue;">var</span> content = <span style="color:blue;">new</span> StringContent(json);
content.Headers.ContentType.MediaType = <span style="color:#a31515;">"application/json"</span>;
<span style="color:blue;">return</span> <span style="color:blue;">await</span> client.PostAsync(<span style="color:#a31515;">"reservations"</span>, content);
}</pre>
</p>
<p>
Notice the <code>[SuppressMessage]</code> attribute. Without it, the compiler emits this error:
</p>
<blockquote>
error CA2234: Modify 'ReservationsTests.PostReservation(object)' to call 'HttpClient.PostAsync(Uri, HttpContent)' instead of 'HttpClient.PostAsync(string, HttpContent)'.
</blockquote>
<p>
That's an example of friction in TDD. I could have fixed the problem by changing the last line to:
</p>
<p>
<pre><span style="color:blue;">return</span> <span style="color:blue;">await</span> client.PostAsync(<span style="color:blue;">new</span> Uri(<span style="color:#a31515;">"reservations"</span>, UriKind.Relative), content);</pre>
</p>
<p>
This makes the actual code more obscure, which is the reason I didn't like that option. Instead, I chose to add the <code>[SuppressMessage]</code> attribute and write a <code>Justification</code>. It is, perhaps, not much of an explanation, but my position is that, in general, I consider <a href="https://learn.microsoft.com/dotnet/fundamentals/code-analysis/quality-rules/ca2234">CA2234</a> a good and proper rule. It's a specific example of favouring stronger types over <a href="https://blog.codinghorror.com/new-programming-jargon/">stringly typed code</a>. I'm <a href="/2015/01/19/from-primitive-obsession-to-domain-modelling">all for it</a>.
</p>
<p>
If you <a href="/ref/stranger-in-a-strange-land">grok</a> the motivation for the rule (which, evidently, the <a href="https://learn.microsoft.com/dotnet/fundamentals/code-analysis/quality-rules/ca2234">documentation</a> code-example writer didn't) you also know when to safely ignore it. Types are useful because they enable you to encapsulate knowledge and guarantees about data in a way that strings and ints typically don't. Indeed, if you are passing URLs around, pass them around as <a href="https://learn.microsoft.com/dotnet/api/system.uri">Uri</a> objects rather than strings. This prevents simple bugs, such as accidentally swapping the place of two variables because they're both strings.
</p>
<p>
In the above example, however, a URL isn't being passed around as a variable. <em>The value is hard-coded right there in the code.</em> Wrapping it in a <code>Uri</code> object doesn't change that.
</p>
<p>
But I digress...
</p>
<p>
This is an example of friction in TDD. Instead of being able to just plough through, I had to stop and deal with a Code Analysis rule.
</p>
<h3 id="3265182ae4114b9da00d91fe0763f36e">
SUT friction example <a href="#3265182ae4114b9da00d91fe0763f36e" title="permalink">#</a>
</h3>
<p>
But wait! There's more.
</p>
<p>
To pass the test, I had to add this class:
</p>
<p>
<pre> [Route(<span style="color:#a31515;">"[controller]"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ReservationsController</span>
{
<span style="color:gray;">#pragma</span> <span style="color:gray;">warning</span> <span style="color:gray;">disable</span> CA1822 <span style="color:green;">// Mark members as static</span>
<span style="color:blue;">public</span> <span style="color:blue;">void</span> Post() { }
<span style="color:gray;">#pragma</span> <span style="color:gray;">warning</span> <span style="color:gray;">restore</span> CA1822 <span style="color:green;">// Mark members as static</span>
}</pre>
</p>
<p>
I had to suppress <a href="https://learn.microsoft.com/dotnet/fundamentals/code-analysis/quality-rules/ca1822">CA1822</a> as well, because it generated this error:
</p>
<blockquote>
error CA1822: Member Post does not access instance data and can be marked as static (Shared in VisualBasic)
</blockquote>
<p>
Keep in mind that because of my settings, it's an <em>error</em>. The code doesn't compile.
</p>
<p>
You can try to fix it by making the method <code>static</code>, but this then triggers another error:
</p>
<blockquote>
error CA1052: Type 'ReservationsController' is a static holder type but is neither static nor NotInheritable
</blockquote>
<p>
In other words, the class should be static as well:
</p>
<p>
<pre>[Route(<span style="color:#a31515;">"[controller]"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ReservationsController</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">void</span> Post() { }
}</pre>
</p>
<p>
This compiles. What's not to like? Those Code Analysis rules are there for a reason, aren't they? Yes, but they are general rules that can't predict every corner case. While the code compiles, the test fails.
</p>
<p>
Out of the box, that's just not how that version of ASP.NET works. The MVC model of ASP.NET expects <em>action methods</em> to be instance members.
</p>
<p>
(I'm sure that there's a way to tweak ASP.NET so that it allows static HTTP handlers as well, but I wasn't interested in researching that option. After all, the above code only represents an interim stage during a longer TDD session. Subsequent tests would prompt me to give the <code>Post</code> method some proper behaviour that would make it an instance method anyway.)
</p>
<p>
So I kept the method as an instance method and suppressed the Code Analysis rule.
</p>
<p>
Friction? Demonstrably.
</p>
<h3 id="06c8473479aa404486a88a99fd90cea9">
Opt in <a href="#06c8473479aa404486a88a99fd90cea9" title="permalink">#</a>
</h3>
<p>
Is there a way to avoid the friction? Paul Wilson mentions a couple of options: Using an environment variable, or only turning warnings into errors in your deployment pipeline. A variation on using an environment variable is to only turn on errors for Release builds (for languages where that distinction exists).
</p>
<p>
In general, if you have a useful tool that unfortunately takes a long time to run, making it a scheduled or opt-in tool may be the way to go. A <a href="https://en.wikipedia.org/wiki/Mutation_testing">mutation testing</a> tool like <a href="https://stryker-mutator.io/">Stryker</a> can easily run for hours, so it's not something you want to do for every change you make.
</p>
<p>
Another example is dependency analysis. One of my recent clients had a tool that scanned their code dependencies (NuGet, npm) for versions with known vulnerabilities. This tool would also take its time before delivering a verdict.
</p>
<p>
Making tools opt-in is definitely an option.
</p>
<p>
You may be concerned that this requires discipline that perhaps not all developers have. If a tool is opt-in, will anyone remember to run it?
</p>
<p>
As I also describe in <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>, you could address that issue with a checklist.
</p>
<p>
Yeah, but <em>do we then need a checklist to remind us to look at the checklist?</em> Right, <a href="https://en.wikipedia.org/wiki/Quis_custodiet_ipsos_custodes%3F">quis custodiet ipsos custodes?</a> Is it going to be <a href="https://en.wikipedia.org/wiki/Turtles_all_the_way_down">turtles all the way down</a>?
</p>
<p>
Well, if no-one in your organisation can be trusted to follow <em>any</em> commonly-agreed-on rules on a regular basis, you're in trouble anyway.
</p>
<h3 id="cad76f35e04f46b9b2703759c03aa3eb">
Good friction? <a href="#cad76f35e04f46b9b2703759c03aa3eb" title="permalink">#</a>
</h3>
<p>
So far, I've spent some time describing the problem. When encountering resistance your natural reaction is to find it disagreeable. You want to accomplish something, and then this rule/technique/tool gets in the way!
</p>
<p>
Despite this, is it possible that this particular kind of friction is beneficial?
</p>
<p>
By (subconsciously, I'm sure) picking a word like 'friction', you've already chosen sides. That word, in general, has a negative connotation. Is it the only word that describes the situation? What if we talked about it instead in terms of safety, assistance, or predictability?
</p>
<p>
Ironically, <em>friction</em> was a main complaint about TDD when it was first introduced.
</p>
<p>
<em>"What do you mean? I have to write a test</em> before <em>I write the implementation? That's going to slow me down!"</em>
</p>
<p>
The TDD and agile movement developed a whole set of standard responses to such objections. <em>Brakes enable you to go faster. If it hurts, do it more often.</em>
</p>
<p>
Try those on for size, only now applied to warnings as errors. Friction is what makes brakes work.
</p>
<h3 id="456884d28f9b484f961f0b142993e042">
Additive mindset <a href="#456884d28f9b484f961f0b142993e042" title="permalink">#</a>
</h3>
<p>
As I age, I'm becoming increasingly aware of a tendency in the software industry. Let's call it the <em>additive mindset</em>.
</p>
<p>
It's a reflex to consider <em>addition</em> a good thing. An API with a wide array of options is better than a narrow API. Software with more features is better than software with few features. More log data provides better insight.
</p>
<p>
More code is better than less code.
</p>
<p>
Obviously, that's not true, but we. keep. behaving as though it is. Just look at the recent hubbub about <a href="https://openai.com/blog/chatgpt/">ChatGPT</a>, or <a href="https://github.com/features/copilot">GitHub Copilot</a>, which I <a href="/2022/12/05/github-copilot-preliminary-experience-report">recently wrote about</a>. Everyone reflexively view them as productivity tools because the can help us produce more code faster.
</p>
<p>
I had a cup of coffee with my wife as I took a break from writing this article, and I told her about it. Her immediate reaction when told about friction is that it's a benefit. She's a doctor, and naturally view procedure, practice, regulation, etcetera as occasionally annoying, but essential to the practice of medicine. Without procedures, patients would die from preventable mistakes and doctors would prescribe morphine to themselves. Checking boxes and signing off on decisions slow you down, and that's half the point. Making you slow down can give you the opportunity to realise that you're about to do something stupid.
</p>
<blockquote>
<p>
Worried that TDD will slow down your programmers? Don't. They probably need slowing down.
</p>
<footer><cite><a href="https://twitter.com/jbrains/status/167297606698008576">J. B. Rainsberger</a></cite></footer>
</blockquote>
<p>
But if TDD is already being touted as a process to make us slow down and think, is it a good idea, then, to slow down TDD with warnings as errors? Are we not interfering with a beneficial and essential process?
</p>
<h3 id="efed5514b66e4da5ab36690d3288ca84">
Alternatives to TDD <a href="#efed5514b66e4da5ab36690d3288ca84" title="permalink">#</a>
</h3>
<p>
I don't have a confident answer to that question. What follows is tentative. I've been doing TDD since 2003 and while I was also an <a href="/2010/12/22/TheTDDApostate">early critic</a>, it's still central to how I write code.
</p>
<p>
When I began doing TDD with all the errors <a href="https://en.wikipedia.org/wiki/Up_to_eleven">dialled to 11</a> I was concerned about the friction, too. While I also believe in linters, the two seem to work at cross purposes. The rule about static members in the above example seems clearly counterproductive. After all, a few commits later I'd written enough code for the <code>Post</code> method that it <em>had</em> to be an instance method after all. The degenerate state was temporary, an artefact of the TDD process, but the rule triggered anyway.
</p>
<p>
What should I think of that?
</p>
<p>
I don't <em>like</em> having to deal with such <a href="https://en.wikipedia.org/wiki/False_positives_and_false_negatives">false positives</a>. The question is whether treating warnings as errors is a net positive or a net negative?
</p>
<p>
It may help to recall why TDD is a useful practice. A major reason is that it provides rapid feedback. There are, however, <a href="/2011/04/29/Feedbackmechanismsandtradeoffs">other ways to produce rapid feedback</a>. Static types, compiler warnings, and static code analysis are other ways.
</p>
<p>
I don't think of these as alternatives to TDD, but rather as complementary. Tests can produce feedback about some implementation details. <a href="https://www.hillelwayne.com/post/constructive/">Constructive data</a> is another option. Compiler warnings and linters enter that mix as well.
</p>
<p>
Here I again speak with some hesitation, but it looks to me as though the TDD practice originated in dynamically typed tradition (<a href="https://en.wikipedia.org/wiki/Smalltalk">Smalltalk</a>), and even though some Java programmers were early adopters as well, from my perspective it's always looked stronger among the dynamic languages than the compiled languages. The unadulterated TDD tradition still seems to largely ignore the existence of other forms of feedback. Everything must be tested.
</p>
<p>
At the risk of repeating myself, I find TDD invaluable, but I'm happy to receive rapid feedback from heterogeneous sources: Tests, type checkers, compilers, linters, fellow ensemble programmers.
</p>
<p>
This suggests that TDD isn't the only game in town. This may also imply that the friction to TDD caused by treating warnings as errors may not be as costly as first perceived. After all, slowing down something that you rely on 75% of the time isn't quite as bad as slowing down something you rely on 100% of the time.
</p>
<p>
While it's a cost, perhaps it went down...
</p>
<h3 id="235a1aa10437465b945242f8b6a29873">
Simplicity <a href="#235a1aa10437465b945242f8b6a29873" title="permalink">#</a>
</h3>
<p>
As always, circumstances matter. Is it always a good idea to treat warnings as errors?
</p>
<p>
Not really. To be honest, treating warnings as errors is another case of treating a symptom. The reason I recommend it is that I've seen enough code bases where compiler warnings (not errors) have accumulated. In a setting where that happens, treating (new) warnings as errors can help get the situation under control.
</p>
<p>
When I work alone, I don't allow warnings to build up. I rarely tell the compiler to treat warnings as errors in my personal code bases. There's no need. I have zero tolerance for compiler warnings, and I do spot them.
</p>
<p>
If you have a team that never allows compiler warnings to accumulate, is there any reason to treat them as errors? Probably not.
</p>
<p>
This underlines an important point about productivity: A good team without strict process can outperform a poor team with a clearly defined process. Mindset beats tooling. Sometimes.
</p>
<p>
Which mindset is that? Not the additive mindset. Rather, I believe in focusing on simplicity. The alternative to adding things isn't to blindly remove things. You can't add features to a program <em>only</em> by deleting code. Rather, add code, but keep it simple. <a href="/2022/11/21/decouple-to-delete">Decouple to delete</a>.
</p>
<blockquote>
<p>
perfection is attained not when there is nothing more to add, but when there is nothing more to remove.
</p>
<footer><cite>Antoine de Saint Exupéry, <a href="/ref/wind-sand-stars">Wind, Sand And Stars</a></cite></footer>
</blockquote>
<p>
Simple code. Simple tests. Be warned, however, that code simplicity does not imply naive code understandable by everyone. I'll refer you to <a href="https://en.wikipedia.org/wiki/Rich_Hickey">Rich Hickey</a>'s wonderful talk <a href="https://www.infoq.com/presentations/Simple-Made-Easy/">Simple Made Easy</a> and remind you that this was the line of thinking that lead to <a href="https://clojure.org/">Clojure</a>.
</p>
<p>
Along the same lines, I tend to consider <a href="https://www.haskell.org/">Haskell</a> to be a vehicle for expressing my thoughts in a <em>simpler</em> way than I can do in <a href="https://fsharp.org/">F#</a>, which again enables simplicity not available in C#. Simpler, not easier.
</p>
<h3 id="eb224aa293984475bf9b105e191b362c">
Conclusion <a href="#eb224aa293984475bf9b105e191b362c" title="permalink">#</a>
</h3>
<p>
Does treating warnings as errors imply TDD friction? It certainly looks that way.
</p>
<p>
Is it worth it, nonetheless? Possibly. It depends on why you need to turn warnings into errors in the first place. In some settings, the benefits of treating warnings as errors may be greater than the cost. If that's the only way you can keep compiler warnings down, then do treat warnings as errors. Such a situation, however, is likely to be a symptom of a more fundamental mindset problem.
</p>
<p>
This almost sounds like a moral judgement, I realise, but that's not my intent. Mindset is shaped by personal preference, but also by organisational and peer pressure, as well as knowledge. If you only know of one way to achieve a goal, you have no choice. Only if you know of more than one way can you choose.
</p>
<p>
Choose the way that leaves the code simpler than the other.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Test Data Generator monadhttps://blog.ploeh.dk/2023/02/27/test-data-generator-monad2023-02-27T07:10:00+00:00Mark Seemann
<div id="post">
<p>
<em>With examples in C# and F#.</em>
</p>
<p>
This article is an instalment in <a href="/2022/03/28/monads">an article series about monads</a>. In other related series previous articles described <a href="/2017/09/18/the-test-data-generator-functor">Test Data Generator as a functor</a>, as well as <a href="/2018/11/26/the-test-data-generator-applicative-functor">Test Data Generator as an applicative functor</a>. As is the case with many (but not all) <a href="/2018/03/22/functors">functors</a>, this one also forms a monad.
</p>
<p>
This article expands on the code from the above-mentioned articles about Test Data Generators. Keep in mind that the code is a simplified version of what you'll find in a real property-based testing framework. It lacks shrinking and referentially transparent (pseudo-)random value generation. Probably more things than that, too.
</p>
<h3 id="0855f8f4840c4efc96f102e3913e7357">
SelectMany <a href="#0855f8f4840c4efc96f102e3913e7357">#</a>
</h3>
<p>
A monad must define either a <em>bind</em> or <em>join</em> function. In C#, monadic bind is called <code>SelectMany</code>. For the <code><span style="color:#2b91af;">Generator</span><<span style="color:#2b91af;">T</span>></code> class, you can implement it as an instance method like this:
</p>
<p>
<pre><span style="color:blue;">public</span> Generator<TResult> <span style="font-weight:bold;color:#74531f;">SelectMany</span><<span style="color:#2b91af;">TResult</span>>(Func<T, Generator<TResult>> <span style="font-weight:bold;color:#1f377f;">selector</span>)
{
Func<Random, TResult> <span style="font-weight:bold;color:#1f377f;">newGenerator</span> = <span style="font-weight:bold;color:#1f377f;">r</span> =>
{
Generator<TResult> <span style="font-weight:bold;color:#1f377f;">g</span> = selector(generate(r));
<span style="font-weight:bold;color:#8f08c4;">return</span> g.Generate(r);
};
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> Generator<TResult>(newGenerator);
}</pre>
</p>
<p>
<code>SelectMany</code> enables you to chain generators together. You'll see an example later in the article.
</p>
<h3 id="36a3fb07ddc9428cb334939251deea36">
Query syntax <a href="#36a3fb07ddc9428cb334939251deea36">#</a>
</h3>
<p>
As the <a href="/2022/03/28/monads">monad article</a> explains, you can enable C# query syntax by adding a special <code>SelectMany</code> overload:
</p>
<p>
<pre><span style="color:blue;">public</span> Generator<TResult> <span style="font-weight:bold;color:#74531f;">SelectMany</span><<span style="color:#2b91af;">U</span>, <span style="color:#2b91af;">TResult</span>>(
Func<T, Generator<U>> <span style="font-weight:bold;color:#1f377f;">k</span>,
Func<T, U, TResult> <span style="font-weight:bold;color:#1f377f;">s</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> SelectMany(<span style="font-weight:bold;color:#1f377f;">x</span> => k(x).Select(<span style="font-weight:bold;color:#1f377f;">y</span> => s(x, y)));
}</pre>
</p>
<p>
The implementation body always looks the same; only the method signature varies from monad to monad. Again, I'll show you an example of using query syntax later in the article.
</p>
<h3 id="3486b1debaf34921b03fd307d03fadce">
Flatten <a href="#3486b1debaf34921b03fd307d03fadce">#</a>
</h3>
<p>
In <a href="/2022/03/28/monads">the introduction</a> you learned that if you have a <code>Flatten</code> or <code>Join</code> function, you can implement <code>SelectMany</code>, and the other way around. Since we've already defined <code>SelectMany</code> for <code><span style="color:#2b91af;">Generator</span><<span style="color:#2b91af;">T</span>></code>, we can use that to implement <code>Flatten</code>. In this article I use the name <code>Flatten</code> rather than <code>Join</code>. This is an arbitrary choice that doesn't impact behaviour. Perhaps you find it confusing that I'm inconsistent, but I do it in order to demonstrate that the behaviour is the same even if the name is different.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Generator<T> <span style="color:#74531f;">Flatten</span><<span style="color:#2b91af;">T</span>>(<span style="color:blue;">this</span> Generator<Generator<T>> <span style="font-weight:bold;color:#1f377f;">generator</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> generator.SelectMany(<span style="font-weight:bold;color:#1f377f;">x</span> => x);
}</pre>
</p>
<p>
As you can tell, this function has to be an extension method, since we can't have a class typed <code>Generator<Generator<T>></code>. As usual, when you already have <code>SelectMany</code>, the body of <code>Flatten</code> (or <code>Join</code>) is always the same.
</p>
<h3 id="347da42ffc7a4c8296743de68cf6a622">
Return <a href="#347da42ffc7a4c8296743de68cf6a622">#</a>
</h3>
<p>
Apart from monadic bind, a monad must also define a way to put a normal value into the monad. Conceptually, I call this function <em>return</em> (because that's the name that <a href="https://www.haskell.org/">Haskell</a> uses):
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Generator<T> <span style="color:#74531f;">Return</span><<span style="color:#2b91af;">T</span>>(T <span style="font-weight:bold;color:#1f377f;">value</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> Generator<T>(<span style="font-weight:bold;color:#1f377f;">_</span> => value);
}</pre>
</p>
<p>
This function ignores the random number generator and always returns <code>value</code>.
</p>
<h3 id="60fdd02680034ebe8ec21cff4fdb47e6">
Left identity <a href="#60fdd02680034ebe8ec21cff4fdb47e6">#</a>
</h3>
<p>
We needed to identify the <em>return</em> function in order to examine <a href="/2022/04/11/monad-laws">the monad laws</a>. Let's see what they look like for the Test Data Generator monad, starting with the left identity law.
</p>
<p>
<pre>[Theory]
[InlineData(17, 0)]
[InlineData(17, 8)]
[InlineData(42, 0)]
[InlineData(42, 1)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">LeftIdentityLaw</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">x</span>, <span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">seed</span>)
{
Func<<span style="color:blue;">int</span>, Generator<<span style="color:blue;">string</span>>> <span style="font-weight:bold;color:#1f377f;">h</span> = <span style="font-weight:bold;color:#1f377f;">i</span> => <span style="color:blue;">new</span> Generator<<span style="color:blue;">string</span>>(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Next(i).ToString());
Assert.Equal(
Generator.Return(x).SelectMany(h).Generate(<span style="color:blue;">new</span> Random(seed)),
h(x).Generate(<span style="color:blue;">new</span> Random(seed)));
}</pre>
</p>
<p>
Notice that the test can't directly compare the two generators, because equality isn't clearly defined for that class. Instead, the test has to call <code>Generate</code> in order to produce comparable values; in this case, strings.
</p>
<p>
Since <code>Generate</code> is non-deterministic, the test has to <code>seed</code> the random number generator argument in order to get reproducible results. It can't even declare one <code>Random</code> object and share it across both method calls, since generating values changes the state of the object. Instead, the test has to generate two separate <code>Random</code> objects, one for each call to <code>Generate</code>, but with the same <code>seed</code>.
</p>
<h3 id="eed0615f4fce4cd79ecc57a734a358ea">
Right identity <a href="#eed0615f4fce4cd79ecc57a734a358ea">#</a>
</h3>
<p>
In a manner similar to above, we can showcase the right identity law as a test.
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:#a31515;">'a'</span>, 0)]
[InlineData(<span style="color:#a31515;">'a'</span>, 8)]
[InlineData(<span style="color:#a31515;">'j'</span>, 0)]
[InlineData(<span style="color:#a31515;">'j'</span>, 5)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">RightIdentityLaw</span>(<span style="color:blue;">char</span> <span style="font-weight:bold;color:#1f377f;">letter</span>, <span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">seed</span>)
{
Func<<span style="color:blue;">char</span>, Generator<<span style="color:blue;">string</span>>> <span style="font-weight:bold;color:#1f377f;">f</span> = <span style="font-weight:bold;color:#1f377f;">c</span> => <span style="color:blue;">new</span> Generator<<span style="color:blue;">string</span>>(<span style="font-weight:bold;color:#1f377f;">r</span> => <span style="color:blue;">new</span> <span style="color:blue;">string</span>(c, r.Next(100)));
Generator<<span style="color:blue;">string</span>> <span style="font-weight:bold;color:#1f377f;">m</span> = f(letter);
Assert.Equal(
m.SelectMany(Generator.Return).Generate(<span style="color:blue;">new</span> Random(seed)),
m.Generate(<span style="color:blue;">new</span> Random(seed)));
}</pre>
</p>
<p>
As always, even a parametrised test constitutes no <em>proof</em> that the law holds. I show the tests to illustrate what the laws look like in 'real' code.
</p>
<h3 id="bca718938dd84e3f9d650e42094768f9">
Associativity <a href="#bca718938dd84e3f9d650e42094768f9">#</a>
</h3>
<p>
The last monad law is the associativity law that describes how (at least) three functions compose. We're going to need three functions. For the demonstration test I'm going to conjure three nonsense functions. While this may not be as intuitive, it on the other hand reduces the noise that more realistic code tends to produce. Later in the article you'll see a more realistic example.
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:#a31515;">'t'</span>, 0)]
[InlineData(<span style="color:#a31515;">'t'</span>, 28)]
[InlineData(<span style="color:#a31515;">'u'</span>, 0)]
[InlineData(<span style="color:#a31515;">'u'</span>, 98)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="font-weight:bold;color:#74531f;">AssociativityLaw</span>(<span style="color:blue;">char</span> <span style="font-weight:bold;color:#1f377f;">a</span>, <span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">seed</span>)
{
Func<<span style="color:blue;">char</span>, Generator<<span style="color:blue;">string</span>>> <span style="font-weight:bold;color:#1f377f;">f</span> = <span style="font-weight:bold;color:#1f377f;">c</span> => <span style="color:blue;">new</span> Generator<<span style="color:blue;">string</span>>(<span style="font-weight:bold;color:#1f377f;">r</span> => <span style="color:blue;">new</span> <span style="color:blue;">string</span>(c, r.Next(100)));
Func<<span style="color:blue;">string</span>, Generator<<span style="color:blue;">int</span>>> <span style="font-weight:bold;color:#1f377f;">g</span> = <span style="font-weight:bold;color:#1f377f;">s</span> => <span style="color:blue;">new</span> Generator<<span style="color:blue;">int</span>>(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Next(s.Length));
Func<<span style="color:blue;">int</span>, Generator<TimeSpan>> <span style="font-weight:bold;color:#1f377f;">h</span> =
<span style="font-weight:bold;color:#1f377f;">i</span> => <span style="color:blue;">new</span> Generator<TimeSpan>(<span style="font-weight:bold;color:#1f377f;">r</span> => TimeSpan.FromDays(r.Next(i)));
Generator<<span style="color:blue;">string</span>> <span style="font-weight:bold;color:#1f377f;">m</span> = f(a);
Assert.Equal(
m.SelectMany(g).SelectMany(h).Generate(<span style="color:blue;">new</span> Random(seed)),
m.SelectMany(<span style="font-weight:bold;color:#1f377f;">x</span> => g(x).SelectMany(h)).Generate(<span style="color:blue;">new</span> Random(seed)));
}</pre>
</p>
<p>
All tests pass.
</p>
<h3 id="3bcc2327700747008fbc5b00865fffc3">
CPR example <a href="#3bcc2327700747008fbc5b00865fffc3">#</a>
</h3>
<p>
Formalities out of the way, let's look at a more realistic example. In the article about the <a href="/2018/11/26/the-test-data-generator-applicative-functor">Test Data Generator applicative functor</a> you saw an example of parsing a <a href="https://en.wikipedia.org/wiki/Personal_identification_number_(Denmark)">Danish personal identification number</a>, in Danish called <em>CPR-nummer</em> (<em>CPR number</em>) for <em>Central Person Register</em>. (It's not a register of central persons, but rather the central register of persons. Danish works slightly differently than English.)
</p>
<p>
CPR numbers have a simple format: <code>DDMMYY-SSSS</code>, where the first six digits indicate a person's birth date, and the last four digits are a sequence number. An example could be <code>010203-1234</code>, which indicates a woman born February 1, 1903.
</p>
<p>
In C# you might model a CPR number as a class with a constructor like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">CprNumber</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">day</span>, <span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">month</span>, <span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">year</span>, <span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">sequenceNumber</span>)
{
<span style="font-weight:bold;color:#8f08c4;">if</span> (year < 0 || 99 < year)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> ArgumentOutOfRangeException(
nameof(year),
<span style="color:#a31515;">"Year must be between 0 and 99, inclusive."</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (month < 1 || 12 < month)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> ArgumentOutOfRangeException(
nameof(month),
<span style="color:#a31515;">"Month must be between 1 and 12, inclusive."</span>);
<span style="font-weight:bold;color:#8f08c4;">if</span> (sequenceNumber < 0 || 9999 < sequenceNumber)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> ArgumentOutOfRangeException(
nameof(sequenceNumber),
<span style="color:#a31515;">"Sequence number must be between 0 and 9999, inclusive."</span>);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">fourDigitYear</span> = CalculateFourDigitYear(year, sequenceNumber);
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">daysInMonth</span> = DateTime.DaysInMonth(fourDigitYear, month);
<span style="font-weight:bold;color:#8f08c4;">if</span> (day < 1 || daysInMonth < day)
<span style="font-weight:bold;color:#8f08c4;">throw</span> <span style="color:blue;">new</span> ArgumentOutOfRangeException(
nameof(day),
<span style="color:#a31515;">$"Day must be between 1 and </span>{daysInMonth}<span style="color:#a31515;">, inclusive."</span>);
<span style="color:blue;">this</span>.day = day;
<span style="color:blue;">this</span>.month = month;
<span style="color:blue;">this</span>.year = year;
<span style="color:blue;">this</span>.sequenceNumber = sequenceNumber;
}</pre>
</p>
<p>
The system has been around since 1968 so clearly suffers from a <a href="https://en.wikipedia.org/wiki/Year_2000_problem">Y2k problem</a>, as years are encoded with only two digits. The workaround for this is that the most significant digit of the sequence number encodes the century. At the time I'm writing this, <a href="https://da.wikipedia.org/wiki/CPR-nummer">the Danish-language wikipedia entry for CPR-nummer</a> still includes a table that shows how one can derive the century from the sequence number. This enables the CPR system to handle birth dates between 1858 and 2057.
</p>
<p>
The <code>CprNumber</code> constructor has to consult that table in order to determine the century. It uses the <code>CalculateFourDigitYear</code> function for that. Once it has the four-digit year, it can use the <a href="https://learn.microsoft.com/dotnet/api/system.datetime.daysinmonth">DateTime.DaysInMonth</a> method to determine the number of days in the given month. This is used to validate the day parameter.
</p>
<p>
The <a href="/2018/11/26/the-test-data-generator-applicative-functor">previous article</a> showed a test that made use of a Test Data Generator for the <code>CprNumber</code> class. The generator was referenced as <code>Gen.CprNumber</code>, but how do you define such a generator?
</p>
<h3 id="02e0bc1c345c4c2b910abd5fa338b30c">
CPR number generator <a href="#02e0bc1c345c4c2b910abd5fa338b30c">#</a>
</h3>
<p>
The constructor arguments for <code>month</code>, <code>year</code>, and <code>sequenceNumber</code> are easy to generate. You need a basic generator that produces values between two boundaries. Both <a href="https://en.wikipedia.org/wiki/QuickCheck">QuickCheck</a> and <a href="https://fscheck.github.io/FsCheck">FsCheck</a> call it <code>choose</code>, so I'll reuse that name:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Generator<<span style="color:blue;">int</span>> <span style="color:#74531f;">Choose</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">min</span>, <span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">max</span>)
{
<span style="font-weight:bold;color:#8f08c4;">return</span> <span style="color:blue;">new</span> Generator<<span style="color:blue;">int</span>>(<span style="font-weight:bold;color:#1f377f;">r</span> => r.Next(min, max + 1));
}</pre>
</p>
<p>
The <code>choose</code> functions of QuickCheck and FsCheck consider both boundaries to be inclusive, so I've done the same. That explains the <code>+ 1</code>, since <a href="https://learn.microsoft.com/dotnet/api/system.random.next">Random.Next</a> excludes the upper boundary.
</p>
<p>
You can now combine <code>choose</code> with <code>DateTime.DaysInMonth</code> to generate a valid day:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Generator<<span style="color:blue;">int</span>> <span style="color:#74531f;">Day</span>(<span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">year</span>, <span style="color:blue;">int</span> <span style="font-weight:bold;color:#1f377f;">month</span>)
{
<span style="color:blue;">var</span> <span style="font-weight:bold;color:#1f377f;">daysInMonth</span> = DateTime.DaysInMonth(year, month);
<span style="font-weight:bold;color:#8f08c4;">return</span> Gen.Choose(1, daysInMonth);
}</pre>
</p>
<p>
Let's pause and consider the implications. The point of this example is to demonstrate why it's practically useful that Test Data Generators are monads. Keep in mind that monads are functors you can flatten. When do you need to flatten a functor? Specifically, when do you need to flatten a Test Data Generator? Right now, as it turns out.
</p>
<p>
The <code>Day</code> method returns a <code>Generator<<span style="color:blue;">int</span>></code>, but where do the <code>year</code> and <code>month</code> arguments come from? They'll typically be produced by another Test Data Generator such as <code>choose</code>. Thus, if you only <em>map</em> (<code>Select</code>) over previous Test Data Generators, you'll produce a <code>Generator<Generator<<span style="color:blue;">int</span>>></code>:
</p>
<p>
<pre>Generator<<span style="color:blue;">int</span>> <span style="font-weight:bold;color:#1f377f;">genYear</span> = Gen.Choose(1970, 2050);
Generator<<span style="color:blue;">int</span>> <span style="font-weight:bold;color:#1f377f;">genMonth</span> = Gen.Choose(1, 12);
Generator<(<span style="color:blue;">int</span>, <span style="color:blue;">int</span>)> <span style="font-weight:bold;color:#1f377f;">genYearAndMonth</span> = genYear.Apply(genMonth);
Generator<Generator<<span style="color:blue;">int</span>>> <span style="font-weight:bold;color:#1f377f;">genDay</span> =
genYearAndMonth.Select(<span style="font-weight:bold;color:#1f377f;">t</span> => Gen.Choose(1, DateTime.DaysInMonth(t.Item1, t.Item2)));</pre>
</p>
<p>
This example uses an <code>Apply</code> overload to combine <code>genYear</code> and <code>genMonth</code>. As long as the two generators are independent of each other, you can use the <a href="/2018/10/01/applicative-functors">applicative functor</a> capability to combine them. When, however, you need to produce a new generator from a value produced by a previous generator, the functor or applicative functor capabilities are insufficient. If you try to use <code>Select</code>, as in the above example, you'll produce a nested generator.
</p>
<p>
Since it's a monad, however, you can <code>Flatten</code> it:
</p>
<p>
<pre>Generator<<span style="color:blue;">int</span>> <span style="font-weight:bold;color:#1f377f;">flattened</span> = genDay.Flatten();</pre>
</p>
<p>
Or you can use <code>SelectMany</code> (monadic <em>bind</em>) to flatten as you go. The <code>CprNumber</code> generator does that, although it uses query syntax syntactic sugar to make the code more readable:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Generator<CprNumber> CprNumber =>
<span style="color:blue;">from</span> sequenceNumber <span style="color:blue;">in</span> Gen.Choose(0, 9999)
<span style="color:blue;">from</span> year <span style="color:blue;">in</span> Gen.Choose(0, 99)
<span style="color:blue;">from</span> month <span style="color:blue;">in</span> Gen.Choose(1, 12)
<span style="color:blue;">let</span> fourDigitYear =
TestDataBuilderFunctor.CprNumber.CalculateFourDigitYear(year, sequenceNumber)
<span style="color:blue;">from</span> day <span style="color:blue;">in</span> Gen.Day(fourDigitYear, month)
<span style="color:blue;">select</span> <span style="color:blue;">new</span> CprNumber(day, month, year, sequenceNumber);</pre>
</p>
<p>
The expression first uses <code>Gen.Choose</code> to produce three independent <code>int</code> values: <code>sequenceNumber</code>, <code>year</code>, and <code>month</code>. It then uses the <code>CalculateFourDigitYear</code> function to look up the proper century based on the two-digit <code>year</code> and the <code>sequenceNumber</code>. With that information it can call <code>Gen.Day</code>, and since the expression uses monadic composition, it's flattening as it goes. Thus <code>day</code> is an <code>int</code> value rather than a generator.
</p>
<p>
Finally, the entire expression can compose the four <code>int</code> values into a valid <code>CprNumber</code> object.
</p>
<p>
You can consult the <a href="/2018/11/26/the-test-data-generator-applicative-functor">previous article</a> to see <code>Gen.CprNumber</code> in use.
</p>
<h3 id="f3947d3a3c364fb28fb264aff59f7fe7">
Hedgehog CPR generator <a href="#f3947d3a3c364fb28fb264aff59f7fe7">#</a>
</h3>
<p>
You can reproduce the CPR example in F# using one of several property-based testing frameworks. In this example, I'll continue the example from the <a href="/2018/11/26/the-test-data-generator-applicative-functor">previous article</a> as well as the article <a href="/2018/12/10/danish-cpr-numbers-in-f">Danish CPR numbers in F#</a>. You can see a couple of tests in these articles. They use the <code>cprNumber</code> generator, but never show the code.
</p>
<p>
In all the property-based testing frameworks I've seen, generators are called <code>Gen</code>. This is also the case for Hedgehog. The <code>Gen</code> <a href="https://bartoszmilewski.com/2014/01/14/functors-are-containers">container</a> is a monad, and there's a <code>gen</code> <a href="https://learn.microsoft.com/dotnet/fsharp/language-reference/computation-expressions">computation expression</a> that supplies syntactic sugar.
</p>
<p>
You can translate the above example to a Hedgehog <code>Gen</code> value like this:
</p>
<p>
<pre><span style="color:blue;">let</span> cprNumber =
gen {
<span style="color:blue;">let!</span> sequenceNumber = Range.linear 0 9999 |> Gen.int32
<span style="color:blue;">let!</span> year = Range.linear 0 99 |> Gen.int32
<span style="color:blue;">let!</span> month = Range.linear 1 12 |> Gen.int32
<span style="color:blue;">let</span> fourDigitYear = Cpr.calculateFourDigitYear year sequenceNumber
<span style="color:blue;">let</span> daysInMonth = DateTime.DaysInMonth (fourDigitYear, month)
<span style="color:blue;">let!</span> day = Range.linear 1 daysInMonth |> Gen.int32
<span style="color:blue;">return</span> Cpr.tryCreate day month year sequenceNumber }
|> Gen.some</pre>
</p>
<p>
To keep the example simple, I haven't defined an explicit <code>day</code> generator, but instead just inlined <code>DateTime.DaysInMonth</code>.
</p>
<p>
Consult the articles that I linked above to see the <code>Gen.cprNumber</code> generator in use.
</p>
<h3 id="15d1feac289e426ebd5845fc59d3f33e">
Conclusion <a href="#15d1feac289e426ebd5845fc59d3f33e">#</a>
</h3>
<p>
Test Data Generators form monads. This is useful when you need to generate test data that depend on other generated test data. Monadic <em>bind</em> (<code>SelectMany</code> in C#) can flatten the generator functor as you go. This article showed examples in both C# and F#.
</p>
<p>
The same abstraction also exists in the Haskell QuickCheck library, but I haven't shown any Haskell examples. If you've taken the trouble to learn Haskell (which you should), you already know what a monad is.
</p>
<p>
<strong>Next:</strong> <a href="/2022/07/11/functor-relationships">Functor relationships</a>.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="771023c319fb4f03bce6e3beb79294e6">
<div class="comment-author"><a href="https://github.com/AnthonyLloyd">Anthony Lloyd</a> <a href="#771023c319fb4f03bce6e3beb79294e6">#</a></div>
<div class="comment-content">
<p>
<a href="https://github.com/AnthonyLloyd/CsCheck">CsCheck</a> is a full implementation of something along these lines. It uses the same random sample generation in the shrinking step always reducing a Size measure. It turns out to be a better way of shrinking than the QuickCheck way.
</p>
</div>
<div class="comment-date">2023-02-27 22:51 UTC</div>
</div>
<div class="comment" id="21f1a20a1a5944fbbbdd300c8f0e89b0">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#21f1a20a1a5944fbbbdd300c8f0e89b0">#</a></div>
<div class="comment-content">
<p>
Anthony, thank you for writing. You'll be pleased to learn, I take it, that the next article in the series about the <a href="/2023/02/13/epistemology-of-interaction-testing">epistemology of interaction testing</a> uses CsCheck as the example framework.
</p>
</div>
<div class="comment-date">2023-02-28 7:33 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.A thought on workplace flexibility and asynchronyhttps://blog.ploeh.dk/2023/02/20/a-thought-on-workplace-flexibility-and-asynchrony2023-02-20T07:03:00+00:00Mark Seemann
<div id="post">
<p>
<em>Is an inclusive workplace one that enables people to work at different hours?</em>
</p>
<p>
In the early <a href="https://en.wikipedia.org/wiki/Aughts">noughties</a> I worked for Microsoft Consulting Service in Denmark. In some sense it was quite the competitive working environment with an unhealthy focus on billable hours, customer satisfaction surveys, and stack ranking. On the other hand, since I was mostly on my own on customer engagements, my managers didn't care when and how I worked. As long as I billed and customers were happy, they were happy.
</p>
<p>
That sometimes allowed me great flexibility.
</p>
<p>
At one time I was on a project for a customer in another part of Denmark, and while Denmark isn't that big, it was still understood that I would do most of my work remotely. The main deliverable was the code base for a software system, and while I might email and otherwise communicate with the customer and a few colleagues during the day, we didn't have any fixed schedules. In other words, I could work whenever I wanted, as long as I got the work done.
</p>
<p>
My daughter was a toddler at the time, and as is the norm in Denmark, already in day nursery. My wife is a doctor and was, at that time, working in hospitals - some of the most inflexible workplaces I can think of. She had to leave early in the morning because the hospitals run on fixed schedules.
</p>
<p>
I'd get up to have breakfast with her. After she left for work, I'd work until my daughter woke up. She typically woke up between 8 and 9, so I'd already be 1-2 hours into my work day. I'd stop working, make her breakfast, and take her to day care. We'd typically arrive between 10 and 11 in good spirits. I'd then bicycle home and work until my wife came home with our daughter. Perhaps I'd get a few more hours of work done in the evening.
</p>
<p>
I worked odd hours, and I loved the flexibility. My customers expected me to deliver iterations of the software and generally stay in touch, but they were perfectly happy with mostly asynchronous communication. Back then, it mostly meant email.
</p>
<p>
During the normal work day, I might be unavailable for hours, taking care of my daughter, exercising, grocery shopping, etc. Yet, I still billed more hours than most of my colleagues, and ultimately received an award for my work.
</p>
<p>
In the decades that followed, I haven't always had such flexibility, but that early experience gave me a strong appreciation for asynchronous work.
</p>
<h3 id="3d5f041e6a4d46d5997c67e7693c3006">
Lockdown work wasn't flexible <a href="#3d5f041e6a4d46d5997c67e7693c3006">#</a>
</h3>
<p>
When COVID-19 hit and most countries went into lockdown, many office workers got their first taste of remote work. Many struggled, for a variety of reasons. Some of those reasons are quite real. If you don't have a well-equipped home office, spending eight hours a day on a kitchen chair is hardly ideal working conditions. And no, the sofa isn't a good long-term solution either.
</p>
<p>
Another problem during lockdown is that your entire family may be home, too. If you have kids, you'll have to attend to them. To be clear, if you've only experienced working from home during COVID-19 lockdown, you may have suffered from many of these problems without realising the benefits of flexibility.
</p>
<p>
To add spite to injury, many workplaces tried to carry on as if nothing had changed, apart from the physical location of people. Office hours were still in effect, and work now took place over video calls. If you spent eight hours on Teams or Zoom, that's not flexible working conditions. Rather, it's the worst of both worlds. The only benefit is that you avoid the commute.
</p>
<h3 id="ac7c2c6f7cf54f0ab882275944262f71">
Remote compared to asynchronous work <a href="#ac7c2c6f7cf54f0ab882275944262f71">#</a>
</h3>
<p>
As outlined above, remote work isn't necessarily flexible. Flexibility comes from asynchronous work processes more than physical location. The flexibility is a result of the freedom to chose <em>when</em> to work, more than <em>where</em> to work.
</p>
<p>
Based on my decades of experience working asynchronously from home, I published <a href="/2020/03/16/conways-law-latency-versus-throughput">an article about the trade-off between latency and throughput</a>, comparing working together in an office with working asynchronously from home. The point is that you can make both work, but the way you organise work matters. In-office work is efficient if everyone is at the office at the same time. Remote work is efficient if people can work asynchronously.
</p>
<p>
As is usually the case, there are trade-offs. The disadvantage of working together is that you must all be present simultaneously. Thus, you don't get the flexibility of choosing when to work. The benefits of working asynchronously is exactly that flexibility, but on the other hand, you lose the advantage of the efficient, high-bandwidth communication that comes from being physically in the same room as others.
</p>
<h3 id="219d3cd5d3c94996b5d3b2993b8a5512">
Inclusion through flexibility? <a href="#219d3cd5d3c94996b5d3b2993b8a5512">#</a>
</h3>
<p>
I was recently listening to an episode of the <a href="https://freakonomics.com/series/freakonomics-radio/">Freakonomics Radio</a> podcast. As a side remark, someone mentioned that for women an important workplace criterion is <em>flexibility</em>. This, clearly, has some implications for this discussion.
</p>
<p>
There's a strong statistical tendency for women to have work-life priorities different from men. For example, <a href="https://ec.europa.eu/eurostat/web/products-eurostat-news/-/EDN-20200306-1">Eurostat reports that women are more likely to work part-time</a>. That may be a signifier that although women want to work, they may want to work less than men. Or perhaps with more flexible hours.
</p>
<p>
If that's true, what does it mean for software development?
</p>
<p>
If you want to include people who value flexibility highly (e.g. some women, but also me) then work should be structured to enable people to engage with it when they have the time. That might include early in the morning, late in the evening, or during the weekend.
</p>
<p>
Two workers who value flexibility may not be on the same schedule. When collaborating, they may have to do so asynchronously. Emails, work item trackers, pull requests.
</p>
<h3 id="554f4dbddb6746fe9ca2224df4714049">
Inclusive collaboration <a href="#554f4dbddb6746fe9ca2224df4714049">#</a>
</h3>
<p>
Most software development takes place in teams. Various team members have different skills, which is good, because a modern software system comprises more components than most people can overcome. Unless you're one of those rainbow unicorns who master modern front-end development, back-end development, DevOps, database design and administration, graphical design, security concerns, cloud computing platforms, reporting and analytics, etc. you'll need to collaborate with team members.
</p>
<p>
You can do so with short-lived Git branches, <a href="/2021/06/21/agile-pull-requests">agile pull requests</a>, and generally well-written communication. No, pull requests and asynchronous reviews don't have to be slow.
</p>
<p>
Recently, I've noticed an increased tendency among some software development thought leaders to extol the virtues of pair- and ensemble programming. These are great collaboration techniques. I've used them with great success in specific contexts. I also write about their advantages in my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>.
</p>
<p>
Pair- and ensemble programming are synchronous collaboration techniques. There are clear advantages to them, but it's a requirement that team members participate at the same time.
</p>
<p>
I'm sure it's fun and stimulating if you're already mostly extravert, but it doesn't strike me as particularly inclusive, time-wise.
</p>
<p>
If you can't be at the office 9-17 you can't participate. Sorry, we can't use you then.
</p>
<p>
What's that, you say? You can work <em>some</em> hours during the day, evenings, and sometimes weekends? But only twenty-five hours a week? Sorry, that doesn't fit our process.
</p>
<h3 id="cd6fdd9f987041e18e35f699952124b2">
A high-throughput alternative <a href="#cd6fdd9f987041e18e35f699952124b2">#</a>
</h3>
<p>
Pair- and ensemble programming are great collaboration techniques, but I've noticed an increased tendency to contrast them to a particular style of siloed, slow solo work with which I'm honestly not familiar. I do, however, consider that a false dichotomy.
</p>
<p>
The alternative to ensemble programming doesn't <em>have</em> to be slow, waterfall-like, feature-branch-based solo work heavy on misunderstandings, integration problems, and rework. It can be asynchronous, pull-based work. <a href="/2023/01/23/agilean">Lean</a>.
</p>
<p>
I've lived that dream. I know that it <em>can</em> work. Is it easy? No. Does it require discipline? Yes. But it's possible, and it's <em>flexible</em>. It enables people to work when they have the time.
</p>
<h3 id="ebbf4488a29a44569e92c491886c3e83">
Conclusion <a href="#ebbf4488a29a44569e92c491886c3e83">#</a>
</h3>
<p>
There are people who would like to work, just not 9-17. Perhaps they <em>can't</em> (for all sorts of socio-economic reasons), or perhaps that just doesn't align with their life choices. Perhaps they're just <a href="/2015/12/04/the-rules-of-attraction-location">not in your time zone</a>.
</p>
<p>
Do you want to include these people, or exclude them?
</p>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Epistemology of interaction testinghttps://blog.ploeh.dk/2023/02/13/epistemology-of-interaction-testing2023-02-13T06:48:00+00:00Mark Seemann
<div id="post">
<p>
<em>How do we know that components interact correctly?</em>
</p>
<p>
Most software systems are composed as a graph of components. To be clear, I use the word <em>component</em> loosely to mean a collection of functionality - it may be an object, a module, a function, a data type, or perhaps something else I haven't thought of. Some components deal with the bigger picture and will typically coordinate other components that perform more specific tasks. If we think of a component graph as a tree, then some components are leaves.
</p>
<p>
<img src="/content/binary/component-graph.png" alt="Example component graph with four leaves.">
</p>
<p>
Leaf components, being self-contained and without dependencies, are typically the easiest to test. Most test-driven development (TDD) katas focus on these kinds of components: <a href="https://codingdojo.org/kata/Tennis/">Tennis</a>, <a href="https://codingdojo.org/kata/Bowling/">bowling</a>, <a href="http://claysnow.co.uk/recycling-tests-in-tdd/">diamond</a>, <a href="https://codingdojo.org/kata/RomanNumerals/">Roman numerals</a>, <a href="https://kata-log.rocks/gossiping-bus-drivers-kata">gossiping bus drivers</a>, and so on. Even the <a href="https://www.devjoy.com/blog/legacy-code-katas/">legacy security manager kata</a> is simple and quite self-contained. There's nothing wrong with that, and there's good reason to keep such exercises simple. After all, you want to be able to <a href="/2020/01/13/on-doing-katas">complete a kata</a> in a few hours. You can hardly do that if the exercise is to develop an entire web site with user interface, persistent data storage, security, data validation, business logic, third-party integration, emails, instrumentation and logging, and so on.
</p>
<p>
This means that even if you get good at TDD against 'leaf' functionality, you may be struggling when it comes to higher-level components. How does one unit test code that has dependencies?
</p>
<h3 id="66efa9d72a62458e93ff8490ef7585e9">
Interaction-based testing <a href="#66efa9d72a62458e93ff8490ef7585e9">#</a>
</h3>
<p>
A common solution is to <a href="https://en.wikipedia.org/wiki/Dependency_inversion_principle">invert the dependencies</a>. You can, for example, use <a href="/dippp">Dependency Injection</a> to inject <a href="https://martinfowler.com/bliki/TestDouble.html">Test Doubles</a> into the System Under Test (SUT). This enables you to control the behaviour of the dependencies and to verify that the SUT behaves as expected. Not only that, but you can also verify that the SUT interacts with the dependencies as expected. This is called <em>interaction-based testing</em>. It is, perhaps, the most common form of unit testing in the industry, and exemplary explained in <a href="/ref/goos">Growing Object-Oriented Software, Guided by Tests</a>.
</p>
<p>
The kinds of Test Doubles most useful with interaction-based testing are <a href="http://xunitpatterns.com/Test%20Stub.html">Stubs</a> and <a href="http://xunitpatterns.com/Mock%20Object.html">Mocks</a>. They are, however, problematic because <a href="/2022/10/17/stubs-and-mocks-break-encapsulation">they break encapsulation</a>. And encapsulation, to be clear, is <a href="/2022/10/24/encapsulation-in-functional-programming">also a concern in functional programming</a>.
</p>
<p>
I have already described how to move <a href="/2019/02/18/from-interaction-based-to-state-based-testing">from interaction-based to state-based testing</a>, and why <a href="/2015/05/07/functional-design-is-intrinsically-testable">functional programming is intrinsically more testable</a>.
</p>
<h3 id="db90b714ddc24393b4340cdd98a19082">
How to test composition of pure functions? <a href="#db90b714ddc24393b4340cdd98a19082">#</a>
</h3>
<p>
When you adopt functional programming (FP) you'll sooner or later need to compose or orchestrate pure functions. How do you test that the composition of pure functions is correct? That's what you can test with a Mock or <a href="http://xunitpatterns.com/Test%20Spy.html">Spy</a>.
</p>
<p>
You've developed component <em>A</em>, perhaps as a <a href="https://en.wikipedia.org/wiki/Higher-order_function">higher-order function</a>, that depends on another component <em>B</em>. You want to test that <em>A</em> correctly interacts with <em>B</em>, but if interaction-based testing is no longer 'allowed' (because it breaks encapsulation), then what do you do?
</p>
<p>
For a long time, I pondered that question myself, while I was busy enjoying FP making most things easier. It took me some time to understand that the answer, as is often the case, is <a href="https://en.wikipedia.org/wiki/Mu_(negative)">mu</a>. I'll get back to that later.
</p>
<p>
I'm not the only one struggling with this question. Sergei Rogovtcev writes and asks what I interpret as the same question:
</p>
<blockquote>
<p>
"I do have a component A, which is, frankly, some controller doing some checks and processing around a fairly complex state. This process can have several outcomes, let's call them Success, Fail, and Missing (the actual states are not important, but I'd like to have more than two). Then we have a component B, which is responsible for the rendering of the result. Of course, three different states lead to three different renderings, but the renderings are also influenced by state (let's say we have browser, mobile and native clients, and we need to provide different renderings). Originally the components are objects, B having three separate methods, but I can express them as pure functions, at least for the purpose of this discussion - A, and then BSuccess, BFail and BMissing. I can easily test each part of B in isolation; the problem comes when I need to test A, which calls different parts of B. If I use mocks, the solution is simple - I inject a mock of B to A, and then verify that A calls appropriate parts according to the process result. This requires knowing the innards of A, but otherwise it is a well-known and well-understood approach. But if I want to avoid mocks, what do I do? I cannot test A without relying on some code path in B, and this to me means that I'm losing the benefits of unit testing and entering the realm of integration testing."
</p>
</blockquote>
<p>
In his email Sergei Rogovtcev has explicitly given me permission to quote him and engage with this question. As I've outlined, I've grappled with that question myself, so I find the question worthwhile. I can't, however, work with it without questioning the premise. This is not an attack on Sergei Rogovtcev; after all, I had that question myself, so any critique I make is directed as much at my former self as at him.
</p>
<h3 id="034c99ad20644781af4f28db0f45b2dd">
Axiomatic versus scientific knowledge <a href="#034c99ad20644781af4f28db0f45b2dd">#</a>
</h3>
<p>
It may be helpful to elevate the discussion. How do we know that software (or a subsystem thereof) works? You could say that one answer to that is: <em>Passing tests</em>. If all tests are passing, we may have high confidence that the system works.
</p>
<p>
In the parlance of Sergei Rogovtcev, we can easily unit test component <em>B</em> because it's composed from <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a>.
</p>
<p>
How do we unit test component <em>A</em>, though? With Mocks and Stubs, you can prove that the interaction works as intended. The keyword here is <em>prove</em>. If you assume that component <em>B</em> works correctly, 'all' you have to do is to demonstrate that component <em>A</em> correctly interacts with component <em>B</em>. I used to do that all the time and called it <a href="/2013/10/23/mocks-for-commands-stubs-for-queries">data-flow verification</a> or <a href="/2013/04/04/structural-inspection">structural inspection</a>. The idea was that if you could demonstrate that component <em>A</em> correctly interacts with <em>any</em> <a href="https://en.wikipedia.org/wiki/Liskov_substitution_principle">LSP</a>-compliant implementation of component <em>B</em>, and then also demonstrate that in reality (when composed in the <a href="/2011/07/28/CompositionRoot">Composition Root</a>) component <em>A</em> is composed with a component <em>B</em> that has also been demonstrated to work correctly, then the (sub-)system works correctly.
</p>
<p>
This is almost like a mathematical proof. First prove <em>lemma B</em>, then prove <em>theorem A</em> using <em>lemma B</em>. Finally, state <em>corollary C</em>: <em>b</em> is a special case handled by <em>lemma B</em>, so therefore <em>a</em> is covered by <em>theorem A</em>. <a href="https://en.wikipedia.org/wiki/Q.E.D.">Q.E.D.</a>
</p>
<p>
It's a logical and deductive approach to the problem of verifying the composition of the whole from verified parts. It's almost mathematical in the sense that it tries to erect an <a href="https://en.wikipedia.org/wiki/Axiomatic_system">axiomatic system</a>.
</p>
<p>
It's also fundamentally flawed.
</p>
<p>
I didn't understand that a decade ago, and in practice, the method worked well enough - apart from all the problems stemming from poor encapsulation. The problem with that approach is that an axiomatic system is only as strong as its <a href="https://en.wikipedia.org/wiki/Axiom">axioms</a>. What are the axioms in this system? The axioms, or premises, are that each of the components (<em>A</em> and <em>B</em>) are already correct. Based on these premises, this testing approach then proves that the composition is also correct.
</p>
<p>
How do we know that the components work correctly?
</p>
<p>
In this context, the answer is that they pass all tests. This, however, doesn't constitute any kind of <em>proof</em>. Rather, this is experimental knowledge, more reminiscent of science than of mathematics.
</p>
<p>
Why are we trying to <em>prove</em>, then, that composition works correctly? Why not just <em>test</em> it?
</p>
<p>
This observation cuts to the heart of the epistemology of testing. How do we know that software works? Typically not by <em>proving</em> it correct, but by subjecting it to experiments. As I've also outlined in <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, we can regard automated tests as scientific experiments that we repeat over and over.
</p>
<h3 id="604436572933477d86d64349357d84ae">
Integration testing <a href="#604436572933477d86d64349357d84ae">#</a>
</h3>
<p>
To outline the argument so far: While you <em>can</em> use Mocks and Spies to verify that a component correctly interacts with another component, this may be overkill. You're essentially trying to prove a conjecture based on doubtful evidence.
</p>
<p>
Does it really matter that two components <em>interact</em> correctly? Aren't the components implementation details? Do users care?
</p>
<p>
Users and other stakeholders care about the <em>behaviour</em> of the software system. Why not test that?
</p>
<p>
This is, unfortunately, easier said than done. Sergei Rogovtcev strongly implies that he isn't keen on integration testing. While he doesn't explicitly state why, there are good reasons to be wary of integration testing. <a href="https://www.infoq.com/presentations/integration-tests-scam/">As J.B. Rainsberger eloquently explained</a>, a major problem with integration testing is the combinatorial explosion of test cases. If you ought to write 53,000 test cases to cover all combinations of pathways through integrated components, which test cases do you write? Surely not all 53,000.
</p>
<p>
J.B. Rainsberger's argument is that if you're going to write no more than a dozen unit tests, you're unlikely to cover enough test cases to be confident that the system works.
</p>
<p>
What if, however, you could write hundreds or thousands of test cases?
</p>
<h3 id="3d43f77a5480418780816b3d6a8b9a0f">
Property-based testing <a href="#3d43f77a5480418780816b3d6a8b9a0f">#</a>
</h3>
<p>
You may recall that the premise of this article is functional programming (FP), where <em>property-based testing</em> is a common testing technique. While you can, to a degree, also use this technique in object-oriented programming (OOP), it's often difficult because of side effects and non-deterministic behaviour.
</p>
<p>
When you write a property-based test, you write a single piece of code that evaluates a <em>property</em> of the SUT. The property looks like a parametrised unit test; the difference is that the input is generated randomly, but in a fashion you can control. This enables you to write hundreds or thousands of test cases without having to write them explicitly.
</p>
<p>
Thus, epistemologically, you can use property-based testing with integrated components to produce confidence that the (sub-)system works. In practice, I find that the confidence I get from this technique is at least as high as the one I used to get from unit testing with Stubs and Spies.
</p>
<h3 id="9edd8d0a18cd4e1abbfaad2d7a3a650a">
Examples <a href="#9edd8d0a18cd4e1abbfaad2d7a3a650a">#</a>
</h3>
<p>
All of this is abstract and theoretical, I realise. <a href="http://www.exampler.com/">An example would be handy right about now</a>. Such examples, however, are complex enough to warrant their own articles:
</p>
<ul>
<li><a href="/2023/03/13/confidence-from-facade-tests">Confidence from Facade Tests</a></li>
<li><a href="/2023/04/03/an-abstract-example-of-refactoring-from-interaction-based-to-property-based-testing">An abstract example of refactoring from interaction-based to property-based testing</a></li>
<li><a href="/2023/04/17/a-restaurant-example-of-refactoring-from-example-based-to-property-based-testing">A restaurant example of refactoring from example-based to property-based testing</a></li>
<li><a href="/2023/05/01/refactoring-pure-function-composition-without-breaking-existing-tests">Refactoring pure function composition without breaking existing tests</a></li>
<li><a href="/2023/06/19/when-is-an-implementation-detail-an-implementation-detail">When is an implementation detail an implementation detail?</a></li>
</ul>
<p>
Sergei Rogovtcev was kind enough to furnish a rather abstract, but <a href="https://en.wikipedia.org/wiki/Minimal_reproducible_example">minimal and self-contained</a>, example. I'll go through that first, and then follow up with a more realistic example.
</p>
<h3 id="2908ce4d26244c90bf20db829888205e">
Conclusion <a href="#2908ce4d26244c90bf20db829888205e">#</a>
</h3>
<p>
How do you know that a software system works correctly? Ultimately, if it behaves in the way it's supposed to, it works correctly. Testing an entire system from the outside, however, is rarely viable in itself. The number of possible test cases is just too large.
</p>
<p>
You can partially address that problem by decomposing the system into components. You can then test the components individually, and verify that they interact correctly. This last part is the topic of this article. A common way to to address this problem is to use Mocks and Spies to prove interactions correct. It does solve the problem of correctness quite neatly, but has the undesirable side effect of making the tests brittle.
</p>
<p>
An alternative is to use property-based testing to verify that the components integrate correctly. Rather than something that looks like a proof, this is a question of numbers. Throw enough random test cases at the system, and you'll be confident that it works. How many? <a href="/2018/11/12/what-to-test-and-not-to-test">Enough</a>.
</p>
<p>
<strong>Next:</strong> <a href="/2023/03/13/confidence-from-facade-tests">Confidence from Facade Tests</a>.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="f30e0d110f6c42feb75100e08c78beab">
<div class="comment-author"><a href="https://github.com/srogovtsev">Sergei Rogovtcev</a> <a href="#f30e0d110f6c42feb75100e08c78beab">#</a></div>
<div class="comment-content">
<p>First of all, let me thank you for taking time and effort to discuss this.</p>
<p>There's a minor point about integration testing:</p>
<blockquote><p>[SR] strongly implies that he isn't keen on integration testing. While he doesn't explicitly state why...</p></blockquote>
<p>The situation is somewhat more complicated: in fact, I tend to have at least a few integration tests for a feature I'm involved with, starting the coverage from the happy paths (the minimum requirement being to verify that we've wired correctly as many components as can be verified), and then, if possible, extending to error paths, edge cases and so on. Even the code from my email originally had integration tests covering all the outcomes for a single rendering (browser). The problem that I've faced then, and which prompted my question, was exactly the one that you quote from J.B. Rainsberger: combinatorial explosion. As soon as I decided to cover a second rendering (mobile), I saw that I needed to replicate the setups for outcomes (success/fail/missing), but modify the asserts for their rendering. And then again the same for the native client. Unit tests, even with their ungainly break in encapsulation, gave the simple appeal of writing less code...</p>
<p>Hopefully, this seem to be the very same premise that you explore towards the end of your post, leading to the property-based testing - which I was trying to incorporate into my toolset for quite some time, but was always somewhat baffled at how it should work and integrate into object-oriented (and C#-based) code. So I'm very much looking forward for your next installment in this series.</p>
<p>And again, thank you for exploring these matters.</p>
</div>
<div class="comment-date">2023-02-21 13:52 UTC</div>
</div>
<div class="comment" id="73cb2b8d15bf4e10a9e20a89dbad4374">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#73cb2b8d15bf4e10a9e20a89dbad4374">#</a></div>
<div class="comment-content">
<p>
Sergei, thank you for writing. I hope that this small series of articles will be able to at least give you some ideas. I am, however, concerned that I may miss the mark.
</p>
<p>
When discussing problems like this, there's always a risk that the examples we look at are too simple; that they don't adequately represent the real world. For instance, we may look at the example code in the next few articles and calculate how well we've covered all combinations.
</p>
<p>
Perhaps we may find that the combinatorial 'explosion' is only in the ten-thousands, which is within reasonable reach of well-written properties.
</p>
<p>
Then, when we come back to our 'real' problems, the combinatorial explosion may be orders of magnitudes larger. You can easily ask a property-based framework to run a property millions of time, but it'll take time. Perhaps this makes the tests so slow that it's not a practical solution.
</p>
<p>
All that said, I think that not all is lost. Part of the solution, however, may be found elsewhere.
</p>
<p>
The more I learn about functional programming (FP), the more I'm amazed at the alternative mindset it offers. Solutions that look in one way in object-oriented programming (OOP) may look completely different in FP. You've probably noticed this yourself. Often, you have to turn a problem on its head to see it 'the FP way'.
</p>
<p>
The following is something that I've not yet thought through rigorously, so perhaps there are flaws in my thinking. I offer it for peer review here.
</p>
<p>
OOP composition tends to be 'deep'. If we think of object composition as a directed (acyclic, hopefully!) graph, typical OOP composition might resemble a graph where each node has only few children, but the distance from the root to each leaf is great. Since, every time you compose two objects, you have to multiply the number of pathways, this gives you this combinatorial explosion we've discussed. The deeper the graph, the worse it is.
</p>
<p>
In FP I typically find myself composing functions in a more shallow fashion. Instead of having functions that call other functions that call other functions, etc. I tend to have functions that return values that I then pass to other functions, and so on. This produces a shallower and wider composition graph. Doesn't it also reduce the combinations that we need to consider for testing?
</p>
<p>
I haven't subjected this idea to a more formal analysis yet, so this may be wrong. If I'm right, though, this could mean that property-based testing is still a viable solution to the problem.
</p>
<p>
Identifying useful properties is another problem that you also bring up, particularly in the context of OOP. So far, property-based testing is more prevalent in FP, and perhaps there's a reason for that.
</p>
<p>
It seems to me that there's a connection between property-based testing and encapsulation. Essentially, a property is an executable description of some invariant, or pre- or post-condition. Most real-world object-oriented code I've seen, however, isn't encapsulated. If you have poor encapsulation, it's no wonder that it's hard to identify useful properties.
</p>
<p>
Even so, Identifying good properties is a skill that you have to learn. It's fairly easy to construct properties that, in a sense, 'reproduce the implementation'. The challenge is to avoid that, and that's not always easy. As an example, it took me years before I found <a href="/2021/06/28/property-based-testing-is-not-the-same-as-partition-testing">a good way to express properties of FizzBuzz without repeating the implementation</a>.
</p>
</div>
<div class="comment-date">2023-02-22 8:00 UTC</div>
</div>
<div class="comment" id="6a9753ec6d54462c9d500112a55105b6">
<div class="comment-author"><a href="https://github.com/srogovtsev">Sergei Rogovtcev</a> <a href="#6a9753ec6d54462c9d500112a55105b6">#</a></div>
<div class="comment-content">
<blockquote><p>This produces a shallower and wider composition graph. Doesn't it also reduce the combinations that we need to consider for testing?</p></blockquote>
<p>Intuitively I'd say that it shouldn't (reduce), because in the end the number of combinations that we consider for testing is the number states our SUT can be in, which is defined as a combination of all its inputs. But I may, of course, miss something important here.</p>
<p>My own opinion on this, coming from a short-ish brush with FP, is that FP, or, more precisely, more expressive type systems, reduce the number of combinations by reducing the number of possible inputs by the virtue of more expressive types. My favorite example is that even less expressive type system, one with simple <em>int</em> and <em>string</em> instead of all-encompassing <em>var</em>/<em>object</em>, allows us to get rid off all the tests where we pass "foo" to a function that only works on numbers. Explicit nullability gets rid of all the <em>null</em>-related test-cases (and we get an indication where we lack such cases for <em>null</em>-accepting functions). This can be continued by adding more and more cases until we arrive at the (in)famous "if it compiles, is works".</p>
<p>I don't remember whether I've included this guard case in my original email, but I definitely remember thinking of mentioning that I'm confined to a less-expressive type system of C#. Even comparing to F# (as I remember it from my side studies), I can see how some tests can be made redundant by, for example, introducing a sum type and then relying on compiler to check for exhaustive match. Sometimes I wonder what would a more expressive type system do to these problems...</p>
</div>
<div class="comment-date">2023-02-22 14:40 UTC</div>
</div>
<div class="comment" id="9fa9a41fd827458aa0a87d854e5e8228">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#9fa9a41fd827458aa0a87d854e5e8228">#</a></div>
<div class="comment-content">
<p>
Sergei, thank you for writing. A more expressive type system certainly does reduce the amount of testing required. While I prefer <a href="https://fsharp.org/">F#</a>, the good news is that most of what F# can do, C# can do, too. Everything is just more verbose in C#. The main stumbling block that people usually complain about is the lack of <a href="https://en.wikipedia.org/wiki/Tagged_union">sum types</a>, but you can use <a href="/2018/06/25/visitor-as-a-sum-type">Visitors as sum types</a>. You get the same benefits as with F# discriminated unions, except with much more <a href="/2019/12/16/zone-of-ceremony">ceremony</a>.
</p>
</div>
<div class="comment-date">2023-02-25 17:50 UTC</div>
</div>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Contravariant functors as invariant functorshttps://blog.ploeh.dk/2023/02/06/contravariant-functors-as-invariant-functors2023-02-06T06:42:00+00:00Mark Seemann
<div id="post">
<p>
<em>Another most likely useless set of invariant functors that nonetheless exist.</em>
</p>
<p>
This article is part of <a href="/2022/08/01/invariant-functors">a series of articles about invariant functors</a>. An invariant functor is a <a href="/2018/03/22/functors">functor</a> that is neither covariant nor contravariant. See the series introduction for more details.
</p>
<p>
It turns out that all <a href="/2021/09/02/contravariant-functors">contravariant functors</a> are also invariant functors.
</p>
<p>
Is this useful? Let me, like in <a href="/2022/12/26/functors-as-invariant-functors">the previous article</a>, be honest and say that if it is, I'm not aware of it. Thus, if you're interested in practical applications, you can stop reading here. This article contains nothing of practical use - as far as I can tell.
</p>
<h3 id="fda7d35329c74cf9ab9753d0b25d1f08">
Because it's there <a href="#fda7d35329c74cf9ab9753d0b25d1f08" title="permalink">#</a>
</h3>
<p>
Why describe something of no practical use?
</p>
<p>
Why do some people climb <a href="https://en.wikipedia.org/wiki/Mount_Everest">Mount Everest</a>? <em>Because it's there</em>, or for other irrational reasons. Which is fine. I've no personal goals that involve climbing mountains, but <a href="/2020/10/12/subjectivity">I happily engage in other irrational and subjective activities</a>.
</p>
<p>
One of them, apparently, is to write articles of software constructs of no practical use, <em>because it's there</em>.
</p>
<p>
All contravariant functors are also invariant functors, even if that's of no practical use. That's just the way it is. This article explains how, and shows a few (useless) examples.
</p>
<p>
I'll start with a few <a href="https://www.haskell.org/">Haskell</a> examples and then move on to showing the equivalent examples in C#. If you're unfamiliar with Haskell, you can skip that section.
</p>
<h3 id="61d5b82994f941db98cb933e6311d396">
Haskell package <a href="#61d5b82994f941db98cb933e6311d396" title="permalink">#</a>
</h3>
<p>
For Haskell you can find an existing definition and implementations in the <a href="https://hackage.haskell.org/package/invariant">invariant</a> package. It already makes most 'common' contravariant functors <code>Invariant</code> instances, including <code>Predicate</code>, <code>Comparison</code>, and <code>Equivalence</code>. Here's an example of using <code>invmap</code> with a predicate.
</p>
<p>
First, we need a predicate. Consider a function that evaluates whether a number is divisible by three:
</p>
<p>
<pre>isDivisbleBy3 :: Integral a => a -> Bool
isDivisbleBy3 = (0 ==) . (`mod` 3)</pre>
</p>
<p>
While this is already <a href="/2021/09/09/the-specification-contravariant-functor">conceptually a contravariant functor</a>, in order to make it an <code>Invariant</code> instance, we have to enclose it in the <code>Predicate</code> wrapper:
</p>
<p>
<pre>ghci> :t Predicate isDivisbleBy3
Predicate isDivisbleBy3 :: Integral a => Predicate a</pre>
</p>
<p>
This is a predicate of some kind of integer. What if we wanted to know if a given duration represented a number of picoseconds divisible by three? Silly example, I know, but in order to demonstrate invariant mapping, we need types that are isomorphic, and <a href="https://hackage.haskell.org/package/time/docs/Data-Time-Clock.html#t:NominalDiffTime">NominalDiffTime</a> is isomorphic to a number of picoseconds via its <code>Enum</code> instance.
</p>
<p>
<pre>p :: Enum a => Predicate a
p = invmap toEnum fromEnum $ Predicate isDivisbleBy3</pre>
</p>
<p>
In other words, it's possible to map the <code>Integral</code> predicate to an <code>Enum</code> predicate, and since <code>NominalDiffTime</code> is an <code>Enum</code> instance, you can now evaluate various durations:
</p>
<p>
<pre>ghci> (getPredicate p) $ secondsToNominalDiffTime 60
True
ghci> (getPredicate p) $ secondsToNominalDiffTime 61
False</pre>
</p>
<p>
This is, as I've already announced, hardly useful, but it's still possible. Unless you have an API that <em>requires</em> an <code>Invariant</code> instance, it's also redundant, because you could just have used <code>contramap</code> with the predicate:
</p>
<p>
<pre>ghci> (getPredicate $ contramap fromEnum $ Predicate isDivisbleBy3) $ secondsToNominalDiffTime 60
True
ghci> (getPredicate $ contramap fromEnum $ Predicate isDivisbleBy3) $ secondsToNominalDiffTime 61
False</pre>
</p>
<p>
When mapping a contravariant functor, only the contravariant mapping argument is required. The <code>Invariant</code> instances for <code>Contravariant</code> simply ignores the covariant mapping argument.
</p>
<h3 id="5a36c7d729f54ce98f21cd714f050d55">
Specification as an invariant functor in C# <a href="#5a36c7d729f54ce98f21cd714f050d55" title="permalink">#</a>
</h3>
<p>
My earlier article <a href="/2021/09/09/the-specification-contravariant-functor">The Specification contravariant functor</a> takes a more object-oriented view on predicates by examining the <a href="https://en.wikipedia.org/wiki/Specification_pattern">Specification pattern</a>.
</p>
<p>
As outlined in <a href="/2022/08/01/invariant-functors">the introduction</a>, while it's possible to add a method called <code>InvMap</code>, it'd be more <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> to add a non-standard <code>Select</code> method:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> ISpecification<T1> Select<<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">T1</span>>(
<span style="color:blue;">this</span> ISpecification<T> source,
Func<T, T1> tToT1,
Func<T1, T> t1ToT)
{
<span style="color:blue;">return</span> source.ContraMap(t1ToT);
}</pre>
</p>
<p>
This implementation ignores <code>tToT1</code> and delegates to the existing <code>ContraMap</code> method.
</p>
<p>
Here's a unit test that demonstrates an example equivalent to the above Haskell example:
</p>
<p>
<pre>[Theory]
[InlineData(60, <span style="color:blue;">true</span>)]
[InlineData(61, <span style="color:blue;">false</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> InvariantMappingExample(<span style="color:blue;">long</span> seconds, <span style="color:blue;">bool</span> expected)
{
ISpecification<<span style="color:blue;">long</span>> spec = <span style="color:blue;">new</span> IsDivisibleBy3Specification();
ISpecification<TimeSpan> mappedSpec =
spec.Select(ticks => <span style="color:blue;">new</span> TimeSpan(ticks), ts => ts.Ticks);
Assert.Equal(
expected,
mappedSpec.IsSatisfiedBy(TimeSpan.FromSeconds(seconds)));
}</pre>
</p>
<p>
Again, while this is hardly useful, it's possible.
</p>
<h3 id="d016726d91834c73998a7615aeab6c3a">
Conclusion <a href="#d016726d91834c73998a7615aeab6c3a" title="permalink">#</a>
</h3>
<p>
All contravariant functors are invariant functors. You simply use the 'normal' contravariant mapping function (<code>contramap</code> in Haskell). This enables you to add an invariant mapping (<code>invmap</code>) that only uses the contravariant argument (<code>b -> a</code>) and ignores the covariant argument (<code>a -> b</code>).
</p>
<p>
Invariant functors are, however, not particularly useful, so neither is this result. Still, it's there, so deserves a mention. Enough of that, though.
</p>
<p>
<strong>Next:</strong> <a href="/2022/03/28/monads">Monads</a>.
</p>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Built-in alternatives to applicative assertionshttps://blog.ploeh.dk/2023/01/30/built-in-alternatives-to-applicative-assertions2023-01-30T08:08:00+00:00Mark Seemann
<div id="post">
<p>
<em>Why make things so complicated?</em>
</p>
<p>
Several readers reacted to my small article series on <a href="/2022/11/07/applicative-assertions">applicative assertions</a>, pointing out that error-collecting assertions are already supported in more than one unit-testing framework.
</p>
<blockquote>
<p>
"In the Java world this seems similar to the result gained by Soft Assertions in AssertJ. <a href="https://assertj.github.io/doc/#assertj-core-soft-assertions">https://assertj.github.io/doc/#assertj-c...</a> if you’re after a target for functionality (without the adventures through monad land)"
</p>
<footer><cite><a href="https://twitter.com/joshuamck/status/1597190184134590464">Josh McK</a></cite></footer>
</blockquote>
<p>
While I'm not familiar with the details of Java unit-testing frameworks, the situation is similar in .NET, it turns out.
</p>
<blockquote>
<p>
"Did you know there is Assert.Multiple in NUnit and now also in xUnit .Net? It seems to have quite an overlap with what you're doing here.
</p>
<p>
"For a quick overview, I found this blogpost helpful: <a href="https://www.thomasbogholm.net/2021/11/25/xunit-2-4-2-pre-multiple-asserts-in-one-test/">https://www.thomasbogholm.net/2021/11/25/xunit-2-4-2-pre-multiple-asserts-in-one-test/</a>"
</p>
<footer><cite><a href="https://twitter.com/DoCh_Dev/status/1597158737357459456">DoCh_Dev</a></cite></footer>
</blockquote>
<p>
I'm not surprised to learn that something like this exists, but let's take a quick look.
</p>
<h3 id="d7c0c78093084c08aa22ccaa7b86cb8a">
NUnit Assert.Multiple <a href="#d7c0c78093084c08aa22ccaa7b86cb8a" title="permalink">#</a>
</h3>
<p>
Let's begin with <a href="https://nunit.org/">NUnit</a>, as this seems to be the first .NET unit-testing framework to support error-collecting assertions. As a beginning, the <a href="https://docs.nunit.org/articles/nunit/writing-tests/assertions/multiple-asserts.html">documentation example</a> works as it's supposed to:
</p>
<p>
<pre>[Test]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> ComplexNumberTest()
{
ComplexNumber result = SomeCalculation();
Assert.Multiple(() =>
{
Assert.AreEqual(5.2, result.RealPart, <span style="color:#a31515;">"Real part"</span>);
Assert.AreEqual(3.9, result.ImaginaryPart, <span style="color:#a31515;">"Imaginary part"</span>);
});
}</pre>
</p>
<p>
When you run the test, it fails (as expected) with this error message:
</p>
<p>
<pre>Message:
Multiple failures or warnings in test:
1) Real part
Expected: 5.2000000000000002d
But was: 5.0999999999999996d
2) Imaginary part
Expected: 3.8999999999999999d
But was: 4.0d</pre>
</p>
<p>
That seems to work well enough, but how does it actually work? I'm not interested in reading the NUnit source code - after all, the concept of <a href="/encapsulation-and-solid">encapsulation</a> is that one should be able to make use of the capabilities of an object without knowing all implementation details. Instead, I'll guess: Perhaps <code>Assert.Multiple</code> executes the code block in a <code>try/catch</code> block and collects the various exceptions thrown by the nested assertions.
</p>
<p>
Does it catch all exception types, or only a subset?
</p>
<p>
Let's try with the kind of composed assertion that I <a href="/2022/11/28/an-initial-proof-of-concept-of-applicative-assertions-in-c">previously investigated</a>:
</p>
<p>
<pre>[Test]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> HttpExample()
{
<span style="color:blue;">var</span> deleteResp = <span style="color:blue;">new</span> HttpResponseMessage(HttpStatusCode.BadRequest);
<span style="color:blue;">var</span> getResp = <span style="color:blue;">new</span> HttpResponseMessage(HttpStatusCode.OK);
Assert.Multiple(() =>
{
deleteResp.EnsureSuccessStatusCode();
Assert.That(getResp.StatusCode, Is.EqualTo(HttpStatusCode.NotFound));
});
}</pre>
</p>
<p>
This test fails (again, as expected). What's the error message?
</p>
<p>
<pre>Message:
System.Net.Http.HttpRequestException :↩
Response status code does not indicate success: 400 (Bad Request).</pre>
</p>
<p>
(I've wrapped the result over multiple lines for readability. The <code>↩</code> symbol indicates where I've wrapped the text. I'll do that again later in this article.)
</p>
<p>
Notice that I'm using <a href="/2020/09/28/ensuresuccessstatuscode-as-an-assertion">EnsureSuccessStatusCode as an assertion</a>. This seems to spoil the behaviour of <code>Assert.Multiple</code>. It only reports the first status code error, but not the second one.
</p>
<p>
I admit that I don't fully understand what's going on here. In fact, I <em>have</em> taken a cursory glance at the relevant NUnit source code without being enlightened.
</p>
<p>
One hypothesis might be that NUnit assertions throw special <code>Exception</code> sub-types that <code>Assert.Multiple</code> catch. In order to test that, I wrote a few more tests in <a href="https://fsharp.org/">F#</a> with <a href="http://www.swensensoftware.com/unquote/">Unquote</a>, assuming that, since Unquote hardly throws NUnit exceptions, the behaviour might be similar to above.
</p>
<p>
<pre>[<Test>]
<span style="color:blue;">let</span> Test4 () =
<span style="color:blue;">let</span> x = 1
<span style="color:blue;">let</span> y = 2
<span style="color:blue;">let</span> z = 3
Assert.Multiple (<span style="color:blue;">fun</span> () <span style="color:blue;">-></span>
x =! y
y =! z)</pre>
</p>
<p>
The <code>=!</code> operator is an Unquote operator that I usually read as <em>must equal</em>. How does that error message look?
</p>
<p>
<pre>Message:
Multiple failures or warnings in test:
1)
1 = 2
false
2)
2 = 3
false</pre>
</p>
<p>
Somehow, <code>Assert.Multiple</code> understands Unquote error messages, but not <code>HttpRequestException</code>. As I wrote, I don't fully understand why it behaves this way. To a degree, I'm intellectually curious enough that I'd like to know. On the other hand, from a maintainability perspective, as a user of NUnit, I shouldn't have to understand such details.
</p>
<h3 id="1f28e534b0f94e93bb12cbb951fa663f">
xUnit.net Assert.Multiple <a href="#1f28e534b0f94e93bb12cbb951fa663f" title="permalink">#</a>
</h3>
<p>
How fares the <a href="https://xunit.net/">xUnit.net</a> port of <code>Assert.Multiple</code>?
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> HttpExample()
{
<span style="color:blue;">var</span> deleteResp = <span style="color:blue;">new</span> HttpResponseMessage(HttpStatusCode.BadRequest);
<span style="color:blue;">var</span> getResp = <span style="color:blue;">new</span> HttpResponseMessage(HttpStatusCode.OK);
Assert.Multiple(
() => deleteResp.EnsureSuccessStatusCode(),
() => Assert.Equal(HttpStatusCode.NotFound, getResp.StatusCode));
}</pre>
</p>
<p>
The API is, you'll notice, not quite identical. Where the NUnit <code>Assert.Multiple</code> method takes a single delegate as input, the xUnit.net method takes an array of actions. The difference is not only at the level of API; the behaviour is different, too:
</p>
<p>
<pre>Message:
Multiple failures were encountered:
---- System.Net.Http.HttpRequestException :↩
Response status code does not indicate success: 400 (Bad Request).
---- Assert.Equal() Failure
Expected: NotFound
Actual: OK</pre>
</p>
<p>
This error message reports both problems, as we'd like it to do.
</p>
<p>
I also tried writing equivalent tests in F#, with and without Unquote, and they behave consistently with this result.
</p>
<p>
If I had to use something like <code>Assert.Multiple</code>, I'd trust the xUnit.net variant more than NUnit's implementation.
</p>
<h3 id="c51fb932618c4464a2e9be05869005d4">
Assertion scopes <a href="#c51fb932618c4464a2e9be05869005d4" title="permalink">#</a>
</h3>
<p>
Apparently, <a href="https://fluentassertions.com/">Fluent Assertions</a> offers yet another alternative.
</p>
<blockquote>
<p>
"Hey @ploeh, been reading your applicative assertion series. I recently discovered Assertion Scopes, so I'm wondering what is your take on them since it seems to me they are solving this problem in C# already. <a href="https://fluentassertions.com/introduction#assertion-scopes">https://fluentassertions.com/introduction#assertion-scopes</a>"
</p>
<footer><cite><a href="https://twitter.com/JernejGoricki/status/1597704973839904768">Jernej Gorički</a></cite></footer>
</blockquote>
<p>
The linked documentation contains this example:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> DocExample()
{
<span style="color:blue;">using</span> (<span style="color:blue;">new</span> AssertionScope())
{
5.Should().Be(10);
<span style="color:#a31515;">"Actual"</span>.Should().Be(<span style="color:#a31515;">"Expected"</span>);
}
}</pre>
</p>
<p>
It fails in the expected manner:
</p>
<p>
<pre>Message:
Expected value to be 10, but found 5 (difference of -5).
Expected string to be "Expected" with a length of 8, but "Actual" has a length of 6,↩
differs near "Act" (index 0).</pre>
</p>
<p>
How does it fare when subjected to the <code>EnsureSuccessStatusCode</code> test?
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> HttpExample()
{
<span style="color:blue;">var</span> deleteResp = <span style="color:blue;">new</span> HttpResponseMessage(HttpStatusCode.BadRequest);
<span style="color:blue;">var</span> getResp = <span style="color:blue;">new</span> HttpResponseMessage(HttpStatusCode.OK);
<span style="color:blue;">using</span> (<span style="color:blue;">new</span> AssertionScope())
{
deleteResp.EnsureSuccessStatusCode();
getResp.StatusCode.Should().Be(HttpStatusCode.NotFound);
}
}</pre>
</p>
<p>
That test produces this error output:
</p>
<p>
<pre>Message:
System.Net.Http.HttpRequestException :↩
Response status code does not indicate success: 400 (Bad Request).</pre>
</p>
<p>
Again, <code>EnsureSuccessStatusCode</code> prevents further assertions from being evaluated. I can't say that I'm that surprised.
</p>
<h3 id="2f325809027243889d703e927a115efc">
Implicit or explicit <a href="#2f325809027243889d703e927a115efc" title="permalink">#</a>
</h3>
<p>
You might protest that using <code>EnsureSuccessStatusCode</code> and treating the resulting <code>HttpRequestException</code> as an assertion is unfair and unrealistic. Possibly. As usual, such considerations are subject to a multitude of considerations, and there's no one-size-fits-all answer.
</p>
<p>
My intent with this article isn't to attack or belittle the APIs I've examined. Rather, I wanted to explore their boundaries by stress-testing them. That's one way to gain a better understanding. Being aware of an API's limitations and quirks can prevent subtle bugs.
</p>
<p>
Even if you'd <em>never</em> use <code>EnsureSuccessStatusCode</code> as an assertion, perhaps you or a colleague might inadvertently do something to the same effect.
</p>
<p>
I'm not surprised that both NUnit's <code>Assert.Multiple</code> and Fluent Assertions' <code>AssertionScope</code> behaves in a less consistent manner than xUnit.net's <code>Assert.Multiple</code>. The clue is in the API.
</p>
<p>
The xUnit.net API looks like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">void</span> <span style="color:#74531f;">Multiple</span>(<span style="color:blue;">params</span> Action[] <span style="color:#1f377f;">checks</span>)</pre>
</p>
<p>
Notice that each assertion is explicitly a separate action. This enables the implementation to isolate it and treat it independently of other actions.
</p>
<p>
Neither the NUnit nor the Fluent Assertions API is that explicit. Instead, you can write arbitrary code inside the 'scope' of multiple assertions. For <code>AssertionScope</code>, the notion of a 'scope' is plain to see. For the NUnit API it's more implicit, but the scope is effectively the extent of the method:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">void</span> <span style="color:#74531f;">Multiple</span>(TestDelegate <span style="color:#1f377f;">testDelegate</span>)</pre>
</p>
<p>
That <code>testDelegate</code> can have as many (nested, even) assertions as you'd like, so the <code>Multiple</code> implementation needs to somehow demarcate when it begins and when it ends.
</p>
<p>
The <code>testDelegate</code> can be implemented in a different file, or even in a different library, and it has no way to communicate or coordinate with its surrounding scope. This reminds me of an Ambient Context, an idiom that <a href="/2019/01/21/some-thoughts-on-anti-patterns">Steven van Deursen convinced me was an anti-pattern</a>. The surrounding context changes the behaviour of the code block it surrounds, and it's quite implicit.
</p>
<blockquote>
<p>
Explicit is better than implicit.
</p>
<footer><cite>Tim Peters, <a href="https://peps.python.org/pep-0020/">The Zen of Python</a></cite></footer>
</blockquote>
<p>
The xUnit.net API, at least, looks a bit saner. Still, this kind of API is quirky enough that it reminds me of <a href="https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule">Greenspun's tenth rule</a>; that these APIs are ad-hoc, informally-specified, bug-ridden, slow implementations of half of <a href="/2018/10/01/applicative-functors">applicative functors</a>.
</p>
<h3 id="95b723f05a7f4b4c88b5e716f5f82989">
Conclusion <a href="#95b723f05a7f4b4c88b5e716f5f82989" title="permalink">#</a>
</h3>
<p>
Not surprisingly, popular unit-testing and assertion libraries come with facilities to compose assertions. Also, not surprisingly, these APIs are crude and require you to learn their implementation details.
</p>
<p>
Would I use them if I had to? I probably would. As <a href="https://www.infoq.com/presentations/Simple-Made-Easy/">Rich Hickey put it</a>, they're already <em>at hand</em>. That makes them easy, but not necessarily simple. APIs that compel you to learn their internal implementation details aren't simple.
</p>
<p>
<a href="/2017/10/04/from-design-patterns-to-category-theory">Universal abstractions</a>, on the other hand, you only have to learn one time. Once you understand what an applicative functor is, you know what to expect from it, and which capabilities it has.
</p>
<p>
In languages with good support for applicative functors, I would favour an assertion API based on that abstraction, if given a choice. At the moment, though, that's not much of an option. Even <a href="https://hackage.haskell.org/package/HUnit/docs/Test-HUnit-Base.html#t:Assertion">HUnit assertions</a> are based on side effects.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="e3269279066146f985c8405f6d3ad286">
<div class="comment-author">Joker_vD <a href="#e3269279066146f985c8405f6d3ad286">#</a></div>
<div class="comment-content">
<p>
Just a reminder: in .NET, method's execution <b>cannot</b> be resumed after an exception is thrown, there is just simply no
way to do this, at all.
Which means that NUnit's Assert.Multiple absolutely cannot work the way you guess it probably does, by running the delegate and
resuming its execution after it throws an exception until the delegate returns.
</p>
<p>
How could it work then? Well, considering that documentation to almost every Assert's method has
"Returns without throwing an exception when inside a multiple assert block" line in it,
I would assume that Assert.Multiple sets a global flag which makes actual assertions to store the failures in some global hidden
context instead on throwing them, then runs the delegate and after it finishes or throws, collects and clears all those failures
from the context and resets the global flag.
</p>
<p>
Cursory inspection of NUnit's source code supports this idea, except that apparently it's not just a boolean flag but a
"depth" counter; and assertions report the failures
<a href="https://github.com/nunit/nunit/blob/62059054137de84b711353765d474779db95f731/src/NUnitFramework/framework/Assert.cs#L371">just the way I've speculated</a>.
I personally hate such side-channels but you have to admit, they allow for some nifty, seemingly impossible magical tricks
(a.k.a. "spooky action at the distance").
</p>
<p>
Also, why do you assume that Unquote would not throw NUnit's assertions?
It literally has "Unquote integrates configuration-free with all exception-based unit testing frameworks including xUnit.net,
NUnit, MbUnit, Fuchu, and MSTest" in its README, and indeed, if you look at
<a href="https://github.com/SwensenSoftware/unquote/blob/78b071043c42372f3693a07e5562520046873ebc/src/Unquote/Assertions.fs">its source code</a>,
you'll see that at runtime it tries to locate any testing framework it's aware of and use its assertions.
More funny party tricks, this time with reflection!
</p>
<p>
I understand that after working in more pure/functional programming environments one does start to slowly forget about
those terrible things, but: those horrorterrors <i>still</i> exist, and people <i>keep making</i> more of them.
Now, if you can, have a good night :)
</p>
</div>
<div class="comment-date">2023-01-31 03:00 UTC</div>
</div>
<div class="comment" id="e0e7c5b258d54c30b87f157e8746150d">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#e0e7c5b258d54c30b87f157e8746150d">#</a></div>
<div class="comment-content">
<p>
Joker_vD, thank you for explaining those details. I admit that I hadn't thought too deeply about implementation details, for the reasons I briefly mentioned in the post.
</p>
<blockquote>
<p>
"I understand that after working in more pure/functional programming environments one does start to slowly forget about those terrible things"
</p>
</blockquote>
<p>
Yes, that summarises my current thinking well, I'm afraid.
</p>
</div>
<div class="comment-date">2023-01-30 6:49 UTC</div>
</div>
<div class="comment" id="821541be129a4ea7976ab33f71d3637a">
<div class="comment-author"><a href="https://github.com/MaxKot">Max Kiselev</a> <a href="#821541be129a4ea7976ab33f71d3637a">#</a></div>
<div class="comment-content">
<p>
NUnit has <a href="https://docs.nunit.org/articles/nunit/writing-tests/assertions/classic-assertions/Assert.DoesNotThrow.html">Assert.DoesNotThrow</a>
and Fluent Assertions has <a href="https://fluentassertions.com/exceptions/">.Should().NotThrow()</a>. I did not check Fluent Assertions, but NUnit
does gather failures of Assert.DoesNotThrow inside Assert.Multiple into a multi-error report. One might argue that asserting that a delegate should not
throw is another application of the "explicit is better than implicit" philosophy. Here's what Fluent Assertions has to say on that matter:
</p>
<blockquote>
<p>
"We know that a unit test will fail anyhow if an exception was thrown, but this syntax returns a clearer description of the exception that was thrown
and fits better to the AAA syntax."
</p>
</blockquote>
<p>
As a side note, you might also want to take a look on NUnits Assert.That syntax. It allows to construct complex conditions tested against a single
actual value:
</p>
<p>
<pre style="font-family:Consolas;font-size:13px;color:black;background:white;"><span style="color:blue;">int</span> <span style="color:#1f377f;">actual</span> = 3;
Assert.That (actual, Is.GreaterThan (0).And.LessThanOrEqualTo (2).And.Matches (Has.Property (<span style="color:#a31515;">"P"</span>).EqualTo (<span style="color:#a31515;">"a"</span>)));</pre>
</p>
<p>
A failure is then reported like this:
</p>
<p>
<pre>Expected: greater than 0 and less than or equal to 2 and property P equal to "a"
But was: 3</pre>
</p>
</div>
<div class="comment-date">2023-01-31 18:35 UTC</div>
</div>
<div class="comment" id="815b4f0e18284dccb3ce38dbb476eb4d">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#815b4f0e18284dccb3ce38dbb476eb4d">#</a></div>
<div class="comment-content">
<p>
Max, thank you for writing. I have to admit that I never understood the point of <a href="https://docs.nunit.org/articles/nunit/writing-tests/assertions/assertion-models/constraint.html">NUnit's constraint model</a>, but your example clearly illustrates how it may be useful. It enables you to compose assertions.
</p>
<p>
It's interesting to try to understand the underlying reason for that. I took a cursory glance at that <code>IResolveConstraint</code> API, and as far as I can tell, it may form a <a href="/2017/10/06/monoids">monoid</a> (I'm not entirely sure about the <code>ConstraintStatus</code> enum, but even so, it may be 'close enough' to be composable).
</p>
<p>
I can see how that may be useful when making assertions against complex objects (i.e. object composed from other objects).
</p>
<p>
In xUnit.net you'd typically address that problem with custom <a href="https://learn.microsoft.com/dotnet/api/system.collections.generic.iequalitycomparer-1">IEqualityComparers</a>. This is more verbose, but also strikes me as more reusable. One disadvantage of that approach, however, is that when tests fail, the assertion message is typically useless.
</p>
<p>
This is the reason I favour Unquote: Instead of inventing a Boolean algebra(?) from scratch, it uses the existing language and still gives you good error messages. Alas, that only works in F#.
</p>
<p>
In general, though, I'm inclined to think that all of these APIs address symptoms rather than solve real problems. Granted, they're useful whenever you need to make assertions against values that you don't control, but for your own APIs, <a href="/2021/05/03/structural-equality-for-better-tests">a simpler solution is to model values as immutable data with structural equality</a>.
</p>
<p>
Another question is whether aiming for clear assertion messages is optimising for the right concern. At least with TDD, <a href="/2022/12/12/when-do-tests-fail">I don't think that it is</a>.
</p>
</div>
<div class="comment-date">2023-02-02 7:53 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Agileanhttps://blog.ploeh.dk/2023/01/23/agilean2023-01-23T07:55:00+00:00Mark Seemann
<div id="post">
<p>
<em>There are other agile methodologies than scrum.</em>
</p>
<p>
More than twenty years after <a href="https://agilemanifesto.org/">the Agile Manifesto</a> it looks as though there's only one kind of agile process left: <a href="https://en.wikipedia.org/wiki/Scrum_(software_development)">Scrum</a>.
</p>
<p>
I recently held a workshop and as a side remark I mentioned that I don't consider scrum the best development process. This surprised some attendees, who politely inquired about my reasoning.
</p>
<h3 id="8837c4f67d694f93ad0b708cc1739705">
My experience with scrum <a href="#8837c4f67d694f93ad0b708cc1739705" title="permalink">#</a>
</h3>
<p>
The first nine years I worked as a professional programmer, the companies I worked in used various <a href="https://en.wikipedia.org/wiki/Waterfall_model">waterfall</a> processes. When I joined the Microsoft Dynamics Mobile team in 2008 they were already using scrum. That was my first exposure to it, and I liked it. Looking back on it today, we weren't particular dogmatic about the process, being more interested in getting things done.
</p>
<p>
One telling fact is that we took turns being Scrum Master. Every sprint we'd rotate that role.
</p>
<p>
We did test-driven development, and had two-week sprints. This being a Microsoft development organisation, we had a dedicated build master, tech writers, specialised testers, and security reviews.
</p>
<p>
I liked it. It's easily one of the most professional software organisations I've worked in. I think it was a good place to work for many reasons. Scrum may have been a contributing factor, but hardly the only reason.
</p>
<p>
I have no issues with scrum as we practised it then. I recall later attending a presentation by <a href="https://en.wikipedia.org/wiki/Mike_Cohn">Mike Cohn</a> where he outlined four quadrants of team maturity. You'd start with scrum, but use retrospectives to evaluate what worked and what didn't. Then you'd adjust. A mature, self-organising team would arrive at its own process, perhaps initiated with scrum, but now having little resemblance with it.
</p>
<p>
I like scrum when viewed like that. When it becomes rigid and empty ceremony, I don't. If all you do is daily stand-ups, sprints, and backlogs, you may be doing scrum, but probably not agile.
</p>
<h3 id="1bb8e521e78341768ca7f5942f3ace4a">
Continuous deployment <a href="#1bb8e521e78341768ca7f5942f3ace4a" title="permalink">#</a>
</h3>
<p>
After Microsoft I joined a startup so small that formal process was unnecessary. Around that time I also became interested in <a href="https://en.wikipedia.org/wiki/Lean_software_development">lean software development</a>. In the beginning, I learned a lot from <a href="https://www.linkedin.com/in/martin-jul-39a12/">Martin Jul</a> who seemed to use the now-defunct <a href="https://ative.dk/">Ative</a> blog as a public notepad as he was reading works of <a href="https://en.wikipedia.org/wiki/W._Edwards_Deming">Deming</a>. I suppose, if you want a more canonical introduction to the topic, that you might start with one of <a href="http://www.poppendieck.com/">the Poppendiecks'</a> books, but since I've only read <a href="/ref/implementing-lean">Implementing Lean Software Development</a>, that's the only one I can recommend.
</p>
<p>
Around 2014 I returned to a regular customer. The team had, in my absence, been busy implementing <a href="https://en.wikipedia.org/wiki/Continuous_deployment">continuous deployment</a>. Instead of artificial periods like 'sprints' we had a <a href="https://en.wikipedia.org/wiki/Kanban_board">kanban board</a> to keep track of our work. We used a variation of <a href="https://en.wikipedia.org/wiki/Feature_toggle">feature flags</a> and marked features as done when they were complete and in production.
</p>
<p>
Why wait until <em>next</em> Friday if the feature is <em>done, done</em> on a Wednesday? Why wait until the <em>next</em> Monday to identify what to work on next, if you're ready to take on new work on a Thursday? Why not move towards <em>one-piece flow?</em>
</p>
<p>
An effective self-organising team typically already knows what it's doing. Much process is introduced in order to give external stakeholders visibility into what a team is doing.
</p>
<p>
I found, in that organisation, that continuous deployment eliminated most of that need. At one time I asked a stakeholder what he thought of the feature I'd deployed a week before - a feature that <em>he had requested</em>. He replied that he hadn't had time to look at it yet.
</p>
<p>
The usual inquires about status (<em>Is it done yet? When is it done?</em>) were gone. The team moved faster than the stakeholders could keep up. That also gave us <a href="/2022/09/19/when-to-refactor">enough slack to keep the code base in good order</a>. We also used test-driven development throughout (TDD).
</p>
<p>
TDD with continuous deployment and a kanban board strikes me as congenial with the ideas of lean software development, but that's not all.
</p>
<h3 id="9f5e725164f54d32b5d56a28d3b1619e">
Stop-the-line issues <a href="#9f5e725164f54d32b5d56a28d3b1619e" title="permalink">#</a>
</h3>
<p>
An <a href="https://en.wikipedia.org/wiki/Andon_(manufacturing)">andon cord</a> is a central concept in <a href="https://en.wikipedia.org/wiki/Lean_manufacturing">lean manufactoring</a>. If a worker (or anyone, really) discovers a problem during production, he or she pulls the andon cord and <em>stops the production line</em>. Then everyone investigates and determines what to do about the problem. Errors are not allowed to accumulate.
</p>
<p>
I think that I've internalised this notion to such a degree that I only recently connected it to lean software development.
</p>
<p>
In <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, I recommend turning compiler warnings into errors at the beginning of a code base. Don't allow warnings to pile up. Do the same with static code analysis and linters.
</p>
<p>
When discussing software engineering with developers, I'm beginning to realise that this runs even deeper.
</p>
<ul>
<li>Turn warnings into errors. Don't allow warnings to accumulate.</li>
<li>The correct number of unhandled exceptions in production is zero. If you observe an unhandled exception in your production logs, fix it. Don't let them accumulate.</li>
<li>The correct number of known bugs is zero. Don't let bugs accumulate.</li>
</ul>
<p>
If you're used to working on a code base with hundreds of known bugs, and frequent exceptions in production, this may sound unrealistic. If you deal with issues as soon as they arise, however, this is not only possible - it's faster.
</p>
<p>
In lean software development, bugs are stop-the-line issues. When something unexpected happens, you stop what you're doing and make fixing the problem the top priority. You build quality in.
</p>
<p>
This has been my modus operandi for years, but I only recently connected the dots to realise that this is a typical lean practice. I may have picked it up from there. Or perhaps it's just common sense.
</p>
<h3 id="e2513b8e32b148cbab8dbd6b75c67348">
Conclusion <a href="#e2513b8e32b148cbab8dbd6b75c67348" title="permalink">#</a>
</h3>
<p>
When Agile was new and exciting, there were <a href="https://en.wikipedia.org/wiki/Extreme_programming">extreme programming</a> and scrum, and possibly some lesser known techniques. Lean was around the corner, but didn't come to my attention, at least, until around 2010. Then it seems to have faded away again.
</p>
<p>
Today, agile looks synonymous with scrum, but I find lean software development more efficient. Why divide work into artificial time periods when you can release continuously? Why <em>plan</em> bug fixing when it's more efficient to stop the line and deal with the problem as it arises?
</p>
<p>
That may sound counter-intuitive, but it works because it prevents technical debt from accumulating.
</p>
<p>
Lean software development is, in my experience, a better agile methodology than scrum.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.In the long runhttps://blog.ploeh.dk/2023/01/16/in-the-long-run2023-01-16T08:28:00+00:00Mark Seemann
<div id="post">
<p>
<em>Software design decisions should be time-aware.</em>
</p>
<p>
A common criticism of modern capitalism is that maximising <a href="https://en.wikipedia.org/wiki/Shareholder_value">shareholder value</a> leads to various detrimental outcomes, both societal, but possibly also for the maximising organisation itself. One major problem is when company leadership is incentivised to optimise stock market price for the next quarter, or other short terms. When considering only the short term, decision makers may (rationally) decide to sacrifice long-term benefits for short-term gains.
</p>
<p>
We often see similar behaviour in democracies. Politicians tend to optimise within a time frame that coincides with the election period. Getting re-elected is more important than good policy in the next period.
</p>
<p>
These observations are crude generalisations. Some democratic politicians and CEOs take longer views. Inherent in the context, however, is an incentive to short-term thinking.
</p>
<p>
This, it strikes me, is frequently the case in software development.
</p>
<p>
Particularly in the context of <a href="https://en.wikipedia.org/wiki/Scrum_(software_development)">scrum</a> there's a focus on delivering at the end of every sprint. I've observed developers and other stakeholders together engage in short-term thinking in order to meet those arbitrary and fictitious deadlines.
</p>
<p>
Even when deadlines are more remote than two weeks, project members rarely think beyond some perceived end date. As I describe in <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, a <em>project</em> is rarely is good way to organise software development work. Projects end. Successful software doesn't.
</p>
<p>
Regardless of the specific circumstances, a too myopic focus on near-term goals gives you an incentive to cut corners. To not care about code quality.
</p>
<h3 id="5cd528d178504f2ba849019e5d6d425e">
...we're all dead <a href="#5cd528d178504f2ba849019e5d6d425e" title="permalink">#</a>
</h3>
<p>
As <a href="https://en.wikipedia.org/wiki/John_Maynard_Keynes">Keynes</a> once quipped:
</p>
<blockquote>
<p>
"In the long run we are all dead."
</p>
<footer><cite>John Maynard Keynes</cite></footer>
</blockquote>
<p>
Clearly, while you can be too short-sighted, you can also take too long a view. Sometimes deadlines matter, and software not used makes no-one happy.
</p>
<p>
Working software remains the ultimate test of value, but as I've tried to express many times before, this does <em>not</em> imply that anything else is worthless.
</p>
<p>
You can't measure code quality. <a href="/2019/03/04/code-quality-is-not-software-quality">Code quality isn't software quality</a>. Low code quality <a href="https://martinfowler.com/articles/is-quality-worth-cost.html">slows you down</a>, and that, eventually, costs you money, blood, sweat, and tears.
</p>
<p>
This is, however, not difficult to predict. All it takes is a slightly wider time horizon. Consider the impact of your decisions past the next deadline.
</p>
<h3 id="b0f0f05bd00142379c2d713adde6eb77">
Conclusion <a href="#b0f0f05bd00142379c2d713adde6eb77" title="permalink">#</a>
</h3>
<p>
Don't be too short-sighted, but don't forget the immediate value of what you do. Your <a href="/2019/03/18/the-programmer-as-decision-maker">decisions</a> matter. The impact is not always immediate. Consider what consequences short-term optimisations may have in a longer perspective.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.The IO monadhttps://blog.ploeh.dk/2023/01/09/the-io-monad2023-01-09T07:39:00+00:00Mark Seemann
<div id="post">
<p>
<em>The IO container forms a monad. An article for object-oriented programmers.</em>
</p>
<p>
This article is an instalment in <a href="/2022/03/28/monads">an article series about monads</a>. A previous article described <a href="/2020/06/22/the-io-functor">the IO functor</a>. As is the case with many (but not all) <a href="/2018/03/22/functors">functors</a>, this one also forms a monad.
</p>
<h3 id="bee075f5bf474b099146cbc7c96f4888">
SelectMany <a href="#bee075f5bf474b099146cbc7c96f4888" title="permalink">#</a>
</h3>
<p>
A monad must define either a <em>bind</em> or <em>join</em> function. In C#, monadic bind is called <code>SelectMany</code>. In a recent article, I gave an example of <a href="/2020/06/15/io-container-in-a-parallel-c-universe">what <em>IO</em> might look like in C#</a>. Notice that it already comes with a <code>SelectMany</code> function:
</p>
<p>
<pre><span style="color:blue;">public</span> IO<TResult> <span style="color:#74531f;">SelectMany</span><<span style="color:#2b91af;">TResult</span>>(Func<T, IO<TResult>> <span style="color:#1f377f;">selector</span>)</pre>
</p>
<p>
Unlike other monads, the IO implementation is considered a black box, but if you're interested in a prototypical implementation, I already posted <a href="/2020/07/13/implementation-of-the-c-io-container">a sketch</a> in 2020.
</p>
<h3 id="1ba6a4e6da014fc3a511e71bdba4f0a9">
Query syntax <a href="#1ba6a4e6da014fc3a511e71bdba4f0a9" title="permalink">#</a>
</h3>
<p>
I have also, already, demonstrated <a href="/2020/06/29/syntactic-sugar-for-io">syntactic sugar for IO</a>. In that article, however, I used an implementation of the required <code>SelectMany</code> overload that is more explicit than it has to be. The <a href="/2022/03/28/monads">monad introduction</a> makes the prediction that you can always implement that overload in the same way, and yet here I didn't.
</p>
<p>
That's an oversight on my part. You can implement it like this instead:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IO<TResult> <span style="color:#74531f;">SelectMany</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">U</span>, <span style="color:#2b91af;">TResult</span>>(
<span style="color:blue;">this</span> IO<T> <span style="color:#1f377f;">source</span>,
Func<T, IO<U>> <span style="color:#1f377f;">k</span>,
Func<T, U, TResult> <span style="color:#1f377f;">s</span>)
{
<span style="color:#8f08c4;">return</span> source.SelectMany(<span style="color:#1f377f;">x</span> => k(x).Select(<span style="color:#1f377f;">y</span> => s(x, y)));
}</pre>
</p>
<p>
Indeed, the conjecture from the introduction still holds.
</p>
<h3 id="f234f9e9d3724a30b8465d176a0d98a6">
Join <a href="#f234f9e9d3724a30b8465d176a0d98a6" title="permalink">#</a>
</h3>
<p>
In <a href="/2022/03/28/monads">the introduction</a> you learned that if you have a <code>Flatten</code> or <code>Join</code> function, you can implement <code>SelectMany</code>, and the other way around. Since we've already defined <code>SelectMany</code> for <code>IO<T></code>, we can use that to implement <code>Join</code>. In this article I use the name <code>Join</code> rather than <code>Flatten</code>. This is an arbitrary choice that doesn't impact behaviour. Perhaps you find it confusing that I'm inconsistent, but I do it in order to demonstrate that the behaviour is the same even if the name is different.
</p>
<p>
The concept of a monad is universal, but the names used to describe its components differ from language to language. What C# calls <code>SelectMany</code>, Scala calls <code>flatMap</code>, and what <a href="https://www.haskell.org/">Haskell</a> calls <code>join</code>, other languages may call <code>Flatten</code>.
</p>
<p>
You can always implement <code>Join</code> by using <code>SelectMany</code> with the identity function:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IO<T> <span style="color:#74531f;">Join</span><<span style="color:#2b91af;">T</span>>(<span style="color:blue;">this</span> IO<IO<T>> <span style="color:#1f377f;">source</span>)
{
<span style="color:#8f08c4;">return</span> source.SelectMany(<span style="color:#1f377f;">x</span> => x);
}</pre>
</p>
<p>
In C# the identity function is <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatically</a> given as the lambda expression <code><span style="color:#1f377f;">x</span> => x</code> since C# doesn't come with a built-in identity function.
</p>
<h3 id="c57bcf1e74144046bc7191280d78519b">
Return <a href="#c57bcf1e74144046bc7191280d78519b" title="permalink">#</a>
</h3>
<p>
Apart from monadic bind, a monad must also define a way to put a normal value into the monad. Conceptually, I call this function <em>return</em> (because that's the name that Haskell uses). In <a href="/2020/06/22/the-io-functor">the IO functor</a> article, I wrote that the <code>IO<T></code> constructor corresponds to <em>return</em>. That's not strictly true, though, since the constructor takes a <code>Func<T></code> and not a <code>T</code>.
</p>
<p>
This issue is, however, trivially addressed:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IO<T> <span style="color:#74531f;">Return</span><<span style="color:#2b91af;">T</span>>(T <span style="color:#1f377f;">x</span>)
{
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> IO<T>(() => x);
}</pre>
</p>
<p>
Take the value <code>x</code> and wrap it in a lazily-evaluated function.
</p>
<h3 id="e0548aa855224081a5c1c3ec80f23951">
Laws <a href="#e0548aa855224081a5c1c3ec80f23951" title="permalink">#</a>
</h3>
<p>
While <a href="/2020/07/06/referential-transparency-of-io">IO values are referentially transparent</a> you can't compare them. You also can't 'run' them by other means than running a program. This makes it hard to talk meaningfully about the <a href="/2022/04/11/monad-laws">monad laws</a>.
</p>
<p>
For example, the left identity law is:
</p>
<p>
<pre>return >=> h ≡ h</pre>
</p>
<p>
Note the implied equality. The composition of <code>return</code> and <code>h</code> should be equal to <code>h</code>, for some reasonable definition of equality. How do we define that?
</p>
<p>
Somehow we must imagine that two alternative compositions would produce the same observable effects <a href="https://en.wikipedia.org/wiki/Ceteris_paribus">ceteris paribus</a>. If you somehow imagine that you have two parallel universes, one with one composition (say <code>return >=> h</code>) and one with another (<code>h</code>), if all else in those two universes were equal, then you would observe no difference in behaviour.
</p>
<p>
That may be useful as a thought experiment, but isn't particularly practical. Unfortunately, due to side effects, things <em>do</em> change when non-deterministic behaviour and side effects are involved. As a simple example, consider an IO action that gets the current time and prints it to the console. That involves both non-determinism and a side effect.
</p>
<p>
In Haskell, that's a straightforward composition of two <code>IO</code> actions:
</p>
<p>
<pre>> h () = getCurrentTime >>= print</pre>
</p>
<p>
How do we compare two compositions? By running them?
</p>
<p>
<pre>> return () >>= h
2022-06-25 16:47:30.6540847 UTC
> h ()
2022-06-25 16:47:37.5281265 UTC</pre>
</p>
<p>
The outputs are not the same, because time goes by. Can we thereby conclude that the monad laws don't hold for IO? Not quite.
</p>
<p>
The IO Container is referentially transparent, but evaluation isn't. Thus, we have to pretend that two alternatives will lead to the same evaluation behaviour, all things being equal.
</p>
<p>
This property seems to hold for both the identity and associativity laws. Whether or not you compose with <em>return</em>, or in which evaluation order you compose actions, it doesn't affect the outcome.
</p>
<p>
For completeness sake, the <a href="/2020/07/13/implementation-of-the-c-io-container">C# implementation sketch</a> is just a wrapper over a <code>Func<T></code>. We can also think of such a function as a function from <a href="/2018/01/15/unit-isomorphisms">unit</a> to <code>T</code> - in pseudo-C# <code>() => T</code>. That's a function; in other words: <a href="/2022/11/14/the-reader-monad">The Reader monad</a>. We already know that the Reader monad obeys the monad laws, so the C# implementation, at least, should be okay.
</p>
<h3 id="74154d1a95a44a91a8d534fbfda59d8d">
Conclusion <a href="#74154d1a95a44a91a8d534fbfda59d8d" title="permalink">#</a>
</h3>
<p>
IO forms a monad, among other abstractions. This is what enables Haskell programmers to compose an arbitrary number of impure actions with monadic bind without ever having to force evaluation. In C# <a href="/2020/06/15/io-container-in-a-parallel-c-universe">it might have looked the same</a>, except that it doesn't.
</p>
<p>
<strong>Next:</strong> <a href="/2023/02/27/test-data-generator-monad">Test Data Generator monad</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Adding NuGet packages when offlinehttps://blog.ploeh.dk/2023/01/02/adding-nuget-packages-when-offline2023-01-02T05:41:00+00:00Mark Seemann
<div id="post">
<p>
<em>A fairly trivial technical detective story.</em>
</p>
<p>
I was recently in an air plane, writing code, when I realised that I needed to add a couple of NuGet packages to my code base. I was on one of those less-travelled flights in Europe, on board an <a href="https://en.wikipedia.org/wiki/Embraer_E-Jet_family">Embraer E190</a>, and as is usually the case on those 1½-hour flights, there was no WiFi.
</p>
<p>
Adding a NuGet package typically requires that you're online so that the tools can query the relevant NuGet repository. You'll need to download the package, so if you're offline, you're just out of luck, right?
</p>
<p>
Fortunately, I'd previously used the packages I needed in other projects, on the same laptop. While <a href="/2014/01/29/nuget-package-restore-considered-harmful">I'm no fan of package restore</a>, I know that the local NuGet tools cache packages somewhere on the local machine.
</p>
<p>
So, perhaps I could entice the tools to reuse a cached package...
</p>
<p>
First, I simply tried adding a package that I needed:
</p>
<p>
<pre>$ dotnet add package unquote
Determining projects to restore...
Writing C:\Users\mark\AppData\Local\Temp\tmpF3C.tmp
info : X.509 certificate chain validation will use the default trust store selected by .NET.
info : Adding PackageReference for package 'unquote' into project '[redacted]'.
error: Unable to load the service index for source https://api.nuget.org/v3/index.json.
error: No such host is known. (api.nuget.org:443)
error: No such host is known.</pre>
</p>
<p>
Fine plan, but no success.
</p>
<p>
Clearly the <code>dotnet</code> tool was trying to access <code>api.nuget.org</code>, which, obviously, couldn't be reached because my laptop was in flight mode. It occurred to me, though, that the reason that the tool was querying <code>api.nuget.org</code> was that it wanted to see which version of the package was the most recent. After all, I hadn't specified a version.
</p>
<p>
What if I were to specify a version? Would the tool use the cached version of the package?
</p>
<p>
That seemed worth a try, but which versions did I already have on my laptop?
</p>
<p>
I don't go around remembering which version numbers I've used of various NuGet packages, but I expected the NuGet tooling to have that information available, somewhere.
</p>
<p>
But where? Keep in mind that I was offline, so couldn't easily look this up.
</p>
<p>
On the other hand, I knew that these days, most Windows applications keep data of that kind somewhere in <code>AppData</code>, so I started spelunking around there, looking for something that might be promising.
</p>
<p>
After looking around a bit, I found a subdirectory named <code>AppData\Local\NuGet\v3-cache</code>. This directory contained a handful of subdirectories obviously named with GUIDs. Each of these contained a multitude of <code>.dat</code> files. The names of those files, however, looked promising:
</p>
<p>
<pre>list_antlr_index.dat
list_autofac.dat
list_autofac.extensions.dependencyinjection.dat
list_autofixture.automoq.dat
list_autofixture.automoq_index.dat
list_autofixture.automoq_range_2.0.0-3.6.7.dat
list_autofixture.automoq_range_3.30.3-3.50.5.dat
list_autofixture.automoq_range_3.50.6-4.17.0.dat
list_autofixture.automoq_range_3.6.8-3.30.2.dat
list_autofixture.dat
...</pre>
</p>
<p>
and so on.
</p>
<p>
These names were clearly(?) named <code>list_[package-name].dat</code> or <code>list_[package-name]_index.dat</code>, so I started looking around for one named after the package I was looking for (<a href="https://www.nuget.org/packages/Unquote/">Unquote</a>).
</p>
<p>
Often, both files are present, which was also the case for Unquote.
</p>
<p>
<pre>$ ls list_unquote* -l
-rw-r--r-- 1 mark 197609 348 Oct 1 18:38 list_unquote.dat
-rw-r--r-- 1 mark 197609 42167 Sep 23 21:29 list_unquote_index.dat</pre>
</p>
<p>
As you can tell, <code>list_unquote_index.dat</code> is much larger than <code>list_unquote.dat</code>. Since I didn't know what the format of these files were, I decided to look at the smallest one first. It had this content:
</p>
<p>
<pre>{
"versions": [
"1.3.0",
"2.0.0",
"2.0.1",
"2.0.2",
"2.0.3",
"2.1.0",
"2.1.1",
"2.2.0",
"2.2.1",
"2.2.2",
"3.0.0",
"3.1.0",
"3.1.1",
"3.1.2",
"3.2.0",
"4.0.0",
"5.0.0",
"6.0.0-rc.1",
"6.0.0-rc.2",
"6.0.0-rc.3",
"6.0.0",
"6.1.0"
]
}</pre>
</p>
<p>
A list of versions. Sterling. It looked as though version 6.1.0 was the most recent one on my machine, so I tried to add that one to my code base:
</p>
<p>
<pre>$ dotnet add package unquote --version 6.1.0
Determining projects to restore...
Writing C:\Users\mark\AppData\Local\Temp\tmp815D.tmp
info : X.509 certificate chain validation will use the default trust store selected by .NET.
info : Adding PackageReference for package 'unquote' into project '[redacted]'.
info : Restoring packages for [redacted]...
info : Package 'unquote' is compatible with all the specified frameworks in project '[redacted]'.
info : PackageReference for package 'unquote' version '6.1.0' added to file '[redacted]'.
info : Generating MSBuild file [redacted].
info : Writing assets file to disk. Path: [redacted]
log : Restored [redacted] (in 397 ms).</pre>
</p>
<p>
Jolly good! That worked.
</p>
<p>
This way I managed to install all the NuGet packages I needed. This was fortunate, because I had so little time to transfer to my connecting flight that I never got to open the laptop before I was airborne again - in another E190 without WiFi, and another session of offline programming.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="7387859cda784fdf8612ddb34666d49b">
<div class="comment-author"><a href="https://github.com/asherber">Aaron Sherber</a> <a href="#7387859cda784fdf8612ddb34666d49b">#</a></div>
<div class="comment-content">
<p>
A postscript to your detective story might note that the primary NuGet cache lives at <code>%userprofile%\.nuget\packages</code> on Windows and <code>~/.nuget/packages</code> on Mac and Linux. The folder names there are much easier to decipher than the folders and files in the http cache.
</p>
</div>
<div class="comment-date">2023-01-02 13:46 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Functors as invariant functorshttps://blog.ploeh.dk/2022/12/26/functors-as-invariant-functors2022-12-26T13:05:00+00:00Mark Seemann
<div id="post">
<p>
<em>A most likely useless set of invariant functors that nonetheless exist.</em>
</p>
<p>
This article is part of <a href="/2022/08/01/invariant-functors">a series of articles about invariant functors</a>. An invariant functor is a <a href="/2018/03/22/functors">functor</a> that is neither covariant nor contravariant. See the series introduction for more details.
</p>
<p>
It turns out that all <a href="/2018/03/22/functors">functors</a> are also invariant functors.
</p>
<p>
Is this useful? Let me be honest and say that if it is, I'm not aware of it. Thus, if you're interested in practical applications, you can stop reading here. This article contains nothing of practical use - as far as I can tell.
</p>
<h3 id="d748d07afe5342e9847e858849053911">
Because it's there <a href="#d748d07afe5342e9847e858849053911" title="permalink">#</a>
</h3>
<p>
Why describe something of no practical use?
</p>
<p>
Why do some people climb <a href="https://en.wikipedia.org/wiki/Mount_Everest">Mount Everest</a>? <em>Because it's there</em>, or for other irrational reasons. Which is fine. I've no personal goals that involve climbing mountains, but <a href="/2020/10/12/subjectivity">I happily engage in other irrational and subjective activities</a>.
</p>
<p>
One of them, apparently, is to write articles of software constructs of no practical use, <em>because it's there</em>.
</p>
<p>
All functors are also invariant functors, even if that's of no practical use. That's just the way it is. This article explains how, and shows a few (useless) examples.
</p>
<p>
I'll start with a few <a href="https://www.haskell.org/">Haskell</a> examples and then move on to showing the equivalent examples in C#. If you're unfamiliar with Haskell, you can skip that section.
</p>
<h3 id="268965f94deb48d79335afea6e9b85e4">
Haskell package <a href="#268965f94deb48d79335afea6e9b85e4" title="permalink">#</a>
</h3>
<p>
For Haskell you can find an existing definition and implementations in the <a href="https://hackage.haskell.org/package/invariant">invariant</a> package. It already makes most common functors <code>Invariant</code> instances, including <code>[]</code> (list), <code>Maybe</code>, and <code>Either</code>. Here's an example of using <code>invmap</code> with a small list:
</p>
<p>
<pre>ghci> invmap secondsToNominalDiffTime nominalDiffTimeToSeconds [0.1, 60]
[0.1s,60s]</pre>
</p>
<p>
Here I'm using the <a href="https://hackage.haskell.org/package/time">time</a> package to convert fixed-point decimals into <code>NominalDiffTime</code> values.
</p>
<p>
How is this different from normal functor mapping with <code>fmap</code>? In observable behaviour, it's not:
</p>
<p>
<pre>ghci> fmap secondsToNominalDiffTime [0.1, 60]
[0.1s,60s]</pre>
</p>
<p>
When invariantly mapping a functor, only the covariant mapping function <code>a -> b</code> is used. Here, that's <code>secondsToNominalDiffTime</code>. The contravariant mapping function <code>b -> a</code> (<code>nominalDiffTimeToSeconds</code>) is simply ignored.
</p>
<p>
While the <em>invariant</em> package already defines certain common functors as <code>Invariant</code> instances, every <code>Functor</code> instance can be converted to an <code>Invariant</code> instance. There are two ways to do that: <code>invmapFunctor</code> and <code>WrappedFunctor</code>.
</p>
<p>
In order to demonstrate, we need a custom <code>Functor</code> instance. This one should do:
</p>
<p>
<pre>data Pair a = Pair (a, a) deriving (Eq, Show, Functor)</pre>
</p>
<p>
If you just want to perform an ad-hoc invariant mapping, you can use <code>invmapFunctor</code>:
</p>
<p>
<pre>ghci> invmapFunctor secondsToNominalDiffTime nominalDiffTimeToSeconds $ Pair (0.1, 60)
Pair (0.1s,60s)</pre>
</p>
<p>
I can't think of any reason to do this, but it's possible.
</p>
<p>
<code>WrappedFunctor</code> is perhaps marginally more relevant. If you run into a function that takes an <code>Invariant</code> argument, you can convert any <code>Functor</code> to an <code>Invariant</code> instance by wrapping it in <code>WrappedFunctor</code>:
</p>
<p>
<pre>ghci> invmap secondsToNominalDiffTime nominalDiffTimeToSeconds $ WrapFunctor $ Pair (0.1, 60)
WrapFunctor {unwrapFunctor = Pair (0.1s,60s)}</pre>
</p>
<p>
A realistic, useful example still escapes me, but there it is.
</p>
<h3 id="2578032e937f4a68997f10b0e2412ffb">
Pair as an invariant functor in C# <a href="#2578032e937f4a68997f10b0e2412ffb" title="permalink">#</a>
</h3>
<p>
What would the above Haskell example look like in C#? First, we're going to need a <code>Pair</code> data structure:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Pair</span><<span style="color:#2b91af;">T</span>>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">Pair</span>(T x, T y)
{
X = x;
Y = y;
}
<span style="color:blue;">public</span> T X { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> T Y { <span style="color:blue;">get</span>; }
<span style="color:green;">// More members follow...</span></pre>
</p>
<p>
Making <code>Pair<T></code> a functor is so easy that Haskell can do it automatically with the <code>DeriveFunctor</code> extension. In C# you must explicitly write the function:
</p>
<p>
<pre><span style="color:blue;">public</span> Pair<T1> Select<<span style="color:#2b91af;">T1</span>>(Func<T, T1> selector)
{
<span style="color:blue;">return</span> <span style="color:blue;">new</span> Pair<T1>(selector(X), selector(Y));
}</pre>
</p>
<p>
An example equivalent to the above <code>fmap</code> example might be this, here expressed as a unit test:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> FunctorExample()
{
Pair<<span style="color:blue;">long</span>> sut = <span style="color:blue;">new</span> Pair<<span style="color:blue;">long</span>>(
TimeSpan.TicksPerSecond / 10,
TimeSpan.TicksPerSecond * 60);
Pair<TimeSpan> actual = sut.Select(ticks => <span style="color:blue;">new</span> TimeSpan(ticks));
Assert.Equal(
<span style="color:blue;">new</span> Pair<TimeSpan>(
TimeSpan.FromSeconds(.1),
TimeSpan.FromSeconds(60)),
actual);
}</pre>
</p>
<p>
You can trivially make <code>Pair<T></code> an invariant functor by giving it a function equivalent to <code>invmap</code>. As I outlined in <a href="/2022/08/01/invariant-functors">the introduction</a> it's possible to add an <code>InvMap</code> method to the class, but it might be more <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> to instead add a <code>Select</code> overload:
</p>
<p>
<pre><span style="color:blue;">public</span> Pair<T1> Select<<span style="color:#2b91af;">T1</span>>(Func<T, T1> tToT1, Func<T1, T> t1ToT)
{
<span style="color:blue;">return</span> Select(tToT1);
}</pre>
</p>
<p>
Notice that this overload simply ignores the <code>t1ToT</code> argument and delegates to the normal <code>Select</code> overload. That's consistent with the Haskell package. This unit test shows an examples:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> InvariantFunctorExample()
{
Pair<<span style="color:blue;">long</span>> sut = <span style="color:blue;">new</span> Pair<<span style="color:blue;">long</span>>(
TimeSpan.TicksPerSecond / 10,
TimeSpan.TicksPerSecond * 60);
Pair<TimeSpan> actual =
sut.Select(ticks => <span style="color:blue;">new</span> TimeSpan(ticks), ts => ts.Ticks);
Assert.Equal(
<span style="color:blue;">new</span> Pair<TimeSpan>(
TimeSpan.FromSeconds(.1),
TimeSpan.FromSeconds(60)),
actual);
}</pre>
</p>
<p>
I can't think of a reason to do this in C#. In Haskell, at least, you have enough power of abstraction to describe something as simply an <code>Invariant</code> functor, and then let client code decide whether to use <code>Maybe</code>, <code>[]</code>, <code>Endo</code>, or a custom type like <code>Pair</code>. You can't do that in C#, so the abstraction is even less useful here.
</p>
<h3 id="8dbd7bd0f9c64fa5a0ac7c397617521b">
Conclusion <a href="#8dbd7bd0f9c64fa5a0ac7c397617521b" title="permalink">#</a>
</h3>
<p>
All functors are invariant functors. You simply use the normal functor mapping function (<code>fmap</code> in Haskell, <code>map</code> in many other languages, <code>Select</code> in C#). This enables you to add an invariant mapping (<code>invmap</code>) that only uses the covariant argument (<code>a -> b</code>) and ignores the contravariant argument (<code>b -> a</code>).
</p>
<p>
Invariant functors are, however, not particularly useful, so neither is this result. Still, it's there, so deserves a mention. The situation is similar for the next article.
</p>
<p>
<strong>Next:</strong> <a href="/2023/02/06/contravariant-functors-as-invariant-functors">Contravariant functors as invariant functors</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Error-accumulating composable assertions in C#https://blog.ploeh.dk/2022/12/19/error-accumulating-composable-assertions-in-c2022-12-19T08:39:00+00:00Mark Seemann
<div id="post">
<p>
<em>Perhaps the list monoid is all you need for non-short-circuiting assertions.</em>
</p>
<p>
This article is the second instalment in a small articles series about <a href="/2022/11/07/applicative-assertions">applicative assertions</a>. It explores a way to compose assertions in such a way that failure messages accumulate rather than short-circuit. It assumes that you've read the <a href="/2022/11/07/applicative-assertions">article series introduction</a> and the <a href="/2022/11/28/an-initial-proof-of-concept-of-applicative-assertions-in-c">previous article</a>.
</p>
<p>
Unsurprisingly, the previous article showed that you can use an <a href="/2018/10/01/applicative-functors">applicative functor</a> to create composable assertions that don't short-circuit. It also concluded that, in C# at least, the API is awkward.
</p>
<p>
This article explores a simpler API.
</p>
<h3 id="dc146cd5b792473db7f7e64cac3c4607">
A clue left by the proof of concept <a href="#dc146cd5b792473db7f7e64cac3c4607" title="permalink">#</a>
</h3>
<p>
The previous article's proof of concept left a clue suggesting a simpler API. Consider, again, how the rather horrible <code>RunAssertions</code> method decides whether or not to throw an exception:
</p>
<p>
<pre><span style="color:blue;">string</span> errors = composition.Match(
onFailure: f => <span style="color:blue;">string</span>.Join(Environment.NewLine, f),
onSuccess: _ => <span style="color:blue;">string</span>.Empty);
<span style="color:blue;">if</span> (!<span style="color:blue;">string</span>.IsNullOrEmpty(errors))
<span style="color:blue;">throw</span> <span style="color:blue;">new</span> Exception(errors);</pre>
</p>
<p>
Even though <code><span style="color:#2b91af;">Validated</span><<span style="color:#2b91af;">F</span>, <span style="color:#2b91af;">S</span>></code> is a <a href="https://en.wikipedia.org/wiki/Tagged_union">sum type</a>, the <code>RunAssertions</code> method declines to take advantage of that. Instead, it reduces <code>composition</code> to a simple type: A <code>string</code>. It then decides to throw an exception if the <code>errors</code> value is not null or empty.
</p>
<p>
This suggests that using a sum type may not be necessary to distinguish between the success and the failure case. Rather, an empty error string is all it takes to indicate success.
</p>
<h3 id="801897c3ffc0455e8e7a7ac7531f2be3">
Non-empty errors <a href="#801897c3ffc0455e8e7a7ac7531f2be3" title="permalink">#</a>
</h3>
<p>
The proof-of-concept assertion type is currently defined as <code>Validated</code> with a particular combination of type arguments: <code>Validated<IReadOnlyCollection<<span style="color:blue;">string</span>>, Unit></code>. Consider, again, this <code>Match</code> expression:
</p>
<p>
<pre><span style="color:blue;">string</span> errors = composition.Match(
onFailure: f => <span style="color:blue;">string</span>.Join(Environment.NewLine, f),
onSuccess: _ => <span style="color:blue;">string</span>.Empty);</pre>
</p>
<p>
Does an empty string unambiguously indicate success? Or is it possible to arrive at an empty string even if <code>composition</code> actually represents a failure case?
</p>
<p>
You can arrive at an empty string from a failure case if the collection of error messages is empty. Consider the type argument that takes the place of the <code>F</code> generic type: <code>IReadOnlyCollection<<span style="color:blue;">string</span>></code>. A collection of this type can be empty, which would also cause the above <code>Match</code> to produce an empty string.
</p>
<p>
Even so, the proof-of-concept works in practice. The reason it works is that failure cases will never have empty assertion messages. We know this because (in the proof-of-concept code) only two functions produce assertions, and they each populate the error message collection with a string. You may want to revisit the <code>AssertTrue</code> and <code>AssertEqual</code> functions in the <a href="/2022/11/28/an-initial-proof-of-concept-of-applicative-assertions-in-c">previous article</a> to convince yourself that this is true.
</p>
<p>
This is a good example of knowledge that 'we' as developers know, but the code currently doesn't capture. Having to deal with such knowledge taxes your working memory, so why not <a href="/encapsulation-and-solid">encapsulate</a> such information in the type itself?
</p>
<p>
How do you encapsulate the knowledge that a collection is never empty? Introduce a <code>NotEmptyCollection</code> collection. I'll reuse the class from the article <a href="/2017/12/11/semigroups-accumulate">Semigroups accumulate</a> and add a <code>Concat</code> instance method:
</p>
<p>
<pre><span style="color:blue;">public</span> NotEmptyCollection<T> Concat(NotEmptyCollection<T> other)
{
<span style="color:blue;">return</span> <span style="color:blue;">new</span> NotEmptyCollection<T>(Head, Tail.Concat(other).ToArray());
}</pre>
</p>
<p>
Since the two assertion-producing functions both supply an error message in the failure case, it's trivial to change them to return <code>Validated<NotEmptyCollection<<span style="color:blue;">string</span>>, Unit></code> - just change the types used:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Validated<NotEmptyCollection<<span style="color:blue;">string</span>>, Unit> AssertTrue(
<span style="color:blue;">this</span> <span style="color:blue;">bool</span> condition,
<span style="color:blue;">string</span> message)
{
<span style="color:blue;">return</span> condition
? Succeed<NotEmptyCollection<<span style="color:blue;">string</span>>, Unit>(Unit.Value)
: Fail<NotEmptyCollection<<span style="color:blue;">string</span>>, Unit>(<span style="color:blue;">new</span> NotEmptyCollection<<span style="color:blue;">string</span>>(message));
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Validated<NotEmptyCollection<<span style="color:blue;">string</span>>, Unit> AssertEqual<<span style="color:#2b91af;">T</span>>(
T expected,
T actual)
{
<span style="color:blue;">return</span> Equals(expected, actual)
? Succeed<NotEmptyCollection<<span style="color:blue;">string</span>>, Unit>(Unit.Value)
: Fail<NotEmptyCollection<<span style="color:blue;">string</span>>, Unit>(
<span style="color:blue;">new</span> NotEmptyCollection<<span style="color:blue;">string</span>>(<span style="color:#a31515;">$"Expected </span>{expected}<span style="color:#a31515;">, but got </span>{actual}<span style="color:#a31515;">."</span>));
}</pre>
</p>
<p>
This change guarantees that the <code>RunAssertions</code> method only produces an empty <code>errors</code> string in success cases.
</p>
<h3 id="e5097dc5d94f4a1480eeaf483ac08336">
Error collection isomorphism <a href="#e5097dc5d94f4a1480eeaf483ac08336" title="permalink">#</a>
</h3>
<p>
Assertions are still defined by the <code>Validated</code> sum type, but the <em>success</em> case carries no information: <code>Validated<NotEmptyCollection<T>, Unit></code>, and the <em>failure</em> case is always guaranteed to contain at least one error message.
</p>
<p>
This suggests that a simpler representation is possible: One that uses a normal collection of errors, and where an empty collection indicates an absence of errors:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Asserted</span><<span style="color:#2b91af;">T</span>>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">Asserted</span>() : <span style="color:blue;">this</span>(Array.Empty<T>())
{
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">Asserted</span>(T error) : <span style="color:blue;">this</span>(<span style="color:blue;">new</span>[] { error })
{
}
<span style="color:blue;">public</span> <span style="color:#2b91af;">Asserted</span>(IReadOnlyCollection<T> errors)
{
Errors = errors;
}
<span style="color:blue;">public</span> Asserted<T> And(Asserted<T> other)
{
<span style="color:blue;">if</span> (other <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="color:blue;">throw</span> <span style="color:blue;">new</span> ArgumentNullException(nameof(other));
<span style="color:blue;">return</span> <span style="color:blue;">new</span> Asserted<T>(Errors.Concat(other.Errors).ToList());
}
<span style="color:blue;">public</span> IReadOnlyCollection<T> Errors { <span style="color:blue;">get</span>; }
}</pre>
</p>
<p>
The <code><span style="color:#2b91af;">Asserted</span><<span style="color:#2b91af;">T</span>></code> class is scarcely more than a glorified wrapper around a normal collection, but it's isomorphic to <code>Validated<NotEmptyCollection<T>, Unit></code>, which the following two functions prove:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Asserted<T> FromValidated<<span style="color:#2b91af;">T</span>>(<span style="color:blue;">this</span> Validated<NotEmptyCollection<T>, Unit> v)
{
<span style="color:blue;">return</span> v.Match(
failures => <span style="color:blue;">new</span> Asserted<T>(failures),
_ => <span style="color:blue;">new</span> Asserted<T>());
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Validated<NotEmptyCollection<T>, Unit> ToValidated<<span style="color:#2b91af;">T</span>>(<span style="color:blue;">this</span> Asserted<T> a)
{
<span style="color:blue;">if</span> (a.Errors.Any())
{
<span style="color:blue;">var</span> errors = <span style="color:blue;">new</span> NotEmptyCollection<T>(
a.Errors.First(),
a.Errors.Skip(1).ToArray());
<span style="color:blue;">return</span> Validated.Fail<NotEmptyCollection<T>, Unit>(errors);
}
<span style="color:blue;">else</span>
<span style="color:blue;">return</span> Validated.Succeed<NotEmptyCollection<T>, Unit>(Unit.Value);
}</pre>
</p>
<p>
You can translate back and forth between <code>Validated<NotEmptyCollection<T>, Unit></code> and <code>Asserted<T></code> without loss of information.
</p>
<p>
A collection, however, gives rise to a <a href="/2017/10/06/monoids">monoid</a>, which suggests a much simpler way to compose assertions than using an applicative functor.
</p>
<h3 id="0ccfc89ad23f4b95932637723ff3c29a">
Asserted truth <a href="#0ccfc89ad23f4b95932637723ff3c29a" title="permalink">#</a>
</h3>
<p>
You can now rewrite the assertion-producing functions to return <code>Asserted<<span style="color:blue;">string</span>></code> instead of <code>Validated<NotEmptyCollection<<span style="color:blue;">string</span>>, Unit></code>.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Asserted<<span style="color:blue;">string</span>> True(<span style="color:blue;">bool</span> condition, <span style="color:blue;">string</span> message)
{
<span style="color:blue;">return</span> condition ? <span style="color:blue;">new</span> Asserted<<span style="color:blue;">string</span>>() : <span style="color:blue;">new</span> Asserted<<span style="color:blue;">string</span>>(message);
}</pre>
</p>
<p>
This <code>Asserted.True</code> function returns no error messages when <code>condition</code> is <code>true</code>, but a collection with the single element <code>message</code> when it's <code>false</code>.
</p>
<p>
You can use it in a unit test like this:
</p>
<p>
<pre><span style="color:blue;">var</span> assertResponse = Asserted.True(
deleteResp.IsSuccessStatusCode,
<span style="color:#a31515;">$"Actual status code: </span>{deleteResp.StatusCode}<span style="color:#a31515;">."</span>);</pre>
</p>
<p>
You'll see how <code>assertResponse</code> composes with another assertion later in this article. The example continues from <a href="/2022/11/28/an-initial-proof-of-concept-of-applicative-assertions-in-c">the previous article</a>. It's the same test from the same code base.
</p>
<h3 id="519d7e0d757f4dceb6a2833c09d019d8">
Asserted equality <a href="#519d7e0d757f4dceb6a2833c09d019d8" title="permalink">#</a>
</h3>
<p>
You can also rewrite the other assertion-producing function in the same way:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Asserted<<span style="color:blue;">string</span>> Equal(<span style="color:blue;">object</span> expected, <span style="color:blue;">object</span> actual)
{
<span style="color:blue;">if</span> (Equals(expected, actual))
<span style="color:blue;">return</span> <span style="color:blue;">new</span> Asserted<<span style="color:blue;">string</span>>();
<span style="color:blue;">return</span> <span style="color:blue;">new</span> Asserted<<span style="color:blue;">string</span>>(<span style="color:#a31515;">$"Expected </span>{expected}<span style="color:#a31515;">, but got </span>{actual}<span style="color:#a31515;">."</span>);
}</pre>
</p>
<p>
Again, when the assertion passes, it returns no errors; otherwise, it returns a collection with a single error message.
</p>
<p>
Using it may look like this:
</p>
<p>
<pre><span style="color:blue;">var</span> getResp = <span style="color:blue;">await</span> api.CreateClient().GetAsync(address);
<span style="color:blue;">var</span> assertState = Asserted.Equal(HttpStatusCode.NotFound, getResp.StatusCode);</pre>
</p>
<p>
At this point, each of the assertions are objects that represent a verification step. By themselves, they neither pass nor fail the test. You have to execute them to reach a verdict.
</p>
<h3 id="29825aed26124471803324863ea7d897">
Evaluating assertions <a href="#29825aed26124471803324863ea7d897" title="permalink">#</a>
</h3>
<p>
The above code listing of the <code><span style="color:#2b91af;">Asserted</span><<span style="color:#2b91af;">T</span>></code> class already shows how to combine two <code><span style="color:#2b91af;">Asserted</span><<span style="color:#2b91af;">T</span>></code> objects into one. The <code>And</code> instance method is a binary operation that, together with the parameterless constructor, makes <code><span style="color:#2b91af;">Asserted</span><<span style="color:#2b91af;">T</span>></code> a <a href="/2017/10/06/monoids">monoid</a>.
</p>
<p>
Once you've combined all assertions into a single <code><span style="color:#2b91af;">Asserted</span><<span style="color:#2b91af;">T</span>></code> object, you need to <code>Run</code> it to produce a test outcome:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">void</span> Run(<span style="color:blue;">this</span> Asserted<<span style="color:blue;">string</span>> assertions)
{
<span style="color:blue;">if</span> (assertions?.Errors.Any() ?? <span style="color:blue;">false</span>)
{
<span style="color:blue;">var</span> messages = <span style="color:blue;">string</span>.Join(Environment.NewLine, assertions.Errors);
<span style="color:blue;">throw</span> <span style="color:blue;">new</span> Exception(messages);
}
}</pre>
</p>
<p>
If there are no errors, <code>Run</code> does nothing; otherwise it combines all the error messages together and throws an exception. As was also the case in the previous article, I've allowed myself a few proof-of-concept shortcuts. <a href="https://learn.microsoft.com/dotnet/standard/design-guidelines/using-standard-exception-types">The framework design guidelines admonishes against throwing System.Exception</a>. It might be more appropriate to introduce a new <code>Exception</code> type that also allows enumerating the error messages.
</p>
<p>
The entire <a href="/2013/06/24/a-heuristic-for-formatting-code-according-to-the-aaa-pattern">assertion phase</a> of the test looks like this:
</p>
<p>
<pre><span style="color:blue;">var</span> assertResponse = Asserted.True(
deleteResp.IsSuccessStatusCode,
<span style="color:#a31515;">$"Actual status code: </span>{deleteResp.StatusCode}<span style="color:#a31515;">."</span>);
<span style="color:blue;">var</span> getResp = <span style="color:blue;">await</span> api.CreateClient().GetAsync(address);
<span style="color:blue;">var</span> assertState = Asserted.Equal(HttpStatusCode.NotFound, getResp.StatusCode);
assertResponse.And(assertState).Run();</pre>
</p>
<p>
You can see the entire test in the previous article. Notice how the two assertion objects are first combined into one with the <code>And</code> binary operation. The result is a single <code>Asserted<<span style="color:blue;">string</span>></code> object on which you can call <code>Run</code>.
</p>
<p>
Like the previous proof of concept, this assertion passes and fails in the same way. It's possible to compose assertions and collect error messages, instead of short-circuiting on the first failure, even without an applicative functor.
</p>
<h3 id="f3d9baf1ad564c05be99b32f0b4a1017">
Method chaining <a href="#f3d9baf1ad564c05be99b32f0b4a1017" title="permalink">#</a>
</h3>
<p>
If you don't like to come up with variable names just to make assertions, it's also possible to use the <code>Asserted</code> API's <a href="https://martinfowler.com/bliki/FluentInterface.html">fluent interface</a>:
</p>
<p>
<pre><span style="color:blue;">var</span> getResp = <span style="color:blue;">await</span> api.CreateClient().GetAsync(address);
Asserted
.True(
deleteResp.IsSuccessStatusCode,
<span style="color:#a31515;">$"Actual status code: </span>{deleteResp.StatusCode}<span style="color:#a31515;">."</span>)
.And(Asserted.Equal(HttpStatusCode.NotFound, getResp.StatusCode))
.Run();</pre>
</p>
<p>
This isn't necessarily better, but it's an option.
</p>
<h3 id="ecabe91c2b9949edbcb4cfed5213ec72">
Conclusion <a href="#ecabe91c2b9949edbcb4cfed5213ec72" title="permalink">#</a>
</h3>
<p>
While it's possible to design non-short-circuiting composable assertions using an applicative functor, it looks as though a simpler solution might solve the same problem. Collect error messages. If none were collected, interpret that as a success.
</p>
<p>
As I wrote in the <a href="/2022/11/07/applicative-assertions">introduction article</a>, however, this may not be the last word. Some assertions return values that can be used for other assertions. That's a scenario that I have not yet investigated in this light, and it may change the conclusion. If so, I'll add more articles to this small article series. As I'm writing this, though, I have no such plans.
</p>
<p>
Did I just, in a roundabout way, write that <em>more research is needed?</em>
</p>
<p>
<strong>Next:</strong> <a href="/2023/01/30/built-in-alternatives-to-applicative-assertions">Built-in alternatives to applicative assertions</a>.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="2aae9a7f4c1145aa92b487925f4b46ba">
<div class="comment-author"><a href="https://ptupitsyn.github.io/">Pavel Tupitsyn</a> <a href="#2aae9a7f4c1145aa92b487925f4b46ba">#</a></div>
<div class="comment-content">
<p>
I think NUnit's <a href="https://docs.nunit.org/articles/nunit/writing-tests/assertions/multiple-asserts.html">Assert.Multiple</a> is worth mentioning in this series. It does not require any complicated APIs, just wrap your existing test with multiple asserts into a delegate.
</p>
</div>
<div class="comment-date">2022-12-20 08:02 UTC</div>
</div>
<div class="comment" id="5b34eb2d997f417bac27738ee7cbb16d">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#5b34eb2d997f417bac27738ee7cbb16d">#</a></div>
<div class="comment-content">
<p>
Pavel, thank you for writing. I'm aware of both that API and similar ones for other testing frameworks. As is usually the case, there are trade-offs to consider. I'm currently working on some material that may turn into another article about that.
</p>
</div>
<div class="comment-date">2022-12-21 20:17 UTC</div>
</div>
<div class="comment" id="f244b81042414b98be67a937cf652648">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#f244b81042414b98be67a937cf652648">#</a></div>
<div class="comment-content">
<p>
A new article is now available: <a href="/2023/01/30/built-in-alternatives-to-applicative-assertions">Built-in alternatives to applicative assertions</a>.
</p>
</div>
<div class="comment-date">2023-01-30 12:31 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.When do tests fail?https://blog.ploeh.dk/2022/12/12/when-do-tests-fail2022-12-12T08:33:00+00:00Mark Seemann
<div id="post">
<p>
<em>Optimise for the common scenario.</em>
</p>
<p>
Unit tests occasionally fail. When does that happen? How often? What triggers it? What information is important when tests fail?
</p>
<p>
Regularly I encounter the viewpoint that it should be easy to understand the purpose of a test <em>when it fails</em>. Some people consider test names important, a topic that <a href="/2022/06/13/some-thoughts-on-naming-tests">I've previously discussed</a>. Recently I discussed the <a href="http://xunitpatterns.com/Assertion%20Roulette.html">Assertion Roulette</a> test smell on Twitter, and again I learned some surprising things about what people value in unit tests.
</p>
<h3 id="6673d3ef3f374f849fe7fe38a7b96985">
The importance of clear assertion messages <a href="#6673d3ef3f374f849fe7fe38a7b96985" title="permalink">#</a>
</h3>
<p>
The Assertion Roulette test smell is often simplified to degeneracy, but it really describes situations where it may be a problem if you can't tell which of several assertions actually caused a test to fail.
</p>
<p>
<a href="https://www.joshka.net">Josh McKinney</a> gave a more detailed example than Gerard Meszaros does in <a href="/ref/xunit-patterns">the book</a>:
</p>
<blockquote>
<p>
"Background. In a legacy product, we saw some tests start failing intermittently. They weren’t just flakey, but also failed without providing enough info to fix. One of things which caused time to fix to increase was multiple ways of a single test to fail."
</p>
<footer><cite><a href="https://twitter.com/joshuamck/status/1572528796125003777">Josh McK</a></cite></footer>
</blockquote>
<p>
He goes on:
</p>
<blockquote>
<p>
"I.e. if you fix the first assertion and you know there still could be flakiness, or long cycle times to see the failure. Multiple assertions makes any test problem worse. In an ideal state, they are fine, but every assertion doubles the amount of failures a test catches."
</p>
<footer><cite><a href="https://twitter.com/joshuamck/status/1572529894361534464">Josh McK</a></cite></footer>
</blockquote>
<p>
and concludes:
</p>
<blockquote>
<p>
"the other main way (unrelated) was things like:
</p>
<p>
assertTrue(someListResult.isRmpty())
</p>
<p>
Which tells you what failed, but nothing about how.
</p>
<p>
But the following is worse. You must run the test twice to fix:
</p>
<p>
assertFalse(someList.isEmpty());<br>
assertEqual(expected, list.get(0));"
</p>
<footer><cite><a href="https://twitter.com/joshuamck/status/1572534297403469824">Josh McK</a></cite></footer>
</blockquote>
<p>
The final point is due to the short-circuiting nature of most assertion libraries. That, however, is <a href="/2022/11/07/applicative-assertions">a solvable problem</a>.
</p>
<p>
I find the above a compelling example of why Assertion Roulette may be problematic.
</p>
<p>
It did give me pause, though. How common is this scenario?
</p>
<h3 id="5ae99f4515444bd6ba483beffb819557">
Out of the blue <a href="#5ae99f4515444bd6ba483beffb819557" title="permalink">#</a>
</h3>
<p>
The situation described by Josh McKinney comes with more than a single warning flag. I hope that it's okay to point some of them out. I didn't get the impression from my interaction with Josh McKinney that he considered the situation ideal in any way.
</p>
<p>
First, of course, there's the lack of information about the problem. Here, that's a real problem. As I understand it, it makes it harder to reproduce the problem in a development environment.
</p>
<p>
Next, there's long cycle times, which I interpret as significant time may pass from when you attempt a fix until you can actually observe whether or not it worked. Josh McKinney doesn't say how long, but I wouldn't surprised if it was measured in days. At least, if the cycle time is measured in days, I can see how this is a problem.
</p>
<p>
Finally, there's the observation that "some tests start failing intermittently". This was the remark that caught my attention. How often does that happen?
</p>
<p>
Tests shouldn't do that. Tests should be deterministic. If they're not, you should work to <a href="https://martinfowler.com/articles/nonDeterminism.html">eradicate non-determinism in tests</a>.
</p>
<p>
I'll be the first to admit that that I also write non-deterministic tests. Not by design, but because I make mistakes. I've written many <a href="http://xunitpatterns.com/Erratic%20Test.html">Erratic Tests</a> in my career, and I've documented a few of them here:
</p>
<ul>
<li><a href="/2021/01/11/waiting-to-happen">Waiting to happen</a></li>
<li><a href="/2022/05/23/waiting-to-never-happen">Waiting to never happen</a></li>
<li><a href="/2020/10/05/fortunately-i-dont-squash-my-commits">Fortunately, I don't squash my commits</a></li>
<li><a href="/2016/01/18/make-pre-conditions-explicit-in-property-based-tests">Make pre-conditions explicit in Property-Based Tests</a></li>
</ul>
<p>
While it <em>can</em> happen, it shouldn't be the norm. When it nonetheless happens, eradicating that source of non-determinism should be top priority. Pull the <a href="https://en.wikipedia.org/wiki/Andon_(manufacturing)">andon cord</a>.
</p>
<h3 id="5fd251665441473ca3048a9137f9bc2b">
When tests fail <a href="#5fd251665441473ca3048a9137f9bc2b" title="permalink">#</a>
</h3>
<p>
Ideally, tests should rarely fail. As examined above, you may have Erratic Tests in your test suite, and if you do, these tests will occasionally (or often) fail. As Martin Fowler writes, this is a problem and you should do something about it. He also outlines strategies for it.
</p>
<p>
Once you've eradicated non-determinism in unit tests, then when do tests fail?
</p>
<p>
I can think of a couple of situations.
</p>
<p>
Tests routinely fail as part of the <a href="/2019/10/21/a-red-green-refactor-checklist">red-green-refactor cycle</a>. This is by design. If no test is failing in the <em>red</em> phase, you probably made a mistake (which also regularly <a href="/2019/10/14/tautological-assertion">happens to me</a>), or you may not really be doing test-driven development (TDD).
</p>
<p>
Another situation that may cause a test to fail is if you changed some code and triggered a regression test.
</p>
<p>
In both cases, tests don't just fail <a href="https://amzn.to/3SPdHAO">out of the blue</a>. They fail as an immediate consequence of something you did.
</p>
<h3 id="aaeebfa1a96b47398219817bc3327a9c">
Optimise for the common scenario <a href="#aaeebfa1a96b47398219817bc3327a9c" title="permalink">#</a>
</h3>
<p>
In both cases you're (hopefully) in a tight feedback loop. If you're in a tight feedback loop, then how important is the assertion message really? How important is the test name?
</p>
<p>
You work on the code base, make some changes, run the tests. If one or more tests fail, it's correlated to the change you just made. You should have a good idea of what went wrong. Are code forensics and elaborate documentation really necessary to understand a test that failed because you just did something a few minutes before?
</p>
<p>
The reason I don't care much about test names or whether there's one or more assertion in a unit test is exactly that: When tests fail, it's usually because of something I just did. I don't need diagnostics tools to find the root cause. The root cause is the change that I just made.
</p>
<p>
That's my common scenario, and I try to optimise my processes for the common scenarios.
</p>
<h3 id="21b16c9b2fb649ec859e53b0a1ab431a">
Fast feedback <a href="#21b16c9b2fb649ec859e53b0a1ab431a" title="permalink">#</a>
</h3>
<p>
There's an implied way of working that affects such attitudes. Since I learned about TDD in 2003 I've always relished the fast feedback I get from a test suite. Since I tried continuous deployment around 2014, I consider it central to <a href="/ref/modern-software-engineering">modern software engineering</a> (and <a href="/ref/accelerate">Accelerate</a> strongly suggests so, too).
</p>
<p>
The modus operandi I outline above is one of fast feedback. If you're sitting on a feature branch for weeks before integrating into master, or if you can only deploy two times a year, this influences what works and what doesn't.
</p>
<p>
Both <em>Modern Software Engineering</em> and <em>Accelerate</em> make a strong case that short feedback cycles are pivotal for successful software development organisations.
</p>
<p>
I also understand that that's not the reality for everyone. When faced with long cycle times, a multitude of Erratic Tests, a legacy code base, and so on, other things become important. In those circumstances, tests may fail for different reasons.
</p>
<p>
When you work with TDD, continuous integration (CI), and continuous deployment (CD), then when do tests fail? They fail because you made them fail, only minutes earlier. Fix your code and move forward.
</p>
<h3 id="53cc6ce19bad4bdea3f40fd752f6338d">
Conclusion <a href="#53cc6ce19bad4bdea3f40fd752f6338d" title="permalink">#</a>
</h3>
<p>
When discussing test names and assertion messages, I've been surprised by the emphasis some people put on what I consider to be of secondary importance. I think the explanation is that circumstances differ.
</p>
<p>
With TDD and CI/CD you mostly look at a unit test when you write it, or if some regression test fails because you changed some code (perhaps in response to a test you just wrote). Your test suite may have hundreds or thousands of tests. Most of these pass every time you run the test suite. That's the normal state of affairs.
</p>
<p>
In other circumstances, you may have Erratic Tests that fail unpredictably. You should make it a priority to stop that, but as part of that process, you may need good assertion messages and good test names.
</p>
<p>
Different circumstances call for different reactions, so what works well in one situation may be a liability in other situations. I hope that this article has shed a little light on the forces you may want to consider.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.GitHub Copilot preliminary experience reporthttps://blog.ploeh.dk/2022/12/05/github-copilot-preliminary-experience-report2022-12-05T08:37:00+00:00Mark Seemann
<div id="post">
<p>
<em>Based on a few months of use.</em>
</p>
<p>
I've been evaluating <a href="https://github.com/features/copilot">GitHub Copilot</a> since August 2022. Perhaps it's time to collect my thoughts so far.
</p>
<p>
In short, it's surprisingly good, but also gets a lot of things wrong. It does seem helpful to the experienced programmer, but I don't see it replacing all programmers yet.
</p>
<h3 id="19d8478cc2d5429f9c62138574a30a45">
Not only for boilerplate code <a href="#19d8478cc2d5429f9c62138574a30a45" title="permalink">#</a>
</h3>
<p>
I was initially doubtful. I'd seen some demos where Copilot created fifteen to twenty lines of code to, say, make a REST API request. These examples mostly struck me as auto-generation of something that ought to be a proper abstraction: A method in a reusable library.
</p>
<p>
In general <a href="/2018/09/17/typing-is-not-a-programming-bottleneck">I don't think that typing is a productivity bottleneck</a>, and I'm <a href="/2013/02/04/BewareofProductivityTools">sceptical of productivity tools</a>, and particularly code generators. The more code a code base contains, the more code there is to read. Accelerating code production doesn't strike me as a goal in itself.
</p>
<p>
On the other hand, I'm past fifty and aware that in my age, one often tends to resist new ideas. Perhaps the above justifications are nothing but <a href="https://en.wikipedia.org/wiki/Motivated_reasoning">motivated reasoning</a>. I decided to install the plug-ins and go about my business to see what would happen.
</p>
<p>
It has surprised and delighted me more than once. It's not only for boilerplate code. Here's a recent example.
</p>
<p>
I was doing an exercise in <a href="https://fsharp.org">F#</a> and wrote this type definition:
</p>
<p>
<pre><span style="color:blue;">type</span> Instruction = Enqueue <span style="color:blue;">of</span> int | Dequeue</pre>
</p>
<p>
I wanted to write a property-based test with this type, and since I was using <a href="https://hedgehog.qa/">Hedgehog</a> I needed a generator for the type. While <a href="https://fscheck.github.io/FsCheck/">FsCheck</a> can automatically generate values of custom <a href="https://learn.microsoft.com/dotnet/fsharp/language-reference/discriminated-unions">discriminated unions</a>, Hedgehog has a more <a href="https://peps.python.org/pep-0020/">explicit-is-better-than-implicit</a> design philosophy. In other words, if there's a way to make Hedgehog automatically generate values of a type like the above <code>Instruction</code>, it has so far escaped me.
</p>
<p>
So I started typing <code><span style="color:blue;">let</span> <span style="color:blue;">private</span> genInstruction =</code> after which Copilot suggested the following:
</p>
<p>
<pre><span style="color:blue;">let</span> <span style="color:blue;">private</span> genInstruction =
Gen.frequency
[ 1, Gen.constant Dequeue
1, Gen.int32 (Range.constant 0 10_000) |> Gen.map Enqueue ]</pre>
</p>
<p>
I was, frankly, stunned. While I do understand that Copilot doesn't actually understand the code it suggests, Hedgehog is sufficiently esoteric that I didn't expect Copilot to have enough training data to enable it to make a useful contribution in this niche. I was wrong. I've repeatedly seen Copilot make suggestions to my F# and <a href="https://www.haskell.org/">Haskell</a> code. It's not just for C#, <a href="https://www.javascript.com">JavaScript</a>, or <a href="https://www.python.org/">python</a> code.
</p>
<p>
The above suggestion was, to be clear, absolutely appropriate and compiled right away. The only detail I decided to change was the <code>Range</code>, which I decided to change to <code>Range.linear</code>. That's not, however, a significant change.
</p>
<p>
Perhaps you're not impressed by three lines of auto-generated code. How much of a productivity improvement is that? Quite a bit, in my case.
</p>
<p>
It wouldn't have taken me long to type those three lines of code, but as I already mentioned, <a href="/2018/09/17/typing-is-not-a-programming-bottleneck">typing isn't a bottleneck</a>. On the other hand, looking up an unfamiliar API can take some time. <a href="/ref/programmers-brain">The Programmer's Brain</a> discusses this kind of problem and suggests exercises to address it. Does Copilot offer a shortcut?
</p>
<p>
While I couldn't remember the details of Hedgehog's API, once I saw the suggestion, I recognised <code>Gen.frequency</code>, so I understood it as an appropriate code suggestion. The productivity gain, if there is one, may come from saving you the effort of looking up unfamiliar APIs, rather than saving you some keystrokes.
</p>
<p>
In this example, I already knew of the <code>Gen.frequency</code> function - I just couldn't recall the exact name and type. This enabled me to evaluate Copilot's suggestion and deem it correct. If I hadn't known that API already, how could I have known whether to trust Copilot?
</p>
<h3 id="424f6a4928d945d7bcb13f2de8f3a98e">
Detectably wrong suggestions <a href="#424f6a4928d945d7bcb13f2de8f3a98e" title="permalink">#</a>
</h3>
<p>
As amazing as Copilot can be, it's hardly faultless. It makes many erroneous suggestions. Sometimes the suggestion is obviously wrong. If you accept it, it doesn't compile. Sometimes, the compilation error is only a little edit from being correct, but at least in such situations you'll be explicitly aware that the suggestion couldn't be used verbatim.
</p>
<p>
Other suggestions are wrong, but less conspicuously so. Here's an example.
</p>
<p>
I was recently subjecting the code base that accompanies <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> to the <a href="https://en.wikipedia.org/wiki/Mutation_testing">mutation testing</a> tool <a href="https://stryker-mutator.io/">Stryker</a>. Since it did point out a few possible mutations, I decided to add a few tests. One was of a wrapper class called <code>TimeOfDay</code>. Because of static code analysis rules, it came with conversions to and from <a href="https://learn.microsoft.com/dotnet/api/system.timespan">TimeSpan</a>, but these methods weren't covered by any tests.
</p>
<p>
In order to remedy that situation, I started writing an FsCheck property and came as far as:
</p>
<p>
<pre>[Property]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">ConversionsRoundTrip</span>(TimeSpan <span style="color:#1f377f;">timeSpan</span>)</pre>
</p>
<p>
At that point Copilot suggested the following, which I accepted:
</p>
<p>
<pre>[Property]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">ConversionsRoundTrip</span>(TimeSpan <span style="color:#1f377f;">timeSpan</span>)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">timeOfDay</span> = <span style="color:blue;">new</span> TimeOfDay(timeSpan);
<span style="color:blue;">var</span> <span style="color:#1f377f;">actual</span> = (TimeSpan)timeOfDay;
Assert.Equal(timeSpan, actual);
}</pre>
</p>
<p>
Looks good, doesn't it? Again, I was impressed. It compiled, and it even looks as though Copilot had picked up one of my naming conventions: <a href="/2020/11/30/name-by-role">naming variables by role</a>, in this case <code>actual</code>.
</p>
<p>
While I tend to be on guard, I immediately ran the test suite instead of thinking it through. It failed. Keep in mind that this is a <a href="https://en.wikipedia.org/wiki/Characterization_test">characterisation test</a>, so it was supposed to pass.
</p>
<p>
The <code>TimeOfDay</code> constructor reveals why:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">TimeOfDay</span>(TimeSpan <span style="color:#1f377f;">durationSinceMidnight</span>)
{
<span style="color:#8f08c4;">if</span> (durationSinceMidnight < TimeSpan.Zero ||
TimeSpan.FromHours(24) < durationSinceMidnight)
<span style="color:#8f08c4;">throw</span> <span style="color:blue;">new</span> ArgumentOutOfRangeException(
nameof(durationSinceMidnight),
<span style="color:#a31515;">"Please supply a TimeSpan between 0 and 24 hours."</span>);
<span style="color:blue;">this</span>.durationSinceMidnight = durationSinceMidnight;
}</pre>
</p>
<p>
While FsCheck knows how to generate <code>TimeSpan</code> values, it'll generate arbitrary durations, including negative values and spans much longer than 24 hours. That explains why the test fails.
</p>
<p>
Granted, this is hardly a searing indictment against Copilot. After all, I could have made this mistake myself.
</p>
<p>
Still, that prompted me to look for more issues with the code that Copilot had suggested. Another problem with the code is that it tests the wrong API. The suggested test tries to round-trip via the <code>TimeOfDay</code> class' explicit cast operators, which were already covered by tests. Well, I might eventually have discovered that, too. Keep in mind that I was adding this test to improve the code base's Stryker score. After running the tool again, I would probably eventually have discovered that the score didn't improve. It takes Stryker around 25 minutes to test this code base, though, so it wouldn't have been rapid feedback.
</p>
<p>
Since, however, I examined the code with a critical eye, I noticed this by myself. This would clearly require changing the test code as well.
</p>
<p>
In the end, I wrote this test:
</p>
<p>
<pre>[Property]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">ConversionsRoundTrip</span>(TimeSpan <span style="color:#1f377f;">timeSpan</span>)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">expected</span> = ScaleToTimeOfDay(timeSpan);
<span style="color:blue;">var</span> <span style="color:#1f377f;">sut</span> = TimeOfDay.ToTimeOfDay(expected);
<span style="color:blue;">var</span> <span style="color:#1f377f;">actual</span> = TimeOfDay.ToTimeSpan(sut);
Assert.Equal(expected, actual);
}
<span style="color:blue;">private</span> <span style="color:blue;">static</span> TimeSpan <span style="color:#74531f;">ScaleToTimeOfDay</span>(TimeSpan <span style="color:#1f377f;">timeSpan</span>)
{
<span style="color:green;">// Convert an arbitrary TimeSpan to a 24-hour TimeSpan.</span>
<span style="color:green;">// The data structure that underlies TimeSpan is a 64-bit integer,</span>
<span style="color:green;">// so first we need to identify the range of possible TimeSpan</span>
<span style="color:green;">// values. It might be easier to understand to calculate</span>
<span style="color:green;">// TimeSpan.MaxValue - TimeSpan.MinValue, but that underflows.</span>
<span style="color:green;">// Instead, the number of possible 64-bit integer values is the same</span>
<span style="color:green;">// as the number of possible unsigned 64-bit integer values.</span>
<span style="color:blue;">var</span> <span style="color:#1f377f;">range</span> = <span style="color:blue;">ulong</span>.MaxValue;
<span style="color:blue;">var</span> <span style="color:#1f377f;">domain</span> = TimeSpan.FromHours(24).Ticks;
<span style="color:blue;">var</span> <span style="color:#1f377f;">scale</span> = (<span style="color:blue;">ulong</span>)domain / range;
<span style="color:blue;">var</span> <span style="color:#1f377f;">expected</span> = timeSpan * scale;
<span style="color:#8f08c4;">return</span> expected;
}</pre>
</p>
<p>
In this case, Copilot didn't improve my productivity. It may actually have slowed me down a bit.
</p>
<p>
This time, it wasn't too difficult to spot issues with the suggested code. What if the problems are more subtle?
</p>
<h3 id="02cb23fc57394a1c963a2f4ebe75ec48">
Errors that are difficult to detect <a href="#02cb23fc57394a1c963a2f4ebe75ec48" title="permalink">#</a>
</h3>
<p>
How do bugs appear? We write them, thinking that our code is going to do one thing, while the compiler decides otherwise. Even when we actively engage with the code, we may be looking at a bug and still fail to see it.
</p>
<p>
Why should we trust Copilot to produce bug-free code?
</p>
<p>
Here's another example. I was going through the <a href="https://codingdojo.org/kata/Range/">Range kata</a> in F# and made my <code>Endpoint</code> discriminated union a <a href="/2018/03/22/functors">functor</a>:
</p>
<p>
<pre><span style="color:blue;">type</span> Endpoint<'a> = Open <span style="color:blue;">of</span> 'a | Closed <span style="color:blue;">of</span> 'a
<span style="color:blue;">module</span> Endpoint =
<span style="color:blue;">let</span> map f = <span style="color:blue;">function</span>
| Open x <span style="color:blue;">-></span> Open (f x)
| Closed x <span style="color:blue;">-></span> Closed (f x)</pre>
</p>
<p>
Making a type a functor is such a well-understood process that <a href="https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/compiler/derive-functor">Haskell can automate it</a>. I knew how to write the above <code>map</code> function as well, so I wrote it without testing it.
</p>
<p>
After I'd written the <code>map</code> function, I decided to add tests of the two functor laws. Not because I doubted the implementation, but because I might need that as example code later.
</p>
<p>
Copilot suggested this test after I named it:
</p>
<p>
<pre><span style="color:blue;">open</span> Xunit
<span style="color:blue;">open</span> Hedgehog
[<Fact>]
<span style="color:blue;">let</span> ``First functor law`` () = Property.check <| property {
<span style="color:blue;">let!</span> x = Gen.int32 (Range.linearBounded ())
<span style="color:blue;">let</span> f = (<span style="color:blue;">fun</span> x <span style="color:blue;">-></span> x + 1)
<span style="color:blue;">let</span> g = (<span style="color:blue;">fun</span> x <span style="color:blue;">-></span> x * 2)
<span style="color:blue;">let</span> actual = Endpoint.map f (Endpoint.map g (Open x))
<span style="color:blue;">let</span> expected = Endpoint.map (f >> g) (Open x)
Assert.Equal(expected, actual) }</pre>
</p>
<p>
It compiles. It also fails.
</p>
<p>
What's wrong with it?
</p>
<p>
If you know the functor laws by heart, you'll be able to spot that this is clearly not the first functor law. On the other hand, it looks convincingly like the second functor law. Should I just change the name and move on?
</p>
<p>
I can't, though, since the test fails. Could there be a bug in my <code>map</code> function, after all?
</p>
<p>
No, there's an error in the test. I invite you to spot it.
</p>
<p>
In terms of keystrokes, it's easy to fix the problem:
</p>
<p>
<pre><span style="color:blue;">open</span> Xunit
<span style="color:blue;">open</span> Hedgehog
[<Fact>]
<span style="color:blue;">let</span> ``First functor law`` () = Property.check <| property {
<span style="color:blue;">let!</span> x = Gen.int32 (Range.linearBounded ())
<span style="color:blue;">let</span> f = (<span style="color:blue;">fun</span> x <span style="color:blue;">-></span> x + 1)
<span style="color:blue;">let</span> g = (<span style="color:blue;">fun</span> x <span style="color:blue;">-></span> x * 2)
<span style="color:blue;">let</span> actual = Endpoint.map f (Endpoint.map g (Open x))
<span style="color:blue;">let</span> expected = Endpoint.map (f << g) (Open x)
Assert.Equal(expected, actual) }</pre>
</p>
<p>
Spot the edit. I bet it'll take you longer to find it than it took me to type it.
</p>
<p>
The test now passes, but for one who has spent less time worrying over functor laws than I have, troubleshooting this could have taken a long time.
</p>
<p>
These almost-right suggestions from Copilot both worry me and give me hope.
</p>
<h3 id="7c90d2bd03054906a5f2505baaf78a31">
Copilot for experienced programmers <a href="#7c90d2bd03054906a5f2505baaf78a31" title="permalink">#</a>
</h3>
<p>
When a new technology like Copilot appears, it's natural to speculate on the consequences. <em>Does this mean that programmers will lose their jobs?</em>
</p>
<p>
This is just a preliminary evaluation after a few months, so I could be wrong, but I think we programmers are safe. If you're experienced, you'll be able to tell most of Copilot's hits from its misses. Perhaps you'll get a productivity improvement out of, but it could also slow you down.
</p>
<p>
The tool is likely to improve over time, so I'm hopeful that this could become a net productivity gain. Still, with this high an error rate, I'm not too worried yet.
</p>
<p>
<a href="/ref/pragmatic-programmer">The Pragmatic Programmer</a> describes a programming style named <em>Programming by Coincidence</em>. People who develop software this way have only a partial understanding of the code they write.
</p>
<blockquote>
<p>
"Fred doesn't know why the code is failing because <em>he didn't know why it worked in the first place.</em>"
</p>
<footer><cite>Andy Hunt and Dave Thomas, <a href="/ref/pragmatic-programmer">The Pragmatic Programmer</a></cite></footer>
</blockquote>
<p>
I've encountered my fair share of these people. When editing code, they make small adjustments and do cursory manual testing until 'it looks like it works'. If they have to start a new feature or are otherwise faced with a metaphorical blank page, they'll copy some code from somewhere else and use that as a starting point.
</p>
<p>
You'd think that Copilot could enhance the productivity of such people, but I'm not sure. It might actually slow them down. These people don't fully understand the code they themselves 'write', so why should we expect them to understand the code that Copilot suggests?
</p>
<p>
If faced with a Copilot suggestion that 'almost works', will they be able to spot if it's a genuinely good suggestion, or whether it's off, like I've described above? If the Copilot code doesn't work, how much time will they waste thrashing?
</p>
<h3 id="f3e17cbf9dbc41f19ae46974d2f28a90">
Conclusion <a href="#f3e17cbf9dbc41f19ae46974d2f28a90" title="permalink">#</a>
</h3>
<p>
GitHub Copilot has the potential to be a revolutionary technology, but it's not, yet. So far, I'm not too worried. It's an assistant, like a pairing partner, but it's up to you to evaluate whether the code that Copilot suggests is useful, correct, and safe. How can you do that unless you already know what you're doing?
</p>
<p>
If you don't have the qualifications to evaluate the suggested code, I fail to see how it's going to help you. Granted, it does have potential to help you move on in less time that you would otherwise have spent. In this article, I showed one example where I would have had to spend significant time looking up API documentation. Instead, Copilot suggested the correct code to use.
</p>
<p>
Pulling in the other direction are the many <a href="https://en.wikipedia.org/wiki/False_positives_and_false_negatives">false positives</a>. Copilot makes many suggestions, and many of them are poor. The ones that are recognisably bad are unlikely to slow you down. I'm more concerned with those that are subtly wrong. They have the potential to waste much time.
</p>
<p>
Which of these forces are strongest? The potential for wasting time is infinite, while the maximum productivity gain you can achieve is 100 percent. That's an asymmetric distribution. There's a long tail of time wasters, but there's no equivalent long tail of improvement.
</p>
<p>
I'm not, however, trying to be pessimistic. I expect to keep Copilot around for the time being. It could very well be here to stay. Used correctly, it seems useful.
</p>
<p>
Is it going to replace programmers? Hardly. Rather, it may enable poor developers to make such a mess of things that you need even more good programmers to subsequently fix things.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.An initial proof of concept of applicative assertions in C#https://blog.ploeh.dk/2022/11/28/an-initial-proof-of-concept-of-applicative-assertions-in-c2022-11-28T06:47:00+00:00Mark Seemann
<div id="post">
<p>
<em>Worthwhile? Not obviously.</em>
</p>
<p>
This article is the first instalment in a small articles series about <a href="/2022/11/07/applicative-assertions">applicative assertions</a>. It explores a way to compose assertions in such a way that failure messages accumulate rather than short-circuit. It assumes that you've read the <a href="/2022/11/07/applicative-assertions">article series introduction</a>.
</p>
<p>
Assertions are typically based on throwing exceptions. As soon as one assertion fails, an exception is thrown and no further assertions are evaluated. This is normal short-circuiting behaviour of exceptions. In some cases, however, it'd be useful to keep evaluating other assertions and collect error messages.
</p>
<p>
This article series explores <a href="https://twitter.com/lucasdicioccio/status/1572264819109003265">an intriguing idea</a> to address such issues: Use an <a href="/2018/10/01/applicative-functors">applicative functor</a> to collect multiple assertion messages. I started experimenting with the idea to see where it would lead. The article series serves as a report of what I found. It is neither a recommendation nor a caution. I still find the idea interesting, but I'm not sure whether the complexity is warranted.
</p>
<h3 id="877c6f5ee5b34ecf990d1126b8139720">
Example scenario <a href="#877c6f5ee5b34ecf990d1126b8139720" title="permalink">#</a>
</h3>
<p>
A realistic example is often illustrative, although there's a risk that the realism carries with it some noise that detracts from the core of the matter. I'll reuse an example that I've <a href="https://stackoverflow.blog/2022/11/03/multiple-assertions-per-test-are-fine/">already discussed and explained in greater detail</a>. The code is from the code base that accompanies my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>.
</p>
<p>
This test has two independent assertions:
</p>
<p>
<pre>[Theory]
[InlineData(884, 18, 47, <span style="color:#a31515;">"c@example.net"</span>, <span style="color:#a31515;">"Nick Klimenko"</span>, 2)]
[InlineData(902, 18, 50, <span style="color:#a31515;">"emot@example.gov"</span>, <span style="color:#a31515;">"Emma Otting"</span>, 5)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task DeleteReservation(
<span style="color:blue;">int</span> days, <span style="color:blue;">int</span> hours, <span style="color:blue;">int</span> minutes, <span style="color:blue;">string</span> email, <span style="color:blue;">string</span> name, <span style="color:blue;">int</span> quantity)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> api = <span style="color:blue;">new</span> LegacyApi();
<span style="color:blue;">var</span> at = DateTime.Today.AddDays(days).At(hours, minutes)
.ToIso8601DateTimeString();
<span style="color:blue;">var</span> dto = Create.ReservationDto(at, email, name, quantity);
<span style="color:blue;">var</span> postResp = <span style="color:blue;">await</span> api.PostReservation(dto);
Uri address = FindReservationAddress(postResp);
<span style="color:blue;">var</span> deleteResp = <span style="color:blue;">await</span> api.CreateClient().DeleteAsync(address);
Assert.True(
deleteResp.IsSuccessStatusCode,
<span style="color:#a31515;">$"Actual status code: </span>{deleteResp.StatusCode}<span style="color:#a31515;">."</span>);
<span style="color:blue;">var</span> getResp = <span style="color:blue;">await</span> api.CreateClient().GetAsync(address);
Assert.Equal(HttpStatusCode.NotFound, getResp.StatusCode);
}</pre>
</p>
<p>
The test exercises the REST API to first create a reservation, then delete it, and finally check that the reservation no longer exists. Two independent postconditions must be true for the test to pass:
</p>
<ul>
<li>The <code>DELETE</code> request must result in a status code that indicates success.</li>
<li>The resource must no longer exist.</li>
</ul>
<p>
It's conceivable that a bug might fail one of these without invalidating the other.
</p>
<p>
As the test is currently written, it uses <a href="https://xunit.net/">xUnit.net</a>'s standard assertion library. If the <code>Assert.True</code> verification fails, the <code>Assert.Equal</code> statement isn't evaluated.
</p>
<h3 id="cd132aaee5484068a890cc4b7995bed3">
Assertions as validations <a href="#cd132aaee5484068a890cc4b7995bed3" title="permalink">#</a>
</h3>
<p>
Is it possible to evaluate the <code>Assert.Equal</code> postcondition even if the first assertion fails? You could use a <code>try/catch</code> block, but is there a more composable and elegant option? How about an applicative functor?
</p>
<p>
Since I was interested in exploring this question as a proof of concept, I decided to reuse the machinery that I'd already put in place for the article <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">An applicative reservation validation example in C#</a>: The <code>Validated</code> class and its associated functions. In a sense, you can think of an assertion as a validation of a postcondition.
</p>
<p>
This is not a resemblance I intend to carry too far. What I learn by experimenting with <code>Validated</code> I can apply to a more appropriately-named class like <code>Asserted</code>.
</p>
<p>
Neither of the two above assertions return a value; they are one-stop assertions. If they succeed, they return nothing; if they fail, they produce an error.
</p>
<p>
It's possible to model this kind of behaviour with <code>Validated</code>. You can model a collection of errors with, well, a collection. To keep the proof of concept simple, I decided to use a collection of strings: <code>IReadOnlyCollection<<span style="color:blue;">string</span>></code>. To model 'nothing' I had to add a <a href="/2018/01/15/unit-isomorphisms">unit</a> type:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Unit</span>
{
<span style="color:blue;">private</span> <span style="color:#2b91af;">Unit</span>() { }
<span style="color:blue;">public</span> <span style="color:blue;">readonly</span> <span style="color:blue;">static</span> Unit Value = <span style="color:blue;">new</span> Unit();
}</pre>
</p>
<p>
This enabled me to define assertions as <code>Validated<IReadOnlyCollection<<span style="color:blue;">string</span>>, Unit></code> values: Either a collection of error messages, or nothing.
</p>
<h3 id="5ac18fb532c04876b00f7de080905a16">
Asserting truth <a href="#5ac18fb532c04876b00f7de080905a16" title="permalink">#</a>
</h3>
<p>
Instead of xUnit.net's <code>Assert.True</code>, you can now define an equivalent function:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Validated<IReadOnlyCollection<<span style="color:blue;">string</span>>, Unit> AssertTrue(
<span style="color:blue;">this</span> <span style="color:blue;">bool</span> condition,
<span style="color:blue;">string</span> message)
{
<span style="color:blue;">return</span> condition
? Succeed<IReadOnlyCollection<<span style="color:blue;">string</span>>, Unit>(Unit.Value)
: Fail<IReadOnlyCollection<<span style="color:blue;">string</span>>, Unit>(<span style="color:blue;">new</span>[] { message });
}</pre>
</p>
<p>
It simply returns a <code>Success</code> value containing nothing when <code>condition</code> is <code>true</code>, and otherwise a <code>Failure</code> value containing the error <code>message</code>.
</p>
<p>
You can use it like this:
</p>
<p>
<pre><span style="color:blue;">var</span> assertResponse = Validated.AssertTrue(
deleteResp.IsSuccessStatusCode,
<span style="color:#a31515;">$"Actual status code: </span>{deleteResp.StatusCode}<span style="color:#a31515;">."</span>);</pre>
</p>
<p>
Later in the article you'll see how this assertion combines with another assertion.
</p>
<h3 id="adc98673e4734918ac983a5bbb520e15">
Asserting equality <a href="#adc98673e4734918ac983a5bbb520e15" title="permalink">#</a>
</h3>
<p>
Instead of xUnit.net's <code>Assert.Equal</code>, you can also define a function that works the same way but returns a <code>Validated</code> value:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Validated<IReadOnlyCollection<<span style="color:blue;">string</span>>, Unit> AssertEqual<<span style="color:#2b91af;">T</span>>(
T expected,
T actual)
{
<span style="color:blue;">return</span> Equals(expected, actual)
? Succeed<IReadOnlyCollection<<span style="color:blue;">string</span>>, Unit>(Unit.Value)
: Fail<IReadOnlyCollection<<span style="color:blue;">string</span>>, Unit>(<span style="color:blue;">new</span>[]
{ <span style="color:#a31515;">$"Expected </span>{expected}<span style="color:#a31515;">, but got </span>{actual}<span style="color:#a31515;">."</span> });
}</pre>
</p>
<p>
The <code>AssertEqual</code> function first uses <a href="https://learn.microsoft.com/dotnet/api/system.object.equals">Equals</a> to compare <code>expected</code> with <code>actual</code>. If the result is <code>true</code>, the function returns a <code>Success</code> value containing nothing; otherwise, it returns a <code>Failure</code> value containing a failure message. Since this is only a proof of concept, the failure message is useful, but minimal.
</p>
<p>
Notice that this function returns a value of the same type (<code>Validated<IReadOnlyCollection<<span style="color:blue;">string</span>>, Unit></code>) as <code>AssertTrue</code>.
</p>
<p>
You can use the function like this:
</p>
<p>
<pre><span style="color:blue;">var</span> assertState = Validated.AssertEqual(HttpStatusCode.NotFound, getResp.StatusCode);</pre>
</p>
<p>
Again, you'll see how to combine this assertion with the above <code>assertResponse</code> value later in this article.
</p>
<h3 id="1d29ecff5ea64911918168cb985ed6b5">
Evaluating assertions <a href="#1d29ecff5ea64911918168cb985ed6b5" title="permalink">#</a>
</h3>
<p>
The <code>DeleteReservation</code> test only has two independent assertions, so in my proof of concept, all I needed to do was to figure out a way to combine two applicative assertions into one, and then evaluate it. This rather horrible method does that:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">void</span> RunAssertions(
Validated<IReadOnlyCollection<<span style="color:blue;">string</span>>, Unit> assertion1,
Validated<IReadOnlyCollection<<span style="color:blue;">string</span>>, Unit> assertion2)
{
<span style="color:blue;">var</span> f = Succeed<IReadOnlyCollection<<span style="color:blue;">string</span>>, Func<Unit, Unit, Unit>>((_, __) => Unit.Value);
Func<IReadOnlyCollection<<span style="color:blue;">string</span>>, IReadOnlyCollection<<span style="color:blue;">string</span>>, IReadOnlyCollection<<span style="color:blue;">string</span>>>
combine = (x, y) => x.Concat(y).ToArray();
Validated<IReadOnlyCollection<<span style="color:blue;">string</span>>, Unit> composition = f
.Apply(assertion1, combine)
.Apply(assertion2, combine);
<span style="color:blue;">string</span> errors = composition.Match(
onFailure: f => <span style="color:blue;">string</span>.Join(Environment.NewLine, f),
onSuccess: _ => <span style="color:blue;">string</span>.Empty);
<span style="color:blue;">if</span> (!<span style="color:blue;">string</span>.IsNullOrEmpty(errors))
<span style="color:blue;">throw</span> <span style="color:blue;">new</span> Exception(errors);</pre>
</p>
<p>
C# doesn't have good language features for applicative functors the same way that <a href="https://fsharp.org/">F#</a> and <a href="https://www.haskell.org/">Haskell</a> do, and although you can use various tricks to make the programming experience better that what is on display here, I was still doing a proof of concept. If it turns out that this approach is useful and warranted, we can introduce some of the facilities to make the API more palatable. For now, though, we're dealing with all the rough edges.
</p>
<p>
The way that applicative functors work, you typically use a 'lifted' function to combine two (or more) 'lifted' values. Here, 'lifted' means 'being inside the <code>Validated</code> <a href="https://bartoszmilewski.com/2014/01/14/functors-are-containers/">container</a>'.
</p>
<p>
Each of the assertions that I want to combine has the same type: <code>Validated<IReadOnlyCollection<<span style="color:blue;">string</span>>, Unit></code>. Notice that the <code>S</code> (<em>success</em>) generic type argument is <code>Unit</code> in both cases. While it seems redundant, formally I needed a 'lifted' function to combine two <code>Unit</code> values into a single value. This single value can (in principle) have any type I'd like it to have, but since you can't extract any information out of a <code>Unit</code> value, it makes sense to use the <a href="/2018/01/15/unit-isomorphisms#4657dbd4b5fc4abda6e8dee2cac67ea9">monoidal nature of unit</a> to combine two into one.
</p>
<p>
Basically, you just ignore the <code>Unit</code> input values because they carry no information. Also, they're all the same value anyway, since the type is a <a href="https://en.wikipedia.org/wiki/Singleton_pattern">Singleton</a>. In its 'naked' form, the function might be implemented like this: <code>(_, __) => Unit.Value</code>. Due to the <a href="/2019/12/16/zone-of-ceremony">ceremony</a> required by the combination of C# and applicative functors, however, this <a href="/2017/10/06/monoids">monoidal</a> binary operation has to be 'lifted' to a <code>Validated</code> value. That's the <code>f</code> value in the <code>RunAssertions</code> function body.
</p>
<p>
The <code>Validated.Apply</code> function requires as an argument a function that combines the generic <code>F</code> (<em>failure</em>) values into one, in order to deal with the case where there's multiple failures. In this case <code>F</code> is <code>IReadOnlyCollection<<span style="color:blue;">string</span>></code>. Since declarations of <code>Func</code> values in C# requires explicit type declaration, that's a bit of a mouthful, but the <code>combine</code> function just concatenates two collections into one.
</p>
<p>
The <code>RunAssertions</code> method can now <code>Apply</code> both <code>assertion1</code> and <code>assertion2</code> to <code>f</code>, which produces a combined <code>Validated</code> value, <code>composition</code>. It then matches on the combined value to produce a <code>string</code> value. If there are no assertion messages, the result is the empty string; otherwise, the function combines the assertion messages with a <code>NewLine</code> between each. Again, this is proof-of-concept code. A more robust and flexible API (if warranted) might keep the errors around as a collection of strongly typed <a href="https://martinfowler.com/bliki/ValueObject.html">Value Objects</a>.
</p>
<p>
Finally, if the resulting <code>errors</code> string is not null or empty, the <code>RunAssertions</code> method throws an exception with the combined error message(s). Here I once more invoked my proof-of-concept privilege to throw an <a href="https://learn.microsoft.com/dotnet/api/system.exception">Exception</a>, even though <a href="https://learn.microsoft.com/dotnet/standard/design-guidelines/using-standard-exception-types">the framework design guidelines admonishes against doing so</a>.
</p>
<p>
Ultimately, then, the <a href="/2013/06/24/a-heuristic-for-formatting-code-according-to-the-aaa-pattern">assert phase</a> of the test looks like this:
</p>
<p>
<pre><span style="color:blue;">var</span> assertResponse = Validated.AssertTrue(
deleteResp.IsSuccessStatusCode,
<span style="color:#a31515;">$"Actual status code: </span>{deleteResp.StatusCode}<span style="color:#a31515;">."</span>);
<span style="color:blue;">var</span> getResp = <span style="color:blue;">await</span> api.CreateClient().GetAsync(address);
<span style="color:blue;">var</span> assertState =
Validated.AssertEqual(HttpStatusCode.NotFound, getResp.StatusCode);
Validated.RunAssertions(assertResponse, assertState);</pre>
</p>
<p>
The rest of the test hasn't changed.
</p>
<h3 id="42e68063b7a3472e960031fde5e3f40d">
Outcomes <a href="#42e68063b7a3472e960031fde5e3f40d" title="permalink">#</a>
</h3>
<p>
Running the test with the applicative assertions passes, as expected. In order to verify that it works as it's supposed to, I tried to sabotage the System Under Test (SUT) in various ways. First, I made the <code>Delete</code> method that handles <code>DELETE</code> requests a <a href="https://en.wikipedia.org/wiki/NOP_(code)">no-op</a>, while still returning <code>200 OK</code>. As you'd expect, the result is a test failure with this message:
</p>
<p>
<pre>Message:
System.Exception : Expected NotFound, but got OK.</pre>
</p>
<p>
This is the assertion that verifies that <code>getResp.StatusCode</code> is <code>404 Not Found</code>. It fails because the sabotaged <code>Delete</code> method doesn't delete the reservation.
</p>
<p>
Then I further sabotaged the SUT to also return an incorrect status code (<code>400 Bad Request</code>), which produced this failure message:
</p>
<p>
<pre>Message:
System.Exception : Actual status code: BadRequest.
Expected NotFound, but got OK.</pre>
</p>
<p>
Notice that the message contains information about both failure conditions.
</p>
<p>
Finally, I re-enabled the correct behaviour (deleting the reservation from the data store) while still returning <code>400 Bad Request</code>:
</p>
<p>
<pre>Message:
System.Exception : Actual status code: BadRequest.</pre>
</p>
<p>
As desired, the assertions collect all relevant failure messages.
</p>
<h3 id="4a2c496da999408ea6c6cddd6a4f33d4">
Conclusion <a href="#4a2c496da999408ea6c6cddd6a4f33d4" title="permalink">#</a>
</h3>
<p>
Not surprisingly, it's possible to design a composable assertion API that collects multiple failure messages using an applicative functor. Anyone who knows how <a href="/2018/11/05/applicative-validation">applicative validation</a> works would have been able to predict that outcome. That's not what the above proof of concept was about. What I wanted to see was rather how it would play out in a realistic scenario, and whether using an applicative functor is warranted.
</p>
<p>
Applicative functors don't gel well with C#, so unsurprisingly the API is awkward. It's likely possible to smooth much of the friction, but without good language support and syntactic sugar, it's unlikely to become <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> C#.
</p>
<p>
Rather than taking the edge off the unwieldy API, the implementation of <code>RunAssertions</code> suggests another alternative.
</p>
<p>
<strong>Next:</strong> <a href="/2022/12/19/error-accumulating-composable-assertions-in-c">Error-accumulating composable assertions in C#</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Decouple to deletehttps://blog.ploeh.dk/2022/11/21/decouple-to-delete2022-11-21T08:46:00+00:00Mark Seemann
<div id="post">
<p>
<em>Don't try to predict the future.</em>
</p>
<p>
Do you know why it's called <a href="https://en.wikipedia.org/wiki/Spaghetti_code">spaghetti code</a>? It's a palatable metaphor. You may start with a single spaghetto, but usually, as you wind your fork around it, the whole dish follows along. Unless you're careful, eating spaghetti can be a mess.
</p>
<p>
<img src="/content/binary/spaghetti-le-calandre.jpg" alt="A small spaghetti serving.">
</p>
<p>
Spaghetti code is tangled and everything is directly or transitively connected to everything else. As you try to edit the code, every change you make affects other code. Fix one thing and another thing breaks, cascading through the code base.
</p>
<p>
I was recently <a href="https://www.goodreads.com/review/show/4913194780">reading Clean Architecture</a>, and as <a href="https://en.wikipedia.org/wiki/Robert_C._Martin">Robert C. Martin</a> was explaining the <a href="https://en.wikipedia.org/wiki/Dependency_inversion_principle">Dependency Inversion Principle</a> for the umpteenth time, my brain made a new connection. To be clear: Connecting (coupling) code is bad, but connecting ideas is good.
</p>
<h3 id="51ca1981115f493bb24ac9338419fe91">
What a tangled web we weave <a href="#51ca1981115f493bb24ac9338419fe91" title="permalink">#</a>
</h3>
<p>
It's impractical to write code that depends on nothing else. Most code will call other code, which again calls other code. It behoves us, though, to be careful that the web of dependencies don't get too tangled.
</p>
<p>
Imagine a code base where the dependency graph looks like this:
</p>
<p>
<img src="/content/binary/tangled-dependency-graph.png" alt="A connected graph.">
</p>
<p>
Think of each node as a unit of code; a class or a module. While a dependency graph is a <a href="https://en.wikipedia.org/wiki/Directed_graph">directed graph</a>, I didn't indicate the directions. Imagine that most edges point both ways, so that the nodes are interdependent. In other ways, the graph has <a href="https://en.wikipedia.org/wiki/Cycle_(graph_theory)">cycles</a>. This is <a href="http://evelinag.com/blog/2014/06-09-comparing-dependency-networks/">not uncommon in C# code</a>.
</p>
<p>
Pick any node in such a graph, and chances are that other nodes depend on it. This makes it hard to make changes to the code in that node, because a change may affect the code that depends on it. As you try to fix the depending code, that change, too, ripples through the network.
</p>
<p>
This already explains why tight coupling is problematic.
</p>
<h3 id="61e4982dad794d0085dd7240508f73b7">
It is difficult to make predictions, especially about the future <a href="#61e4982dad794d0085dd7240508f73b7" title="permalink">#</a>
</h3>
<p>
When you write source code, you might be tempted to try to take into account future needs and requirements. There may be a historical explanation for that tendency.
</p>
<blockquote>
<p>
"That is, once it was a sign of failure to change product code. You should have gotten it right the first time."
</p>
<footer><cite><a href="https://twitter.com/marick/status/1566564277573525507">Brian Marick</a></cite></footer>
</blockquote>
<p>
In the days of punchcards, you had to schedule time to use a computer. If you made a mistake in your program, you typically didn't have time to fix it during your timeslot. A mistake could easily cost you days as you scrambled to schedule a new time. Not surprisingly, emphasis was on correctness.
</p>
<p>
With this mindset, it's natural to attempt to future-proof code.
</p>
<h3 id="0f12f362efaf461bbbdf9bddd2e01361">
YAGNI <a href="#0f12f362efaf461bbbdf9bddd2e01361" title="permalink">#</a>
</h3>
<p>
With interactive development environments you can get rapid feedback. If you make a mistake, change the code and observe the outcome. Don't add code because you think that you might need it later. <a href="https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it">You probably will not</a>.
</p>
<p>
While you should avoid <a href="https://wiki.c2.com/?SpeculativeGenerality">speculative generality</a>, that alone is no guarantee of clean code. Unless you're careful, you can easily make a mess by tightly coupling different parts of your code base.
</p>
<p>
How do produce a code base that is as easy to change as possible?
</p>
<h3 id="89bda4d0fd774c7c969c30e1562aa11b">
Write code that is easy to delete <a href="#89bda4d0fd774c7c969c30e1562aa11b" title="permalink">#</a>
</h3>
<p>
Write code that is easy to change. The ultimate change you can make is to delete code. After that, you can write something else that better does what you need.
</p>
<blockquote>
<p>
"A system where you can delete parts without rewriting others is often called loosely coupled"
</p>
<footer><cite><a href="https://programmingisterrible.com/post/139222674273/how-to-write-disposable-code-in-large-systems">tef</a></cite></footer>
</blockquote>
<p>
I don't mean that you should always delete code in order to make changes, but often, looking at extremes can provide insights into less extreme cases.
</p>
<p>
When you have a tangled web as shown above, most of the code is coupled to other parts. If you delete a node, then you break something else. You'd think that deleting code is the easiest thing in the world, but it's not.
</p>
<p>
What if, on the other hand, you have smaller clusters of nodes that are independent?
</p>
<p>
<img src="/content/binary/less-coupled-dependency-graph.png" alt="A disconnected graph with small islands of connected graphs.">
</p>
<p>
If your dependency graph looks like this, you can at least delete each of the 'islands' without impacting the other sub-graphs.
</p>
<p>
<img src="/content/binary/dependency-graph-without-deleted-subgraph.png" alt="The graph from the previous figure, less one sub-graph.">
</p>
<p>
<a href="https://programmingisterrible.com/post/139222674273/how-to-write-disposable-code-in-large-systems">Writing code that is easy to delete</a> may be a good idea, but even <em>that</em> is easier said that done. Loose coupling is, once more, key to good architecture.
</p>
<h3 id="88c387fb34bc48b382d1eefdc3ee6367">
Add something better <a href="#88c387fb34bc48b382d1eefdc3ee6367" title="permalink">#</a>
</h3>
<p>
Once you've deleted a cluster of code, you have the opportunity to add something that is even less coupled than the island you deleted.
</p>
<p>
<img src="/content/binary/dependency-graph-with-new-subgraphs.png" alt="The graph from the previous figure, with new small graphs added.">
</p>
<p>
If you add new code that is less coupled than the code you deleted, it's even easier to delete again.
</p>
<h3 id="8a7a8d9547eb4d3881a1cced603f8422">
Conclusion <a href="#8a7a8d9547eb4d3881a1cced603f8422" title="permalink">#</a>
</h3>
<p>
Coupling is a key factor in code organisation. Tightly coupled code is difficult to change. Loosely coupled code is easier to change. As a thought experiment, consider how difficult it would be to delete a particular piece of code. The easier it is to delete the code, the less coupled it is.
</p>
<p>
Deleting a small piece of code to add new code in its stead is the ultimate change. You can often get by with a less radical edit, but if all else fails, delete part of your code base and start over. The less coupled the code is, the easier it is to change.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.The Reader monadhttps://blog.ploeh.dk/2022/11/14/the-reader-monad2022-11-14T06:50:00+00:00Mark Seemann
<div id="post">
<p>
<em>Normal functions form monads. An article for object-oriented programmers.</em>
</p>
<p>
This article is an instalment in <a href="/2022/03/28/monads">an article series about monads</a>. A previous article described <a href="/2021/08/30/the-reader-functor">the Reader functor</a>. As is the case with many (but not all) <a href="/2018/03/22/functors">functors</a>, Readers also form monads.
</p>
<p>
This article continues where the Reader functor article stopped. It uses the same code base.
</p>
<h3 id="d54d83f22d854e94853271e1a559a1d8">
Flatten <a href="#d54d83f22d854e94853271e1a559a1d8" title="permalink">#</a>
</h3>
<p>
A monad must define either a <em>bind</em> or <em>join</em> function, although you can use other names for both of these functions. <code>Flatten</code> is in my opinion a more intuitive name than <code>join</code>, since a monad is really just a functor that you can flatten. Flattening is relevant if you have a nested functor; in this case a Reader within a Reader. You can flatten such a nested Reader with a <code>Flatten</code> function:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IReader<R, A> <span style="color:#74531f;">Flatten</span><<span style="color:#2b91af;">R</span>, <span style="color:#2b91af;">A</span>>(
<span style="color:blue;">this</span> IReader<R, IReader<R, A>> <span style="color:#1f377f;">source</span>)
{
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> FlattenReader<R, A>(source);
}
<span style="color:blue;">private</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">FlattenReader</span><<span style="color:#2b91af;">R</span>, <span style="color:#2b91af;">A</span>> : IReader<R, A>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IReader<R, IReader<R, A>> source;
<span style="color:blue;">public</span> <span style="color:#2b91af;">FlattenReader</span>(IReader<R, IReader<R, A>> <span style="color:#1f377f;">source</span>)
{
<span style="color:blue;">this</span>.source = source;
}
<span style="color:blue;">public</span> A <span style="color:#74531f;">Run</span>(R <span style="color:#1f377f;">environment</span>)
{
IReader<R, A> <span style="color:#1f377f;">newReader</span> = source.Run(environment);
<span style="color:#8f08c4;">return</span> newReader.Run(environment);
}
}</pre>
</p>
<p>
Since the <code>source</code> Reader is nested, calling its <code>Run</code> method once returns a <code>newReader</code>. You can <code>Run</code> that <code>newReader</code> one more time to get an <code>A</code> value to return.
</p>
<p>
You could easily chain the two calls to <code>Run</code> together, one after the other. That would make the code terser, but here I chose to do it in two explicit steps in order to show what's going on.
</p>
<p>
Like the previous article about <a href="/2022/06/20/the-state-monad">the State monad</a>, a lot of <a href="/2019/12/16/zone-of-ceremony">ceremony</a> is required because this variation of the Reader monad is defined with an interface. You could also define the Reader monad on a 'raw' function of the type <code>Func<R, A></code>, in which case <code>Flatten</code> would be simpler:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Func<R, A> <span style="color:#74531f;">Flatten</span><<span style="color:#2b91af;">R</span>, <span style="color:#2b91af;">A</span>>(<span style="color:blue;">this</span> Func<R, Func<R, A>> <span style="color:#1f377f;">source</span>)
{
<span style="color:#8f08c4;">return</span> <span style="color:#1f377f;">environment</span> => source(environment)(environment);
}</pre>
</p>
<p>
In this variation <code>source</code> is a function, so you can call it with <code>environment</code>, which returns another function that you can again call with <code>environment</code>. This produces an <code>A</code> value for the function to return.
</p>
<h3 id="e2f72c66681d45949a23a7e574ae5ae7">
SelectMany <a href="#e2f72c66681d45949a23a7e574ae5ae7" title="permalink">#</a>
</h3>
<p>
When you have <code>Flatten</code> you can always define <code>SelectMany</code> (<em>monadic bind</em>) like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IReader<R, B> <span style="color:#74531f;">SelectMany</span><<span style="color:#2b91af;">R</span>, <span style="color:#2b91af;">A</span>, <span style="color:#2b91af;">B</span>>(
<span style="color:blue;">this</span> IReader<R, A> <span style="color:#1f377f;">source</span>,
Func<A, IReader<R, B>> <span style="color:#1f377f;">selector</span>)
{
<span style="color:#8f08c4;">return</span> source.Select(selector).Flatten();
}</pre>
</p>
<p>
First use functor-based mapping. Since the <code>selector</code> returns a Reader, this mapping produces a Reader within a Reader. That's exactly the situation that <code>Flatten</code> addresses.
</p>
<p>
The above <code>SelectMany</code> example works with the <code>IReader<R, A></code> interface, but the 'raw' function version has the exact same implementation:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Func<R, B> <span style="color:#74531f;">SelectMany</span><<span style="color:#2b91af;">R</span>, <span style="color:#2b91af;">A</span>, <span style="color:#2b91af;">B</span>>(
<span style="color:blue;">this</span> Func<R, A> <span style="color:#1f377f;">source</span>,
Func<A, Func<R, B>> <span style="color:#1f377f;">selector</span>)
{
<span style="color:#8f08c4;">return</span> source.Select(selector).Flatten();
}</pre>
</p>
<p>
Only the method declaration differs.
</p>
<h3 id="8c5f94f7395040a7bcfdb2561c8e3ed3">
Query syntax <a href="#8c5f94f7395040a7bcfdb2561c8e3ed3" title="permalink">#</a>
</h3>
<p>
Monads also enable query syntax in C# (just like they enable other kinds of syntactic sugar in languages like <a href="https://fsharp.org/">F#</a> and <a href="https://www.haskell.org">Haskell</a>). As outlined in the <a href="/2022/03/28/monads">monad introduction</a>, however, you must add a special <code>SelectMany</code> overload:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IReader<R, T1> <span style="color:#74531f;">SelectMany</span><<span style="color:#2b91af;">R</span>, <span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">U</span>, <span style="color:#2b91af;">T1</span>>(
<span style="color:blue;">this</span> IReader<R, T> <span style="color:#1f377f;">source</span>,
Func<T, IReader<R, U>> <span style="color:#1f377f;">k</span>,
Func<T, U, T1> <span style="color:#1f377f;">s</span>)
{
<span style="color:#8f08c4;">return</span> source.SelectMany(<span style="color:#1f377f;">x</span> => k(x).Select(<span style="color:#1f377f;">y</span> => s(x, y)));
}</pre>
</p>
<p>
As already predicted in the monad introduction, this boilerplate overload is always implemented in the same way. Only the signature changes. With it, you could write an expression like this nonsense:
</p>
<p>
<pre>IReader<<span style="color:blue;">int</span>, <span style="color:blue;">bool</span>> <span style="color:#1f377f;">r</span> =
<span style="color:blue;">from</span> dur <span style="color:blue;">in</span> <span style="color:blue;">new</span> MinutesReader()
<span style="color:blue;">from</span> b <span style="color:blue;">in</span> <span style="color:blue;">new</span> Thingy(dur)
<span style="color:blue;">select</span> b;</pre>
</p>
<p>
Where <code>MinutesReader</code> was already shown in the article <a href="/2021/10/04/reader-as-a-contravariant-functor">Reader as a contravariant functor</a>. I couldn't come up with a good name for another reader, so I went with <a href="https://dannorth.net">Dan North</a>'s naming convention that if you don't yet know what to call a class, method, or function, don't <em>pretend</em> that you know. Be explicit that you don't know.
</p>
<p>
Here it is, for the sake of completion:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Thingy</span> : IReader<<span style="color:blue;">int</span>, <span style="color:blue;">bool</span>>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> TimeSpan timeSpan;
<span style="color:blue;">public</span> <span style="color:#2b91af;">Thingy</span>(TimeSpan <span style="color:#1f377f;">timeSpan</span>)
{
<span style="color:blue;">this</span>.timeSpan = timeSpan;
}
<span style="color:blue;">public</span> <span style="color:blue;">bool</span> <span style="color:#74531f;">Run</span>(<span style="color:blue;">int</span> <span style="color:#1f377f;">environment</span>)
{
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> TimeSpan(timeSpan.Ticks * environment).TotalDays < 1;
}
}</pre>
</p>
<p>
I'm not claiming that this class makes sense. These articles are deliberate kept abstract in order to focus on structure and behaviour, rather than on practical application.
</p>
<h3 id="d5d827cf83d242e6baecbda622ca5cb9">
Return <a href="#d5d827cf83d242e6baecbda622ca5cb9" title="permalink">#</a>
</h3>
<p>
Apart from flattening or monadic bind, a monad must also define a way to put a normal value into the monad. Conceptually, I call this function <em>return</em> (because that's the name that Haskell uses):
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IReader<R, A> <span style="color:#74531f;">Return</span><<span style="color:#2b91af;">R</span>, <span style="color:#2b91af;">A</span>>(A <span style="color:#1f377f;">a</span>)
{
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> ReturnReader<R, A>(a);
}
<span style="color:blue;">private</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ReturnReader</span><<span style="color:#2b91af;">R</span>, <span style="color:#2b91af;">A</span>> : IReader<R, A>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> A a;
<span style="color:blue;">public</span> <span style="color:#2b91af;">ReturnReader</span>(A <span style="color:#1f377f;">a</span>)
{
<span style="color:blue;">this</span>.a = a;
}
<span style="color:blue;">public</span> A <span style="color:#74531f;">Run</span>(R <span style="color:#1f377f;">environment</span>)
{
<span style="color:#8f08c4;">return</span> a;
}
}</pre>
</p>
<p>
This implementation returns the <code>a</code> value and completely ignores the <code>environment</code>. You can do the same with a 'naked' function.
</p>
<h3 id="bd79fe3d97644f0f9edb12f08a0b5d01">
Left identity <a href="#bd79fe3d97644f0f9edb12f08a0b5d01" title="permalink">#</a>
</h3>
<p>
We need to identify the <em>return</em> function in order to examine <a href="/2022/04/11/monad-laws">the monad laws</a>. Now that this is accomplished, let's see what the laws look like for the Reader monad, starting with the left identity law.
</p>
<p>
<pre>[Theory]
[InlineData(UriPartial.Authority, <span style="color:#a31515;">"https://example.com/f?o=o"</span>)]
[InlineData(UriPartial.Path, <span style="color:#a31515;">"https://example.net/b?a=r"</span>)]
[InlineData(UriPartial.Query, <span style="color:#a31515;">"https://example.org/b?a=z"</span>)]
[InlineData(UriPartial.Scheme, <span style="color:#a31515;">"https://example.gov/q?u=x"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">LeftIdentity</span>(UriPartial <span style="color:#1f377f;">a</span>, <span style="color:blue;">string</span> <span style="color:#1f377f;">u</span>)
{
Func<UriPartial, IReader<Uri, UriPartial>> <span style="color:#1f377f;">@return</span> =
<span style="color:#1f377f;">up</span> => Reader.Return<Uri, UriPartial>(up);
Func<UriPartial, IReader<Uri, <span style="color:blue;">string</span>>> <span style="color:#1f377f;">h</span> =
<span style="color:#1f377f;">up</span> => <span style="color:blue;">new</span> UriPartReader(up);
Assert.Equal(
@return(a).SelectMany(h).Run(<span style="color:blue;">new</span> Uri(u)),
h(a).Run(<span style="color:blue;">new</span> Uri(u)));
}</pre>
</p>
<p>
In order to compare the two Reader values, the test has to <code>Run</code> them and then compare the return values.
</p>
<p>
This test and the next uses a Reader implementation called <code>UriPartReader</code>, which almost makes sense:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">UriPartReader</span> : IReader<Uri, <span style="color:blue;">string</span>>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> UriPartial part;
<span style="color:blue;">public</span> <span style="color:#2b91af;">UriPartReader</span>(UriPartial <span style="color:#1f377f;">part</span>)
{
<span style="color:blue;">this</span>.part = part;
}
<span style="color:blue;">public</span> <span style="color:blue;">string</span> <span style="color:#74531f;">Run</span>(Uri <span style="color:#1f377f;">environment</span>)
{
<span style="color:#8f08c4;">return</span> environment.GetLeftPart(part);
}
}</pre>
</p>
<p>
Almost.
</p>
<h3 id="ad9edb1097be4eeb96170809272851eb">
Right identity <a href="#ad9edb1097be4eeb96170809272851eb" title="permalink">#</a>
</h3>
<p>
In a similar manner, we can showcase the right identity law as a test.
</p>
<p>
<pre>[Theory]
[InlineData(UriPartial.Authority, <span style="color:#a31515;">"https://example.com/q?u=ux"</span>)]
[InlineData(UriPartial.Path, <span style="color:#a31515;">"https://example.net/q?u=uuz"</span>)]
[InlineData(UriPartial.Query, <span style="color:#a31515;">"https://example.org/c?o=rge"</span>)]
[InlineData(UriPartial.Scheme, <span style="color:#a31515;">"https://example.gov/g?a=rply"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">RightIdentity</span>(UriPartial <span style="color:#1f377f;">a</span>, <span style="color:blue;">string</span> <span style="color:#1f377f;">u</span>)
{
Func<UriPartial, IReader<Uri, <span style="color:blue;">string</span>>> <span style="color:#1f377f;">f</span> =
<span style="color:#1f377f;">up</span> => <span style="color:blue;">new</span> UriPartReader(up);
Func<<span style="color:blue;">string</span>, IReader<Uri, <span style="color:blue;">string</span>>> <span style="color:#1f377f;">@return</span> =
<span style="color:#1f377f;">s</span> => Reader.Return<Uri, <span style="color:blue;">string</span>>(s);
IReader<Uri, <span style="color:blue;">string</span>> <span style="color:#1f377f;">m</span> = f(a);
Assert.Equal(
m.SelectMany(@return).Run(<span style="color:blue;">new</span> Uri(u)),
m.Run(<span style="color:blue;">new</span> Uri(u)));
}</pre>
</p>
<p>
As always, even a parametrised test constitutes no <em>proof</em> that the law holds. I show the tests to illustrate what the laws look like in 'real' code.
</p>
<h3 id="67a2f225bd8f432dbeed30c2ba2b623a">
Associativity <a href="#67a2f225bd8f432dbeed30c2ba2b623a" title="permalink">#</a>
</h3>
<p>
The last monad law is the associativity law that describes how (at least) three functions compose. We're going to need three functions. For the purpose of demonstrating the law, any three pure functions will do. While the following functions are silly and not at all 'realistic', they have the virtue of being as simple as they can be (while still providing a bit of variety). They don't 'mean' anything, so don't worry too much about their behaviour. It is, as far as I can tell, nonsensical.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">F</span> : IReader<<span style="color:blue;">int</span>, <span style="color:blue;">string</span>>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">char</span> c;
<span style="color:blue;">public</span> <span style="color:#2b91af;">F</span>(<span style="color:blue;">char</span> <span style="color:#1f377f;">c</span>)
{
<span style="color:blue;">this</span>.c = c;
}
<span style="color:blue;">public</span> <span style="color:blue;">string</span> <span style="color:#74531f;">Run</span>(<span style="color:blue;">int</span> <span style="color:#1f377f;">environment</span>)
{
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> <span style="color:blue;">string</span>(c, environment);
}
}
<span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">G</span> : IReader<<span style="color:blue;">int</span>, <span style="color:blue;">bool</span>>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">string</span> s;
<span style="color:blue;">public</span> <span style="color:#2b91af;">G</span>(<span style="color:blue;">string</span> <span style="color:#1f377f;">s</span>)
{
<span style="color:blue;">this</span>.s = s;
}
<span style="color:blue;">public</span> <span style="color:blue;">bool</span> <span style="color:#74531f;">Run</span>(<span style="color:blue;">int</span> <span style="color:#1f377f;">environment</span>)
{
<span style="color:#8f08c4;">return</span> environment < 42 || s.Contains(<span style="color:#a31515;">"a"</span>);
}
}
<span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">H</span> : IReader<<span style="color:blue;">int</span>, TimeSpan>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">bool</span> b;
<span style="color:blue;">public</span> <span style="color:#2b91af;">H</span>(<span style="color:blue;">bool</span> <span style="color:#1f377f;">b</span>)
{
<span style="color:blue;">this</span>.b = b;
}
<span style="color:blue;">public</span> TimeSpan <span style="color:#74531f;">Run</span>(<span style="color:blue;">int</span> <span style="color:#1f377f;">environment</span>)
{
<span style="color:#8f08c4;">return</span> b ?
TimeSpan.FromMinutes(environment) :
TimeSpan.FromSeconds(environment);
}
}</pre>
</p>
<p>
Armed with these three classes, we can now demonstrate the Associativity law:
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:#a31515;">'a'</span>, 0)]
[InlineData(<span style="color:#a31515;">'b'</span>, 1)]
[InlineData(<span style="color:#a31515;">'c'</span>, 42)]
[InlineData(<span style="color:#a31515;">'d'</span>, 2112)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">Associativity</span>(<span style="color:blue;">char</span> <span style="color:#1f377f;">a</span>, <span style="color:blue;">int</span> <span style="color:#1f377f;">i</span>)
{
Func<<span style="color:blue;">char</span>, IReader<<span style="color:blue;">int</span>, <span style="color:blue;">string</span>>> <span style="color:#1f377f;">f</span> = <span style="color:#1f377f;">c</span> => <span style="color:blue;">new</span> F(c);
Func<<span style="color:blue;">string</span>, IReader<<span style="color:blue;">int</span>, <span style="color:blue;">bool</span>>> <span style="color:#1f377f;">g</span> = <span style="color:#1f377f;">s</span> => <span style="color:blue;">new</span> G(s);
Func<<span style="color:blue;">bool</span>, IReader<<span style="color:blue;">int</span>, TimeSpan>> <span style="color:#1f377f;">h</span> = <span style="color:#1f377f;">b</span> => <span style="color:blue;">new</span> H(b);
IReader<<span style="color:blue;">int</span>, <span style="color:blue;">string</span>> <span style="color:#1f377f;">m</span> = f(a);
Assert.Equal(
m.SelectMany(g).SelectMany(h).Run(i),
m.SelectMany(<span style="color:#1f377f;">x</span> => g(x).SelectMany(h)).Run(i));
}</pre>
</p>
<p>
In case you're wondering, the four test cases produce the outputs <code>00:00:00</code>, <code>00:01:00</code>, <code>00:00:42</code>, and <code>00:35:12</code>. You can see that reproduced below:
</p>
<h3 id="a46f77dddd914f0b9c8926dd2c06e9d6">
Haskell <a href="#a46f77dddd914f0b9c8926dd2c06e9d6" title="permalink">#</a>
</h3>
<p>
In Haskell, normal functions <code>a -> b</code> are already <code>Monad</code> instances, which means that you can easily replicate the functions from the <code>Associativity</code> test:
</p>
<p>
<pre>> f c = \env -> replicate env c
> g s = \env -> env < 42 || 'a' `elem` s
> h b = \env -> if b then secondsToDiffTime (toEnum env * 60) else secondsToDiffTime (toEnum env)</pre>
</p>
<p>
I've chosen to write the <code>f</code>, <code>g</code>, and <code>h</code> as functions that return lambda expressions in order to emphasise that each of these functions return Readers. Since Haskell functions are already curried, I could also have written them in the more normal function style with two normal parameters, but that might have obscured the Reader aspect of each.
</p>
<p>
Here's the composition in action:
</p>
<p>
<pre>> f 'a' >>= g >>= h $ 0
0s
> f 'b' >>= g >>= h $ 1
60s
> f 'c' >>= g >>= h $ 42
42s
> f 'd' >>= g >>= h $ 2112
2112s</pre>
</p>
<p>
In case you are wondering, 2,112 seconds is 35 minutes and 12 seconds, so all outputs fit with the results reported for the C# example.
</p>
<p>
What the above Haskell GHCi (<a href="https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop">REPL</a>) session demonstrates is that it's possible to compose functions with Haskell's monadic bind operator <code>>>=</code> operator exactly because all functions are (Reader) monads.
</p>
<h3 id="fb5e50be9b66464ca61cd4a45eb7e756">
Conclusion <a href="#fb5e50be9b66464ca61cd4a45eb7e756" title="permalink">#</a>
</h3>
<p>
In Haskell, it can occasionally be useful that a function can be used when a <code>Monad</code> is required. Some Haskell libraries are defined in very general terms. Their APIs may enable you to call functions with any monadic input value. You can, say, pass a <a href="/2022/04/25/the-maybe-monad">Maybe</a>, a <a href="/2022/04/19/the-list-monad">List</a>, an <a href="/2022/05/09/an-either-monad">Either</a>, a State, but you can also pass a function.
</p>
<p>
C# and most other languages (F# included) doesn't come with that level of abstraction, so the fact that a function forms a monad is less useful there. In fact, I can't recall having made explicit use of this knowledge in C#, but one never knows if that day arrives.
</p>
<p>
In a similar vein, knowing that <a href="/2018/04/16/endomorphic-composite-as-a-monoid">endomorphisms form monoids</a> (and thereby also <a href="/2017/11/27/semigroups">semigroups</a>) enabled me to <a href="/2020/12/14/validation-a-solved-problem">quickly identify the correct design for a validation problem</a>.
</p>
<p>
Who knows? One day the knowledge that functions are monads may come in handy.
</p>
<p>
<strong>Next:</strong> <a href="/2025/03/03/reactive-monad">Reactive monad</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Applicative assertionshttps://blog.ploeh.dk/2022/11/07/applicative-assertions2022-11-07T06:56:00+00:00Mark Seemann
<div id="post">
<p>
<em>An exploration.</em>
</p>
<p>
In a recent Twitter exchange, <a href="https://lucasdicioccio.github.io/">Lucas DiCioccio</a> made an interesting observation:
</p>
<blockquote>
<p>
"Imho the properties you want of an assertion-framework are really close (the same as?) applicative-validation: one assertion failure with multiple bullet points composed mainly from combinators."
</p>
<footer><cite><a href="https://twitter.com/lucasdicioccio/status/1572264819109003265">Lucas DiCioccio</a></cite></footer>
</blockquote>
<p>
In another branch off my initial tweet <a href="https://www.joshka.net/">Josh McKinney</a> pointed out the short-circuiting nature of standard assertions:
</p>
<blockquote>
<p>
"short circuiting often causes weaker error messages in failing tests than running compound assertions. E.g.
</p>
<p>
<pre>TransferTest {
a.transfer(b,50);
a.shouldEqual(50);
b.shouldEqual(150); // never reached?
}</pre>
</p>
<footer><cite><a href="https://twitter.com/joshuamck/status/1572232484884217864">Josh McK</a></cite></footer>
</blockquote>
<p>
Most standard assertion libraries work by throwing exceptions when an assertion fails. Once you throw an exception, remaining code doesn't execute. This means that you only get the first assertion message. Further assertions are not evaluated.
</p>
<p>
Josh McKinney <a href="https://twitter.com/joshuamck/status/1572528796125003777">later gave more details about a particular scenario</a>. Although in the general case I don't consider the short-circuiting nature of assertions to be a problem, I grant that there are cases where proper assertion composition would be useful.
</p>
<p>
Lucas DiCioccio's suggestion seems worthy of investigation.
</p>
<h3 id="54fc71d7459d4251a79dc16f58bd79b3">
Ongoing exploration <a href="#54fc71d7459d4251a79dc16f58bd79b3" title="permalink">#</a>
</h3>
<p>
<a href="https://twitter.com/ploeh/status/1572282314402721805">I asked</a> Lucas DiCioccio whether he'd done any work with his idea, and the day after <a href="https://twitter.com/lucasdicioccio/status/1572639255582867456">he replied</a> with a <a href="https://www.haskell.org">Haskell</a> proof of concept.
</p>
<p>
I found the idea so interesting that I also wanted to carry out a few proofs of concept myself, perhaps within a more realistic setting.
</p>
<p>
As I'm writing this, I've reached some preliminary conclusions, but I'm also aware that they may not hold in more general cases. I'm posting what I have so far, but you should expect this exploration to evolve over time. If I find out more, I'll update this post with more articles.
</p>
<ul>
<li><a href="/2022/11/28/an-initial-proof-of-concept-of-applicative-assertions-in-c">An initial proof of concept of applicative assertions in C#</a></li>
<li><a href="/2022/12/19/error-accumulating-composable-assertions-in-c">Error-accumulating composable assertions in C#</a></li>
<li><a href="/2023/01/30/built-in-alternatives-to-applicative-assertions">Built-in alternatives to applicative assertions</a></li>
</ul>
<p>
A preliminary summary is in order. Based on the first two articles, applicative assertions look like overkill. I think, however, that it's because of the degenerate nature of the example. Some assertions are essentially one-stop verifications: Evaluate a predicate, and throw an exception if the result is <em>false</em>. These assertions return <a href="/2018/01/15/unit-isomorphisms">unit or void</a>. Examples from <a href="https://xunit.net/">xUnit</a> include <code>Assert.Equal</code>, <code>Assert.True</code>, <code>Assert.False</code>, <code>Assert.All</code>, and <code>Assert.DoesNotContain</code>.
</p>
<p>
These are the kinds of assertions that the initial two articles explore.
</p>
<p>
There are other kinds of assertions that return a value in case of success. xUnit.net examples include <code>Assert.Throws</code>, <code>Assert.Single</code>, <code>Assert.IsAssignableFrom</code>, and some overloads of <code>Assert.Contains</code>. <code>Assert.Single</code>, for example, verifies that a collection contains only a single element. While it throws an exception if the collection is either empty or has more than one element, in the success case it returns the single value. This can be useful if you want to add more assertions based on that value.
</p>
<p>
I haven't experimented with this yet, but as far as can tell, you'll run into the following problem: If you make such an assertion return an <a href="/2018/10/01/applicative-functors">applicative functor</a>, you'll need some way to handle the success case. Combining it with another assertion-producing function, such as <code>a -> Asserted e b</code> (pseudocode) is possible with <a href="/2018/03/22/functors">functor</a> mapping, but will leave you with a nested functor.
</p>
<p>
You'll probably want to flatten the nested functor, which is exactly what <a href="/2022/03/28/monads">monads</a> do. Monads, on the other hand, short circuit, so you don't want to make your applicative assertion type a monad. Instead, you'll need to use an isomorphic monad container (<a href="/2022/05/09/an-either-monad">Either</a> should do) to move in and out of. Doable, but is the complexity warranted?
</p>
<p>
I realise that the above musings are abstract, and that I really should show rather than tell. I'll add some more content here if I ever collect something worthy of an article. if you ask me now, though, I consider that a bit of a toss-up.
</p>
<p>
The first two examples also suffer from being written in C#, which doesn't have good syntactic support for applicative functors. Perhaps I'll add some articles that use <a href="https://fsharp.org/">F#</a> or Haskell.
</p>
<h3 id="676b3bf45f0841bc9a51d3510d917a6a">
Conclusion <a href="#676b3bf45f0841bc9a51d3510d917a6a" title="permalink">#</a>
</h3>
<p>
There's the occasional need for composable assertions. You can achieve that with an applicative functor, but the question is whether it's warranted. Could you make something simpler based on the <a href="/2022/04/19/the-list-monad">list monad</a>?
</p>
<p>
As I'm writing this, I don't consider that question settled. Even so, you may want to read on.
</p>
<p>
<strong>Next:</strong> <a href="/2022/11/28/an-initial-proof-of-concept-of-applicative-assertions-in-c">An initial proof of concept of applicative assertions in C#</a>.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="e3269279056146f985c8405f6d3ad286">
<div class="comment-author"><a href="https://about.me/tysonwilliams">Tyson Williams</a> <a href="#e3269279056146f985c8405f6d3ad286">#</a></div>
<div class="comment-content">
<blockquote>
Monads, on the other hand, short circuit, so you don't want to make your applicative assertion type a monad.
</blockquote>
<p>
I want my assertion type to be both applicative and monadic.
So does Paul Loath, the creator of Language Ext,
which is most clearly seen via <a href="https://github.com/louthy/language-ext/blob/main/LanguageExt.Tests/ValidationTests.cs#L267-L277">this Validation test code</a>.
So does Alexis King (as you pointed out to me) in her Haskell Validation package,
which violiates Hakell's monad type class,
and which she defends <a href="https://hackage.haskell.org/package/monad-validate-1.2.0.1/docs/Control-Monad-Validate.html#:~:text=ValidateT%20and%20the%20Monad%20laws">here</a>.
</p>
<p>
When I want (or typically need) short-circuiting behavior,
then I use the type's monadic API.
When I want "error-collecting behavior",
then I use the type's applicative API.
</p>
<blockquote>
The first two examples also suffer from being written in C#, which doesn't have good syntactic support for applicative functors.
</blockquote>
<p>
The best syntactic support for applicative functors in C# that I have seen is in Langauge Ext.
<a href="https://github.com/louthy/language-ext/blob/15cd875ea40925e2ca9cd702c84f9142918dbb77/LanguageExt.Tests/ValidationTests.cs#L272-L277">A comment explains</a>
in that same Validation test how it works,
and the line after the comment shows it in action.
</p>
</div>
<div class="comment-date">2023-01-16 21:13 UTC</div>
</div>
<div class="comment" id="a1516515c1b84cb2b8473f2c0321562e">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#a1516515c1b84cb2b8473f2c0321562e">#</a></div>
<div class="comment-content">
<p>
Tyson, thank you for writing. Whether or not you want to enable monadic short-circuiting for assertions or validations depends, I think, on 'developer ergonomics'. It's a trade-off mainly between <em>ease</em> and <em>simplicity</em> as <a href="https://www.infoq.com/presentations/Simple-Made-Easy/">outlined by Rich Hickey</a>. Enabling a monadic API for something that isn't naturally monadic does indeed provide ease of use, in that the compositional capabilities of a monad are readily 'at hand'.
</p>
<p>
If you don't have that capability you'll have to map back and forth between, say, <code>Validation</code> and <code>Either</code> (if using the <a href="https://hackage.haskell.org/package/validation">validation</a> package). This is tedious, but <a href="https://peps.python.org/pep-0020/">explicit</a>.
</p>
<p>
Making validation or assertions monadic makes it easier to compose nested values, but also (in my experience) makes it easier to make mistakes, in the sense that you (or a colleague) may <em>think</em> that the behaviour is error-collecting, whereas in reality it's short-circuiting.
</p>
<p>
In the end, the trade-off may reduce to how much you trust yourself (and co-workers) to steer clear of mistakes, and how important it is to avoid errors. In this case, how important is it to collect the errors, rather than short-circuiting?
</p>
<p>
You can choose one alternative or the other by weighing such concerns.
</p>
</div>
<div class="comment-date">2023-01-19 8:30 UTC</div>
</div>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.A regular grid emergeshttps://blog.ploeh.dk/2022/10/31/a-regular-grid-emerges2022-10-31T06:44:00+00:00Mark Seemann
<div id="post">
<p>
<em>The code behind a lecture animation.</em>
</p>
<p>
If you've seen my presentation <a href="https://youtu.be/FPEEiX5unWI">Fractal Architecture</a>, you may have wondered how I made the animation where a regular(ish) hexagonal grid emerges from adding more and more blobs to an area.
</p>
<p>
<img src="/content/binary/a-regular-grid-emerges.png" alt="A grid-like structure starting to emerge from tightly packing blobs.">
</p>
<p>
Like <a href="/2021/04/05/mazes-on-voronoi-tesselations">a few</a> <a href="/2021/07/05/fractal-hex-flowers">previous</a> blog posts, today's article appears on <a href="https://observablehq.com">Observable</a>, which is where the animation and the code that creates it lives. <a href="https://observablehq.com/@ploeh/a-regular-grid-emerges">Go there to read it</a>.
</p>
<p>
If you have time, watch the animation evolve. Personally I find it quite mesmerising.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Encapsulation in Functional Programminghttps://blog.ploeh.dk/2022/10/24/encapsulation-in-functional-programming2022-10-24T05:54:00+00:00Mark Seemann
<div id="post">
<p>
<em>Encapsulation is only relevant for object-oriented programming, right?</em>
</p>
<p>
The concept of <em>encapsulation</em> is closely related to object-oriented programming (OOP), and you rarely hear the word in discussions about (statically-typed) functional programming (FP). I will argue, however, that the notion is relevant in FP as well. Typically, it just appears with a different catchphrase.
</p>
<h3 id="f0f64bfaaa6f4d22b990fae4775a8b89">
Contracts <a href="#f0f64bfaaa6f4d22b990fae4775a8b89" title="permalink">#</a>
</h3>
<p>
I base my understanding of encapsulation on <a href="/ref/oosc">Object-Oriented Software Construction</a>. I've tried to distil it in my Pluralsight course <a href="/encapsulation-and-solid">Encapsulation and SOLID</a>.
</p>
<p>
In short, encapsulation denotes the distinction between an object's contract and its implementation. An object should fulfil its contract in such a way that client code doesn't need to know about its implementation.
</p>
<p>
Contracts, according to <a href="https://en.wikipedia.org/wiki/Bertrand_Meyer">Bertrand Meyer</a>, describe three properties of objects:
</p>
<ul>
<li>Preconditions: What client code must fulfil in order to successfully interact with the object.</li>
<li>Invariants: Statements about the object that are always true.</li>
<li>Postconditions: Statements that are guaranteed to be true after a successful interaction between client code and object.</li>
</ul>
<p>
You can replace <em>object</em> with <em>value</em> and I'd argue that the same concerns are relevant in FP.
</p>
<p>
In OOP <em>invariants</em> often point to the properties of an object that are guaranteed to remain even in the face of state mutation. As you change the state of an object, the object should guarantee that its state remains valid. These are the properties (i.e. <em>qualities</em>, <em>traits</em>, <em>attributes</em>) that don't vary - i.e. are <em>invariant</em>.
</p>
<p>
An example would be helpful around here.
</p>
<h3 id="72a37691c0c14d4a8673d52f25e7c3e2">
Table mutation <a href="#72a37691c0c14d4a8673d52f25e7c3e2" title="permalink">#</a>
</h3>
<p>
Consider an object that models a table in a restaurant. You may, for example, be working on <a href="/2020/01/27/the-maitre-d-kata">the Maître d' kata</a>. In short, you may decide to model a table as being one of two kinds: Standard tables and communal tables. You can reserve seats at communal tables, but you still share the table with other people.
</p>
<p>
You may decide to model the problem in such a way that when you reserve the table, you change the state of the object. You may decide to describe the contract of <code>Table</code> objects like this:
</p>
<ul>
<li>Preconditions
<ul>
<li>To create a <code>Table</code> object, you must supply a type (standard or communal).</li>
<li>To create a <code>Table</code> object, you must supply the size of the table, which is a measure of its capacity; i.e. how many people can sit at it.</li>
<li>The capacity must be a natural number. <em>One</em> (1) is the smallest valid capacity.</li>
<li>When reserving a table, you must supply a valid reservation.</li>
<li>When reserving a table, the reservation quantity must be less than or equal to the table's remaining capacity.</li>
</ul>
</li>
<li>Invariants
<ul>
<li>The table capacity doesn't change.</li>
<li>The table type doesn't change.</li>
<li>The number of remaining seats is never negative.</li>
<li>The number of remaining seats is never greater than the table's capacity.</li>
</ul>
</li>
<li>Postconditions
<ul>
<li>After reserving a table, the number of remaining seats can't be greater than the previous number of remaining seats minus the reservation quantity.</li>
</ul>
</li>
</ul>
<p>
This list may be incomplete, and if you add more operations, you may have to elaborate on what that means to the contract.
</p>
<p>
In C# you may implement a <code>Table</code> class like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Table</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> List<Reservation> reservations;
<span style="color:blue;">public</span> <span style="color:#2b91af;">Table</span>(<span style="color:blue;">int</span> capacity, TableType type)
{
<span style="color:blue;">if</span> (capacity < 1)
<span style="color:blue;">throw</span> <span style="color:blue;">new</span> ArgumentOutOfRangeException(
nameof(capacity),
<span style="color:#a31515;">$"Capacity must be greater than zero, but was: </span>{capacity}<span style="color:#a31515;">."</span>);
reservations = <span style="color:blue;">new</span> List<Reservation>();
Capacity = capacity;
Type = type;
RemaingSeats = capacity;
}
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Capacity { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> TableType Type { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">int</span> RemaingSeats { <span style="color:blue;">get</span>; <span style="color:blue;">private</span> <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">void</span> Reserve(Reservation reservation)
{
<span style="color:blue;">if</span> (RemaingSeats < reservation.Quantity)
<span style="color:blue;">throw</span> <span style="color:blue;">new</span> InvalidOperationException(
<span style="color:#a31515;">"The table has no remaining seats."</span>);
<span style="color:blue;">if</span> (Type == TableType.Communal)
RemaingSeats -= reservation.Quantity;
<span style="color:blue;">else</span>
RemaingSeats = 0;
reservations.Add(reservation);
}
}</pre>
</p>
<p>
This class has good encapsulation because it makes sure to fulfil the contract. You can't put it in an invalid state.
</p>
<h3 id="b37e173371f249d99f80d12d64b2bee2">
Immutable Table <a href="#b37e173371f249d99f80d12d64b2bee2" title="permalink">#</a>
</h3>
<p>
Notice that two of the invariants for the above <code>Table</code> class is that the table can't change type or capacity. While OOP often revolves around state mutation, it seems reasonable that some data is immutable. A table doesn't all of a sudden change size.
</p>
<p>
In FP data is immutable. Data doesn't change. Thus, data has that invariant property.
</p>
<p>
If you consider the above contract, it still applies to FP. The specifics change, though. You'll no longer be dealing with <code>Table</code> objects, but rather <code>Table</code> data, and to make reservations, you call a function that returns a new <code>Table</code> value.
</p>
<p>
In <a href="https://fsharp.org/">F#</a> you could model a <code>Table</code> like this:
</p>
<p>
<pre><span style="color:blue;">type</span> Table = <span style="color:blue;">private</span> Standard <span style="color:blue;">of</span> int * Reservation list | Communal <span style="color:blue;">of</span> int * Reservation list
<span style="color:blue;">module</span> Table =
<span style="color:blue;">let</span> standard capacity =
<span style="color:blue;">if</span> 0 < capacity
<span style="color:blue;">then</span> Some (Standard (capacity, []))
<span style="color:blue;">else</span> None
<span style="color:blue;">let</span> communal capacity =
<span style="color:blue;">if</span> 0 < capacity
<span style="color:blue;">then</span> Some (Communal (capacity, []))
<span style="color:blue;">else</span> None
<span style="color:blue;">let</span> remainingSeats = <span style="color:blue;">function</span>
| Standard (capacity, []) <span style="color:blue;">-></span> capacity
| Standard _ <span style="color:blue;">-></span> 0
| Communal (capacity, rs) <span style="color:blue;">-></span> capacity - List.sumBy (<span style="color:blue;">fun</span> r <span style="color:blue;">-></span> r.Quantity) rs
<span style="color:blue;">let</span> reserve r t =
<span style="color:blue;">match</span> t <span style="color:blue;">with</span>
| Standard (capacity, []) <span style="color:blue;">when</span> r.Quantity <= remainingSeats t <span style="color:blue;">-></span>
Some (Standard (capacity, [r]))
| Communal (capacity, rs) <span style="color:blue;">when</span> r.Quantity <= remainingSeats t <span style="color:blue;">-></span>
Some (Communal (capacity, r :: rs))
| _ <span style="color:blue;">-></span> None</pre>
</p>
<p>
While you'll often hear fsharpers say that one should <a href="https://blog.janestreet.com/effective-ml-video/">make illegal states unrepresentable</a>, in practice you often have to rely on <a href="https://www.hillelwayne.com/post/constructive/">predicative</a> data to enforce contracts. I've done this here by making the <code>Table</code> cases <code>private</code>. Code outside the module can't directly create <code>Table</code> data. Instead, it'll have to use one of two functions: <code>Table.standard</code> or <code>Table.communal</code>. These are functions that return <code>Table option</code> values.
</p>
<p>
That's the idiomatic way to model predicative data in statically typed FP. In <a href="https://www.haskell.org/">Haskell</a> such functions are called <a href="https://wiki.haskell.org/Smart_constructors">smart constructors</a>.
</p>
<p>
Statically typed FP typically use <a href="/2022/04/25/the-maybe-monad">Maybe</a> (<code>Option</code>) or <a href="/2022/05/09/an-either-monad">Either</a> (<code>Result</code>) values to communicate failure, rather than throwing exceptions, but apart from that a smart constructor is just an object constructor.
</p>
<p>
The above F# <code>Table</code> API implements the same contract as the OOP version.
</p>
<p>
If you want to see a more elaborate example of modelling table and reservations in F#, see <a href="/2020/04/27/an-f-implementation-of-the-maitre-d-kata">An F# implementation of the Maître d' kata</a>.
</p>
<h3 id="78027bc1d1414c2fa3604a68c9df6418">
Functional contracts in OOP languages <a href="#78027bc1d1414c2fa3604a68c9df6418" title="permalink">#</a>
</h3>
<p>
You can adopt many FP concepts in OOP languages. My book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> contains sample code in C# that implements an online restaurant reservation system. It includes a <code>Table</code> class that, at first glance, looks like the above C# class.
</p>
<p>
While it has the same contract, the book's <code>Table</code> class is implemented with the FP design principles in mind. Thus, it's an immutable class with this API:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Table</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Table Standard(<span style="color:blue;">int</span> seats)
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Table Communal(<span style="color:blue;">int</span> seats)
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Capacity { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">int</span> RemainingSeats { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> Table Reserve(Reservation reservation)
<span style="color:blue;">public</span> T Accept<<span style="color:#2b91af;">T</span>>(ITableVisitor<T> visitor)
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">bool</span> Equals(<span style="color:blue;">object</span>? obj)
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">int</span> GetHashCode()
}</pre>
</p>
<p>
Notice that the <code>Reserve</code> method returns a <code>Table</code> object. That's the table with the reservation associated. The original <code>Table</code> instance remains unchanged.
</p>
<p>
The entire book is written in the <a href="https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell">Functional Core, Imperative Shell</a> architecture, so all domain models are immutable objects with <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a> as methods.
</p>
<p>
The objects still have contracts. They have proper encapsulation.
</p>
<h3 id="ca2409555a5b4efe9b98e1c65e77256d">
Conclusion <a href="#ca2409555a5b4efe9b98e1c65e77256d" title="permalink">#</a>
</h3>
<p>
Functional programmers may not use the term <em>encapsulation</em> much, but that doesn't mean that they don't share that kind of concern. They often throw around the phrase <em>make illegal states unrepresentable</em> or talk about smart constructors or <a href="https://en.wikipedia.org/wiki/Partial_function">partial versus total functions</a>. It's clear that they care about data modelling that prevents mistakes.
</p>
<p>
The object-oriented notion of <em>encapsulation</em> is ultimately about separating the affordances of an API from its implementation details. An object's contract is an abstract description of the properties (i.e. <em>qualities</em>, <em>traits</em>, or <em>attributes</em>) of the object.
</p>
<p>
Functional programmers care so much about the properties of data and functions that <em>property-based testing</em> is often the preferred way to perform automated testing.
</p>
<p>
Perhaps you can find a functional programmer who might be slightly offended if you suggest that he or she should consider encapsulation. If so, suggest instead that he or she considers the properties of functions and data.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="bb616acbc1ac41cb8f937fe7175ce061">
<div class="comment-author"><a href="http://www.raboof.com/">Atif Aziz</a> <a href="#bb616acbc1ac41cb8f937fe7175ce061">#</a></div>
<div class="comment-content">
<p>
I wonder what's the goal of illustrating OOP-ish examples exclusively in C# and FP-ish ones in F# when you could stick to just one language for the reader. It might not always be as effective depending on the topic, but for encapsulation and the examples shown in this article, a C# version would read just as effective as an F# one. I mean when you get round to making your points in the <strong>Immutable Table</strong> section of your article, you could demonstrate the ideas with a C# version that's nearly identical to and reads as succinct as the F# version:
</p>
<pre><span style="color:gray;">#nullable</span> <span style="color:gray;">enable</span>
<span style="color:blue;">readonly</span> <span style="color:blue;">record</span> <span style="color:blue;">struct</span> <span style="color:darkblue;">Reservation</span>(<span style="color:blue;">int</span> <span style="color:#1f377f;">Quantity</span>);
<span style="color:blue;">abstract</span> <span style="color:blue;">record</span> <span style="color:darkblue;">Table</span>;
<span style="color:blue;">record</span> <span style="color:darkblue;">StandardTable</span>(<span style="color:blue;">int</span> <span style="color:#1f377f;">Capacity</span>, <span style="color:darkblue;">Reservation</span>? <span style="color:#1f377f;">Reservation</span>): <span style="color:darkblue;">Table</span>;
<span style="color:blue;">record</span> <span style="color:darkblue;">CommunalTable</span>(<span style="color:blue;">int</span> <span style="color:#1f377f;">Capacity</span>, <span style="color:darkblue;">ImmutableArray</span><<span style="color:darkblue;">Reservation</span>> <span style="color:#1f377f;">Reservations</span>): <span style="color:darkblue;">Table</span>;
<span style="color:blue;">static</span> <span style="color:blue;">class</span> <span style="color:darkblue;">TableModule</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:darkblue;">StandardTable</span>? <span style="color:darkcyan;">Standard</span>(<span style="color:blue;">int</span> <span style="color:#1f377f;">capacity</span>) =>
0 < capacity ? <span style="color:blue;">new</span> <span style="color:darkblue;">StandardTable</span>(capacity, <span style="color:blue;">null</span>) : <span style="color:blue;">null</span>;
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:darkblue;">CommunalTable</span>? <span style="color:darkcyan;">Communal</span>(<span style="color:blue;">int</span> <span style="color:#1f377f;">capacity</span>) =>
0 < capacity ? <span style="color:blue;">new</span> <span style="color:darkblue;">CommunalTable</span>(capacity, <span style="color:darkblue;">ImmutableArray</span><<span style="color:darkblue;">Reservation</span>>.Empty) : <span style="color:blue;">null</span>;
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">int</span> <span style="color:darkcyan;">RemainingSeats</span>(<span style="color:blue;">this</span> <span style="color:darkblue;">Table</span> <span style="color:#1f377f;">table</span>) => table <span style="color:#8f08c4;">switch</span>
{
<span style="color:darkblue;">StandardTable</span> { <span style="color:purple;">Reservation</span>: <span style="color:blue;">null</span> } t => t.<span style="color:purple;">Capacity</span>,
<span style="color:darkblue;">StandardTable</span> => 0,
<span style="color:darkblue;">CommunalTable</span> <span style="color:#1f377f;">t</span> => t.<span style="color:purple;">Capacity</span> - t.<span style="color:purple;">Reservations</span>.Sum(<span style="color:#1f377f;">r</span> => r.Quantity)
};
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:darkblue;">Table</span>? <span style="color:darkcyan;">Reserve</span>(<span style="color:blue;">this</span> <span style="color:darkblue;">Table</span> <span style="color:#1f377f;">table</span>, <span style="color:darkblue;">Reservation</span> <span style="color:#1f377f;">r</span>) => table <span style="color:#8f08c4;">switch</span>
{
<span style="color:darkblue;">StandardTable</span> <span style="color:#1f377f;">t</span> <span style="color:#8f08c4;">when</span> r.<span style="color:purple;">Quantity</span> <= t.<span style="color:darkcyan;">RemainingSeats</span>() => t <span style="color:blue;">with</span> { <span style="color:purple;">Reservation</span> = r },
<span style="color:darkblue;">CommunalTable</span> <span style="color:#1f377f;">t</span> <span style="color:#8f08c4;">when</span> r.<span style="color:purple;">Quantity</span> <= t.<span style="color:darkcyan;">RemainingSeats</span>() => t <span style="color:blue;">with</span> { <span style="color:purple;">Reservations</span> = t.<span style="color:purple;">Reservations</span>.Add(r) },
<span style="color:blue;">_</span> => <span style="color:blue;">null</span>,
};
}
</pre>
<p>
This way, I can just point someone to your article for enlightenment, 😉 but not leave them feeling frustrated that they need F# to (practice and) model around data instead of state mutating objects. It might still be worthwhile to show an F# version to draw the similarities and also call out some differences; like <code>Table</code> being a true discriminated union in F#, and while it appears to be emulated in C#, they desugar to the same thing in terms of CLR types and hierarchies.
</p>
<p>
By the way, in the C# example above, I modeled the standard table variant differently because if it can hold only one reservation at a time then the model should reflect that.
</p>
</div>
<div class="comment-date">2022-10-27 16:09 UTC</div>
</div>
<div class="comment" id="7069ea2b33a64a1caf7247c3a1543bac">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#7069ea2b33a64a1caf7247c3a1543bac">#</a></div>
<div class="comment-content">
<p>
Atif, thank you for supplying and example of an immutable C# implementation.
</p>
<p>
I already have an example of an immutable, functional C# implementation in <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, so I wanted to supply something else here. I also tend to find it interesting to compare how to model similar ideas in different languages, and it felt natural to supply an F# example to show how a 'natural' FP implementation might look.
</p>
<p>
Your point is valid, though, so I'm not insisting that this was the right decision.
</p>
</div>
<div class="comment-date">2022-10-28 8:50 UTC</div>
</div>
<div class="comment" id="a7b4d4d0dcc8432fb3b49cb7189d8123">
<div class="comment-author"><a href="https://github.com/sebastianfrelle">Sebastian Frelle Koch</a> <a href="#a7b4d4d0dcc8432fb3b49cb7189d8123">#</a></div>
<div class="comment-content">
<p>I took your idea, Atif, and wrote something that I think is more congruent with the example <a href="#78027bc1d1414c2fa3604a68c9df6418">here</a>. In short, I’m</p>
<ul>
<li>using polymorphism to avoid having to switch over the Table type</li>
<li>hiding subtypes of Table to simplify the interface.</li>
</ul>
<p>Here's the code:</p>
<div class="language-cs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="err">#</span><span class="n">nullable</span> <span class="n">enable</span>
<span class="k">using</span> <span class="nn">System.Collections.Immutable</span><span class="p">;</span>
<span class="k">readonly</span> <span class="n">record</span> <span class="k">struct</span> <span class="nc">Reservation</span><span class="p">(</span><span class="kt">int</span> <span class="n">Quantity</span><span class="p">);</span>
<span class="k">abstract</span> <span class="n">record</span> <span class="n">Table</span>
<span class="p">{</span>
<span class="k">public</span> <span class="k">abstract</span> <span class="n">Table</span><span class="p">?</span> <span class="nf">Reserve</span><span class="p">(</span><span class="n">Reservation</span> <span class="n">r</span><span class="p">);</span>
<span class="k">public</span> <span class="k">abstract</span> <span class="kt">int</span> <span class="nf">RemainingSeats</span><span class="p">();</span>
<span class="k">public</span> <span class="k">static</span> <span class="n">Table</span><span class="p">?</span> <span class="nf">Standard</span><span class="p">(</span><span class="kt">int</span> <span class="n">capacity</span><span class="p">)</span> <span class="p">=></span>
<span class="n">capacity</span> <span class="p">></span> <span class="m">0</span> <span class="p">?</span> <span class="k">new</span> <span class="nf">StandardTable</span><span class="p">(</span><span class="n">capacity</span><span class="p">,</span> <span class="k">null</span><span class="p">)</span> <span class="p">:</span> <span class="k">null</span><span class="p">;</span>
<span class="k">public</span> <span class="k">static</span> <span class="n">Table</span><span class="p">?</span> <span class="nf">Communal</span><span class="p">(</span><span class="kt">int</span> <span class="n">capacity</span><span class="p">)</span> <span class="p">=></span>
<span class="n">capacity</span> <span class="p">></span> <span class="m">0</span> <span class="p">?</span> <span class="k">new</span> <span class="nf">CommunalTable</span><span class="p">(</span>
<span class="n">capacity</span><span class="p">,</span>
<span class="n">ImmutableArray</span><span class="p"><</span><span class="n">Reservation</span><span class="p">>.</span><span class="n">Empty</span><span class="p">)</span> <span class="p">:</span> <span class="k">null</span><span class="p">;</span>
<span class="k">private</span> <span class="n">record</span> <span class="nf">StandardTable</span><span class="p">(</span><span class="kt">int</span> <span class="n">Capacity</span><span class="p">,</span> <span class="n">Reservation</span><span class="p">?</span> <span class="n">Reservation</span><span class="p">)</span> <span class="p">:</span> <span class="n">Table</span>
<span class="p">{</span>
<span class="k">public</span> <span class="k">override</span> <span class="n">Table</span><span class="p">?</span> <span class="nf">Reserve</span><span class="p">(</span><span class="n">Reservation</span> <span class="n">r</span><span class="p">)</span> <span class="p">=></span> <span class="nf">RemainingSeats</span><span class="p">()</span> <span class="k">switch</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">seats</span> <span class="n">when</span> <span class="n">seats</span> <span class="p">>=</span> <span class="n">r</span><span class="p">.</span><span class="n">Quantity</span> <span class="p">=></span> <span class="k">this</span> <span class="n">with</span> <span class="p">{</span> <span class="n">Reservation</span> <span class="p">=</span> <span class="n">r</span> <span class="p">},</span>
<span class="n">_</span> <span class="p">=></span> <span class="k">null</span><span class="p">,</span>
<span class="p">};</span>
<span class="k">public</span> <span class="k">override</span> <span class="kt">int</span> <span class="nf">RemainingSeats</span><span class="p">()</span> <span class="p">=></span> <span class="n">Reservation</span> <span class="k">switch</span>
<span class="p">{</span>
<span class="k">null</span> <span class="p">=></span> <span class="n">Capacity</span><span class="p">,</span>
<span class="n">_</span> <span class="p">=></span> <span class="m">0</span><span class="p">,</span>
<span class="p">};</span>
<span class="p">}</span>
<span class="k">private</span> <span class="n">record</span> <span class="nf">CommunalTable</span><span class="p">(</span>
<span class="kt">int</span> <span class="n">Capacity</span><span class="p">,</span>
<span class="n">ImmutableArray</span><span class="p"><</span><span class="n">Reservation</span><span class="p">></span> <span class="n">Reservations</span><span class="p">)</span> <span class="p">:</span> <span class="n">Table</span>
<span class="p">{</span>
<span class="k">public</span> <span class="k">override</span> <span class="n">Table</span><span class="p">?</span> <span class="nf">Reserve</span><span class="p">(</span><span class="n">Reservation</span> <span class="n">r</span><span class="p">)</span> <span class="p">=></span> <span class="nf">RemainingSeats</span><span class="p">()</span> <span class="k">switch</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">seats</span> <span class="n">when</span> <span class="n">seats</span> <span class="p">>=</span> <span class="n">r</span><span class="p">.</span><span class="n">Quantity</span> <span class="p">=></span>
<span class="k">this</span> <span class="n">with</span> <span class="p">{</span> <span class="n">Reservations</span> <span class="p">=</span> <span class="n">Reservations</span><span class="p">.</span><span class="nf">Add</span><span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="p">},</span>
<span class="n">_</span> <span class="p">=></span> <span class="k">null</span><span class="p">,</span>
<span class="p">};</span>
<span class="k">public</span> <span class="k">override</span> <span class="kt">int</span> <span class="nf">RemainingSeats</span><span class="p">()</span> <span class="p">=></span>
<span class="n">Capacity</span> <span class="p">-</span> <span class="n">Reservations</span><span class="p">.</span><span class="nf">Sum</span><span class="p">(</span><span class="n">r</span> <span class="p">=></span> <span class="n">r</span><span class="p">.</span><span class="n">Quantity</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>I’d love to hear your thoughts on this approach. I think that one of its weaknesses is that calls to <code class="language-plaintext highlighter-rouge">Table.Standard()</code> and <code class="language-plaintext highlighter-rouge">Table.Communal()</code> will yield two instances of <code class="language-plaintext highlighter-rouge">Table</code> that can never be equal. For instance, <code class="language-plaintext highlighter-rouge">Table.Standard(4) != Table.Communal(4)</code>, even though they’re both of type <code class="language-plaintext highlighter-rouge">Table?</code> and have the same number of seats.
</p>
<p>
Calling <code class="language-plaintext highlighter-rouge">GetType()</code> on each of the instances reveals that their types are actually <code class="language-plaintext highlighter-rouge">Table+StandardTable</code> and <code class="language-plaintext highlighter-rouge">Table+CommunalTable</code> respectively; however, this isn't transparent to callers. Another solution might be to expose the <code class="language-plaintext highlighter-rouge">Table</code> subtypes and give them private constructors – I just like the simplicity of not exposing the individual types of tables the same way you’re doing <a href="#b37e173371f249d99f80d12d64b2bee2">here</a>, Mark.</p>
</div>
<div class="comment-date">2022-11-29 11:28 UTC</div>
</div>
<div class="comment" id="f925c18ec3a746c393cfae319200baac">
<div class="comment-author"><a href="https://github.com/alexmurari">Alexandre Murari Jr</a> <a href="#f925c18ec3a746c393cfae319200baac">#</a></div>
<div class="comment-content">
<p>
Mark,
<p>
How do you differentiate encapsulation from abstraction?
</p>
<p>
Here's an excerpt from your book Dependency Injection: Principles, Practices, and Patterns.
</p>
<p>
Section: 1.3 - What to inject and what not to inject
Subsection: 1.3.1 - Stable Dependencies
</p>
<blockquote>
"Other examples [of libraries that do not require to be injected] may include specialized libraries that encapsulate alogorithms relevant to your application".
</blockquote>
<p>
In that section, you and Steven were giving examples of stable dependencies that do not require to be injected to keep modularity.
You define a library that "encapsulates an algorithm" as an example.
</p>
<p>
Now, to me, encapsulation is "protecting data integrity", plain and simple.
A class is encapsulated as long as it's impossible or nearly impossible to bring it to an invalid or inconsistent state.
</p>
<p>
Protection of invariants, implementation hiding, bundling data and operations together, pre- and postconditions, Postel's Law all come into play to achieve this goal.
</p>
<p>
Thus, a class, to be "encapsulatable", has to have a state that can be initialized and/or modified by the client code.
</p>
<p>
Now I ask: most of the time when we say that something is encapsulating another, don't we really mean abstracting?
</p>
<p>
Why is it relevant to know that the hypothetical algorithm library protects it's invariants by using the term "encapsulate"?
</p>
<p>
Abstraction, under the light of Robert C. Martin's definition of it, makes much more sense in that context: "a specialized library that abstracts algorithms relevant to your application".
It amplifies the essential (by providing a clear API), but eliminates the irrelevant (by hiding the alogirthm's implementation details).
</p>
<p>
Granted, there is some overlap between encapsulation and abstraction, specially when you bundle data and operations together (rich domain models), but they are not the same thing, you just use one to achieve another sometimes.
</p>
<p>
Would it be correct to say that the .NET Framework encapsulates math algorithms in the System.Math class? Is there any state there to be preserved? They're all static methods and constants.
On the other hand, they're surely eliminating some pretty irrelevant (from a consumer POV) trigonometric algorithms.
</p>
<p>
Thanks.
</p>
</div>
<div class="comment-date">2022-12-04 02:35 UTC</div>
</div>
<div class="comment" id="5af1535933bf4b28bb5c2fc14ce0a01a">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#5af1535933bf4b28bb5c2fc14ce0a01a">#</a></div>
<div class="comment-content">
<p>
Alexandre, thank you for writing. How do I distinguish between abstraction and encapsulation?
</p>
<p>
There's much overlap, to be sure.
</p>
<p>
As I write, my view on encapsulation is influenced by Bertrand Meyer's notion of contract. Likewise, I do use Robert C. Martin's notion of amplifying the essentials while hiding the irrelevant details as a guiding light when discussing abstraction.
</p>
<p>
While these concepts may seem synonymous, they're not quite the same. I can't say that I've spent too much time considering how these two words relate, but shooting from the hip I think that <em>abstraction</em> is a wider concept.
</p>
<p>
You don't need to read much of Robert C. Martin before he'll tell you that the <a href="https://en.wikipedia.org/wiki/Dependency_inversion_principle">Dependency Inversion Principle</a> is an important part of abstraction:
</p>
<blockquote>
<p>
"Abstractions should not depend on details. Details should depend on abstractions."
</p>
<footer><cite>Robert C. Martin, <a href="/ref/appp">Agile Principles, Patterns, and Practices in C#</a></cite></footer>
</blockquote>
<p>
It's possible to implement a code base where this isn't true, even if classes have good encapsulation. You could imagine a domain model that depends on database details like a particular ORM. I've seen plenty of those in my career, although I grant that most of them have had poor encapsulation as well. It is not, however, impossible to imagine such a system with good encapsulation, but suboptimal abstraction.
</p>
<p>
Does it go the other way as well? Can we have good abstraction, but poor encapsulation?
</p>
<p>
An example doesn't come immediately to mind, but as I wrote, it's not an ontology that I've given much thought.
</p>
</div>
<div class="comment-date">2022-12-06 22:11 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Stubs and mocks break encapsulationhttps://blog.ploeh.dk/2022/10/17/stubs-and-mocks-break-encapsulation2022-10-17T08:47:00+00:00Mark Seemann
<div id="post">
<p>
<em>Favour Fakes over dynamic mocks.</em>
</p>
<p>
For a while now, I've <a href="/2019/02/18/from-interaction-based-to-state-based-testing">favoured Fakes over Stubs and Mocks</a>. Using <a href="http://xunitpatterns.com/Fake%20Object.html">Fake Objects</a> over other <a href="https://martinfowler.com/bliki/TestDouble.html">Test Doubles</a> makes test suites more robust. I wrote the code base for my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a> entirely with Fakes and the occasional <a href="http://xunitpatterns.com/Test%20Spy.html">Test Spy</a>, and I rarely had to fix broken tests. No <a href="https://moq.github.io/moq4/">Moq</a>, <a href="https://fakeiteasy.github.io/">FakeItEasy</a>, <a href="https://nsubstitute.github.io/">NSubstitute</a>, nor <a href="https://hibernatingrhinos.com/oss/rhino-mocks">Rhino Mocks</a>. Just hand-written Test Doubles.
</p>
<p>
It recently occurred to me that a way to explain the problem with <a href="http://xunitpatterns.com/Mock%20Object.html">Mocks</a> and <a href="http://xunitpatterns.com/Test%20Stub.html">Stubs</a> is that they break encapsulation.
</p>
<p>
You'll see some examples soon, but first it's important to be explicit about terminology.
</p>
<h3 id="bde25ebecc664e99b529755c3f0829fb">
Terminology <a href="#bde25ebecc664e99b529755c3f0829fb" title="permalink">#</a>
</h3>
<p>
Words like <em>Mocks</em>, <em>Stubs</em>, as well as <em>encapsulation</em>, have different meanings to different people. They've fallen victim to <a href="https://martinfowler.com/bliki/SemanticDiffusion.html">semantic diffusion</a>, if ever they were well-defined to begin with.
</p>
<p>
When I use the words <em>Test Double</em>, <em>Fake</em>, <em>Mock</em>, and <em>Stub</em>, I use them as they are defined in <a href="/ref/xunit-patterns">xUnit Test Patterns</a>. I usually try to avoid the terms <em>Mock</em> and <em>Stub</em> since people use them vaguely and inconsistently. The terms <em>Test Double</em> and <em>Fake</em> fare better.
</p>
<p>
We do need, however, a name for those libraries that generate Test Doubles on the fly. In .NET, they are libraries like Moq, FakeItEasy, and so on, as listed above. Java has <a href="https://site.mockito.org/">Mockito</a>, <a href="https://easymock.org/">EasyMock</a>, <a href="https://jmockit.github.io/">JMockit</a>, and possibly more like that.
</p>
<p>
What do we call such libraries? Most people call them <em>mock libraries</em> or <em>dynamic mock libraries</em>. Perhaps <em>dynamic Test Double library</em> would be more consistent with the <em>xUnit Test Patterns</em> vocabulary, but nobody calls them that. I'll call them <em>dynamic mock libraries</em> to at least emphasise the dynamic, on-the-fly object generation these libraries typically use.
</p>
<p>
Finally, it's important to define <em>encapsulation</em>. This is another concept where people may use the same word and yet mean different things.
</p>
<p>
I base my understanding of encapsulation on <a href="/ref/oosc">Object-Oriented Software Construction</a>. I've tried to distil it in my Pluralsight course <a href="/encapsulation-and-solid">Encapsulation and SOLID</a>.
</p>
<p>
In short, encapsulation denotes the distinction between an object's contract and its implementation. An object should fulfil its contract in such a way that client code doesn't need to know about its implementation.
</p>
<p>
Contracts, according to Meyer, describe three properties of objects:
</p>
<ul>
<li>Preconditions: What client code must fulfil in order to successfully interact with the object.</li>
<li>Invariants: Statements about the object that are always true.</li>
<li>Postconditions: Statements that are guaranteed to be true after a successful interaction between client code and object.</li>
</ul>
<p>
As I'll demonstrate in this article, objects generated by dynamic mock libraries often break their contracts.
</p>
<h3 id="2d05b5926474448589c3953f6796c2a9">
Create-and-read round-trip <a href="#2d05b5926474448589c3953f6796c2a9" title="permalink">#</a>
</h3>
<p>
Consider the <code>IReservationsRepository</code> interface from <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IReservationsRepository</span>
{
Task Create(<span style="color:blue;">int</span> restaurantId, Reservation reservation);
Task<IReadOnlyCollection<Reservation>> ReadReservations(
<span style="color:blue;">int</span> restaurantId, DateTime min, DateTime max);
Task<Reservation?> ReadReservation(<span style="color:blue;">int</span> restaurantId, Guid id);
Task Update(<span style="color:blue;">int</span> restaurantId, Reservation reservation);
Task Delete(<span style="color:blue;">int</span> restaurantId, Guid id);
}</pre>
</p>
<p>
I already discussed some of the contract properties of this interface in <a href="/2021/12/06/the-liskov-substitution-principle-as-a-profunctor">an earlier article</a>. Here, I want to highlight a certain interaction.
</p>
<p>
What is the contract of the <code>Create</code> method?
</p>
<p>
There are a few preconditions:
</p>
<ul>
<li>The client must have a properly initialised <code>IReservationsRepository</code> object.</li>
<li>The client must have a valid <code>restaurantId</code>.</li>
<li>The client must have a valid <code>reservation</code>.</li>
</ul>
<p>
A client that fulfils these preconditions can successfully call and await the <code>Create</code> method. What are the invariants and postconditions?
</p>
<p>
I'll skip the invariants because they aren't relevant to the line of reasoning that I'm pursuing. One postcondition, however, is that the <code>reservation</code> passed to <code>Create</code> must now be 'in' the repository.
</p>
<p>
How does that manifest as part of the object's contract?
</p>
<p>
This implies that a client should be able to retrieve the <code>reservation</code>, either with <code>ReadReservation</code> or <code>ReadReservations</code>. This suggests a kind of property that Scott Wlaschin calls <a href="https://fsharpforfunandprofit.com/posts/property-based-testing-2/">There and back again</a>.
</p>
<p>
Picking <code>ReadReservation</code> for the verification step we now have a property: If client code successfully calls and awaits <code>Create</code> it should be able to use <code>ReadReservation</code> to retrieve the reservation it just saved. That's implied by the <code>IReservationsRepository</code> contract.
</p>
<h3 id="f787d89599a04573a2c9b7afc28b2315">
SQL implementation <a href="#f787d89599a04573a2c9b7afc28b2315" title="permalink">#</a>
</h3>
<p>
The 'real' implementation of <code>IReservationsRepository</code> used in production is an implementation that stores reservations in SQL Server. This class should obey the contract.
</p>
<p>
While it might be possible to write a true property-based test, running hundreds of randomly generated test cases against a real database is going to take time. Instead, I chose to only write a parametrised test:
</p>
<p>
<pre>[Theory]
[InlineData(Grandfather.Id, <span style="color:#a31515;">"2022-06-29 12:00"</span>, <span style="color:#a31515;">"e@example.gov"</span>, <span style="color:#a31515;">"Enigma"</span>, 1)]
[InlineData(Grandfather.Id, <span style="color:#a31515;">"2022-07-27 11:40"</span>, <span style="color:#a31515;">"c@example.com"</span>, <span style="color:#a31515;">"Carlie"</span>, 2)]
[InlineData(2, <span style="color:#a31515;">"2021-09-03 14:32"</span>, <span style="color:#a31515;">"bon@example.edu"</span>, <span style="color:#a31515;">"Jovi"</span>, 4)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task CreateAndReadRoundTrip(
<span style="color:blue;">int</span> restaurantId,
<span style="color:blue;">string</span> at,
<span style="color:blue;">string</span> email,
<span style="color:blue;">string</span> name,
<span style="color:blue;">int</span> quantity)
{
<span style="color:blue;">var</span> expected = <span style="color:blue;">new</span> Reservation(
Guid.NewGuid(),
DateTime.Parse(at, CultureInfo.InvariantCulture),
<span style="color:blue;">new</span> Email(email),
<span style="color:blue;">new</span> Name(name),
quantity);
<span style="color:blue;">var</span> connectionString = ConnectionStrings.Reservations;
<span style="color:blue;">var</span> sut = <span style="color:blue;">new</span> SqlReservationsRepository(connectionString);
<span style="color:blue;">await</span> sut.Create(restaurantId, expected);
<span style="color:blue;">var</span> actual = <span style="color:blue;">await</span> sut.ReadReservation(restaurantId, expected.Id);
Assert.Equal(expected, actual);
}</pre>
</p>
<p>
The part that we care about is the three last lines:
</p>
<p>
<pre><span style="color:blue;">await</span> sut.Create(restaurantId, expected);
<span style="color:blue;">var</span> actual = <span style="color:blue;">await</span> sut.ReadReservation(restaurantId, expected.Id);
Assert.Equal(expected, actual);</pre>
</p>
<p>
First call <code>Create</code> and subsequently <code>ReadReservation</code>. The value created should equal the value retrieved, which is also the case. All tests pass.
</p>
<h3 id="dc7d9e2730364cf8895d73d2adcd37b0">
Fake <a href="#dc7d9e2730364cf8895d73d2adcd37b0" title="permalink">#</a>
</h3>
<p>
The Fake implementation is effectively an in-memory database, so we expect it to also fulfil the same contract. We can test it with an almost identical test:
</p>
<p>
<pre>[Theory]
[InlineData(RestApi.Grandfather.Id, <span style="color:#a31515;">"2022-06-29 12:00"</span>, <span style="color:#a31515;">"e@example.gov"</span>, <span style="color:#a31515;">"Enigma"</span>, 1)]
[InlineData(RestApi.Grandfather.Id, <span style="color:#a31515;">"2022-07-27 11:40"</span>, <span style="color:#a31515;">"c@example.com"</span>, <span style="color:#a31515;">"Carlie"</span>, 2)]
[InlineData(2, <span style="color:#a31515;">"2021-09-03 14:32"</span>, <span style="color:#a31515;">"bon@example.edu"</span>, <span style="color:#a31515;">"Jovi"</span>, 4)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task CreateAndReadRoundTrip(
<span style="color:blue;">int</span> restaurantId,
<span style="color:blue;">string</span> at,
<span style="color:blue;">string</span> email,
<span style="color:blue;">string</span> name,
<span style="color:blue;">int</span> quantity)
{
<span style="color:blue;">var</span> expected = <span style="color:blue;">new</span> Reservation(
Guid.NewGuid(),
DateTime.Parse(at, CultureInfo.InvariantCulture),
<span style="color:blue;">new</span> Email(email),
<span style="color:blue;">new</span> Name(name),
quantity);
<span style="color:blue;">var</span> sut = <span style="color:blue;">new</span> FakeDatabase();
<span style="color:blue;">await</span> sut.Create(restaurantId, expected);
<span style="color:blue;">var</span> actual = <span style="color:blue;">await</span> sut.ReadReservation(restaurantId, expected.Id);
Assert.Equal(expected, actual);
}</pre>
</p>
<p>
The only difference is that the <code>sut</code> is a different class instance. These test cases also all pass.
</p>
<p>
How is <code>FakeDatabase</code> implemented? That's not important, because it obeys the contract. <code>FakeDatabase</code> has good encapsulation, which makes it possible to use it without knowing anything about its internal implementation details. That, after all, is the point of encapsulation.
</p>
<h3 id="a13bc3a894914ce59b1c11ea9702b3f8">
Dynamic mock <a href="#a13bc3a894914ce59b1c11ea9702b3f8" title="permalink">#</a>
</h3>
<p>
How does a dynamic mock fare if subjected to the same test? Let's try with Moq 4.18.2 (and I'm not choosing Moq to single it out - I chose Moq because it's the dynamic mock library I used to love the most):
</p>
<p>
<pre>[Theory]
[InlineData(RestApi.Grandfather.Id, <span style="color:#a31515;">"2022-06-29 12:00"</span>, <span style="color:#a31515;">"e@example.gov"</span>, <span style="color:#a31515;">"Enigma"</span>, 1)]
[InlineData(RestApi.Grandfather.Id, <span style="color:#a31515;">"2022-07-27 11:40"</span>, <span style="color:#a31515;">"c@example.com"</span>, <span style="color:#a31515;">"Carlie"</span>, 2)]
[InlineData(2, <span style="color:#a31515;">"2021-09-03 14:32"</span>, <span style="color:#a31515;">"bon@example.edu"</span>, <span style="color:#a31515;">"Jovi"</span>, 4)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task CreateAndReadRoundTrip(
<span style="color:blue;">int</span> restaurantId,
<span style="color:blue;">string</span> at,
<span style="color:blue;">string</span> email,
<span style="color:blue;">string</span> name,
<span style="color:blue;">int</span> quantity)
{
<span style="color:blue;">var</span> expected = <span style="color:blue;">new</span> Reservation(
Guid.NewGuid(),
DateTime.Parse(at, CultureInfo.InvariantCulture),
<span style="color:blue;">new</span> Email(email),
<span style="color:blue;">new</span> Name(name),
quantity);
<span style="color:blue;">var</span> sut = <span style="color:blue;">new</span> Mock<IReservationsRepository>().Object;
<span style="color:blue;">await</span> sut.Create(restaurantId, expected);
<span style="color:blue;">var</span> actual = <span style="color:blue;">await</span> sut.ReadReservation(restaurantId, expected.Id);
Assert.Equal(expected, actual);
}</pre>
</p>
<p>
If you've worked a little with dynamic mock libraries, you will not be surprised to learn that all three tests fail. Here's one of the failure messages:
</p>
<p>
<pre>Ploeh.Samples.Restaurants.RestApi.Tests.MoqRepositoryTests.CreateAndReadRoundTrip(↩
restaurantId: 1, at: "2022-06-29 12:00", email: "e@example.gov", name: "Enigma", quantity: 1)
Source: <span style="color:blue;">MoqRepositoryTests.cs</span> line 17
Duration: 1 ms
Message:
Assert.Equal() Failure
Expected: Reservation↩
{↩
At = 2022-06-29T12:00:00.0000000,↩
Email = e@example.gov,↩
Id = c9de4f95-3255-4e1f-a1d6-63591b58ff0c,↩
Name = Enigma,↩
Quantity = 1↩
}
Actual: (null)
Stack Trace:
<span style="color:red;">MoqRepositoryTests.CreateAndReadRoundTrip(↩
Int32 restaurantId, String at, String email, String name, Int32 quantity)</span> line 35
--- End of stack trace from previous location where exception was thrown ---</pre>
</p>
<p>
(I've introduced line breaks and indicated them with the ↩ symbol to make the output more readable. I'll do that again later in the article.)
</p>
<p>
Not surprisingly, the return value of <code>Create</code> is null. You typically have to configure a dynamic mock in order to give it any sort of behaviour, and I didn't do that here. In that case, the dynamic mock returns the default value for the return type, which in this case correctly is null.
</p>
<p>
You may object that the above example is unfair. How can a dynamic mock know what to do? You have to configure it. That's the whole point of it.
</p>
<h3 id="307eb992e4c54aae8a2f331c05d9d7e4">
Retrieval without creation <a href="#307eb992e4c54aae8a2f331c05d9d7e4" title="permalink">#</a>
</h3>
<p>
Okay, let's set up the dynamic mock:
</p>
<p>
<pre><span style="color:blue;">var</span> dm = <span style="color:blue;">new</span> Mock<IReservationsRepository>();
dm.Setup(r => r.ReadReservation(restaurantId, expected.Id)).ReturnsAsync(expected);
<span style="color:blue;">var</span> sut = dm.Object;</pre>
</p>
<p>
These are the only lines I've changed from the previous listing of the test, which now passes.
</p>
<p>
A common criticism of dynamic-mock-heavy tests is that they mostly 'just test the mocks', and this is exactly what happens here.
</p>
<p>
You can make that more explicit by deleting the <code>Create</code> method call:
</p>
<p>
<pre><span style="color:blue;">var</span> dm = <span style="color:blue;">new</span> Mock<IReservationsRepository>();
dm.Setup(r => r.ReadReservation(restaurantId, expected.Id)).ReturnsAsync(expected);
<span style="color:blue;">var</span> sut = dm.Object;
<span style="color:blue;">var</span> actual = <span style="color:blue;">await</span> sut.ReadReservation(restaurantId, expected.Id);
Assert.Equal(expected, actual);</pre>
</p>
<p>
The test still passes. Clearly it only tests the dynamic mock.
</p>
<p>
You may, again, demur that this is expected, and it doesn't demonstrate that dynamic mocks break encapsulation. Keep in mind, however, the nature of the contract: Upon successful completion of <code>Create</code>, the reservation is 'in' the repository and can later be retrieved, either with <code>ReadReservation</code> or <code>ReadReservations</code>.
</p>
<p>
This variation of the test no longer calls <code>Create</code>, yet <code>ReadReservation</code> still returns the <code>expected</code> value.
</p>
<p>
Do <code>SqlReservationsRepository</code> or <code>FakeDatabase</code> behave like that? No, they don't.
</p>
<p>
Try to delete the <code>Create</code> call from the test that exercises <code>SqlReservationsRepository</code>:
</p>
<p>
<pre><span style="color:blue;">var</span> sut = <span style="color:blue;">new</span> SqlReservationsRepository(connectionString);
<span style="color:blue;">var</span> actual = <span style="color:blue;">await</span> sut.ReadReservation(restaurantId, expected.Id);
Assert.Equal(expected, actual);</pre>
</p>
<p>
Hardly surprising, the test now fails because <code>actual</code> is null. The same happens if you delete the <code>Create</code> call from the test that exercises <code>FakeDatabase</code>:
</p>
<p>
<pre><span style="color:blue;">var</span> sut = <span style="color:blue;">new</span> FakeDatabase();
<span style="color:blue;">var</span> actual = <span style="color:blue;">await</span> sut.ReadReservation(restaurantId, expected.Id);
Assert.Equal(expected, actual);</pre>
</p>
<p>
Again, the assertion fails because <code>actual</code> is null.
</p>
<p>
The classes <code>SqlReservationsRepository</code> and <code>FakeDatabase</code> behave according to contract, while the dynamic mock doesn't.
</p>
<h3 id="81c7041024ad4c5d99055494c5b0bdf0">
Alternative retrieval <a href="#81c7041024ad4c5d99055494c5b0bdf0" title="permalink">#</a>
</h3>
<p>
There's another way in which the dynamic mock breaks encapsulation. Recall what the contract states: Upon successful completion of <code>Create</code>, the reservation is 'in' the repository and can later be retrieved, either with <code>ReadReservation</code> or <code>ReadReservations</code>.
</p>
<p>
In other words, it should be possible to change the interaction from <code>Create</code> followed by <code>ReadReservation</code> to <code>Create</code> followed by <code>ReadReservations</code>.
</p>
<p>
First, try it with <code>SqlReservationsRepository</code>:
</p>
<p>
<pre><span style="color:blue;">await</span> sut.Create(restaurantId, expected);
<span style="color:blue;">var</span> min = expected.At.Date;
<span style="color:blue;">var</span> max = min.AddDays(1);
<span style="color:blue;">var</span> actual = <span style="color:blue;">await</span> sut.ReadReservations(restaurantId, min, max);
Assert.Contains(expected, actual);</pre>
</p>
<p>
The test still passes, as expected.
</p>
<p>
Second, try the same change with <code>FakeDatabase</code>:
</p>
<p>
<pre><span style="color:blue;">await</span> sut.Create(restaurantId, expected);
<span style="color:blue;">var</span> min = expected.At.Date;
<span style="color:blue;">var</span> max = min.AddDays(1);
<span style="color:blue;">var</span> actual = <span style="color:blue;">await</span> sut.ReadReservations(restaurantId, min, max);
Assert.Contains(expected, actual);</pre>
</p>
<p>
Notice that this is the exact same code as in the <code>SqlReservationsRepository</code> test. That test also passes, as expected.
</p>
<p>
Third, try it with the dynamic mock:
</p>
<p>
<pre><span style="color:blue;">await</span> sut.Create(restaurantId, expected);
<span style="color:blue;">var</span> min = expected.At.Date;
<span style="color:blue;">var</span> max = min.AddDays(1);
<span style="color:blue;">var</span> actual = <span style="color:blue;">await</span> sut.ReadReservations(restaurantId, min, max);
Assert.Contains(expected, actual);</pre>
</p>
<p>
Same code, different <code>sut</code>, and the test fails. The dynamic mock breaks encapsulation. You'll have to go and fix the <code>Setup</code> of it to make the test pass again. That's not the case with <code>SqlReservationsRepository</code> or <code>FakeDatabase</code>.
</p>
<h3 id="09410dcfb758412d991a47f08be885b5">
Dynamic mocks break the SUT, not the tests <a href="#09410dcfb758412d991a47f08be885b5" title="permalink">#</a>
</h3>
<p>
Perhaps you're still not convinced that this is of practical interest. After all, <a href="https://en.wikipedia.org/wiki/Bertrand_Meyer">Bertrand Meyer</a> had limited success getting mainstream adoption of his thought on contract-based programming.
</p>
<p>
That dynamic mocks break encapsulation does, however, have real implications.
</p>
<p>
What if, instead of using <code>FakeDatabase</code>, I'd used dynamic mocks when testing my online restaurant reservation system? A test might have looked like this:
</p>
<p>
<pre>[Theory]
[InlineData(1049, 19, 00, <span style="color:#a31515;">"juliad@example.net"</span>, <span style="color:#a31515;">"Julia Domna"</span>, 5)]
[InlineData(1130, 18, 15, <span style="color:#a31515;">"x@example.com"</span>, <span style="color:#a31515;">"Xenia Ng"</span>, 9)]
[InlineData( 956, 16, 55, <span style="color:#a31515;">"kite@example.edu"</span>, <span style="color:blue;">null</span>, 2)]
[InlineData( 433, 17, 30, <span style="color:#a31515;">"shli@example.org"</span>, <span style="color:#a31515;">"Shanghai Li"</span>, 5)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task PostValidReservationWhenDatabaseIsEmpty(
<span style="color:blue;">int</span> days,
<span style="color:blue;">int</span> hours,
<span style="color:blue;">int</span> minutes,
<span style="color:blue;">string</span> email,
<span style="color:blue;">string</span> name,
<span style="color:blue;">int</span> quantity)
{
<span style="color:blue;">var</span> at = DateTime.Now.Date + <span style="color:blue;">new</span> TimeSpan(days, hours, minutes, 0);
<span style="color:blue;">var</span> dm = <span style="color:blue;">new</span> Mock<IReservationsRepository>();
dm.Setup(r => r.ReadReservations(Grandfather.Id, at.Date, at.Date.AddDays(1).AddTicks(-1)))
.ReturnsAsync(Array.Empty<Reservation>());
<span style="color:blue;">var</span> sut = <span style="color:blue;">new</span> ReservationsController(
<span style="color:blue;">new</span> SystemClock(),
<span style="color:blue;">new</span> InMemoryRestaurantDatabase(Grandfather.Restaurant),
dm.Object);
<span style="color:blue;">var</span> expected = <span style="color:blue;">new</span> Reservation(
<span style="color:blue;">new</span> Guid(<span style="color:#a31515;">"B50DF5B1-F484-4D99-88F9-1915087AF568"</span>),
at,
<span style="color:blue;">new</span> Email(email),
<span style="color:blue;">new</span> Name(name ?? <span style="color:#a31515;">""</span>),
quantity);
<span style="color:blue;">await</span> sut.Post(expected.ToDto());
dm.Verify(r => r.Create(Grandfather.Id, expected));
}</pre>
</p>
<p>
This is yet another riff on the <code>PostValidReservationWhenDatabaseIsEmpty</code> test - the gift that keeps giving. I've previously discussed this test in other articles:
</p>
<ul>
<li><a href="/2020/12/07/branching-tests">Branching tests</a></li>
<li><a href="/2021/01/11/waiting-to-happen">Waiting to happen</a></li>
<li><a href="/2021/01/18/parametrised-test-primitive-obsession-code-smell">Parametrised test primitive obsession code smell</a></li>
<li><a href="/2021/09/27/the-equivalence-contravariant-functor">The Equivalence contravariant functor</a></li>
</ul>
<p>
Here I've replaced the <code>FakeDatabase</code> Test Double with a dynamic mock. (I am, again, using Moq, but keep in mind that the fallout of using a dynamic mock is unrelated to specific libraries.)
</p>
<p>
To go 'full dynamic mock' I should also have replaced <code>SystemClock</code> and <code>InMemoryRestaurantDatabase</code> with dynamic mocks, but that's not necessary to illustrate the point I wish to make.
</p>
<p>
This, and other tests, describe the desired outcome of making a reservation against the REST API. It's an interaction that looks like this:
</p>
<p>
<pre>POST /restaurants/90125/reservations?sig=aco7VV%2Bh5sA3RBtrN8zI8Y9kLKGC60Gm3SioZGosXVE%3D HTTP/1.1
content-type: application/json
{
<span style="color:#2e75b6;">"at"</span>: <span style="color:#a31515;">"2022-12-12T20:00"</span>,
<span style="color:#2e75b6;">"name"</span>: <span style="color:#a31515;">"Pearl Yvonne Gates"</span>,
<span style="color:#2e75b6;">"email"</span>: <span style="color:#a31515;">"pearlygates@example.net"</span>,
<span style="color:#2e75b6;">"quantity"</span>: 4
}
HTTP/1.1 201 Created
Content-Length: 151
Content-Type: application/json; charset=utf-8
Location: [...]/restaurants/90125/reservations/82e550b1690742368ea62d76e103b232?sig=fPY1fSr[...]
{
<span style="color:#2e75b6;">"id"</span>: <span style="color:#a31515;">"82e550b1690742368ea62d76e103b232"</span>,
<span style="color:#2e75b6;">"at"</span>: <span style="color:#a31515;">"2022-12-12T20:00:00.0000000"</span>,
<span style="color:#2e75b6;">"email"</span>: <span style="color:#a31515;">"pearlygates@example.net"</span>,
<span style="color:#2e75b6;">"name"</span>: <span style="color:#a31515;">"Pearl Yvonne Gates"</span>,
<span style="color:#2e75b6;">"quantity"</span>: 4
}</pre>
</p>
<p>
What's of interest here is that the response includes the JSON representation of the resource that the interaction created. It's mostly a copy of the posted data, but enriched with a server-generated ID.
</p>
<p>
The code responsible for the database interaction looks like this:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">async</span> Task<ActionResult> TryCreate(Restaurant restaurant, Reservation reservation)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> scope = <span style="color:blue;">new</span> TransactionScope(TransactionScopeAsyncFlowOption.Enabled);
<span style="color:blue;">var</span> reservations = <span style="color:blue;">await</span> Repository
.ReadReservations(restaurant.Id, reservation.At)
.ConfigureAwait(<span style="color:blue;">false</span>);
<span style="color:blue;">var</span> now = Clock.GetCurrentDateTime();
<span style="color:blue;">if</span> (!restaurant.MaitreD.WillAccept(now, reservations, reservation))
<span style="color:blue;">return</span> NoTables500InternalServerError();
<span style="color:blue;">await</span> Repository.Create(restaurant.Id, reservation).ConfigureAwait(<span style="color:blue;">false</span>);
scope.Complete();
<span style="color:blue;">return</span> Reservation201Created(restaurant.Id, reservation);
}</pre>
</p>
<p>
The last line of code creates a <code>201 Created</code> response with the <code>reservation</code> as content. Not shown in this snippet is the origin of the <code>reservation</code> parameter, but it's <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">the input JSON document parsed to a <code>Reservation</code> object</a>. Each <code>Reservation</code> object has <a href="/2022/09/12/coalescing-dtos">an ID that the server creates when it's not supplied by the client</a>.
</p>
<p>
The above <code>TryCreate</code> helper method contains all the database interaction code related to creating a new reservation. It first calls <code>ReadReservations</code> to retrieve the existing reservations. Subsequently, it calls <code>Create</code> if it decides to accept the reservation. The <code>ReadReservations</code> method is actually an <code>internal</code> extension method:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">static</span> Task<IReadOnlyCollection<Reservation>> ReadReservations(
<span style="color:blue;">this</span> IReservationsRepository repository,
<span style="color:blue;">int</span> restaurantId,
DateTime date)
{
<span style="color:blue;">var</span> min = date.Date;
<span style="color:blue;">var</span> max = min.AddDays(1).AddTicks(-1);
<span style="color:blue;">return</span> repository.ReadReservations(restaurantId, min, max);
}</pre>
</p>
<p>
Notice how the dynamic-mock-based test has to replicate this <code>internal</code> implementation detail to the <a href="https://docs.microsoft.com/dotnet/api/system.datetime.ticks">tick</a>. If I ever decide to change this just one tick, the test is going to fail. That's already bad enough (and something that <code>FakeDatabase</code> gracefully handles), but not what I'm driving towards.
</p>
<p>
At the moment the <code>TryCreate</code> method echoes back the <code>reservation</code>. What if, however, you instead want to query the database and return the record that you got from the database? In this particular case, there's no reason to do that, but perhaps in other cases, something happens in the data layer that either enriches or normalises the data. So you make an innocuous change:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">async</span> Task<ActionResult> TryCreate(Restaurant restaurant, Reservation reservation)
{
<span style="color:blue;">using</span> <span style="color:blue;">var</span> scope = <span style="color:blue;">new</span> TransactionScope(TransactionScopeAsyncFlowOption.Enabled);
<span style="color:blue;">var</span> reservations = <span style="color:blue;">await</span> Repository
.ReadReservations(restaurant.Id, reservation.At)
.ConfigureAwait(<span style="color:blue;">false</span>);
<span style="color:blue;">var</span> now = Clock.GetCurrentDateTime();
<span style="color:blue;">if</span> (!restaurant.MaitreD.WillAccept(now, reservations, reservation))
<span style="color:blue;">return</span> NoTables500InternalServerError();
<span style="color:blue;">await</span> Repository.Create(restaurant.Id, reservation).ConfigureAwait(<span style="color:blue;">false</span>);
<span style="color:blue;">var</span> storedReservation = <span style="color:blue;">await</span> Repository
.ReadReservation(restaurant.Id, reservation.Id)
.ConfigureAwait(<span style="color:blue;">false</span>);
scope.Complete();
<span style="color:blue;">return</span> Reservation201Created(restaurant.Id, storedReservation!);
}</pre>
</p>
<p>
Now, instead of echoing back <code>reservation</code>, the method calls <code>ReadReservation</code> to retrieve the (possibly enriched or normalised) <code>storedReservation</code> and returns that value. Since this value could, conceivably, be null, for now the method uses the <code>!</code> operator to insist that this is not the case. A new test case might be warranted to cover the scenario where the query returns null.
</p>
<p>
This is perhaps a little less efficient because it implies an extra round-trip to the database, but <em>it shouldn't change the behaviour of the system!</em>
</p>
<p>
But when you run the test suite, that <code>PostValidReservationWhenDatabaseIsEmpty</code> test fails:
</p>
<p>
<pre>Ploeh.Samples.Restaurants.RestApi.Tests.ReservationsTests.PostValidReservationWhenDatabaseIsEmpty(↩
days: 433, hours: 17, minutes: 30, email: "shli@example.org", name: "Shanghai Li", quantity: 5)↩
[FAIL]
System.NullReferenceException : Object reference not set to an instance of an object.
Stack Trace:
[...]\Restaurant.RestApi\ReservationsController.cs(94,0): at↩
[...].RestApi.ReservationsController.Reservation201Created↩
(Int32 restaurantId, Reservation r)
[...]\Restaurant.RestApi\ReservationsController.cs(79,0): at↩
[...].RestApi.ReservationsController.TryCreate↩
(Restaurant restaurant, Reservation reservation)
[...]\Restaurant.RestApi\ReservationsController.cs(57,0): at↩
[...].RestApi.ReservationsController.Post↩
(Int32 restaurantId, ReservationDto dto)
[...]\Restaurant.RestApi.Tests\ReservationsTests.cs(73,0): at↩
[...].RestApi.Tests.ReservationsTests.PostValidReservationWhenDatabaseIsEmpty↩
(Int32 days, Int32 hours, Int32 minutes, String email, String name, Int32 quantity)
--- End of stack trace from previous location where exception was thrown ---</pre>
</p>
<p>
Oh, the dreaded <code>NullReferenceException</code>! This happens because <code>ReadReservation</code> returns null, since the dynamic mock isn't configured.
</p>
<p>
The typical reaction that most people have is: <em>Oh no, the tests broke!</em>
</p>
<p>
I think, though, that this is the wrong perspective. The dynamic mock broke the System Under Test (SUT) because it passed an implementation of <code>IReservationsRepository</code> that breaks the contract. The test didn't 'break', because it was never correct from the outset.
</p>
<h3 id="f720608a0e3b45aa88f1831a4bb1f4e8">
Shotgun surgery <a href="#f720608a0e3b45aa88f1831a4bb1f4e8" title="permalink">#</a>
</h3>
<p>
When a test code base uses dynamic mocks, it tends to do so pervasively. Most tests create one or more dynamic mocks that they pass to their SUT. Most of these dynamic mocks break encapsulation, so when you refactor, the dynamic mocks break the SUT.
</p>
<p>
You'll typically need to revisit and 'fix' all the failing tests to accommodate the refactoring:
</p>
<p>
<pre>[Theory]
[InlineData(1049, 19, 00, <span style="color:#a31515;">"juliad@example.net"</span>, <span style="color:#a31515;">"Julia Domna"</span>, 5)]
[InlineData(1130, 18, 15, <span style="color:#a31515;">"x@example.com"</span>, <span style="color:#a31515;">"Xenia Ng"</span>, 9)]
[InlineData( 956, 16, 55, <span style="color:#a31515;">"kite@example.edu"</span>, <span style="color:blue;">null</span>, 2)]
[InlineData( 433, 17, 30, <span style="color:#a31515;">"shli@example.org"</span>, <span style="color:#a31515;">"Shanghai Li"</span>, 5)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task PostValidReservationWhenDatabaseIsEmpty(
<span style="color:blue;">int</span> days,
<span style="color:blue;">int</span> hours,
<span style="color:blue;">int</span> minutes,
<span style="color:blue;">string</span> email,
<span style="color:blue;">string</span> name,
<span style="color:blue;">int</span> quantity)
{
<span style="color:blue;">var</span> at = DateTime.Now.Date + <span style="color:blue;">new</span> TimeSpan(days, hours, minutes, 0);
<span style="color:blue;">var</span> expected = <span style="color:blue;">new</span> Reservation(
<span style="color:blue;">new</span> Guid(<span style="color:#a31515;">"B50DF5B1-F484-4D99-88F9-1915087AF568"</span>),
at,
<span style="color:blue;">new</span> Email(email),
<span style="color:blue;">new</span> Name(name ?? <span style="color:#a31515;">""</span>),
quantity);
<span style="color:blue;">var</span> dm = <span style="color:blue;">new</span> Mock<IReservationsRepository>();
dm.Setup(r => r.ReadReservations(Grandfather.Id, at.Date, at.Date.AddDays(1).AddTicks(-1)))
.ReturnsAsync(Array.Empty<Reservation>());
dm.Setup(r => r.ReadReservation(Grandfather.Id, expected.Id)).ReturnsAsync(expected);
<span style="color:blue;">var</span> sut = <span style="color:blue;">new</span> ReservationsController(
<span style="color:blue;">new</span> SystemClock(),
<span style="color:blue;">new</span> InMemoryRestaurantDatabase(Grandfather.Restaurant),
dm.Object);
<span style="color:blue;">await</span> sut.Post(expected.ToDto());
dm.Verify(r => r.Create(Grandfather.Id, expected));
}</pre>
</p>
<p>
The test now passes (until the next change in the SUT), but notice how top-heavy it becomes. That's a test code smell when using dynamic mocks. Everything has to happen in <a href="/2013/06/24/a-heuristic-for-formatting-code-according-to-the-aaa-pattern">the Arrange phase</a>.
</p>
<p>
You typically have many such tests that you need to edit. The name of this antipattern is <a href="https://en.wikipedia.org/wiki/Shotgun_surgery">Shotgun Surgery</a>.
</p>
<p>
The implication is that <em>refactoring</em> by definition is impossible:
</p>
<blockquote>
<p>
"to refactor, the essential precondition is [...] solid tests"
</p>
<footer><cite>Martin Fowler, <a href="/ref/refactoring">Refactoring</a></cite></footer>
</blockquote>
<p>
You need tests that don't break when you refactor. When you use dynamic mocks, tests tend to fail whenever you make changes in SUTs. Even though you have tests, they don't enable refactoring.
</p>
<p>
To add spite to injury, <a href="/2013/04/02/why-trust-tests">every time you edit existing tests, they become less trustworthy</a>.
</p>
<p>
To address these problems, use Fakes instead of Mocks and Stubs. With the <code>FakeDatabase</code> the entire sample test suite for the online restaurant reservation system gracefully handles the change described above. No tests fail.
</p>
<h3 id="8b30b068b4a4412ca3a797de2760b2be">
Spies <a href="#8b30b068b4a4412ca3a797de2760b2be" title="permalink">#</a>
</h3>
<p>
If you spelunk the test code base for the book, you may also find this Test Double:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">SpyPostOffice</span> :
Collection<SpyPostOffice.Observation>, IPostOffice
{
<span style="color:blue;">public</span> Task EmailReservationCreated(
<span style="color:blue;">int</span> restaurantId,
Reservation reservation)
{
Add(<span style="color:blue;">new</span> Observation(Event.Created, restaurantId, reservation));
<span style="color:blue;">return</span> Task.CompletedTask;
}
<span style="color:blue;">public</span> Task EmailReservationDeleted(
<span style="color:blue;">int</span> restaurantId,
Reservation reservation)
{
Add(<span style="color:blue;">new</span> Observation(Event.Deleted, restaurantId, reservation));
<span style="color:blue;">return</span> Task.CompletedTask;
}
<span style="color:blue;">public</span> Task EmailReservationUpdating(
<span style="color:blue;">int</span> restaurantId,
Reservation reservation)
{
Add(<span style="color:blue;">new</span> Observation(Event.Updating, restaurantId, reservation));
<span style="color:blue;">return</span> Task.CompletedTask;
}
<span style="color:blue;">public</span> Task EmailReservationUpdated(
<span style="color:blue;">int</span> restaurantId,
Reservation reservation)
{
Add(<span style="color:blue;">new</span> Observation(Event.Updated, restaurantId, reservation));
<span style="color:blue;">return</span> Task.CompletedTask;
}
<span style="color:blue;">internal</span> <span style="color:blue;">enum</span> <span style="color:#2b91af;">Event</span>
{
Created = 0,
Updating,
Updated,
Deleted
}
<span style="color:blue;">internal</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Observation</span>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">Observation</span>(
Event @event,
<span style="color:blue;">int</span> restaurantId,
Reservation reservation)
{
Event = @event;
RestaurantId = restaurantId;
Reservation = reservation;
}
<span style="color:blue;">public</span> Event Event { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">int</span> RestaurantId { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> Reservation Reservation { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">bool</span> Equals(<span style="color:blue;">object</span>? obj)
{
<span style="color:blue;">return</span> obj <span style="color:blue;">is</span> Observation observation &&
Event == observation.Event &&
RestaurantId == observation.RestaurantId &&
EqualityComparer<Reservation>.Default.Equals(Reservation, observation.Reservation);
}
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">int</span> GetHashCode()
{
<span style="color:blue;">return</span> HashCode.Combine(Event, RestaurantId, Reservation);
}
}
}</pre>
</p>
<p>
As you can see, I've chosen to name this class with the <em>Spy</em> prefix, indicating that this is a Test Spy rather than a Fake Object. A Spy is a Test Double whose main purpose is to observe and record interactions. Does that break or realise encapsulation?
</p>
<p>
While I favour Fakes whenever possible, consider the interface that <code>SpyPostOffice</code> implements:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IPostOffice</span>
{
Task EmailReservationCreated(<span style="color:blue;">int</span> restaurantId, Reservation reservation);
Task EmailReservationDeleted(<span style="color:blue;">int</span> restaurantId, Reservation reservation);
Task EmailReservationUpdating(<span style="color:blue;">int</span> restaurantId, Reservation reservation);
Task EmailReservationUpdated(<span style="color:blue;">int</span> restaurantId, Reservation reservation);
}</pre>
</p>
<p>
This interface consist entirely of <a href="https://en.wikipedia.org/wiki/Command%E2%80%93query_separation">Commands</a>. There's no way to query the interface to examine the state of the object. Thus, you can't check that postconditions hold <em>exclusively via the interface</em>. Instead, you need an additional <a href="http://xunitpatterns.com/Test%20Spy.html">retrieval interface</a> to examine the posterior state of the object. The <code>SpyPostOffice</code> concrete class exposes such an interface.
</p>
<p>
In a sense, you can view <code>SpyPostOffice</code> as an in-memory message sink. It fulfils the contract.
</p>
<h3 id="b36b228b2b4f4187918c8c224c9aed04">
Concurrency <a href="#b36b228b2b4f4187918c8c224c9aed04" title="permalink">#</a>
</h3>
<p>
Perhaps you're still not convinced. You may argue, for example, that the (partial) contract that I stated is naive. Consider, again, the implications expressed as code:
</p>
<p>
<pre><span style="color:blue;">await</span> sut.Create(restaurantId, expected);
<span style="color:blue;">var</span> actual = <span style="color:blue;">await</span> sut.ReadReservation(restaurantId, expected.Id);
Assert.Equal(expected, actual);</pre>
</p>
<p>
You may argue that in the face of concurrency, another thread or process could be making changes to the reservation <em>after</em> <code>Create</code>, but <em>before</em> <code>ReadReservation</code>. Thus, you may argue, the contract I've stipulated is false. In a real system, we can't expect that to be the case.
</p>
<p>
I agree.
</p>
<p>
Concurrency makes things much harder. Even in that light, I think the above line of reasoning is appropriate, for two reasons.
</p>
<p>
First, I chose to model <code>IReservationsRepository</code> like I did because I didn't expect high contention on individual reservations. In other words, I don't expect two or more concurrent processes to attempt to modify <em>the same reservation</em> at the same time. Thus, I found it appropriate to model the Repository as
</p>
<blockquote>
<p>
"a collection-like interface for accessing domain objects."
</p>
<footer><cite>Edward Hieatt and Rob Mee, in Martin Fowler, <a href="/ref/peaa">Patterns of Enterprise Application Architecture</a>, <em>Repository</em> pattern</cite></footer>
</blockquote>
<p>
A <em>collection-like interface</em> implies both data retrieval and collection manipulation members. In low-contention scenarios like the reservation system, this turns out to be a useful model. As the aphorism goes, <a href="https://en.wikipedia.org/wiki/All_models_are_wrong">all models are wrong, but some models are useful</a>. Treating <code>IReservationsRepository</code> as a collection accessed in a non-concurrent manner turned out to be useful in this code base.
</p>
<p>
Had I been more worried about data contention, a move towards <a href="https://martinfowler.com/bliki/CQRS.html">CQRS</a> seems promising. This leads to another object model, with different contracts.
</p>
<p>
Second, even in the face of concurrency, most unit test cases are implicitly running on a single thread. While they may run in parallel, each unit test exercises the SUT on a single thread. This implies that reads and writes against Test Doubles are serialised.
</p>
<p>
Even if concurrency is a real concern, you'd still expect that <em>if only one thread is manipulating the Repository object</em>, then what you <code>Create</code> you should be able to retrieve. The contract may be a little looser, but it'd still be a violation of the <a href="https://en.wikipedia.org/wiki/Principle_of_least_astonishment">principle of least surprise</a> if it was any different.
</p>
<h3 id="7db1c826945e417db7eacf65ac5207bb">
Conclusion <a href="#7db1c826945e417db7eacf65ac5207bb" title="permalink">#</a>
</h3>
<p>
In object-oriented programming, encapsulation is the notion of separating the affordances of an object from its implementation details. I find it most practical to think about this in terms of contracts, which again can be subdivided into sets of preconditions, invariants, and postconditions.
</p>
<p>
Polymorphic objects (like interfaces and base classes) come with contracts as well. When you replace 'real' implementations with Test Doubles, the Test Doubles should also fulfil the contracts. Fake objects do that; Test Spies may also fit that description.
</p>
<p>
When Test Doubles obey their contracts, you can refactor your SUT without breaking your test suite.
</p>
<p>
By default, however, dynamic mocks break encapsulation because they don't fulfil the objects' contracts. This leads to fragile tests.
</p>
<p>
Favour Fakes over dynamic mocks. You can read more about this way to write tests by following many of the links in this article, or by reading my book <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="6b55998427954772bed6da4ed8be8694">
<div class="comment-author">Matthew Wiemer <a href="#6b55998427954772bed6da4ed8be8694">#</a></div>
<div class="comment-content">
<p>
Excellent article exploring the nuances of encapsulation as it relates to testing. That said, the examples here left me with one big question: what exactly is covered by the tests using `FakeDatabase`?
</p>
<p>This line in particular is confusing me (as to its practical use in a "real-world" setting): `var sut = new FakeDatabase();`</p>
<p>
How can I claim to have tested the real system's implementation when the "system under test" is, in this approach, explicitly _not_ my real system? It appears the same criticism of dynamic mocks surfaces: "you're only testing the fake database". Does this approach align with any claim you are testing the "real database"?
</p>
<p>
When testing the data-layer, I have historically written (heavier) tests that integrate with a real database to exercise a system's data-layer (as you describe with `SqlReservationsRepository`). I find myself reaching for dynamic mocks in the context of exercising an application's domain layer -- where the data-layer is a dependency providing indirect input/output. Does this use of mocks violate encapsulation in the way this article describes? I _think_ not, because in that case a dynamic mock is used to represent states that are valid "according to the contract", but I'm hoping you could shed a bit more light on the topic. Am I putting the pieces together correctly?
</p>
<p>
Rephrasing the question using your Reservations example code, I would typically inject `IReservationsRepository` into `MaitreD` (which you opt not to do) and outline the posssible database return values (or commands) using dynamic mocks in a test suite of `MaitreD`. What drawbacks, if any, would that approach lead to with respect to encapsulation and test fragility?
</p>
</div>
<div class="comment-date">2022-11-02 20:11 UTC</div>
</div>
<div class="comment" id="98ef29aec08e408382205b49bea697b6">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#98ef29aec08e408382205b49bea697b6">#</a></div>
<div class="comment-content">
<p>
Matthew, thank you for writing. I apologise if the article is unclear about this, but nowhere in the <em>real</em> code base do I have a test of <code>FakeDatabase</code>. I only wrote the tests that exercise the Test Doubles to illustrate the point I was trying to make. These tests only exist for the benefit of this article.
</p>
<p>
The first <code>CreateAndReadRoundTrip</code> test in the article shows a real integration test. The System Under Test (SUT) is the <code>SqlReservationsRepository</code> class, which is part of the production code - not a Test Double.
</p>
<p>
That class implements the <code>IReservationsRepository</code> interface. The point I was trying to make is that the <code>CreateAndReadRoundTrip</code> test already exercises a particular subset of the contract of the interface. Thus, if one replaces one implementation of the interface with another implementation, according to the <a href="https://en.wikipedia.org/wiki/Liskov_substitution_principle">Liskov Substitution Principle</a> (LSP) the test should still pass.
</p>
<p>
This is true for <code>FakeDatabase</code>. While the behaviour is different (it doesn't persist data), it still fulfils the contract. Dynamic mocks, on the other hand, don't automatically follow the LSP. Unless one is careful and explicit, dynamic mocks tend to weaken postconditions. For example, a dynamic mock doesn't automatically return the added reservation when you call <code>ReadReservation</code>.
</p>
<p>
This is an essential flaw of dynamic mock objects that is independent of where you use them. My article <a href="#09410dcfb758412d991a47f08be885b5">already describes</a> how a fairly innocuous change in the production code will cause a dynamic mock to break the test.
</p>
<p>
I no longer inject dependencies into domain models, since doing so <a href="/2017/01/27/from-dependency-injection-to-dependency-rejection">makes the domain model impure</a>. Even if I did, however, I'd still have the same problem with dynamic mocks breaking encapsulation.
</p>
</div>
<div class="comment-date">2022-11-04 7:06 UTC</div>
</div>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Refactoring a saga from the State pattern to the State monadhttps://blog.ploeh.dk/2022/10/10/refactoring-a-saga-from-the-state-pattern-to-the-state-monad2022-10-10T06:27:00+00:00Mark Seemann
<div id="post">
<p>
<em>A slightly less unrealistic example in C#.</em>
</p>
<p>
This article is one of the examples that I promised in the earlier article <a href="/2022/09/05/the-state-pattern-and-the-state-monad">The State pattern and the State monad</a>. That article examines the relationship between the <a href="https://en.wikipedia.org/wiki/State_pattern">State design pattern</a> and the <a href="/2022/06/20/the-state-monad">State monad</a>. It's deliberately abstract, so one or more examples are in order.
</p>
<p>
In the <a href="/2022/09/26/refactoring-the-tcp-state-pattern-example-to-pure-functions">previous example</a> you saw how to refactor <a href="/ref/dp">Design Patterns</a>' <em>TCP connection</em> example. That example is, unfortunately, hardly illuminating due to its nature, so a second example is warranted.
</p>
<p>
This second example shows how to refactor a stateful asynchronous message handler from the State pattern to the State monad.
</p>
<h3 id="20b0274c10b14a84a07ebd2086ab1fa0">
Shipping policy <a href="#20b0274c10b14a84a07ebd2086ab1fa0" title="permalink">#</a>
</h3>
<p>
Instead of inventing an example from scratch, I decided to use <a href="https://docs.particular.net/tutorials/nservicebus-sagas/1-saga-basics/">an NServiceBus saga tutorial</a> as a foundation. Read on even if you don't know <a href="https://particular.net/nservicebus">NServiceBus</a>. You don't have to know anything about NServiceBus in order to follow along. I just thought that I'd embed the example code in a context that actually executes and does something, instead of faking it with a bunch of unit tests. Hopefully this will help make the example a bit more realistic and relatable.
</p>
<p>
The example is a simple demo of asynchronous message handling. In a web store shipping department, you should only ship an item once you've received the order and a billing confirmation. When working with asynchronous messaging, you can't, however, rely on message ordering, so perhaps the <code>OrderBilled</code> message arrives before the <code>OrderPlaced</code> message, and sometimes it's the other way around.
</p>
<p>
<img src="/content/binary/shipping-policy-state-diagram.png" alt="Shipping policy state diagram.">
</p>
<p>
Only when you've received both messages may you ship the item.
</p>
<p>
It's a simple workflow, and you don't <em>really</em> need the State pattern. So much is clear from the sample code implementation:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ShippingPolicy</span> : Saga<ShippingPolicyData>,
IAmStartedByMessages<OrderBilled>,
IAmStartedByMessages<OrderPlaced>
{
<span style="color:blue;">static</span> ILog log = LogManager.GetLogger<ShippingPolicy>();
<span style="color:blue;">protected</span> <span style="color:blue;">override</span> <span style="color:blue;">void</span> ConfigureHowToFindSaga(SagaPropertyMapper<ShippingPolicyData> mapper)
{
mapper.MapSaga(sagaData => sagaData.OrderId)
.ToMessage<OrderPlaced>(message => message.OrderId)
.ToMessage<OrderBilled>(message => message.OrderId);
}
<span style="color:blue;">public</span> Task Handle(OrderPlaced message, IMessageHandlerContext context)
{
log.Info(<span style="color:#a31515;">$"OrderPlaced message received."</span>);
Data.IsOrderPlaced = <span style="color:blue;">true</span>;
<span style="color:blue;">return</span> ProcessOrder(context);
}
<span style="color:blue;">public</span> Task Handle(OrderBilled message, IMessageHandlerContext context)
{
log.Info(<span style="color:#a31515;">$"OrderBilled message received."</span>);
Data.IsOrderBilled = <span style="color:blue;">true</span>;
<span style="color:blue;">return</span> ProcessOrder(context);
}
<span style="color:blue;">private</span> <span style="color:blue;">async</span> Task ProcessOrder(IMessageHandlerContext context)
{
<span style="color:blue;">if</span> (Data.IsOrderPlaced && Data.IsOrderBilled)
{
<span style="color:blue;">await</span> context.SendLocal(<span style="color:blue;">new</span> ShipOrder() { OrderId = Data.OrderId });
MarkAsComplete();
}
}
}</pre>
</p>
<p>
I don't expect you to be familiar with the NServiceBus API, so don't worry about the base class, the interfaces, or the <code>ConfigureHowToFindSaga</code> method. What you need to know is that this class handles two types of messages: <code>OrderPlaced</code> and <code>OrderBilled</code>. What the base class and the framework does is handling message correlation, hydration and dehydration, and so on.
</p>
<p>
For the purposes of this demo, all you need to know about the <code>context</code> object is that it enables you to send and publish messages. The code sample uses <code>context.SendLocal</code> to send a new <code>ShipOrder</code> Command.
</p>
<p>
Messages arrive asynchronously and conceptually with long wait times between them. You can't just rely on in-memory object state because a <code>ShippingPolicy</code> instance may receive one message and then risk that the server it's running on shuts down before the next message arrives. The NServiceBus framework handles message correlation and hydration and dehydration of state data. The latter is modelled by the <code>ShippingPolicyData</code> class:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ShippingPolicyData</span> : ContainSagaData
{
<span style="color:blue;">public</span> <span style="color:blue;">string</span> OrderId { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">bool</span> IsOrderPlaced { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">bool</span> IsOrderBilled { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
}</pre>
</p>
<p>
Notice that the above sample code inspects and manipulates the <code>Data</code> property defined by the <code>Saga<ShippingPolicyData></code> base class.
</p>
<p>
When the <code>ShippingPolicy</code> methods are called by the NServiceBus framework, the <code>Data</code> is automatically populated. When you modify the <code>Data</code>, the state data is automatically persisted when the message handler shuts down to wait for the next message.
</p>
<h3 id="0edda65734e64a6a99a865c43d6de5be">
Characterisation tests <a href="#0edda65734e64a6a99a865c43d6de5be" title="permalink">#</a>
</h3>
<p>
While you can draw an explicit state diagram like the one above, the sample code doesn't explicitly model the various states as objects. Instead, it relies on reading and writing two Boolean values.
</p>
<p>
There's nothing wrong with this implementation. It's the simplest thing that could possibly work, so why make it more complicated?
</p>
<p>
In this article, I <em>am</em> going to make it more complicated. First, I'm going to refactor the above sample code to use the State design pattern, and then I'm going to refactor that code to use the State monad. From a perspective of maintainability, this isn't warranted, but on the other hand, I hope it's educational. The sample code is just complex enough to showcase the structures of the State pattern and the State monad, yet simple enough that the implementation logic doesn't get in the way.
</p>
<p>
Simplicity can be deceiving, however, and no refactoring is without risk.
</p>
<blockquote>
<p>
"to refactor, the essential precondition is [...] solid tests"
</p>
<footer><cite><a href="https://martinfowler.com/">Martin Fowler</a>, <a href="/ref/refactoring">Refactoring</a></cite></footer>
</blockquote>
<p>
I found it safest to first add a few <a href="https://en.wikipedia.org/wiki/Characterization_test">Characterisation Tests</a> to make sure I didn't introduce any errors as I changed the code. It did catch a few copy-paste goofs that I made, so adding tests turned out to be a good idea.
</p>
<p>
Testing NServiceBus message handlers isn't too hard. All the tests I wrote look similar, so one should be enough to give you an idea.
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:#a31515;">"1337"</span>)]
[InlineData(<span style="color:#a31515;">"baz"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task OrderPlacedAndBilled(<span style="color:blue;">string</span> orderId)
{
<span style="color:blue;">var</span> sut =
<span style="color:blue;">new</span> ShippingPolicy
{
Data = <span style="color:blue;">new</span> ShippingPolicyData { OrderId = orderId }
};
<span style="color:blue;">var</span> ctx = <span style="color:blue;">new</span> TestableMessageHandlerContext();
<span style="color:blue;">await</span> sut.Handle(<span style="color:blue;">new</span> OrderPlaced { OrderId = orderId }, ctx);
<span style="color:blue;">await</span> sut.Handle(<span style="color:blue;">new</span> OrderBilled { OrderId = orderId }, ctx);
Assert.True(sut.Completed);
<span style="color:blue;">var</span> msg = Assert.Single(ctx.SentMessages.Containing<ShipOrder>());
Assert.Equal(orderId, msg.Message.OrderId);
}</pre>
</p>
<p>
The tests use <a href="https://xunit.net/">xUnit.net</a> 2.4.2. When I downloaded the <a href="https://docs.particular.net/tutorials/nservicebus-sagas/1-saga-basics/">NServiceBus saga sample code</a> it targeted .NET Framework 4.8, and I didn't bother to change the version.
</p>
<p>
While the NServiceBus framework will automatically hydrate and populate <code>Data</code>, in a unit test you have to remember to explicitly populate it. The <code>TestableMessageHandlerContext</code> class is a <a href="http://xunitpatterns.com/Test%20Spy.html">Test Spy</a> that is part of <a href="https://docs.particular.net/nservicebus/testing/">NServiceBus testing API</a>.
</p>
<p>
You'd think I was paid by <a href="https://particular.net/">Particular Software</a> to write this article, but I'm not. All this is really just the introduction. You're excused if you've forgotten the topic of this article, but my goal is to show a State pattern example. Only now can we begin in earnest.
</p>
<h3 id="7b1878c85e134ac5a24e9ce13ba064d2">
State pattern implementation <a href="#7b1878c85e134ac5a24e9ce13ba064d2" title="permalink">#</a>
</h3>
<p>
Refactoring to the State pattern, I chose to let the <code>ShippingPolicy</code> class fill the role of the pattern's <code>Context</code>. Instead of a base class with virtual method, I used an interface to define the <code>State</code> object, as that's more <a href="/2015/08/03/idiomatic-or-idiosyncratic">Idiomatic</a> in C#:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IShippingState</span>
{
Task OrderPlaced(OrderPlaced message, IMessageHandlerContext context, ShippingPolicy policy);
Task OrderBilled(OrderBilled message, IMessageHandlerContext context, ShippingPolicy policy);
}</pre>
</p>
<p>
The State pattern only shows examples where the <code>State</code> methods take a single argument: The <code>Context</code>. In this case, that's the <code>ShippingPolicy</code>. Careful! There's also a parameter called <code>context</code>! That's the NServiceBus context, and is an artefact of the original example. The two other parameters, <code>message</code> and <code>context</code>, are run-time values passed on from the <code>ShippingPolicy</code>'s <code>Handle</code> methods:
</p>
<p>
<pre><span style="color:blue;">public</span> IShippingState State { <span style="color:blue;">get</span>; <span style="color:blue;">internal</span> <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task Handle(OrderPlaced message, IMessageHandlerContext context)
{
log.Info(<span style="color:#a31515;">$"OrderPlaced message received."</span>);
Hydrate();
<span style="color:blue;">await</span> State.OrderPlaced(message, context, <span style="color:blue;">this</span>);
Dehydrate();
}
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task Handle(OrderBilled message, IMessageHandlerContext context)
{
log.Info(<span style="color:#a31515;">$"OrderBilled message received."</span>);
Hydrate();
<span style="color:blue;">await</span> State.OrderBilled(message, context, <span style="color:blue;">this</span>);
Dehydrate();
}</pre>
</p>
<p>
The <code>Hydrate</code> method isn't part of the State pattern, but finds an appropriate state based on <code>Data</code>:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">void</span> Hydrate()
{
<span style="color:blue;">if</span> (!Data.IsOrderPlaced && !Data.IsOrderBilled)
State = InitialShippingState.Instance;
<span style="color:blue;">else</span> <span style="color:blue;">if</span> (Data.IsOrderPlaced && !Data.IsOrderBilled)
State = AwaitingBillingState.Instance;
<span style="color:blue;">else</span> <span style="color:blue;">if</span> (!Data.IsOrderPlaced && Data.IsOrderBilled)
State = AwaitingPlacementState.Instance;
<span style="color:blue;">else</span>
State = CompletedShippingState.Instance;
}</pre>
</p>
<p>
In more recent versions of C# you'd be able to use more succinct pattern matching, but since this code base is on .NET Framework 4.8 I'm constrained to C# 7.3 and this is as good as I cared to make it. It's not important to the topic of the State pattern, but I'm showing it in case you where wondering. It's typical that you need to translate between data that exists in the 'external world' and your object-oriented, polymorphic code, since <a href="/2011/05/31/AttheBoundaries,ApplicationsareNotObject-Oriented">at the boundaries, applications aren't object-oriented</a>.
</p>
<p>
Likewise, the <code>Dehydrate</code> method translates the other way:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">void</span> Dehydrate()
{
<span style="color:blue;">if</span> (State <span style="color:blue;">is</span> AwaitingBillingState)
{
Data.IsOrderPlaced = <span style="color:blue;">true</span>;
Data.IsOrderBilled = <span style="color:blue;">false</span>;
<span style="color:blue;">return</span>;
}
<span style="color:blue;">if</span> (State <span style="color:blue;">is</span> AwaitingPlacementState)
{
Data.IsOrderPlaced = <span style="color:blue;">false</span>;
Data.IsOrderBilled = <span style="color:blue;">true</span>;
<span style="color:blue;">return</span>;
}
<span style="color:blue;">if</span> (State <span style="color:blue;">is</span> CompletedShippingState)
{
Data.IsOrderPlaced = <span style="color:blue;">true</span>;
Data.IsOrderBilled = <span style="color:blue;">true</span>;
<span style="color:blue;">return</span>;
}
Data.IsOrderPlaced = <span style="color:blue;">false</span>;
Data.IsOrderBilled = <span style="color:blue;">false</span>;
}</pre>
</p>
<p>
In any case, <code>Hydrate</code> and <code>Dehydrate</code> are distractions. The important part is that the <code>ShippingPolicy</code> (the State <em>Context</em>) now delegates execution to its <code>State</code>, which performs the actual work and updates the <code>State</code>.
</p>
<h3 id="0ad9f36d329d45cda95fc92630e65eed">
Initial state <a href="#0ad9f36d329d45cda95fc92630e65eed" title="permalink">#</a>
</h3>
<p>
The first time the saga runs, both <code>Data.IsOrderPlaced</code> and <code>Data.IsOrderBilled</code> are <code>false</code>, which means that the <code>State</code> is <code>InitialShippingState</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">InitialShippingState</span> : IShippingState
{
<span style="color:blue;">public</span> <span style="color:blue;">readonly</span> <span style="color:blue;">static</span> InitialShippingState Instance =
<span style="color:blue;">new</span> InitialShippingState();
<span style="color:blue;">private</span> <span style="color:#2b91af;">InitialShippingState</span>()
{
}
<span style="color:blue;">public</span> Task OrderPlaced(
OrderPlaced message,
IMessageHandlerContext context,
ShippingPolicy policy)
{
policy.State = AwaitingBillingState.Instance;
<span style="color:blue;">return</span> Task.CompletedTask;
}
<span style="color:blue;">public</span> Task OrderBilled(
OrderBilled message,
IMessageHandlerContext context,
ShippingPolicy policy)
{
policy.State = AwaitingPlacementState.Instance;
<span style="color:blue;">return</span> Task.CompletedTask;
}
}</pre>
</p>
<p>
As the above state transition diagram indicates, the only thing that each of the methods do is that they transition to the next appropriate state: <code>AwaitingBillingState</code> if the first event was <code>OrderPlaced</code>, and <code>AwaitingPlacementState</code> when the event was <code>OrderBilled</code>.
</p>
<blockquote>
<p>
"State object are often Singletons"
</p>
<footer><cite><a href="/ref/dp">Design Patterns</a></cite></footer>
</blockquote>
<p>
Like in the <a href="/2022/09/26/refactoring-the-tcp-state-pattern-example-to-pure-functions">previous example</a> I've made all the State objects <a href="https://en.wikipedia.org/wiki/Singleton_pattern">Singletons</a>. It's not that important, but since they are all stateless, we might as well. At least, it's in the spirit of the book.
</p>
<h3 id="1a3c6882c82d444fb4dff0cb9ffee2e5">
Awaiting billing <a href="#1a3c6882c82d444fb4dff0cb9ffee2e5" title="permalink">#</a>
</h3>
<p>
<code>AwaitingBillingState</code> is another <code>IShippingState</code> implementation:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">AwaitingBillingState</span> : IShippingState
{
<span style="color:blue;">public</span> <span style="color:blue;">readonly</span> <span style="color:blue;">static</span> IShippingState Instance =
<span style="color:blue;">new</span> AwaitingBillingState();
<span style="color:blue;">private</span> <span style="color:#2b91af;">AwaitingBillingState</span>()
{
}
<span style="color:blue;">public</span> Task OrderPlaced(
OrderPlaced message,
IMessageHandlerContext context,
ShippingPolicy policy)
{
<span style="color:blue;">return</span> Task.CompletedTask;
}
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task OrderBilled(
OrderBilled message,
IMessageHandlerContext context,
ShippingPolicy policy)
{
<span style="color:blue;">await</span> context.SendLocal(
<span style="color:blue;">new</span> ShipOrder() { OrderId = policy.Data.OrderId });
policy.Complete();
policy.State = CompletedShippingState.Instance;
}
}</pre>
</p>
<p>
This State doesn't react to <code>OrderPlaced</code> because it assumes that an order has already been placed. It only reacts to an <code>OrderBilled</code> event. When that happens, all requirements have been fulfilled to ship the item, so it sends a <code>ShipOrder</code> Command, marks the saga as completed, and changes the <code>State</code> to <code>CompletedShippingState</code>.
</p>
<p>
The <code>Complete</code> method is a little wrapper method I had to add to the <code>ShippingPolicy</code> class, since <code>MarkAsComplete</code> is a <code>protected</code> method:
</p>
<p>
<pre><span style="color:blue;">internal</span> <span style="color:blue;">void</span> Complete()
{
MarkAsComplete();
}</pre>
</p>
<p>
The <code>AwaitingPlacementState</code> class is similar to <code>AwaitingBillingState</code>, except that it reacts to <code>OrderPlaced</code> rather than <code>OrderBilled</code>.
</p>
<h3 id="503262f1d69a4042bb3f93b16aa04d44">
Terminal state <a href="#503262f1d69a4042bb3f93b16aa04d44" title="permalink">#</a>
</h3>
<p>
The fourth and final state is the <code>CompletedShippingState</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">CompletedShippingState</span> : IShippingState
{
<span style="color:blue;">public</span> <span style="color:blue;">readonly</span> <span style="color:blue;">static</span> IShippingState Instance =
<span style="color:blue;">new</span> CompletedShippingState();
<span style="color:blue;">private</span> <span style="color:#2b91af;">CompletedShippingState</span>()
{
}
<span style="color:blue;">public</span> Task OrderPlaced(
OrderPlaced message,
IMessageHandlerContext context,
ShippingPolicy policy)
{
<span style="color:blue;">return</span> Task.CompletedTask;
}
<span style="color:blue;">public</span> Task OrderBilled(
OrderBilled message,
IMessageHandlerContext context,
ShippingPolicy policy)
{
<span style="color:blue;">return</span> Task.CompletedTask;
}
}</pre>
</p>
<p>
In this state, the saga is completed, so it ignores both events.
</p>
<h3 id="3769c2c7504548c884cdc38d788a8f46">
Move Commands to output <a href="#3769c2c7504548c884cdc38d788a8f46" title="permalink">#</a>
</h3>
<p>
The saga now uses the State pattern to manage state-specific behaviour as well as state transitions. To be clear, this complexity isn't warranted for the simple requirements. This is, after all, an example. All tests still pass, and smoke testing also indicates that everything still works as it's supposed to.
</p>
<p>
The goal of this article is now to refactor the State pattern implementation to <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a>. When the saga runs it has an observable side effect: It eventually sends a <code>ShipOrder</code> Command. During processing it also updates its internal state. Both of these are sources of impurity that we have to <a href="/2016/09/26/decoupling-decisions-from-effects">decouple from the decision logic</a>.
</p>
<p>
I'll do this in several steps. The first impure action I'll address is the externally observable message transmission. A common functional-programming trick is to turn a side effect into a return value. So far, the <code>IShippingState</code> methods don't return anything. (This is strictly not true; they each return <a href="https://docs.microsoft.com/dotnet/api/system.threading.tasks.task">Task</a>, but we can regard <code>Task</code> as 'asynchronous <code>void</code>'.) Thus, return values are still available as a communications channel.
</p>
<p>
Refactor the <code>IShippingState</code> methods to return Commands instead of actually sending them. Each method may send an arbitrary number of Commands, including none, so the return type has to be a collection:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IShippingState</span>
{
IReadOnlyCollection<ICommand> OrderPlaced(
OrderPlaced message,
IMessageHandlerContext context,
ShippingPolicy policy);
IReadOnlyCollection<ICommand> OrderBilled(
OrderBilled message,
IMessageHandlerContext context,
ShippingPolicy policy);
}</pre>
</p>
<p>
When you change the interface you also have to change all the implementing classes, including <code>AwaitingBillingState</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">AwaitingBillingState</span> : IShippingState
{
<span style="color:blue;">public</span> <span style="color:blue;">readonly</span> <span style="color:blue;">static</span> IShippingState Instance = <span style="color:blue;">new</span> AwaitingBillingState();
<span style="color:blue;">private</span> <span style="color:#2b91af;">AwaitingBillingState</span>()
{
}
<span style="color:blue;">public</span> IReadOnlyCollection<ICommand> OrderPlaced(
OrderPlaced message,
IMessageHandlerContext context,
ShippingPolicy policy)
{
<span style="color:blue;">return</span> Array.Empty<ICommand>();
}
<span style="color:blue;">public</span> IReadOnlyCollection<ICommand> OrderBilled(
OrderBilled message,
IMessageHandlerContext context,
ShippingPolicy policy)
{
policy.Complete();
policy.State = CompletedShippingState.Instance;
<span style="color:blue;">return</span> <span style="color:blue;">new</span>[] { <span style="color:blue;">new</span> ShipOrder() { OrderId = policy.Data.OrderId } };
}
}</pre>
</p>
<p>
In order to do nothing a method like <code>OrderPlaced</code> now has to return an empty collection of Commands. In order to 'send' a Command, <code>OrderBilled</code> now returns it instead of using the <code>context</code> to send it. The <code>context</code> is already redundant, but since I prefer to <a href="https://stackoverflow.blog/2022/04/06/use-git-tactically/">move in small steps</a>, I'll remove it in a separate step.
</p>
<p>
It's now the responsibility of the <code>ShippingPolicy</code> class to do something with the Commands returned by the <code>State</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> Task Handle(OrderBilled message, IMessageHandlerContext context)
{
log.Info(<span style="color:#a31515;">$"OrderBilled message received."</span>);
Hydrate();
<span style="color:blue;">var</span> result = State.OrderBilled(message, context, <span style="color:blue;">this</span>);
<span style="color:blue;">await</span> Interpret(result, context);
Dehydrate();
}
<span style="color:blue;">private</span> <span style="color:blue;">async</span> Task Interpret(
IReadOnlyCollection<ICommand> commands,
IMessageHandlerContext context)
{
<span style="color:blue;">foreach</span> (var cmd <span style="color:blue;">in</span> commands)
<span style="color:blue;">await</span> context.SendLocal(cmd);
}</pre>
</p>
<p>
In functional programming, you often run an interpreter over the instructions returned by a pure function. Here the interpreter is just a private helper method.
</p>
<p>
The <code>IShippingState</code> methods are no longer asynchronous. Now they just return collections. I consider that a simplification.
</p>
<h3 id="31ef1f01413143359d91e067da9d9a35">
Remove context parameter <a href="#31ef1f01413143359d91e067da9d9a35" title="permalink">#</a>
</h3>
<p>
The <code>context</code> parameter is now redundant, so remove it from the <code>IShippingState</code> interface:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IShippingState</span>
{
IReadOnlyCollection<ICommand> OrderPlaced(OrderPlaced message, ShippingPolicy policy);
IReadOnlyCollection<ICommand> OrderBilled(OrderBilled message, ShippingPolicy policy);
}</pre>
</p>
<p>
I used Visual Studio's built-in refactoring tools to remove the parameter, which automatically removed it from all the call sites and implementations.
</p>
<p>
This takes us part of the way towards implementing the states as pure functions, but there's still work to be done.
</p>
<p>
<pre><span style="color:blue;">public</span> IReadOnlyCollection<ICommand> OrderBilled(OrderBilled message, ShippingPolicy policy)
{
policy.Complete();
policy.State = CompletedShippingState.Instance;
<span style="color:blue;">return</span> <span style="color:blue;">new</span>[] { <span style="color:blue;">new</span> ShipOrder() { OrderId = policy.Data.OrderId } };
}</pre>
</p>
<p>
The above <code>OrderBilled</code> implementation calls <code>policy.Complete</code> to indicate that the saga has completed. That's another state mutation that must be eliminated to make this a pure function.
</p>
<h3 id="51ae5d89697b47fe8611b58b4540163e">
Return complex result <a href="#51ae5d89697b47fe8611b58b4540163e" title="permalink">#</a>
</h3>
<p>
How do you refactor from state mutation to pure function? You turn the mutation statement into an instruction, which is a value that you return. In this case you might want to return a Boolean value: True to complete the saga. False otherwise.
</p>
<p>
There seems to be a problem, though. The <code>IShippingState</code> methods already return data: They return a collection of Commands. How do we get around this conundrum?
</p>
<p>
Introduce a complex object:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ShippingStateResult</span>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">ShippingStateResult</span>(
IReadOnlyCollection<ICommand> commands,
<span style="color:blue;">bool</span> completeSaga)
{
Commands = commands;
CompleteSaga = completeSaga;
}
<span style="color:blue;">public</span> IReadOnlyCollection<ICommand> Commands { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">bool</span> CompleteSaga { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">bool</span> Equals(<span style="color:blue;">object</span> obj)
{
<span style="color:blue;">return</span> obj <span style="color:blue;">is</span> ShippingStateResult result &&
EqualityComparer<IReadOnlyCollection<ICommand>>.Default
.Equals(Commands, result.Commands) &&
CompleteSaga == result.CompleteSaga;
}
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">int</span> GetHashCode()
{
<span style="color:blue;">int</span> hashCode = -1668187231;
hashCode = hashCode * -1521134295 + EqualityComparer<IReadOnlyCollection<ICommand>>
.Default.GetHashCode(Commands);
hashCode = hashCode * -1521134295 + CompleteSaga.GetHashCode();
<span style="color:blue;">return</span> hashCode;
}
}</pre>
</p>
<p>
That looks rather horrible, but most of the code is generated by Visual Studio. The only thing I wrote myself was the class declaration and the two read-only properties. I then used Visual Studio's <em>Generate constructor</em> and <em>Generate Equals and GetHashCode</em> Quick Actions to produce the rest of the code.
</p>
<p>
With more modern versions of C# I could have used a <a href="https://docs.microsoft.com/dotnet/csharp/language-reference/builtin-types/record">record</a>, but as I've already mentioned, I'm on C# 7.3 here.
</p>
<p>
The <code>IShippingState</code> interface can now define its methods with this new return type:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IShippingState</span>
{
ShippingStateResult OrderPlaced(OrderPlaced message, ShippingPolicy policy);
ShippingStateResult OrderBilled(OrderBilled message, ShippingPolicy policy);
}</pre>
</p>
<p>
This change reminds me of the <a href="https://refactoring.com/catalog/introduceParameterObject.html">Introduce Parameter Object</a> refactoring, but instead applied to the return value instead of input.
</p>
<p>
Implementers now have to return values of this new type:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">AwaitingBillingState</span> : IShippingState
{
<span style="color:blue;">public</span> <span style="color:blue;">readonly</span> <span style="color:blue;">static</span> IShippingState Instance = <span style="color:blue;">new</span> AwaitingBillingState();
<span style="color:blue;">private</span> <span style="color:#2b91af;">AwaitingBillingState</span>()
{
}
<span style="color:blue;">public</span> ShippingStateResult OrderPlaced(OrderPlaced message, ShippingPolicy policy)
{
<span style="color:blue;">return</span> <span style="color:blue;">new</span> ShippingStateResult(Array.Empty<ICommand>(), <span style="color:blue;">false</span>);
}
<span style="color:blue;">public</span> ShippingStateResult OrderBilled(OrderBilled message, ShippingPolicy policy)
{
policy.State = CompletedShippingState.Instance;
<span style="color:blue;">return</span> <span style="color:blue;">new</span> ShippingStateResult(
<span style="color:blue;">new</span>[] { <span style="color:blue;">new</span> ShipOrder() { OrderId = policy.Data.OrderId } },
<span style="color:blue;">true</span>);
}
}</pre>
</p>
<p>
Moving a statement to an output value implies that the effect must happen somewhere else. It seems natural to put it in the <code>ShippingPolicy</code> class' <code>Interpret</code> method:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> Task Handle(OrderBilled message, IMessageHandlerContext context)
{
log.Info(<span style="color:#a31515;">$"OrderBilled message received."</span>);
Hydrate();
<span style="color:blue;">var</span> result = State.OrderBilled(message, <span style="color:blue;">this</span>);
<span style="color:blue;">await</span> Interpret(result, context);
Dehydrate();
}
<span style="color:blue;">private</span> <span style="color:blue;">async</span> Task Interpret(ShippingStateResult result, IMessageHandlerContext context)
{
<span style="color:blue;">foreach</span> (var cmd <span style="color:blue;">in</span> result.Commands)
<span style="color:blue;">await</span> context.SendLocal(cmd);
<span style="color:blue;">if</span> (result.CompleteSaga)
MarkAsComplete();
}</pre>
</p>
<p>
Since <code>Interpret</code> is an instance method on the <code>ShippingPolicy</code> class I can now also delete the internal <code>Complete</code> method, since <code>MarkAsComplete</code> is already callable (it's a <code>protected</code> method defined by the <code>Saga</code> base class).
</p>
<h3 id="ce2f9545c94b4c6ab9536050ed030b8b">
Use message data <a href="#ce2f9545c94b4c6ab9536050ed030b8b" title="permalink">#</a>
</h3>
<p>
Have you noticed an odd thing about the code so far? It doesn't use any of the <code>message</code> data!
</p>
<p>
This is an artefact of the original code example. Refer back to the original <code>ProcessOrder</code> helper method. It uses neither <code>OrderPlaced</code> nor <code>OrderBilled</code> for anything. Instead, it pulls the <code>OrderId</code> from the saga's <code>Data</code> property. It can do that because NServiceBus makes sure that all <code>OrderId</code> values are correlated. It'll only instantiate a saga for which <code>Data.OrderId</code> matches <code>OrderPlaced.OrderId</code> or <code>OrderBilled.OrderId</code>. Thus, these values are guaranteed to be the same, and that's why <code>ProcessOrder</code> can get away with using <code>Data.OrderId</code> instead of the <code>message</code> data.
</p>
<p>
So far, through all refactorings, I've retained this detail, but it seems odd. It also couples the implementation methods to the <code>ShippingPolicy</code> class rather than the message classes. For these reasons, refactor the methods to use the message data instead. Here's the <code>AwaitingBillingState</code> implementation:
</p>
<p>
<pre><span style="color:blue;">public</span> ShippingStateResult OrderBilled(OrderBilled message, ShippingPolicy policy)
{
policy.State = CompletedShippingState.Instance;
<span style="color:blue;">return</span> <span style="color:blue;">new</span> ShippingStateResult(
<span style="color:blue;">new</span>[] { <span style="color:blue;">new</span> ShipOrder() { OrderId = message.OrderId } },
<span style="color:blue;">true</span>);
}</pre>
</p>
<p>
Compare this version with the previous iteration, where it used <code>policy.Data.OrderId</code> instead of <code>message.OrderId</code>.
</p>
<p>
Now, the only reason to pass <code>ShippingPolicy</code> as a method parameter is to mutate <code>policy.State</code>. We'll get to that in due time, but first, there's another issue I'd like to address.
</p>
<h3 id="800453851c1944e3835a05f8ff2e3272">
Immutable arguments <a href="#800453851c1944e3835a05f8ff2e3272" title="permalink">#</a>
</h3>
<p>
Keep in mind that the overall goal of the exercise is to refactor the state machine to pure functions. For good measure, method parameters should be immutable as well. Consider a method like <code>OrderBilled</code> shown above in its most recent iteration. It mutates <code>policy</code> by setting <code>policy.State</code>. The long-term goal is to get rid of that statement.
</p>
<p>
The method doesn't mutate the other argument, <code>message</code>, but the <code>OrderBilled</code> class is actually mutable:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">OrderBilled</span> : IEvent
{
<span style="color:blue;">public</span> <span style="color:blue;">string</span> OrderId { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
}</pre>
</p>
<p>
The same is true for the other message type, <code>OrderPlaced</code>.
</p>
<p>
For good measure, pure functions shouldn't take mutable arguments. You could argue that, since none of the implementation methods actually mutate the messages, it doesn't really matter. I am, however, enough of a neat freak that I don't like to leave such a loose strand dangling. I'd like to refactor the <code>IShippingState</code> API so that only immutable message data is passed as arguments.
</p>
<p>
In a situation like this, there are (at least) three options:
</p>
<ul>
<li>
Make the message types immutable. This would mean making <code>OrderBilled</code> and <code>OrderPlaced</code> immutable. These message types are by default mutable <a href="https://en.wikipedia.org/wiki/Data_transfer_object">Data Transfer Objects</a> (DTO), because NServiceBus needs to serialise and deserialise them to transmit them over durable queues. There are ways you can configure NServiceBus to use serialisation mechanisms that enable immutable records as messages, but for an example code base like this, I might be inclined to reach for an easier solution if one presents itself.
</li>
<li>
Add an immutable 'mirror' class. This may often be a good idea if you have a rich domain model that you'd like to represent. You can see an example of that in <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>, where there's both a mutable <code>ReservationDto</code> class and an immutable <code>Reservation</code> <a href="https://www.martinfowler.com/bliki/ValueObject.html">Value Object</a>. This makes sense if the invariants of the domain model are sufficiently stronger than the DTO. That hardly seems to be the case here, since both messages only contain an <code>OrderId</code>.
</li>
<li>
Dissolve the DTO into its constituents and pass each as an argument. This doesn't work if the DTO is complex and nested, but here there's only a single constituent element, and that's the <code>OrderId</code> property.
</li>
</ul>
<p>
The third option seems like the simplest solution, so refactor the <code>IShippingState</code> methods to take an <code>orderId</code> parameter instead of a message:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IShippingState</span>
{
ShippingStateResult OrderPlaced(<span style="color:blue;">string</span> orderId, ShippingPolicy policy);
ShippingStateResult OrderBilled(<span style="color:blue;">string</span> orderId, ShippingPolicy policy);
}</pre>
</p>
<p>
While this is the easiest of the three options given above, the refactoring doesn't hinge on this. It would work just as well with one of the two other options.
</p>
<p>
Implementations now look like this:
</p>
<p>
<pre><span style="color:blue;">public</span> ShippingStateResult OrderBilled(<span style="color:blue;">string</span> orderId, ShippingPolicy policy)
{
policy.State = CompletedShippingState.Instance;
<span style="color:blue;">return</span> <span style="color:blue;">new</span> ShippingStateResult(
<span style="color:blue;">new</span>[] { <span style="color:blue;">new</span> ShipOrder() { OrderId = orderId } },
<span style="color:blue;">true</span>);
}</pre>
</p>
<p>
The only impure action still lingering is the mutation of <code>policy.State</code>. Once we're rid of that, the API consists of pure functions.
</p>
<h3 id="1034a4283bb44e639805778d5ae4504a">
Return state <a href="#1034a4283bb44e639805778d5ae4504a" title="permalink">#</a>
</h3>
<p>
As outlined by the <a href="/2022/09/05/the-state-pattern-and-the-state-monad">parent article</a>, instead of mutating the caller's state, you can return the state as part of a tuple. This means that you no longer need to pass <code>ShippingPolicy </code> as a parameter:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IShippingState</span>
{
Tuple<ShippingStateResult, IShippingState> OrderPlaced(<span style="color:blue;">string</span> orderId);
Tuple<ShippingStateResult, IShippingState> OrderBilled(<span style="color:blue;">string</span> orderId);
}</pre>
</p>
<p>
Why not expand the <code>ShippingStateResult</code> class, or conversely, dissolve that class and instead return a triple (a three-tuple)? All of these are possible as alternatives, as they'd be isomorphic to this particular design. The reason I've chosen this particular return type is that it's the idiomatic implementation of the State monad: The result is the first element of a tuple, and the state is the second element. This means that you can use a standard, reusable State monad library to manipulate the values, as you'll see later.
</p>
<p>
An implementation now looks like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">AwaitingBillingState</span> : IShippingState
{
<span style="color:blue;">public</span> <span style="color:blue;">readonly</span> <span style="color:blue;">static</span> IShippingState Instance = <span style="color:blue;">new</span> AwaitingBillingState();
<span style="color:blue;">private</span> <span style="color:#2b91af;">AwaitingBillingState</span>()
{
}
<span style="color:blue;">public</span> Tuple<ShippingStateResult, IShippingState> OrderPlaced(<span style="color:blue;">string</span> orderId)
{
<span style="color:blue;">return</span> Tuple.Create(
<span style="color:blue;">new</span> ShippingStateResult(Array.Empty<ICommand>(), <span style="color:blue;">false</span>),
(IShippingState)<span style="color:blue;">this</span>);
}
<span style="color:blue;">public</span> Tuple<ShippingStateResult, IShippingState> OrderBilled(<span style="color:blue;">string</span> orderId)
{
<span style="color:blue;">return</span> Tuple.Create(
<span style="color:blue;">new</span> ShippingStateResult(
<span style="color:blue;">new</span>[] { <span style="color:blue;">new</span> ShipOrder() { OrderId = orderId } },
<span style="color:blue;">true</span>),
CompletedShippingState.Instance);
}
}</pre>
</p>
<p>
Since the <code>ShippingPolicy</code> class that calls these methods now directly receives the state as part of the output, it no longer needs a mutable <code>State</code> property. Instead, it immediately handles the return value:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">async</span> Task Handle(OrderPlaced message, IMessageHandlerContext context)
{
log.Info(<span style="color:#a31515;">$"OrderPlaced message received."</span>);
<span style="color:blue;">var</span> state = Hydrate();
<span style="color:blue;">var</span> result = state.OrderPlaced(message.OrderId);
<span style="color:blue;">await</span> Interpret(result.Item1, context);
Dehydrate(result.Item2);
}
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task Handle(OrderBilled message, IMessageHandlerContext context)
{
log.Info(<span style="color:#a31515;">$"OrderBilled message received."</span>);
<span style="color:blue;">var</span> state = Hydrate();
<span style="color:blue;">var</span> result = state.OrderBilled(message.OrderId);
<span style="color:blue;">await</span> Interpret(result.Item1, context);
Dehydrate(result.Item2);
}</pre>
</p>
<p>
Each <code>Handle</code> method is now an <a href="/2020/03/02/impureim-sandwich">impureim sandwich</a>.
</p>
<p>
Since the <code>result</code> is now a tuple, the <code>Handle</code> methods now have to pass the first element (<code>result.Item1</code>) to the <code>Interpret</code> helper method, and the second element (<code>result.Item2</code>) - the state - to <code>Dehydrate</code>. It's also possible to pattern match (or <em>destructure</em>) each of the elements directly; you'll see an example of that later.
</p>
<p>
Since the mutable <code>State</code> property is now gone, the <code>Hydrate</code> method returns the hydrated state:
</p>
<p>
<pre><span style="color:blue;">private</span> IShippingState Hydrate()
{
<span style="color:blue;">if</span> (!Data.IsOrderPlaced && !Data.IsOrderBilled)
<span style="color:blue;">return</span> InitialShippingState.Instance;
<span style="color:blue;">else</span> <span style="color:blue;">if</span> (Data.IsOrderPlaced && !Data.IsOrderBilled)
<span style="color:blue;">return</span> AwaitingBillingState.Instance;
<span style="color:blue;">else</span> <span style="color:blue;">if</span> (!Data.IsOrderPlaced && Data.IsOrderBilled)
<span style="color:blue;">return</span> AwaitingPlacementState.Instance;
<span style="color:blue;">else</span>
<span style="color:blue;">return</span> CompletedShippingState.Instance;
}</pre>
</p>
<p>
Likewise, the <code>Dehydrate</code> method takes the new state as an input parameter:
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">void</span> Dehydrate(IShippingState state)
{
<span style="color:blue;">if</span> (state <span style="color:blue;">is</span> AwaitingBillingState)
{
Data.IsOrderPlaced = <span style="color:blue;">true</span>;
Data.IsOrderBilled = <span style="color:blue;">false</span>;
<span style="color:blue;">return</span>;
}
<span style="color:blue;">if</span> (state <span style="color:blue;">is</span> AwaitingPlacementState)
{
Data.IsOrderPlaced = <span style="color:blue;">false</span>;
Data.IsOrderBilled = <span style="color:blue;">true</span>;
<span style="color:blue;">return</span>;
}
<span style="color:blue;">if</span> (state <span style="color:blue;">is</span> CompletedShippingState)
{
Data.IsOrderPlaced = <span style="color:blue;">true</span>;
Data.IsOrderBilled = <span style="color:blue;">true</span>;
<span style="color:blue;">return</span>;
}
Data.IsOrderPlaced = <span style="color:blue;">false</span>;
Data.IsOrderBilled = <span style="color:blue;">false</span>;
}</pre>
</p>
<p>
Since each <code>Handle</code> method only calls a single State-valued method, they don't need the State monad machinery. This only becomes useful when you need to compose multiple State-based operations.
</p>
<p>
This might be useful in unit tests, so let's examine that next.
</p>
<h3 id="8559bfcefc6e49459fa16fce5977cc58">
State monad <a href="#8559bfcefc6e49459fa16fce5977cc58" title="permalink">#</a>
</h3>
<p>
In <a href="/2022/06/20/the-state-monad">previous articles about the State monad</a> you've seen it implemented based on an <code>IState</code> interface. I've also dropped hints here and there that you don't <em>need</em> the interface. Instead, you can implement the monad functions directly on State-valued functions. That's what I'm going to do here:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Func<S, Tuple<T1, S>> SelectMany<<span style="color:#2b91af;">S</span>, <span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">T1</span>>(
<span style="color:blue;">this</span> Func<S, Tuple<T, S>> source,
Func<T, Func<S, Tuple<T1, S>>> selector)
{
<span style="color:blue;">return</span> s =>
{
<span style="color:blue;">var</span> tuple = source(s);
<span style="color:blue;">var</span> f = selector(tuple.Item1);
<span style="color:blue;">return</span> f(tuple.Item2);
};
}</pre>
</p>
<p>
This <code>SelectMany</code> implementation works directly on another function, <code>source</code>. This function takes a state of type <code>S</code> as input and returns a tuple as a result. The first element is the result of type <code>T</code>, and the second element is the new state, still of type <code>S</code>. Compare that to <a href="/2021/07/19/the-state-functor">the IState interface</a> to convince yourself that these are just two representations of the same idea.
</p>
<p>
The return value is a new function with the same shape, but where the result type is <code>T1</code> rather than <code>T</code>.
</p>
<p>
You can implement the special <code>SelectMany</code> overload that enables query syntax in <a href="/2022/03/28/monads">the standard way</a>.
</p>
<p>
The <em>return</em> function also mirrors the previous interface-based implementation:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Func<S, Tuple<T, S>> Return<<span style="color:#2b91af;">S</span>, <span style="color:#2b91af;">T</span>>(T x)
{
<span style="color:blue;">return</span> s => Tuple.Create(x, s);
}</pre>
</p>
<p>
You can also implement <a href="/2022/07/04/get-and-put-state">the standard Get, Put, and Modify functions</a>, but we are not going to need them here. Try it as an exercise.
</p>
<h3 id="4fe0da2976e34c8eaf8490186f7d022a">
State-valued event handlers <a href="#4fe0da2976e34c8eaf8490186f7d022a" title="permalink">#</a>
</h3>
<p>
The <code>IShippingState</code> methods almost look like State values, but the arguments are in the wrong order. A State value is a function that takes state as input and returns a tuple. The methods on <code>IShippingState</code>, however, take <code>orderId</code> as input and return a tuple. The state is also present, but as the instance that exposes the methods. We have to flip the arguments:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Func<IShippingState, Tuple<ShippingStateResult, IShippingState>> Billed(
<span style="color:blue;">this</span> <span style="color:blue;">string</span> orderId)
{
<span style="color:blue;">return</span> s => s.OrderBilled(orderId);
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Func<IShippingState, Tuple<ShippingStateResult, IShippingState>> Placed(
<span style="color:blue;">this</span> <span style="color:blue;">string</span> orderId)
{
<span style="color:blue;">return</span> s => s.OrderPlaced(orderId);
}</pre>
</p>
<p>
This is a typical example of how you have to turn things on their heads in functional programming, compared to object-oriented programming. These two methods convert <code>OrderBilled</code> and <code>OrderPlaced</code> to State monad values.
</p>
<h3 id="2aa26c7863bc47f6a37ad50f54d97d24">
Testing state results <a href="#2aa26c7863bc47f6a37ad50f54d97d24" title="permalink">#</a>
</h3>
<p>
A unit test demonstrates how this enables you to compose multiple stateful operations using query syntax:
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:#a31515;">"90125"</span>)]
[InlineData(<span style="color:#a31515;">"quux"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> StateResultExample(<span style="color:blue;">string</span> orderId)
{
<span style="color:blue;">var</span> sf = <span style="color:blue;">from</span> x <span style="color:blue;">in</span> orderId.Placed()
<span style="color:blue;">from</span> y <span style="color:blue;">in</span> orderId.Billed()
<span style="color:blue;">select</span> <span style="color:blue;">new</span>[] { x, y };
var (results, finalState) = sf(InitialShippingState.Instance);
Assert.Equal(
<span style="color:blue;">new</span>[] { <span style="color:blue;">false</span>, <span style="color:blue;">true</span> },
results.Select(r => r.CompleteSaga));
Assert.Single(
results
.SelectMany(r => r.Commands)
.OfType<ShipOrder>()
.Select(msg => msg.OrderId),
orderId);
Assert.Equal(CompletedShippingState.Instance, finalState);
}</pre>
</p>
<p>
Keep in mind that a State monad value is a function. That's the reason I called the composition <code>sf</code> - for <em>State Function</em>. When you execute it with <code>InitialShippingState</code> as input it returns a tuple that the test immediately pattern matches (destructures) into its constituent elements.
</p>
<p>
The test then asserts that the <code>results</code> and <code>finalState</code> are as expected. The assertions against <code>results</code> are a bit awkward, since C# collections don't have structural equality. These assertions would have been simpler in <a href="https://fsharp.org/">F#</a> or <a href="https://www.haskell.org/">Haskell</a>.
</p>
<h3 id="c899bd7d1b8f4bcea29a5ad946b0f3b8">
Testing with an interpreter <a href="#c899bd7d1b8f4bcea29a5ad946b0f3b8" title="permalink">#</a>
</h3>
<p>
While <a href="/2013/06/24/a-heuristic-for-formatting-code-according-to-the-aaa-pattern">the Arrange and Act phases of the above test</a> are simple, the Assertion phase seems awkward. Another testing strategy is to run a test-specific interpreter over the instructions returned as the State computation result:
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:#a31515;">"1984"</span>)]
[InlineData(<span style="color:#a31515;">"quuz"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> StateInterpretationExample(<span style="color:blue;">string</span> orderId)
{
<span style="color:blue;">var</span> sf = <span style="color:blue;">from</span> x <span style="color:blue;">in</span> orderId.Placed()
<span style="color:blue;">from</span> y <span style="color:blue;">in</span> orderId.Billed()
<span style="color:blue;">select</span> <span style="color:blue;">new</span>[] { x, y };
var (results, finalState) = sf(InitialShippingState.Instance);
Assert.Equal(CompletedShippingState.Instance, finalState);
<span style="color:blue;">var</span> result = Interpret(results);
Assert.True(result.CompleteSaga);
Assert.Single(
result.Commands.OfType<ShipOrder>().Select(msg => msg.OrderId),
orderId);
}</pre>
</p>
<p>
It helps a little, but the assertions still have to work around the lack of structural equality of <code>result.Commands</code>.
</p>
<h3 id="80405400d4ed4cfd9c89bf539df21b3b">
Monoid <a href="#80405400d4ed4cfd9c89bf539df21b3b" title="permalink">#</a>
</h3>
<p>
The test-specific <code>Interpret</code> helper method is interesting in its own right, though:
</p>
<p>
<pre><span style="color:blue;">private</span> ShippingStateResult Interpret(IEnumerable<ShippingStateResult> results)
{
<span style="color:blue;">var</span> identity = <span style="color:blue;">new</span> ShippingStateResult(Array.Empty<ICommand>(), <span style="color:blue;">false</span>);
ShippingStateResult Combine(ShippingStateResult x, ShippingStateResult y)
{
<span style="color:blue;">return</span> <span style="color:blue;">new</span> ShippingStateResult(
x.Commands.Concat(y.Commands).ToArray(),
x.CompleteSaga || y.CompleteSaga);
}
<span style="color:blue;">return</span> results.Aggregate(identity, Combine);
}</pre>
</p>
<p>
It wasn't until I started implementing this helper method that I realised that <code>ShippingStateResult</code> gives rise to a <a href="/2017/10/06/monoids">monoid</a>! Since <a href="/2017/11/20/monoids-accumulate">monoids accumulate</a>, you can start with the <code>identity</code> and use the binary operation (here called <code>Combine</code>) to <code>Aggregate</code> an arbitrary number of <code>ShippingStateResult</code> values into one.
</p>
<p>
The <code>ShippingStateResult</code> class is composed of two constituent values (a collection and a Boolean value), and since both of these give rise to one or more monoids, a <a href="/2017/10/30/tuple-monoids">tuple of those monoids itself gives rise to one or more monoids</a>. The <code>ShippingStateResult</code> is isomorphic to a tuple, so this result carries over.
</p>
<p>
Should you move the <code>Combine</code> method and the <code>identity</code> value to the <code>ShippingStateResult</code> class itself. After all, putting them in a test-specific helper method smells a bit of <a href="https://wiki.c2.com/?FeatureEnvySmell">Feature Envy</a>.
</p>
<p>
This seems compelling, but it's not clear that arbitrary client code might need this particular monoid. After all, there are four monoids over Boolean values, and at least two over collections. That's eight possible combinations. Which one should <code>ShippingStateResult</code> expose as members?
</p>
<p>
The monoid used in <code>Interpret</code> combines the normal <a href="/2017/10/10/strings-lists-and-sequences-as-a-monoid">collection monoid</a> with the <em>any</em> monoid. That seems appropriate in this case, but other clients might rather need the <em>all</em> monoid.
</p>
<p>
Without more usage examples, I decided to leave the code as an <code>Interpret</code> implementation detail for now.
</p>
<p>
In any case, I find it worth noting that by decoupling the state logic from the NServiceBus framework, it's possible to test it <a href="/2019/02/11/asynchronous-injection">without running asynchronous workflows</a>.
</p>
<h3 id="a1bc6585723044a9b795aa23099d9d36">
Conclusion <a href="#a1bc6585723044a9b795aa23099d9d36" title="permalink">#</a>
</h3>
<p>
In this article you saw how to implement an asynchronous messaging saga in three different ways. First, as a simple ad-hoc solution, second using the State pattern, and third implemented with the State monad. Both the State pattern and State monad implementations are meant exclusively to showcase these two techniques. The first solution using two Boolean flags is by far the simplest solution, and the one I'd use in a production system.
</p>
<p>
The point is that you can use the State monad if you need to write stateful computations. This may include finite state machines, as otherwise addressed by the State design pattern, but could also include other algorithms where you need to keep track of state.
</p>
<p>
<strong>Next:</strong> <a href="/2021/11/29/postels-law-as-a-profunctor">Postel's law as a profunctor</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Some thoughts on the economics of programminghttps://blog.ploeh.dk/2022/10/03/some-thoughts-on-the-economics-of-programming2022-10-03T05:53:00+00:00Mark Seemann
<div id="post">
<p>
<em>On the net value of process and code quality.</em>
</p>
<p>
Once upon a time there was a software company that had a special way of doing things. No other company had ever done things quite like that before, but the company had much success. In short time it rose to dominance in the market, outcompeting all serious competition. Some people disliked the company because of its business tactics and sheer size, but others admired it.
</p>
<p>
Even more wanted to be like it.
</p>
<p>
How did the company achieve its indisputable success? It looked as though it was really, really good at making software. How did they write such good software?
</p>
<p>
It turned out that the company had a special software development process.
</p>
<p>
Other software organisations, hoping to be able to be as successful, tried to copy the special process. The company was willing to share. Its employees wrote about the process. They gave conference presentations on their special sauce.
</p>
<p>
Which company do I have in mind, and what was the trick that made it so much better than its competition? Was it <a href="https://en.wikipedia.org/wiki/Microservices">microservices</a>? <a href="https://en.wikipedia.org/wiki/Monorepo">Monorepos</a>? <a href="https://en.wikipedia.org/wiki/Kubernetes">Kubernetes</a>? <a href="https://en.wikipedia.org/wiki/DevOps">DevOps</a>? <a href="https://en.wikipedia.org/wiki/Serverless_computing">Serverless</a>?
</p>
<p>
No, the company was <a href="https://microsoft.com">Microsoft</a> and the development process was called <a href="https://en.wikipedia.org/wiki/Microsoft_Solutions_Framework">Microsoft Solutions Framework</a> (MSF).
</p>
<p>
<em>What?!</em> do you say.
</p>
<p>
You've never heard of MSF?
</p>
<p>
That's hardly surprising. I doubt that MSF was in any way related to Microsoft's success.
</p>
<h3 id="9333d3993dbb4125a7d40faf5221ea02">
Net profits <a href="#9333d3993dbb4125a7d40faf5221ea02" title="permalink">#</a>
</h3>
<p>
These days, many people in technology consider Microsoft an embarrassing dinosaur. While you know that it's still around, does it really <em>matter</em>, these days?
</p>
<p>
You can't deny, however, that Microsoft made a lot of money in the Nineties. They still do.
</p>
<p>
What's the key to making a lot of money? Have a revenue larger than your costs.
</p>
<p>
I'm too lazy to look up the actual numbers, but clearly Microsoft had (and still has) a revenue vastly larger than its costs:
</p>
<p>
<img src="/content/binary/great-net-value-chart.png" alt="Revenue and cost line chart. The revenue is visibly and significantly greater than the cost over the entire time line.">
</p>
<p>
Compared to real, historic numbers, this may be exaggerated, but I'm trying to make a general point - not one that hinges on actual profit numbers of Microsoft, <a href="https://apple.com">Apple</a>, <a href="https://amzn.to/3QLHkly">Amazon</a>, <a href="http://google.com">Google</a>, or any other tremendously profitable company. I'm also aware that real companies have costs that aren't directly related to software development: Marketing, operations, buildings, sales, etcetera. They also make money in other ways than from their software, mainly from investments of the profits.
</p>
<p>
The difference between the revenue and the cost is the profit or net value.
</p>
<p>
If the graph looks like the above, is <em>managing cost</em> the main cause of success? Hardly. The cost is almost a rounding error on the profits.
</p>
<p>
If so, is the technology or process key to such a company's success? Was it MSF that made Microsoft the wealthiest company in the world? Are two-pizza teams the only explanation of Amazon's success? Is Google the dominant search engine because the source code is organised in a monorepo?
</p>
<p>
I'd be surprised were that the case. Rather, I think that these companies were at the right place at the right time. While there were search engines before Google, Google was so much better that users quickly migrated. Google was also better at making money than earlier search engines like <a href="https://en.wikipedia.org/wiki/AltaVista">AltaVista</a> or <a href="https://en.wikipedia.org/wiki/Yahoo!">Yahoo!</a> Likewise, Microsoft made more successful PC operating systems than the competition (which in the early Windows era consisted exclusively of <a href="https://en.wikipedia.org/wiki/OS/2">OS/2</a>) and better professional software (word processor, spreadsheet, etcetera). Amazon made a large-scale international web shop before anyone else. Apple made affordable computers with graphical user interfaces before other companies. Later, they introduced a smartphone at the right time.
</p>
<p>
All of this is simplified. For example, it's not really true that Apple made the first smartphone. When the iPhone was introduced, I already carried a <a href="https://en.wikipedia.org/wiki/Pocket_PC">Pocket PC Phone Edition</a> device that could browse the internet, had email, phone, SMS, and so on. There were other precursors even earlier.
</p>
<p>
I'm not trying to explain away <em>excellence of execution</em>. These companies succeeded for a variety of reasons, including that they were good at what they were doing. Lots of companies, however, are good at what they are doing, and still they fail. Being at the right place at the right time matters. Once in a while, a company finds itself in such favourable circumstances that success is served on a silver platter. While good execution is important, it doesn't explain the magnitude of the success.
</p>
<p>
Bad execution is likely to eliminate you in the long run, but it doesn't follow logically that good execution guarantees success.
</p>
<p>
Perhaps the successful companies succeeded because of circumstances, and <em>despite</em> mediocre execution. As usual, you should be wary not to mistake correlation for causation.
</p>
<h3 id="d1ab697df716465ab7eb8c841d3a1615">
Legacy code <a href="#d1ab697df716465ab7eb8c841d3a1615" title="permalink">#</a>
</h3>
<p>
You should be sceptical of adopting processes or technology just because a <a href="https://en.wikipedia.org/wiki/Big_Tech">Big Tech</a> company uses it. Still, if that was all I had in mind, I could probably had said that shorter. I have another point to make.
</p>
<p>
I often encounter resistance to ideas about better software development on the grounds that the status quo is good enough. Put bluntly,
</p>
<blockquote>
<p>""legacy," [...] is condescending-engineer-speak for "actually makes money.""</p>
<footer><cite><a href="https://www.lastweekinaws.com/blog/right-sizing-your-instances-is-nonsense">Corey Quinn</a></cite></footer>
</blockquote>
<p>
To be clear, I have nothing against the author or the cited article, which discusses something (right-sizing VMs) that I know nothing about. The phrase, or variations thereof, however, is such a fit <a href="https://en.wikipedia.org/wiki/Meme">meme</a> that it spreads. It strongly indicates that people who discuss code quality are wankers, while 'real programmers' produce code that makes money. I consider that a <a href="https://en.wikipedia.org/wiki/False_dilemma">false dichotomy</a>.
</p>
<p>
Most software organisations aren't in the fortunate situation that revenues are orders of magnitude greater than costs. Most software organisations can make a decent profit if they find a market and execute on a good idea. Perhaps the revenue starts at 'only' double the cost.
</p>
<p>
<img src="/content/binary/decreasing-profit-margin-from-increased-cost.png" alt="Revenue and cost line chart. The revenue starts at about double that of the cost. The cost line, however, grows by a steeper rater and eventually overtakes the revenue.">
</p>
<p>
If you can consistently make the double of your costs, you'll be in business for a long time. As the above line chart indicates, however, is that if the costs rise faster than the revenue, you'll eventually hit a point when you start losing money.
</p>
<p>
The Big Tech companies aren't likely to run into that situation because their profit margins are so great, but normal companies are very much at risk.
</p>
<p>
The area between the revenue and the cost represents the profit. Thus, looking back, it may be true that a software system has been making money. This doesn't mean, however, that it will keep making money.
</p>
<p>
In the above chart, the cost eventually exceeds the revenue. If this cost is mainly driven by rising software development costs, then the company is in deep trouble.
</p>
<p>
I've worked with such a company. When I started with it, it was a thriving company with many employees, most of them developers or IT professionals. In the previous decade, it had turned a nice profit every year.
</p>
<p>
This all started to change around the time that I arrived. (I will, again, remind the reader that correlation does not imply causation.) One reason I was engaged was that the developers were stuck. Due to external market pressures they had to deliver a tremendous amount of new features, and they were stuck in <a href="https://en.wikipedia.org/wiki/Analysis_paralysis">analysis paralysis</a>.
</p>
<p>
I helped them get unstuck, but as we started working on the new features, we discovered the size of the mess of the legacy code base.
</p>
<p>
I recall a conversation I later had with the CEO. He told me, after having discussed the situation with several key people: <em>"I knew that we had a legacy code base... but I didn't know it was </em>this<em> bad!"</em>
</p>
<p>
Revenue remained constant, but costs kept rising. Today, the company is no longer around.
</p>
<p>
This was a 100% digital service company. All revenue was ultimately based on software. The business idea was good, but the company couldn't keep up with competitors. As far as I can tell, it was undone by its legacy code base.
</p>
<h3 id="33ded378d771431fa00f37c8b3e85bfc">
Conclusion <a href="#33ded378d771431fa00f37c8b3e85bfc" title="permalink">#</a>
</h3>
<p>
Software should provide some kind of value. Usually profits, but sometimes savings, and occasionally wider concerns are in scope. It's reasonable and professional to consider value as you produce software. You should, however, be aware of a too myopic focus on immediate and past value.
</p>
<p>
Finding safety in past value is indulging in complacency. Legacy software can make money from day one, but that doesn't mean that it'll keep making money. The main problem with legacy code is that costs keep rising. When non-technical business stakeholders start to notice this, it may be too late.
</p>
<p>
The is one of many reasons I believe that we, software developers, have a <em>responsibility</em> to combat the mess. I don't think there's anything condescending about that attitude.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Refactoring the TCP State pattern example to pure functionshttps://blog.ploeh.dk/2022/09/26/refactoring-the-tcp-state-pattern-example-to-pure-functions2022-09-26T05:50:00+00:00Mark Seemann
<div id="post">
<p>
<em>A C# example.</em>
</p>
<p>
This article is one of the examples that I promised in the earlier article <a href="/2022/09/05/the-state-pattern-and-the-state-monad">The State pattern and the State monad</a>. That article examines the relationship between the <a href="https://en.wikipedia.org/wiki/State_pattern">State design pattern</a> and the <a href="/2022/06/20/the-state-monad">State monad</a>. That article is deliberately abstract, so one or more examples are in order.
</p>
<p>
In this article, I show you how to start with the example from <a href="/ref/dp">Design Patterns</a> and refactor it to an immutable solution using <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a>.
</p>
<p>
The code shown here is <a href="https://github.com/ploeh/TCPStateCSharp">available on GitHub</a>.
</p>
<h3 id="f70ca28a361241e5a352399f8cf912d7">
TCP connection <a href="#f70ca28a361241e5a352399f8cf912d7" title="permalink">#</a>
</h3>
<p>
The example is a class that handles <a href="https://en.wikipedia.org/wiki/Transmission_Control_Protocol">TCP</a> connections. The book's example is in C++, while I'll show my C# interpretation.
</p>
<p>
A TCP connection can be in one of several states, so the <code>TcpConnection</code> class keeps an instance of the polymorphic <code>TcpState</code>, which implements the state and transitions between them.
</p>
<p>
<code>TcpConnection</code> plays the role of the State pattern's <code>Context</code>, and <code>TcpState</code> of the <code>State</code>.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">TcpConnection</span>
{
<span style="color:blue;">public</span> TcpState State { <span style="color:blue;">get</span>; <span style="color:blue;">internal</span> <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:#2b91af;">TcpConnection</span>()
{
State = TcpClosed.Instance;
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> ActiveOpen()
{
State.ActiveOpen(<span style="color:blue;">this</span>);
}
<span style="color:blue;">public</span> <span style="color:blue;">void</span> PassiveOpen()
{
State.PassiveOpen(<span style="color:blue;">this</span>);
}
<span style="color:green;">// More members that delegate to State follows...</span></pre>
</p>
<p>
The <code>TcpConnection</code> class' methods delegate to a corresponding method on <code>TcpState</code>, passing itself an argument. This gives the <code>TcpState</code> implementation an opportunity to change the <code>TcpConnection</code>'s <code>State</code> property, which has an <code>internal</code> setter.
</p>
<h3 id="ffea200499f94145a0686e2481aebf2c">
State <a href="#ffea200499f94145a0686e2481aebf2c" title="permalink">#</a>
</h3>
<p>
This is the <code>TcpState</code> class:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">TcpState</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">virtual</span> <span style="color:blue;">void</span> Transmit(TcpConnection connection, TcpOctetStream stream)
{
}
<span style="color:blue;">public</span> <span style="color:blue;">virtual</span> <span style="color:blue;">void</span> ActiveOpen(TcpConnection connection)
{
}
<span style="color:blue;">public</span> <span style="color:blue;">virtual</span> <span style="color:blue;">void</span> PassiveOpen(TcpConnection connection)
{
}
<span style="color:blue;">public</span> <span style="color:blue;">virtual</span> <span style="color:blue;">void</span> Close(TcpConnection connection)
{
}
<span style="color:blue;">public</span> <span style="color:blue;">virtual</span> <span style="color:blue;">void</span> Synchronize(TcpConnection connection)
{
}
<span style="color:blue;">public</span> <span style="color:blue;">virtual</span> <span style="color:blue;">void</span> Acknowledge(TcpConnection connection)
{
}
<span style="color:blue;">public</span> <span style="color:blue;">virtual</span> <span style="color:blue;">void</span> Send(TcpConnection connection)
{
}
}</pre>
</p>
<p>
I don't consider this entirely <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> C# code, but it seems closer to the book's C++ example. (It's been a couple of decades since I wrote C++, so I could be mistaken.) It doesn't matter in practice, but instead of a concrete class with <a href="https://en.wikipedia.org/wiki/NOP_(code)">no-op</a> <code>virtual</code> methods, I would usually define an interface. I'll do that in the next example article.
</p>
<p>
The methods have the same names as the methods on <code>TcpConnection</code>, but the signatures are different. All the <code>TcpState</code> methods take a <code>TcpConnection</code> parameter, whereas the <code>TcpConnection</code> methods take no arguments.
</p>
<p>
While the <code>TcpState</code> methods don't do anything, various classes can inherit from the class and override some or all of them.
</p>
<h3 id="d68a32e30a314c9ea7f45004482a7d98">
Connection closed <a href="#d68a32e30a314c9ea7f45004482a7d98" title="permalink">#</a>
</h3>
<p>
The book shows implementations of three classes that inherit from <code>TcpState</code>, starting with <code>TcpClosed</code>. Here's my translation to C#:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">TcpClosed</span> : TcpState
{
<span style="color:blue;">public</span> <span style="color:blue;">static</span> TcpState Instance = <span style="color:blue;">new</span> TcpClosed();
<span style="color:blue;">private</span> <span style="color:#2b91af;">TcpClosed</span>()
{
}
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">void</span> ActiveOpen(TcpConnection connection)
{
<span style="color:green;">// Send SYN, receive SYN, Ack, etc.</span>
connection.State = TcpEstablished.Instance;
}
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">void</span> PassiveOpen(TcpConnection connection)
{
connection.State = TcpListen.Instance;
}
}</pre>
</p>
<p>
This implementation overrides <code>ActiveOpen</code> and <code>PassiveOpen</code>. In both cases, after performing some work, they change <code>connection.State</code>.
</p>
<blockquote>
<p>
"<code>TCPState</code> subclasses maintain no local state, so they can be shared, and only one instance of each is required. The unique instance of <code>TCPState</code> subclass is obtained by the static <code>Instance</code> operation. [...]
</p>
<p>
"This make each <code>TCPState</code> subclass a Singleton [...]."
</p>
<footer><cite><a href="/ref/dp">Design Patterns</a></cite></footer>
</blockquote>
<p>
I've maintained that property of each subclass in my C# code, even though it has no impact on the structure of the State pattern.
</p>
<h3 id="8e21a702cffe4ef09455a29c2981ab39">
The other subclasses <a href="#8e21a702cffe4ef09455a29c2981ab39" title="permalink">#</a>
</h3>
<p>
The next subclass, <code>TcpEstablished</code>, is cast in the same mould:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">TcpEstablished</span> : TcpState
{
<span style="color:blue;">public</span> <span style="color:blue;">static</span> TcpState Instance = <span style="color:blue;">new</span> TcpEstablished();
<span style="color:blue;">private</span> <span style="color:#2b91af;">TcpEstablished</span>()
{
}
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">void</span> Close(TcpConnection connection)
{
<span style="color:green;">// send FIN, receive ACK of FIN</span>
connection.State = TcpListen.Instance;
}
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">void</span> Transmit(
TcpConnection connection,
TcpOctetStream stream)
{
connection.ProcessOctet(stream);
}
}</pre>
</p>
<p>
As is <code>TcpListen</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">TcpListen</span> : TcpState
{
<span style="color:blue;">public</span> <span style="color:blue;">static</span> TcpState Instance = <span style="color:blue;">new</span> TcpListen();
<span style="color:blue;">private</span> <span style="color:#2b91af;">TcpListen</span>()
{
}
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">void</span> Send(TcpConnection connection)
{
<span style="color:green;">// Send SYN, receive SYN, ACK, etc.</span>
connection.State = TcpEstablished.Instance;
}
}</pre>
</p>
<p>
I admit that I find these examples a bit anaemic, since there's really no logic going on. None of the overrides change state <em>conditionally</em>, which would be possible and make the examples a little more interesting. If you're interested in an example where this happens, see my article <a href="/2021/05/24/tennis-kata-using-the-state-pattern">Tennis kata using the State pattern</a>.
</p>
<h3 id="68a957ca3dcd40e9986b078223f0763e">
Refactor to pure functions <a href="#68a957ca3dcd40e9986b078223f0763e" title="permalink">#</a>
</h3>
<p>
There's only one obvious source of impurity in the example: The literal <code>State</code> mutation of <code>TcpConnection</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> TcpState State { <span style="color:blue;">get</span>; <span style="color:blue;">internal</span> <span style="color:blue;">set</span>; }</pre>
</p>
<p>
While client code can't <code>set</code> the <code>State</code> property, subclasses can, and they do. After all, it's how the State pattern works.
</p>
<p>
It's quite a stretch to claim that if we can only get rid of that property setter then all else will be pure. After all, who knows what all those comments actually imply:
</p>
<p>
<pre><span style="color:green;">// Send SYN, receive SYN, ACK, etc.</span></pre>
</p>
<p>
To be honest, we must imagine that <a href="https://en.wikipedia.org/wiki/Input/output">I/O</a> takes place here. This means that even though it's possible to refactor away from mutating the <code>State</code> property, these implementations are not really going to be pure functions.
</p>
<p>
I could try to imagine what that <code>SYN</code> and <code>ACK</code> would look like, but it would be unfounded and hypothetical. I'm not going to do that here. Instead, that's the reason I'm going to publish a second article with a more realistic and complex example. When it comes to the present example, I'm going to proceed with the unreasonable assumption that the comments hide no nondeterministic behaviour or side effects.
</p>
<p>
As outlined in the <a href="/2022/09/05/the-state-pattern-and-the-state-monad">article that compares the State pattern and the State monad</a>, you can refactor state mutation to a pure function by instead returning the new state. Usually, you'd have to return a tuple, because you'd also need to return the 'original' return value. Here, however, the 'return type' of all methods is <code>void</code>, so this isn't necessary.
</p>
<p>
<code>void</code> is <a href="/2018/01/15/unit-isomorphisms">isomorphic to unit</a>, so strictly speaking you could refactor to a return type like <code>Tuple<Unit, TcpConnection></code>, but that is isomorphic to <code>TcpConnection</code>. (If you need to understand why that is, try writing two functions: One that converts a <code>Tuple<Unit, TcpConnection></code> to a <code>TcpConnection</code>, and another that converts a <code>TcpConnection</code> to a <code>Tuple<Unit, TcpConnection></code>.)
</p>
<p>
There's no reason to make things more complicated than they have to be, so I'm going to use the simplest representation: <code>TcpConnection</code>. Thus, you can get rid of the <code>State</code> mutation by instead returning a new <code>TcpConnection</code> from all methods:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">TcpConnection</span>
{
<span style="color:blue;">public</span> TcpState State { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:#2b91af;">TcpConnection</span>()
{
State = TcpClosed.Instance;
}
<span style="color:blue;">private</span> <span style="color:#2b91af;">TcpConnection</span>(TcpState state)
{
State = state;
}
<span style="color:blue;">public</span> TcpConnection ActiveOpen()
{
<span style="color:blue;">return</span> <span style="color:blue;">new</span> TcpConnection(State.ActiveOpen(<span style="color:blue;">this</span>));
}
<span style="color:blue;">public</span> TcpConnection PassiveOpen()
{
<span style="color:blue;">return</span> <span style="color:blue;">new</span> TcpConnection(State.PassiveOpen(<span style="color:blue;">this</span>));
}
<span style="color:green;">// More members that delegate to State follows...</span></pre>
</p>
<p>
The <code>State</code> property no longer has a setter; there's only a public getter. In order to 'change' the state, code must return a new <code>TcpConnection</code> object with the new state. To facilitate that, you'll need to add a constructor overload that takes the new state as an input. Here I made it <code>private</code>, but making it more accessible is not prohibited.
</p>
<p>
This implies, however, that the <code>TcpState</code> methods <em>also</em> return values instead of mutating state. The base class now looks like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">TcpState</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">virtual</span> TcpState Transmit(TcpConnection connection, TcpOctetStream stream)
{
<span style="color:blue;">return</span> <span style="color:blue;">this</span>;
}
<span style="color:blue;">public</span> <span style="color:blue;">virtual</span> TcpState ActiveOpen(TcpConnection connection)
{
<span style="color:blue;">return</span> <span style="color:blue;">this</span>;
}
<span style="color:blue;">public</span> <span style="color:blue;">virtual</span> TcpState PassiveOpen(TcpConnection connection)
{
<span style="color:blue;">return</span> <span style="color:blue;">this</span>;
}
<span style="color:green;">// And so on...</span></pre>
</p>
<p>
Again, all the methods previously 'returned' <code>void</code>, so while, according to the State monad, you should strictly speaking return <code>Tuple<Unit, TcpState></code>, this simplifies to <code>TcpState</code>.
</p>
<p>
Individual subclasses now do their work and return other <code>TcpState</code> implementations. I'm not going to tire you with all the example subclasses, so here's just <code>TcpEstablished</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">TcpEstablished</span> : TcpState
{
<span style="color:blue;">public</span> <span style="color:blue;">static</span> TcpState Instance = <span style="color:blue;">new</span> TcpEstablished();
<span style="color:blue;">private</span> <span style="color:#2b91af;">TcpEstablished</span>()
{
}
<span style="color:blue;">public</span> <span style="color:blue;">override</span> TcpState Close(TcpConnection connection)
{
<span style="color:green;">// send FIN, receive ACK of FIN</span>
<span style="color:blue;">return</span> TcpListen.Instance;
}
<span style="color:blue;">public</span> <span style="color:blue;">override</span> TcpState Transmit(
TcpConnection connection,
TcpOctetStream stream)
{
TcpConnection newConnection = connection.ProcessOctet(stream);
<span style="color:blue;">return</span> newConnection.State;
}
}</pre>
</p>
<p>
The trickiest implementation is <code>Transmit</code>, since <code>ProcessOctet</code> returns a <code>TcpConnection</code> while the <code>Transmit</code> method has to return a <code>TcpState</code>. Fortunately, the <code>Transmit</code> method can achieve that goal by returning <code>newConnection.State</code>. It feels a bit roundabout, but highlights a point I made in the <a href="/2022/09/05/the-state-pattern-and-the-state-monad">previous article</a>: The <code>TcpConnection</code> and <code>TcpState</code> classes are isomorphic - or, they would be if we made the <code>TcpConnection</code> constructor overload public. Thus, the <code>TcpConnection</code> class is redundant and might be deleted.
</p>
<h3 id="a94a0e101f6b49d3b4caea30060ae1e1">
Conclusion <a href="#a94a0e101f6b49d3b4caea30060ae1e1" title="permalink">#</a>
</h3>
<p>
This article shows how to refactor the <em>TCP connection</em> sample code from <a href="/ref/dp">Design Patterns</a> to pure functions.
</p>
<p>
If it feels as though something's missing there's a good reason for that. The example, as given, is degenerate because all methods 'return' <code>void</code>, and we don't really know what the actual implementation code (all that <em>Send SYN, receive SYN, ACK, etc.</em>) looks like. This means that we actually don't have to make use of the State monad, because we can get away with <a href="https://en.wikipedia.org/wiki/Endomorphism">endomorphisms</a>. All methods on <code>TcpConnection</code> are really functions that take <code>TcpConnection</code> as input (the instance itself) and return <code>TcpConnection</code>. If you want to see a more realistic example showcasing that perspective, see my article <a href="/2021/05/31/from-state-tennis-to-endomorphism">From State tennis to endomorphism</a>.
</p>
<p>
Even though the example is degenerate, I wanted to show it because otherwise you might wonder how the book's example code fares when exposed to the State monad. To be clear, because of the nature of the example, the State monad never becomes necessary. Thus, we need a second example.
</p>
<p>
<strong>Next:</strong> <a href="/2022/10/10/refactoring-a-saga-from-the-state-pattern-to-the-state-monad">Refactoring a saga from the State pattern to the State monad</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.When to refactorhttps://blog.ploeh.dk/2022/09/19/when-to-refactor2022-09-19T06:36:00+00:00Mark Seemann
<div id="post">
<p>
<em>FAQ: How do I convince my manager to let me refactor?</em>
</p>
<p>
This question frequently comes up. Developers want to refactor, but are under the impression that managers or other stakeholders will not let them.
</p>
<p>
Sometimes people ask me how to convince their managers to get permission to refactor. I can't answer that. <a href="/2021/03/22/the-dispassionate-developer">I don't know how to convince other people</a>. That's not my métier.
</p>
<p>
I also believe that professional programmers <a href="/2019/03/18/the-programmer-as-decision-maker">should make their own decisions</a>. You don't ask permission to add three lines to a file, or create a new class. Why do you feel that you have to ask permission to refactor?
</p>
<h3 id="4af4356cd706457ebf8968e92850b140">
Does refactoring take time? <a href="#4af4356cd706457ebf8968e92850b140" title="permalink">#</a>
</h3>
<p>
In <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a> I tell the following story:
</p>
<blockquote>
<p>
"I once led an effort to <a href="/ref/ddd">refactor towards deeper insight</a>. My colleague and I had identified that the key to implementing a new feature would require changing a fundamental class in our code base.
</p>
<p>
"While such an insight rarely arrives at an opportune time, we wanted to make the change, and our manager allowed it.
</p>
<p>
"A week later, our code still didn’t compile.
</p>
<p>
"I’d hoped that I could make the change to the class in question and then <a href="/ref/welc">lean on the compiler</a> to identify the call sites that needed modification. The problem was that there was an abundance of compilation errors, and fixing them wasn’t a simple question of search-and-replace.
</p>
<p>
"My manager finally took me aside to let me know that he wasn’t satisfied with the situation. I could only concur.
</p>
<p>
"After a mild dressing down, he allowed me to continue the work, and a few more days of heroic effort saw the work completed.
</p>
<p>
"That’s a failure I don’t intend to repeat."
</p>
<footer><cite><a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a></cite></footer>
</blockquote>
<p>
There's a couple of points to this story. Yes, I <em>did</em> ask for permission before refactoring. I expected the process to take time, and I felt that making such a choice of prioritisation should involve my manager. While this manager trusted me, I felt a moral obligation to be transparent about the work I was doing. I didn't consider it professional to take a week out of the calendar and work on one thing while the rest of the organisation was expecting me to be working on something else.
</p>
<p>
So I can understand why developers feel that they have to ask permission to refactor. After all, refactoring takes time... Doesn't it?
</p>
<h3 id="ddb8e36130fe475da5ea914ccf06ae45">
Small steps <a href="#ddb8e36130fe475da5ea914ccf06ae45" title="permalink">#</a>
</h3>
<p>
This may unearth the underlying assumption that prevents developers from refactoring: The notion that refactoring takes time.
</p>
<p>
As I wrote in <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>, that was a failure I didn't intend to repeat. I've never again asked permission to refactor, because I've never since allowed myself to be in a situation where refactoring would take significant time.
</p>
<p>
The reason I tell the story in the book is that I use it to motivate using the <a href="https://martinfowler.com/bliki/StranglerFigApplication.html">Strangler pattern</a> at the code level. The book proceeds to show an example of that.
</p>
<p>
Migrating code to a new API by allowing the old and the new to coexist for a while is only one of many techniques for taking smaller steps. Another is the use of <a href="https://en.wikipedia.org/wiki/Feature_toggle">feature flags</a>, a technique that I also show in the book. <a href="https://martinfowler.com/">Martin Fowler</a>'s <a href="/ref/refactoring">Refactoring</a> is literally an entire book about how to improve code bases in small, controlled steps.
</p>
<p>
Follow the <a href="/2019/10/21/a-red-green-refactor-checklist">red-green-refactor checklist</a> and commit after each <em>green</em> and <em>refactor</em> step. Move in small steps and <a href="https://stackoverflow.blog/2022/04/06/use-git-tactically/">use Git tactically</a>.
</p>
<p>
I'm beginning to realise, though, that <em>moving in small steps</em> is a skill that must be explicitly learned. This may seem obvious once posited, but it may also be helpful to explicitly state it.
</p>
<p>
Whenever I've had a chance to talk to other software professionals and <a href="https://twitter.com/hillelogram/status/1445435617047990273">thought leaders</a>, they agree. As far as I can tell, universities and coding boot camps don't teach this skill, and if (like me) you're autodidact, you probably haven't learned it either. After all, few people insist that this is an important skill. It may, however, be one of the most important programming skills you can learn.
</p>
<h3 id="11afa6718a3c4a6f8e50aa5066474ac8">
Make it work, then make it right <a href="#11afa6718a3c4a6f8e50aa5066474ac8" title="permalink">#</a>
</h3>
<p>
When should you refactor? As <a href="https://wiki.c2.com/?BoyScoutRule">the boy scout rule</a> suggests: All the time.
</p>
<p>
You can, specifically, do it after implementing a new feature. As <a href="https://wiki.c2.com/?MakeItWorkMakeItRightMakeItFast">Kent Beck perhaps said or wrote</a>: <em>Make it work, then make it right</em>.
</p>
<p>
How long does it take to make it right?
</p>
<p>
Perhaps you think that it takes as much time as it does to make it work.
</p>
<p>
<img src="/content/binary/make-it-work-then-right-50-50.png" alt="A timeline with two sections: 'make it work' and 'make it right'. Each section has the same size.">
</p>
<p>
Perhaps you think that making it right takes even more time.
</p>
<p>
<img src="/content/binary/make-it-work-then-right-20-80.png" alt="A timeline with two sections: 'make it work' and 'make it right'. The 'make it right' section is substantially larger than the 'make it work' section.">
</p>
<p>
If this is how much time making the code right takes, I can understand why you feel that you need to ask your manager. That's what I did, those many years ago. But what if the proportions are more like this?
</p>
<p>
<img src="/content/binary/make-it-work-then-right-80-20.png" alt="A timeline with two sections: 'make it work' and 'make it right'. The 'make it right' section is substantially smaller than the 'make it work' section.">
</p>
<p>
Do you still feel that you need to ask for permission to refactor?
</p>
<p>
Writing code so that the team can keep a sustainable pace is your job. It's not something you should have to ask for permission to do.
</p>
<blockquote>
<p>
"Any fool can write code that a computer can understand. Good programmers write code that humans can understand."
</p>
<footer><cite>Martin Fowler, <a href="/ref/refactoring">Refactoring</a></cite></footer>
</blockquote>
<p>
Making the code right is not always a huge endeavour. It can be, if you've already made a mess of it, but if it's in good condition, keeping it that way doesn't have to take much extra effort. It's part of the ongoing design process that programming is.
</p>
<p>
How do you know what <em>right</em> is? Doesn't this make-it-work-make-it-right mentality lead to <a href="https://wiki.c2.com/?SpeculativeGenerality">speculative generality</a>?
</p>
<p>
No-one expects you to be able to predict the future, so don't try. Making it right means making the code good in the current context. Use good names, remove duplication, get rid of code smells, keep methods small and complexity low. <a href="/2020/04/13/curb-code-rot-with-thresholds">Refactor if you exceed a threshold</a>.
</p>
<h3 id="1a2eb1f23d70440c84e518babf7dc373">
Make code easy to change <a href="#1a2eb1f23d70440c84e518babf7dc373" title="permalink">#</a>
</h3>
<p>
The purpose of keeping code in a good condition is to make future changes as easy as possible. If you can't predict the future, however, then how do you know how to factor the code?
</p>
<p>
Another <a href="https://en.wikipedia.org/wiki/Kent_Beck">Kent Beck</a> aphorism suggests a tactic:
</p>
<blockquote>
<p>
"for each desired change, make the change easy (warning: this may be hard), then make the easy change"
</p>
<footer><cite><a href="https://twitter.com/KentBeck/status/250733358307500032">Kent Beck</a></cite></footer>
</blockquote>
<p>
In other words, when you know what you need to accomplish, first refactor the code so that it becomes easier to achieve the goal, and only then write the code to do that.
</p>
<p>
<img src="/content/binary/refactor-implement-40-60.png" alt="A timeline with two sections: Refactor and Implement. The Implement section is visibly larger than the Refactor section.">
</p>
<p>
Should you ask permission to refactor in such a case? Only if you sincerely believe that you can complete the entire task significantly faster without first improving the code. How likely is that? If the code base is already a mess, how easy is it to make changes? Not easy, and granted: That will also be true for refactoring. The difference between first refactoring and <em>not</em> refactoring, however, is that if you refactor, you leave the code in a better state. If you don't, you leave it in a worse state.
</p>
<p>
These decisions compound.
</p>
<p>
But what if, as Kent Beck implies, refactoring is hard? Then the situation might look like this:
</p>
<p>
<img src="/content/binary/refactor-implement-80-20.png" alt="A timeline with two sections: Refactor and Implement. The Refactor section is significantly larger than the Implement section.">
</p>
<p>
Should you ask for permission to refactor? I don't think so. While refactoring in this diagram is most of the work, it makes the change easy. Thus, once you're done refactoring, you make the easy change. The total amount of time this takes may turn out to be quicker than if you hadn't refactored (compare this figure to the previous figure: they're to scale). You also leave the code base in a better state so that future changes may be easier.
</p>
<h3 id="ff4ce77459af4e64a9e3df797fc01d8d">
Conclusion <a href="#ff4ce77459af4e64a9e3df797fc01d8d" title="permalink">#</a>
</h3>
<p>
There are lots of opportunities for refactoring. Every time you see something that could be improved, why not improve it? The fact that you're already looking at a piece of code suggests that it's somehow relevant to your current task. If it takes ten, fifteen minutes to improve it, why not do it? What if it takes an hour?
</p>
<p>
Most people think nothing of spending hours in meetings without asking their managers. If this is true, you can also decide to use a couple of hours improving code. They're likely as well spent as the meeting hours.
</p>
<p>
The key, however, is to be able to perform opportunistic refactoring. You can't do that if you can only move in day-long iterations; if hours, or days, go by when you can't compile, or when most tests fail.
</p>
<p>
On the other hand, if you're able to incrementally improve the code base in one-minute, or fifteen-minute, steps, then you can improve the code base every time an occasion arrives.
</p>
<p>
This is a skill that you need to learn. You're not born with the ability to improve in small steps. You'll have to practice - for example by <a href="/2020/01/13/on-doing-katas">doing katas</a>. One customer of mine told me that they found Kent Beck's <a href="https://medium.com/@kentbeck_7670/test-commit-revert-870bbd756864">TCR</a> a great way to teach that skill.
</p>
<p>
You can refactor in small steps. It's part of software engineering. Usually, you don't need to ask for permission.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="4dc52487bcee4459857d6fb6741dbcad">
<div class="comment-author"><a href="https://danielt1263.medium.com">Daniel Tartaglia</a> <a href="#4dc52487bcee4459857d6fb6741dbcad">#</a></div>
<div class="comment-content">
<p>
I've always had a problem with the notion of "red, green, refactor" and "first get it working, then make it right." I think the order is completely wrong.
</p>
<p>
As an explanation, I refer you to the first chapter of the first edition of Martin Fowler's Refactoring book. In that chapter is an example of a working system and we are presented with a change request.
</p>
<p>
In the example, the first thing that Fowler points out and does is the refactoring. And one of the highlighted ideas in the chapter says:
<blockquote>When you find you have to add a feature to a program, and the program's code is not structured in a convenient way to add the feature, first refactor the program to make it easy to add the feature, then add the feature.</blockquote>
</p>
<p>
In other words, the refactoring <i>comes first</i>. You refactor as part of adding the feature, not as a separate thing that is done after you have working code. It may not trip off the tongue as nicely, but the saying should be "refactor, red, green."
</p>
<p>
Once you have working code, you are done, and when you are estimating the time it will take to add the feature, <i>you include the refactoring time</i>. Lastly, you never refactor "just because," you refactor in order to make a new feature easy to add.
</p>
<p>
This mode of working makes much more sense to me. I feel that refactoring with no clear goal in mind ("improve the design" is not a clear goal) just leads to an over-designed/unnecessarily complex system. What do you think of this idea?
</p>
</div>
<div class="comment-date">2022-09-24 02:28 UTC</div>
</div>
<div class="comment" id="f9df3a46ce5f4cb8a386304f64030137">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#f9df3a46ce5f4cb8a386304f64030137">#</a></div>
<div class="comment-content">
<p>
Daniel, thank you for writing. You make some good points.
</p>
<p>
The red-green-refactor cycle is useful as a feedback cycle for new development work. It's not the only way to work. Particularly, as you point out, when you have existing code, first refactoring and then adding new code is a useful order.
</p>
<p>
Typically, though, when you're adding a new feature, you can rarely implement a new feature <em>only</em> by refactoring existing code. Normally you also need to add some new code. I still find the red-green-refactor cycle useful for that kind of work. I don't view it as an <em>either-or</em> proposition, but rather as a <em>both-this-and-that</em> way of working.
</p>
<blockquote>
<p>
"you never refactor "just because," you refactor in order to make a new feature easy to add."
</p>
</blockquote>
<p>
Never say never. I don't agree with that point of view. There are more than one reason for refactoring, and making room for a new feature is certainly one of them. This does not, however, rule out other reasons. I can easily think of a handful of other reasons that I consider warranted, but I don't want to derail the discussion by listing all of them. The list is not going to be complete anyway. I'll just outline one:
</p>
<p>
Sometimes, you read existing code because you need to understand what's going on. If the code is badly structured, it can take significant time and effort to reach such understanding. If, at that point you can see a simpler way to achieve the same behaviour, why not refactor the code? In that way, you make it easier for future readers of the code to understand what's going on. If you've already spent (wasted) significant time understanding something, why let other readers suffer and waste time if you can simplify the code?
</p>
<p>
This is essentially the boy scout rule, but as I claimed, there are other reasons to refactor as well.
</p>
<p>
Finally, thank you for the quote from <em>Refactoring</em>. I've usually been using this Kent Beck quote:
</p>
<blockquote>
<p>
"for each desired change, make the change easy (warning: this may be hard), then make the easy change"
</p>
<footer><cite><a href="https://twitter.com/kentbeck/status/250733358307500032">Kent Beck</a></cite></footer>
</blockquote>
<p>
but that's from 2012, and <em>Refactoring</em> is from 1999. It's such a Kent Beck thing to say, though, and Kent is a coauthor of <em>Refactoring</em>, so who knows who came up with that. I'm happy to know of the earlier quote, though.
</p>
</div>
<div class="comment-date">2022-10-02 18:19 UTC</div>
</div>
<div class="comment" id="d2f99d2cf0d2471ea591103b9eda7e9a">
<div class="comment-author"><a href="https://about.me/tysonwilliams">Tyson Williams</a> <a href="#d2f99d2cf0d2471ea591103b9eda7e9a">#</a></div>
<div class="comment-content">
<blockquote>
I don't view it as an either-or proposition, but rather as a both-this-and-that way of working.
</blockquote>
<p>
I think it is worth elaborating on this. I think am correct in saying that Mark believes that type-driven development and test-driven development are a both-this-and-that way of working instead of an either-or way of working. He did exactly this in his <a href="https://www.pluralsight.com/courses/fsharp-type-driven-development">Pluralsight course titled Type-Driven Development with F#</a> by first obtaining an implementation using type-driven development and then <i>deleting his implementation</i> but keeping his types and obtaining a second implementation using test-driven development.
</p>
<p>
When implementing a new feature, it is important to as quickly as possible derisk by discovering any surprises (aka unknown unknowns) and analyze all challenges (aka known unknowns). The reason for this is to make sure the intended approach is feasible. During this phase of work, we are in the "green" step of test-driven development. Anything goes. There are no rules. The code can horribly ugly or unmaintainable. Just get the failing test to pass.
</p>
<p>
After the test passes, you have proved that the approach is sound. Now you need to share your solution with others. Here is where refactoring first occurs. Just like in Mark's course, I often find it helpful to <i>start over</i>. Now that I know where I am going, I can first refactor the code to make the functional change, which I know will make the test pass. In this way, I know that all my refactors have a clear goal.
</p>
<blockquote>
You refactor as part of adding the feature, not as a separate thing that is done after you have working code.
</blockquote>
<p>
I agree that refactoring should be done as part of the feature, but I disagree that it should (always) be done before you have working code. It is often done after you have working code.
</p>
<blockquote>
Once you have working code, you are done, and when you are estimating the time it will take to add the feature, <i>you include the refactoring time</i>.
</blockquote>
<p>
I agree that estimating should include the refactoring time, but I disagree that you are done when you have working code. When you have working code, you are approximately halfway done. Your code is currently optimized for writing. You still need to optimize it for reading.
</p>
</div>
<div class="comment-date">2022-10-08 17:42 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Coalescing DTOshttps://blog.ploeh.dk/2022/09/12/coalescing-dtos2022-09-12T07:35:00+00:00Mark Seemann
<div id="post">
<p>
<em>Refactoring to a universal abstraction.</em>
</p>
<p>
Despite my best efforts, no code base I write is perfect. This is also true for the code base that accompanies <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>.
</p>
<p>
One (among several) warts that has annoyed me for long is this:
</p>
<p>
<pre>[HttpPost(<span style="color:#a31515;">"restaurants/{restaurantId}/reservations"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task<ActionResult> Post(
<span style="color:blue;">int</span> restaurantId,
ReservationDto dto)
{
<span style="color:blue;">if</span> (dto <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="color:blue;">throw</span> <span style="color:blue;">new</span> ArgumentNullException(nameof(dto));
<span style="color:blue;">var</span> id = dto.ParseId() ?? Guid.NewGuid();
Reservation? reservation = dto.Validate(id);
<span style="color:blue;">if</span> (reservation <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="color:blue;">return</span> <span style="color:blue;">new</span> BadRequestResult();
<span style="color:green;">// More code follows...</span></pre>
</p>
<p>
Passing <code>id</code> to <code>Validate</code> annoys me. Why does <code>Validate</code> need an <code>id</code>?
</p>
<p>
When you see it in context, it <em>may</em> makes some sort of sense, but in isolation, it seems arbitrary:
</p>
<p>
<pre><span style="color:blue;">internal</span> Reservation? Validate(Guid id)</pre>
</p>
<p>
Why does the method need an <code>id</code>? Doesn't <code>ReservationDto</code> have an <code>Id</code>?
</p>
<h3 id="90765fa49a8149d0bf54d8d9de19ffee">
Abstraction, broken <a href="#90765fa49a8149d0bf54d8d9de19ffee" title="permalink">#</a>
</h3>
<p>
Yes, indeed, <code>ReservationDto</code> <em>has</em> an <code>Id</code> property:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">string</span>? Id { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }</pre>
</p>
<p>
Then why do callers have to pass an <code>id</code> argument? Doesn't <code>Validate</code> use the <code>Id</code> property? It's almost as though the <code>Validate</code> method <em>begs</em> you to read the implementing code:
</p>
<p>
<pre><span style="color:blue;">internal</span> Reservation? Validate(Guid id)
{
<span style="color:blue;">if</span> (!DateTime.TryParse(At, <span style="color:blue;">out</span> var d))
<span style="color:blue;">return</span> <span style="color:blue;">null</span>;
<span style="color:blue;">if</span> (Email <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="color:blue;">return</span> <span style="color:blue;">null</span>;
<span style="color:blue;">if</span> (Quantity < 1)
<span style="color:blue;">return</span> <span style="color:blue;">null</span>;
<span style="color:blue;">return</span> <span style="color:blue;">new</span> Reservation(
id,
d,
<span style="color:blue;">new</span> Email(Email),
<span style="color:blue;">new</span> Name(Name ?? <span style="color:#a31515;">""</span>),
Quantity);
}</pre>
</p>
<p>
Indeed, the method doesn't use the <code>Id</code> property. Reading the code may not be of much help, but at least we learn that <code>id</code> is passed to the <code>Reservation</code> constructor. It's still not clear why the method isn't trying to parse the <code>Id</code> property, like it's doing with <code>At</code>.
</p>
<p>
I'll return to the motivation in a moment, but first I'd like to dwell on the problems of this design.
</p>
<p>
It's a typical example of ad-hoc design. I had a set of behaviours I needed to implement, and in order to avoid code duplication, I came up with a method that seemed to solve the problem.
</p>
<p>
And indeed, the <code>Validate</code> method does solve the problem of code duplication. It also passes all tests. It could be worse.
</p>
<p>
It could also be better.
</p>
<p>
The problem with an ad-hoc design like this is that the motivation is unclear. As a reader, you feel that you're missing the full picture. Perhaps you feel compelled to read the implementation code to gain a better understanding. Perhaps you look for other call sites. Perhaps you search the Git history to find a helpful comment. Perhaps you ask a colleague.
</p>
<p>
It slows you down. Worst of all, it may leave you apprehensive of refactoring. If you feel that there's something you don't fully understand, you may decide to leave the API alone, instead of improving it.
</p>
<p>
It's one of the many ways that code slowly rots.
</p>
<p>
What's missing here is a proper abstraction.
</p>
<h3 id="c6b50930baf541eb9243611e2090b1e6">
Motivation <a href="#c6b50930baf541eb9243611e2090b1e6" title="permalink">#</a>
</h3>
<p>
I recently hit upon a design that I like better. Before I describe it, however, you need to understand the problem I was trying to solve.
</p>
<p>
The code base for the book is a restaurant reservation REST API, and I was evolving the code as I wrote it. I wanted the code base (and its Git history) to be as realistic as possible. In a real-world situation, you don't always know all requirements up front, or even if you do, they may change.
</p>
<p>
At one point I decided that a REST client could supply a <a href="https://en.wikipedia.org/wiki/Universally_unique_identifier">GUID</a> when making a new reservation. On the other hand, I had lots of existing tests (and a deployed system) that accepted reservations <em>without</em> IDs. In order to not break compatibility, I decided to use the ID if it was supplied with the <a href="https://en.wikipedia.org/wiki/Data_transfer_object">DTO</a>, and otherwise create one. (I later explored <a href="/2021/09/20/keep-ids-internal-with-rest">an API without explicit IDs</a>, but that's a different story.)
</p>
<p>
The <code>id</code> is a JSON property, however, so there's no guarantee that it's properly formatted. Thus, the need to first parse it:
</p>
<p>
<pre><span style="color:blue;">var</span> id = dto.ParseId() ?? Guid.NewGuid();</pre>
</p>
<p>
To make matters even more complicated, when you <code>PUT</code> a reservation, the ID is actually part of the resource address, which means that even if it's present in the JSON document, that value should be ignored:
</p>
<p>
<pre>[HttpPut(<span style="color:#a31515;">"restaurants/{restaurantId}/reservations/{id}"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task<ActionResult> Put(
<span style="color:blue;">int</span> restaurantId,
<span style="color:blue;">string</span> id,
ReservationDto dto)
{
<span style="color:blue;">if</span> (dto <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="color:blue;">throw</span> <span style="color:blue;">new</span> ArgumentNullException(nameof(dto));
<span style="color:blue;">if</span> (!Guid.TryParse(id, <span style="color:blue;">out</span> var rid))
<span style="color:blue;">return</span> <span style="color:blue;">new</span> NotFoundResult();
Reservation? reservation = dto.Validate(rid);
<span style="color:blue;">if</span> (reservation <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="color:blue;">return</span> <span style="color:blue;">new</span> BadRequestResult();
<span style="color:green;">// More code follows...</span></pre>
</p>
<p>
Notice that this <code>Put</code> implementation exclusively considers the resource address <code>id</code> parameter. Recall that the <code>Validate</code> method ignores the <code>dto</code>'s <code>Id</code> property.
</p>
<p>
This is knowledge about implementation details that leaks through to the calling code. As a client developer, you need to know and keep this knowledge in your head while you write your own code. That's not really code that fits in your head.
</p>
<p>
As I usually put it: If you have to read the code, it implies that encapsulation is broken.
</p>
<p>
At the time, however, I couldn't think of a better alternative, and since the problem is still fairly small and localised, I decided to move on. After all, <a href="https://en.wikipedia.org/wiki/Perfect_is_the_enemy_of_good">perfect is the enemy of good</a>.
</p>
<h3 id="7e10ecc024e94a20817f0054bde56c29">
Why don't you just..? <a href="#7e10ecc024e94a20817f0054bde56c29" title="permalink">#</a>
</h3>
<p>
Is there a better way? Perhaps you think that you've spotted an obvious improvement. Why don't I just try to parse <code>dto.Id</code> and then create a <code>Guid.NewGuid()</code> if parsing fails? Like this:
</p>
<p>
<pre><span style="color:blue;">internal</span> Reservation? <span style="color:#74531f;">Validate</span>()
{
<span style="color:#8f08c4;">if</span> (!Guid.TryParse(Id, <span style="color:blue;">out</span> var <span style="color:#1f377f;">id</span>))
id = Guid.NewGuid();
<span style="color:#8f08c4;">if</span> (!DateTime.TryParse(At, <span style="color:blue;">out</span> var <span style="color:#1f377f;">d</span>))
<span style="color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="color:#8f08c4;">if</span> (Email <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="color:#8f08c4;">if</span> (Quantity < 1)
<span style="color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> Reservation(
id,
d,
<span style="color:blue;">new</span> Email(Email),
<span style="color:blue;">new</span> Name(Name ?? <span style="color:#a31515;">""</span>),
Quantity);
}</pre>
</p>
<p>
The short answer is: Because it doesn't work.
</p>
<p>
It may work for <code>Get</code>, but then <code>Put</code> doesn't have a way to tell the <code>Validate</code> method which ID to use.
</p>
<p>
Or rather: That's not entirely true, because this is possible:
</p>
<p>
<pre>dto.Id = id;
Reservation? <span style="color:#1f377f;">reservation</span> = dto.Validate();</pre>
</p>
<p>
This does suggest an even better way. Before we go there, however, there's another reason I don't like this particular variation: It makes <code>Validate</code> impure.
</p>
<p>
<em>Why care?</em> you may ask.
</p>
<p>
I always end up regretting making an otherwise potentially <a href="https://en.wikipedia.org/wiki/Pure_function">pure function</a> non-deterministic. Sooner or later, it turns out to have been a bad decision, regardless of how alluring it initially looked. <a href="/2022/05/23/waiting-to-never-happen">I recently gave an example of that</a>.
</p>
<p>
When weighing the advantages and disadvantages, I preferred passing <code>id</code> explicitly rather than relying on <code>Guid.NewGuid()</code> inside <code>Validate</code>.
</p>
<h3 id="cb79dc6987854f8b9a897285af649360">
First monoid <a href="#cb79dc6987854f8b9a897285af649360" title="permalink">#</a>
</h3>
<p>
One of the reasons I find <a href="/2017/10/04/from-design-patterns-to-category-theory">universal abstractions</a> beneficial is that you only have to learn them once. As Felienne Hermans writes in <a href="/ref/programmers-brain">The Programmer's Brain</a> our working memory juggles a combination of ephemeral data and knowledge from our long-term memory. The better you can leverage existing knowledge, the easier it is to read code.
</p>
<p>
Which universal abstraction enables you to choose from a prioritised list of candidates? The <a href="/2018/04/03/maybe-monoids">First monoid</a>!
</p>
<p>
In C# with <a href="https://docs.microsoft.com/dotnet/csharp/nullable-references">nullable reference types</a> the <a href="https://docs.microsoft.com/dotnet/csharp/language-reference/operators/null-coalescing-operator">null-coalescing operator</a> <code>??</code> already implements the desired functionality. (If you're using another language or an older version of C#, you can instead use <a href="/2018/06/04/church-encoded-maybe">Maybe</a>.)
</p>
<p>
Once I got that idea I was able to simplify the API.
</p>
<h3 id="ad2f0a4d20da4c9c8006e9587b011911">
Parsing and coalescing DTOs <a href="#ad2f0a4d20da4c9c8006e9587b011911" title="permalink">#</a>
</h3>
<p>
Instead of that odd <code>Validate</code> method which isn't quite a validator and not quite a parser, this insight suggests to <a href="https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/">parse, don't validate</a>:
</p>
<p>
<pre><span style="color:blue;">internal</span> Reservation? <span style="color:#74531f;">TryParse</span>()
{
<span style="color:#8f08c4;">if</span> (!Guid.TryParse(Id, <span style="color:blue;">out</span> var <span style="color:#1f377f;">id</span>))
<span style="color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="color:#8f08c4;">if</span> (!DateTime.TryParse(At, <span style="color:blue;">out</span> var <span style="color:#1f377f;">d</span>))
<span style="color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="color:#8f08c4;">if</span> (Email <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="color:#8f08c4;">if</span> (Quantity < 1)
<span style="color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> Reservation(
id,
d,
<span style="color:blue;">new</span> Email(Email),
<span style="color:blue;">new</span> Name(Name ?? <span style="color:#a31515;">""</span>),
Quantity);
}</pre>
</p>
<p>
This function only returns a parsed <code>Reservation</code> object when the <code>Id</code> is present and well-formed. What about the cases where the <code>Id</code> is absent?
</p>
<p>
The calling <code>ReservationsController</code> can deal with that:
</p>
<p>
<pre>Reservation? <span style="color:#1f377f;">candidate1</span> = dto.TryParse();
dto.Id = Guid.NewGuid().ToString(<span style="color:#a31515;">"N"</span>);
Reservation? <span style="color:#1f377f;">candidate2</span> = dto.TryParse();
Reservation? <span style="color:#1f377f;">reservation</span> = candidate1 ?? candidate2;
<span style="color:#8f08c4;">if</span> (reservation <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> BadRequestResult();</pre>
</p>
<p>
First try to parse the <code>dto</code>, then explicitly overwrite its <code>Id</code> property with a new <code>Guid</code>, and then try to parse it again. Finally, pick the first of these that aren't null, using the null-coalescing <code>??</code> operator.
</p>
<p>
This API also works consistently in the <code>Put</code> method:
</p>
<p>
<pre>dto.Id = id;
Reservation? <span style="color:#1f377f;">reservation</span> = dto.TryParse();
<span style="color:#8f08c4;">if</span> (reservation <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> BadRequestResult();</pre>
</p>
<p>
Why is this better? I consider it better because the <code>TryParse</code> function should be a familiar abstraction. Once you've seen a couple of those, you know that a well-behaved parser either returns a valid object, or nothing. You don't have to go and read the implementation of <code>TryParse</code> to (correctly) guess that. Thus, encapsulation is maintained.
</p>
<h3 id="72d1eececc3f4166ba426d6a39cd6721">
Where does mutation go? <a href="#72d1eececc3f4166ba426d6a39cd6721" title="permalink">#</a>
</h3>
<p>
The <code>ReservationsController</code> mutates the <code>dto</code> and relies on the impure <code>Guid.NewGuid()</code> method. Why is that okay when it wasn't okay to do this inside of <code>Validate</code>?
</p>
<p>
This is because the code base follows the <a href="https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell">functional core, imperative shell</a> architecture. Specifically, Controllers make up the imperative shell, so I consider it appropriate to put impure actions there. After all, they have to go somewhere.
</p>
<p>
This means that the <code>TryParse</code> function remains pure.
</p>
<h3 id="85ce3840fad04a0c8cb106f6d36459d5">
Conclusion <a href="#85ce3840fad04a0c8cb106f6d36459d5" title="permalink">#</a>
</h3>
<p>
Sometimes a good API design can elude you for a long time. When that happens, I move on with the best solution I can think of in the moment. As it often happens, though, ad-hoc abstractions leave me unsatisfied, so I'm always happy to improve such code later, if possible.
</p>
<p>
In this article, you saw an example of an ad-hoc API design that represented the best I could produce at the time. Later, it dawned on me that an implementation based on a universal abstraction would be possible. In this case, the universal abstraction was null coalescing (which is a specialisation of the <em>monoid</em> abstraction).
</p>
<p>
I like universal abstractions because, once you know them, you can trust that they work in well-understood ways. You don't have to waste time reading implementation code in order to learn whether it's safe to call a method in a particular way.
</p>
<p>
This saves time when you have to work with the code, because, after all, we spend more time reading code than writing it.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="abb3e1c91878423a94835a798fc8b32d">
<div class="comment-author"><a href="https://about.me/tysonwilliams">Tyson Williams</a> <a href="#abb3e1c91878423a94835a798fc8b32d">#</a></div>
<div class="comment-content">
<p>
After the refactor in this article, is the entirety of your Post method (including the part you didn't show in this article) an <a href="https://blog.ploeh.dk/2020/03/02/impureim-sandwich/">impureim sandwich</a>?
</p>
</div>
<div class="comment-date">2022-09-17 18:37 UTC</div>
</div>
<div class="comment" id="2352255931b1499184b5ba3782436d61">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#2352255931b1499184b5ba3782436d61">#</a></div>
<div class="comment-content">
<p>
Not yet. There's a lot of (preliminary) interleaving of impure actions and pure functions remaining in the controller, even after this refactoring.
</p>
<p>
A future article will tackle that question. One of the reasons I even started writing about monads, functor relationships, etcetera was to establish the foundations for what this requires. If it can be done without monads and traversals I don't know how.
</p>
<p>
Even though the <code>Post</code> method isn't an impureim sandwich, I still consider the architecture <a href="https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell">functional core, imperative shell</a>, since I've kept all impure actions in the controllers.
</p>
<p>
The reason I didn't go all the way to impureim sandwiches with the book's code is didactic. For complex logic, you'll need traversals, monads, sum types, and so on, and none of those things were in scope for the book.
</p>
</div>
<div class="comment-date">2022-09-18 19:28 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.The State pattern and the State monadhttps://blog.ploeh.dk/2022/09/05/the-state-pattern-and-the-state-monad2022-09-05T12:48:00+00:00Mark Seemann
<div id="post">
<p>
<em>The names are the same. Is there a connection? An article for object-oriented programmers.</em>
</p>
<p>
This article is part of <a href="/2018/03/05/some-design-patterns-as-universal-abstractions">a series of articles about specific design patterns and their category theory counterparts</a>. In this article I compare the <a href="https://en.wikipedia.org/wiki/State_pattern">State design pattern</a> to the <a href="/2022/06/20/the-state-monad">State monad</a>.
</p>
<p>
Since the design pattern and the monad share the name <em>State</em> you'd think that they might be <a href="/2018/01/08/software-design-isomorphisms">isomorphic</a>, but it's not quite that easy. I find it more likely that the name is an example of <a href="https://en.wikipedia.org/wiki/Parallel_evolution">parallel evolution</a>. Monads were discovered by <a href="https://en.wikipedia.org/wiki/Eugenio_Moggi">Eugenio Moggi</a> in the early nineties, and <a href="/ref/dp">Design Patterns</a> is from 1994. That's close enough in time that I find it more likely that whoever came up with the names found them independently. <em>State</em>, after all, is hardly an exotic word.
</p>
<p>
Thus, it's possible that the choice of the same name is coincidental. If this is true (which is only my conjecture), does the State pattern have anything in common with the State monad? I find that the answer is a tentative <em>yes</em>. The State design pattern describes an open polymorphic stateful computation. That kind of computation can also be described with the State monad.
</p>
<p>
This article contains a significant amount of code, and it's all quite abstract. It examines the abstract shape of the pattern, so there's little prior intuition on which to build an understanding. While later articles will show more concrete examples, if you want to follow along, you can use the <a href="https://github.com/ploeh/StatePatternAndMonad">GitHub repository</a>.
</p>
<h3 id="320763be103b49018debc64b45069e3c">
Shape <a href="#320763be103b49018debc64b45069e3c" title="permalink">#</a>
</h3>
<p>
<a href="/ref/dp">Design Patterns</a> is a little vague when it comes to representing the essential form of the pattern. What one can deduce from the diagram in the <em>Structure</em> section describing the pattern, you have an abstract <code>State</code> class with a <code>Handle</code> method like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">virtual</span> <span style="color:blue;">void</span> <span style="color:#74531f;">Handle</span>(Context <span style="color:#1f377f;">context</span>)
{
}</pre>
</p>
<p>
This, however, doesn't capture all scenarios. What if you need to pass more arguments to the method? What if the method returns a result? What if there's more than one method?
</p>
<p>
Taking into account all those concerns, you might arrive at a more generalised description of the State pattern where an abstract <code>State</code> class might define methods like these:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">abstract</span> Out1 <span style="color:#74531f;">Handle1</span>(Context <span style="color:#1f377f;">context</span>, In1 <span style="color:#1f377f;">in1</span>);
<span style="color:blue;">public</span> <span style="color:blue;">abstract</span> Out2 <span style="color:#74531f;">Handle2</span>(Context <span style="color:#1f377f;">context</span>, In2 <span style="color:#1f377f;">in2</span>);</pre>
</p>
<p>
There might be an arbitrary number of <code>Handle</code> methods, from <code>Handle1</code> to <code>HandleN</code>, each with their own input and return types.
</p>
<p>
The idea behind the State pattern is that clients don't interact directly with <code>State</code> objects. Instead, they interact with a <code>Context</code> object that delegates operations to a <code>State</code> object, passing itself as an argument:
</p>
<p>
<pre><span style="color:blue;">public</span> Out1 <span style="color:#74531f;">Request1</span>(In1 <span style="color:#1f377f;">in1</span>)
{
<span style="color:#8f08c4;">return</span> State.Handle1(<span style="color:blue;">this</span>, in1);
}
<span style="color:blue;">public</span> Out2 <span style="color:#74531f;">Request2</span>(In2 <span style="color:#1f377f;">in2</span>)
{
<span style="color:#8f08c4;">return</span> State.Handle2(<span style="color:blue;">this</span>, in2);
}</pre>
</p>
<p>
Classes that derive from the abstract <code>State</code> may then mutate <code>context.State</code>.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">override</span> Out2 <span style="color:#74531f;">Handle2</span>(Context <span style="color:#1f377f;">context</span>, In2 <span style="color:#1f377f;">in2</span>)
{
<span style="color:#8f08c4;">if</span> (in2 == In2.Epsilon)
context.State = <span style="color:blue;">new</span> ConcreteStateB();
<span style="color:#8f08c4;">return</span> Out2.Eta;
}</pre>
</p>
<p>
Clients interact with the <code>Context</code> object and aren't aware of this internal machinery:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="color:#1f377f;">actual</span> = ctx.Request2(in2);</pre>
</p>
<p>
With such state mutation going on, is it possible to refactor to a design that uses immutable data and <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a>?
</p>
<h3 id="463f29ae1f8240ec8de7ed18cd429b9f">
State pair <a href="#463f29ae1f8240ec8de7ed18cd429b9f" title="permalink">#</a>
</h3>
<p>
When you have a <code>void</code> method that mutates state, you can refactor it to a pure function by leaving the existing state unchanged and instead returning the new state. What do you do, however, when the method in question already returns a value?
</p>
<p>
This is the case with the generalised <code>HandleN</code> methods, above.
</p>
<p>
One way to resolve this problem is to introduce a more complex type to return. To avoid too much duplication or boilerplate code, you could make it a generic type:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">StatePair</span><<span style="color:#2b91af;">T</span>>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">StatePair</span>(T <span style="color:#1f377f;">value</span>, State <span style="color:#1f377f;">state</span>)
{
Value = value;
State = state;
}
<span style="color:blue;">public</span> T Value { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> State State { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">bool</span> <span style="color:#74531f;">Equals</span>(<span style="color:blue;">object</span> <span style="color:#1f377f;">obj</span>)
{
<span style="color:#8f08c4;">return</span> obj <span style="color:blue;">is</span> StatePair<T> <span style="color:#1f377f;">result</span> &&
EqualityComparer<T>.Default.Equals(Value, result.Value) &&
EqualityComparer<State>.Default.Equals(State, result.State);
}
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">int</span> <span style="color:#74531f;">GetHashCode</span>()
{
<span style="color:#8f08c4;">return</span> HashCode.Combine(Value, State);
}
}</pre>
</p>
<p>
This enables you to change the signatures of the <code>Handle</code> methods:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">abstract</span> StatePair<Out1> <span style="color:#74531f;">Handle1</span>(Context <span style="color:#1f377f;">context</span>, In1 <span style="color:#1f377f;">in1</span>);
<span style="color:blue;">public</span> <span style="color:blue;">abstract</span> StatePair<Out2> <span style="color:#74531f;">Handle2</span>(Context <span style="color:#1f377f;">context</span>, In2 <span style="color:#1f377f;">in2</span>);</pre>
</p>
<p>
This refactoring is always possible. Even if the original return type of a method was <code>void</code>, you can <a href="/2018/01/15/unit-isomorphisms">use a <em>unit</em> type as a replacement for <em>void</em></a>. While redundant but consistent, a method could return <code>StatePair<Unit></code>.
</p>
<h3 id="e2ca8d320e684cf89407880d06256b64">
Generic pair <a href="#e2ca8d320e684cf89407880d06256b64" title="permalink">#</a>
</h3>
<p>
The above <code>StatePair</code> type is so coupled to a particular <code>State</code> class that it's not reusable. If you had more than one implementation of the State pattern in your code base, you'd have to duplicate that effort. That seems wasteful, so why not make the type generic in the state dimension as well?
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">StatePair</span><<span style="color:#2b91af;">TState</span>, <span style="color:#2b91af;">T</span>>
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">StatePair</span>(T <span style="color:#1f377f;">value</span>, TState <span style="color:#1f377f;">state</span>)
{
Value = value;
State = state;
}
<span style="color:blue;">public</span> T Value { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> TState State { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">bool</span> <span style="color:#74531f;">Equals</span>(<span style="color:blue;">object</span> <span style="color:#1f377f;">obj</span>)
{
<span style="color:#8f08c4;">return</span> obj <span style="color:blue;">is</span> StatePair<TState, T> <span style="color:#1f377f;">pair</span> &&
EqualityComparer<T>.Default.Equals(Value, pair.Value) &&
EqualityComparer<TState>.Default.Equals(State, pair.State);
}
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">int</span> <span style="color:#74531f;">GetHashCode</span>()
{
<span style="color:#8f08c4;">return</span> HashCode.Combine(Value, State);
}
}</pre>
</p>
<p>
When you do that then clearly you'd also need to modify the <code>Handle</code> methods accordingly:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">abstract</span> StatePair<State, Out1> <span style="color:#74531f;">Handle1</span>(Context <span style="color:#1f377f;">context</span>, In1 <span style="color:#1f377f;">in1</span>);
<span style="color:blue;">public</span> <span style="color:blue;">abstract</span> StatePair<State, Out2> <span style="color:#74531f;">Handle2</span>(Context <span style="color:#1f377f;">context</span>, In2 <span style="color:#1f377f;">in2</span>);</pre>
</p>
<p>
Notice that, as is the case with <a href="/2021/07/19/the-state-functor">the State functor</a>, the <em>type</em> declares the type with <code>TState</code> before <code>T</code>, while the <em>constructor</em> takes <code>T</code> before <code>TState</code>. While odd and potentially confusing, I've done this to stay consistent with my previous articles, which again do this to stay consistent with prior art (mainly <a href="https://www.haskell.org/">Haskell</a>).
</p>
<p>
With <code>StatePair</code> you can make the methods pure.
</p>
<h3 id="cacad33b630d4529b4f703337b9b0a61">
Pure functions <a href="#cacad33b630d4529b4f703337b9b0a61" title="permalink">#</a>
</h3>
<p>
Since <code>Handle</code> methods can now return a new state instead of mutating objects, they can be pure functions. Here's an example:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">override</span> StatePair<State, Out2> <span style="color:#74531f;">Handle2</span>(Context <span style="color:#1f377f;">context</span>, In2 <span style="color:#1f377f;">in2</span>)
{
<span style="color:#8f08c4;">if</span> (in2 == In2.Epsilon)
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> StatePair<State, Out2>(Out2.Eta, <span style="color:blue;">new</span> ConcreteStateB());
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> StatePair<State, Out2>(Out2.Eta, <span style="color:blue;">this</span>);
}</pre>
</p>
<p>
The same is true for <code>Context</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> StatePair<Context, Out1> <span style="color:#74531f;">Request1</span>(In1 <span style="color:#1f377f;">in1</span>)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">pair</span> = State.Handle1(<span style="color:blue;">this</span>, in1);
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> StatePair<Context, Out1>(pair.Value, <span style="color:blue;">new</span> Context(pair.State));
}
<span style="color:blue;">public</span> StatePair<Context, Out2> <span style="color:#74531f;">Request2</span>(In2 <span style="color:#1f377f;">in2</span>)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">pair</span> = State.Handle2(<span style="color:blue;">this</span>, in2);
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> StatePair<Context, Out2>(pair.Value, <span style="color:blue;">new</span> Context(pair.State));
}</pre>
</p>
<p>
Does this begin to look familiar?
</p>
<h3 id="bf25630a0d324f628f400c73ddaa78fa">
Monad <a href="#bf25630a0d324f628f400c73ddaa78fa" title="permalink">#</a>
</h3>
<p>
The <code>StatePair</code> class is nothing but a glorified tuple. Armed with that knowledge, you can introduce a variation of <a href="/2021/07/19/the-state-functor">the IState interface I used to introduce the State functor</a>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IState</span><<span style="color:#2b91af;">TState</span>, <span style="color:#2b91af;">T</span>>
{
StatePair<TState, T> <span style="color:#74531f;">Run</span>(TState <span style="color:#1f377f;">state</span>);
}</pre>
</p>
<p>
This variation uses the explicit <code>StatePair</code> class as the return type of <code>Run</code>, rather than a more anonymous tuple. These representations are isomorphic. (That might be a good exercise: Write functions that convert from one to the other, and vice versa.)
</p>
<p>
You can write the usual <code>Select</code> and <code>SelectMany</code> implementations to make <code>IState</code> a functor and monad. Since I have already shown these in previous articles, I'm also going to skip those. (Again, it might be a good exercise to implement them if you're in doubt of how they work.)
</p>
<p>
You can now, for example, use C# query syntax to run the same computation multiple times:
</p>
<p>
<pre>IState<Context, (Out1 a, Out1 b)> <span style="color:#1f377f;">s</span> =
<span style="color:blue;">from</span> a <span style="color:blue;">in</span> in1.Request1()
<span style="color:blue;">from</span> b <span style="color:blue;">in</span> in1.Request1()
<span style="color:blue;">select</span> (a, b);
StatePair<Context, (Out1 a, Out1 b)> <span style="color:#1f377f;">t</span> = s.Run(ctx);</pre>
</p>
<p>
This example calls <code>Request1</code> twice, and collects both return values in a tuple. Running the computation with a <code>Context</code> will produce both a result (the two outputs <code>a</code> and <code>b</code>) as well as the 'current' <code>Context</code> (state).
</p>
<p>
<code>Request1</code> is a State-valued extension method on <code>In1</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IState<Context, Out1> <span style="color:#74531f;">Request1</span>(<span style="color:blue;">this</span> In1 <span style="color:#1f377f;">in1</span>)
{
<span style="color:#8f08c4;">return</span>
<span style="color:blue;">from</span> ctx <span style="color:blue;">in</span> Get<Context>()
<span style="color:blue;">let</span> p = ctx.Request1(in1)
<span style="color:blue;">from</span> _ <span style="color:blue;">in</span> Put(p.State)
<span style="color:blue;">select</span> p.Value;
}</pre>
</p>
<p>
Notice the abstraction level in play. This extension method doesn't return a <code>StatePair</code>, but rather an <code>IState</code> computation, defined by using <a href="/2022/07/04/get-and-put-state">the State monad's Get and Put functions</a>. Since the computation is running with a <code>Context</code> state, the computation can <code>Get</code> a <code>ctx</code> object and call its <code>Request1</code> method. This method returns a pair <code>p</code>. The computation can then <code>Put</code> the pair's <code>State</code> (here, a <code>Context</code> object) and return the pair's <code>Value</code>.
</p>
<p>
This stateful computation is composed from the building blocks of the State monad, including query syntax supported by <code>SelectMany</code>, <code>Get</code>, and <code>Put</code>.
</p>
<p>
This does, however, still feel unsatisfactory. After all, you have to know enough of the details of the State monad to know that <code>ctx.Request1</code> returns a pair of which you must remember to <code>Put</code> the <code>State</code>. Would it be possible to also express the underlying <code>Handle</code> methods as stateful computations?
</p>
<h3 id="f8c4147c58924cd19f1d0bdae8fc05a5">
StatePair bifunctor <a href="#f8c4147c58924cd19f1d0bdae8fc05a5" title="permalink">#</a>
</h3>
<p>
The <code>StatePair</code> class is isomorphic to a <em>pair</em> (a <em>two-tuple</em>), and we know that <a href="/2018/12/31/tuple-bifunctor">a pair gives rise to a bifunctor</a>:
</p>
<p>
<pre><span style="color:blue;">public</span> StatePair<TState1, T1> <span style="color:#74531f;">SelectBoth</span><<span style="color:#2b91af;">TState1</span>, <span style="color:#2b91af;">T1</span>>(
Func<T, T1> <span style="color:#1f377f;">selectValue</span>,
Func<TState, TState1> <span style="color:#1f377f;">selectState</span>)
{
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> StatePair<TState1, T1>(
selectValue(Value),
selectState(State));
}</pre>
</p>
<p>
You can use <code>SelectBoth</code> to implement both <code>Select</code> and <code>SelectState</code>. In the following we're only going to need <code>SelectState</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> StatePair<TState1, T> <span style="color:#74531f;">SelectState</span><<span style="color:#2b91af;">TState1</span>>(Func<TState, TState1> <span style="color:#1f377f;">selectState</span>)
{
<span style="color:#8f08c4;">return</span> SelectBoth(<span style="color:#1f377f;">x</span> => x, selectState);
}</pre>
</p>
<p>
This enables us to slightly simplify the <code>Context</code> methods:
</p>
<p>
<pre><span style="color:blue;">public</span> StatePair<Context, Out1> <span style="color:#74531f;">Request1</span>(In1 <span style="color:#1f377f;">in1</span>)
{
<span style="color:#8f08c4;">return</span> State.Handle1(<span style="color:blue;">this</span>, in1).SelectState(<span style="color:#1f377f;">s</span> => <span style="color:blue;">new</span> Context(s));
}
<span style="color:blue;">public</span> StatePair<Context, Out2> <span style="color:#74531f;">Request2</span>(In2 <span style="color:#1f377f;">in2</span>)
{
<span style="color:#8f08c4;">return</span> State.Handle2(<span style="color:blue;">this</span>, in2).SelectState(<span style="color:#1f377f;">s</span> => <span style="color:blue;">new</span> Context(s));
}</pre>
</p>
<p>
Keep in mind that <code>Handle1</code> returns a <code>StatePair<State, Out1></code>, <code>Handle2</code> returns <code>StatePair<State, Out2></code>, and so on. While <code>Request1</code> calls <code>Handle1</code>, it must return a <code>StatePair<Context, Out1></code> rather than a <code>StatePair<State, Out1></code>. Since <code>StatePair</code> is a bifunctor, the <code>Request1</code> method can use <code>SelectState</code> to map the <code>State</code> to a <code>Context</code>.
</p>
<p>
Unfortunately, this doesn't seem to move us much closer to being able to express the underlying functions as stateful computations. It does, however, set up the code so that the next change is a little easier to follow.
</p>
<h3 id="21f08903569d4c1a898914092f2bf3f9">
State computations <a href="#21f08903569d4c1a898914092f2bf3f9" title="permalink">#</a>
</h3>
<p>
Is it possible to express the <code>Handle</code> methods on <code>State</code> as <code>IState</code> computations? One option is to write another extension method:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IState<State, Out1> <span style="color:#74531f;">Request1S</span>(<span style="color:blue;">this</span> In1 <span style="color:#1f377f;">in1</span>)
{
<span style="color:#8f08c4;">return</span>
<span style="color:blue;">from</span> s <span style="color:blue;">in</span> Get<State>()
<span style="color:blue;">let</span> ctx = <span style="color:blue;">new</span> Context(s)
<span style="color:blue;">let</span> p = s.Handle1(ctx, in1)
<span style="color:blue;">from</span> _ <span style="color:blue;">in</span> Put(p.State)
<span style="color:blue;">select</span> p.Value;
}</pre>
</p>
<p>
I had to add an <code>S</code> suffix to the name, since it only differs from the above <code>Request1</code> extension method on its return type, and C# doesn't allow method overloading on return types.
</p>
<p>
You can add a similar <code>Request2S</code> extension method. It feels like boilerplate code, but enables us to express the <code>Context</code> methods in terms of running stateful computations:
</p>
<p>
<pre><span style="color:blue;">public</span> StatePair<Context, Out1> <span style="color:#74531f;">Request1</span>(In1 <span style="color:#1f377f;">in1</span>)
{
<span style="color:#8f08c4;">return</span> in1.Request1S().Run(State).SelectState(<span style="color:#1f377f;">s</span> => <span style="color:blue;">new</span> Context(s));
}
<span style="color:blue;">public</span> StatePair<Context, Out2> <span style="color:#74531f;">Request2</span>(In2 <span style="color:#1f377f;">in2</span>)
{
<span style="color:#8f08c4;">return</span> in2.Request2S().Run(State).SelectState(<span style="color:#1f377f;">s</span> => <span style="color:blue;">new</span> Context(s));
}</pre>
</p>
<p>
This still isn't entirely satisfactory, since the return types of these <code>Request</code> methods are state pairs, and not <code>IState</code> values. The above <code>Request1S</code> function, however, contains a clue about how to proceed. Notice how it can create a <code>Context</code> object from the underlying <code>State</code>, and convert that <code>Context</code> object back to a <code>State</code> object. That's a generalizable idea.
</p>
<h3 id="6ec1667fe3354e8bb9b5f7b3540b74f9">
Invariant functor <a href="#6ec1667fe3354e8bb9b5f7b3540b74f9" title="permalink">#</a>
</h3>
<p>
While it's possible to map the <code>TState</code> dimension of the state pair, it seems harder to do it on <code><span style="color:#2b91af;">IState</span><<span style="color:#2b91af;">TState</span>, <span style="color:#2b91af;">T</span>></code>. A tuple, after all, is covariant in both dimensions. The State monad, on the other hand, is neither co- nor contravariant in the state dimension. You can deduce this with positional variance analysis (which I've learned from <a href="https://thinkingwithtypes.com/">Thinking with Types</a>). In short, this is because <code>TState</code> appears as both input and output in <code>StatePair<TState, T> <span style="color:#74531f;">Run</span>(TState <span style="color:#1f377f;">state</span>)</code> - it's neither co- nor contravariant, but rather <em>invariant</em>.
</p>
<p>
What little option is left us, then, is to make <code>IState</code> an <a href="/2022/08/01/invariant-functors">invariant functor</a> in the state dimension:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IState<TState1, T> <span style="color:#74531f;">SelectState</span><<span style="color:#2b91af;">TState</span>, <span style="color:#2b91af;">TState1</span>, <span style="color:#2b91af;">T</span>>(
<span style="color:blue;">this</span> IState<TState, T> <span style="color:#1f377f;">state</span>,
Func<TState, TState1> <span style="color:#1f377f;">forward</span>,
Func<TState1, TState> <span style="color:#1f377f;">back</span>)
{
<span style="color:#8f08c4;">return</span>
<span style="color:blue;">from</span> s1 <span style="color:blue;">in</span> Get<TState1>()
<span style="color:blue;">let</span> s = back(s1)
<span style="color:blue;">let</span> p = state.Run(s)
<span style="color:blue;">from</span> _ <span style="color:blue;">in</span> Put(forward(p.State))
<span style="color:blue;">select</span> p.Value;
}</pre>
</p>
<p>
Given an <code>IState<TState, T></code> the <code>SelectState</code> function enables us to turn it into a <code>IState<TState1, T></code>. This is, however, only possible if you can translate both <code>forward</code> and <code>back</code> between two representations. When we have two such translations, we can produce a new computation that runs in <code>TState1</code> by first using <code>Get</code> to retrieve a <code>TState1</code> value from the new environment, translate it <code>back</code> to <code>TState</code>, which enables the expression to <code>Run</code> the <code>state</code>. Then translate the resulting <code>p.State</code> <code>forward</code> and <code>Put</code> it. Finally, return the <code>Value</code>.
</p>
<p>
As <a href="https://reasonablypolymorphic.com/">Sandy Maguire</a> explains:
</p>
<blockquote>
<p>
"... an invariant type <code>T</code> allows you to map from <code>a</code> to <code>b</code> if and only if <code>a</code> and <code>b</code> are isomorphic. [...] an isomorphism between <code>a</code> and <code>b</code> means they're already the same thing to begin with."
</p>
<footer><cite>Sandy Maguire, <a href="https://thinkingwithtypes.com/">Thinking with Types</a></cite></footer>
</blockquote>
<p>
This may seem limiting, but is enough in this case. The <code>Context</code> class is only a wrapper of a <code>State</code> object:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">Context</span>(State <span style="color:#1f377f;">state</span>)
{
State = state;
}
<span style="color:blue;">public</span> State State { <span style="color:blue;">get</span>; }</pre>
</p>
<p>
If you have a <code>State</code> object, you can create a <code>Context</code> object via the <code>Context</code> constructor. On the other hand, if you have a <code>Context</code> object, you can get the wrapped <code>State</code> object by reading the <code>State</code> property.
</p>
<p>
The first improvement this offers is simplification of the <code>Request1</code> extension method:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IState<Context, Out1> <span style="color:#74531f;">Request1</span>(<span style="color:blue;">this</span> In1 <span style="color:#1f377f;">in1</span>)
{
<span style="color:#8f08c4;">return</span> in1.Request1S().SelectState(<span style="color:#1f377f;">s</span> => <span style="color:blue;">new</span> Context(s), <span style="color:#1f377f;">ctx</span> => ctx.State);
}</pre>
</p>
<p>
Recall that <code>Request1S</code> returns a <code>IState<State, Out1></code>. Since a two-way translation between <code>State</code> and <code>Context</code> exists, <code>SelectState</code> can translate <code>IState<State, Out1></code> to <code>IState<Context, Out1></code>.
</p>
<p>
The same applies to the equivalent <code>Request2</code> extension method.
</p>
<p>
This, again, enables us to rewrite the <code>Context</code> methods:
</p>
<p>
<pre><span style="color:blue;">public</span> StatePair<Context, Out1> <span style="color:#74531f;">Request1</span>(In1 <span style="color:#1f377f;">in1</span>)
{
<span style="color:#8f08c4;">return</span> in1.Request1().Run(<span style="color:blue;">this</span>);
}
<span style="color:blue;">public</span> StatePair<Context, Out2> <span style="color:#74531f;">Request2</span>(In2 <span style="color:#1f377f;">in2</span>)
{
<span style="color:#8f08c4;">return</span> in2.Request2().Run(<span style="color:blue;">this</span>);
}</pre>
</p>
<p>
While this may seem like an insignificant change, one result has been gained: This last refactoring pushed the <code>Run</code> call to the right. It's now clear that each expression is a stateful computation, and that the only role that the <code>Request</code> methods play is to <code>Run</code> the computations.
</p>
<p>
This illustrates that the <code>Request</code> methods can be decomposed into two decoupled steps:
<ol>
<li>A stateful computation expression</li>
<li>Running the expression</li>
</ol>
The question now becomes: How useful is the <code>Context</code> wrapper class now?
</p>
<h3 id="4fc39c9d7bd647cb9773344082468051">
Eliminating the Context <a href="#4fc39c9d7bd647cb9773344082468051" title="permalink">#</a>
</h3>
<p>
A reasonable next refactoring might be to remove the <code>context</code> parameter from each of the <code>Handle</code> methods. After all, this parameter is a remnant of the State design pattern. Its original purpose was to enable <code>State</code> implementers to mutate the <code>context</code> by changing its <code>State</code>.
</p>
<p>
After refactoring to immutable functions, the <code>context</code> parameter no longer needs to be there - for that reason. Do we need it for other reasons? Does it carry other information that a <code>State</code> implementer might need?
</p>
<p>
In the form that the code now has, it doesn't. Even if it did, we could consider moving that data to the other input parameter: <code>In1</code>, <code>In2</code>, etcetera.
</p>
<p>
Therefore, it seems sensible to remove the <code>context</code> parameter from the <code>State</code> methods:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">abstract</span> StatePair<State, Out1> <span style="color:#74531f;">Handle1</span>(In1 <span style="color:#1f377f;">in1</span>);
<span style="color:blue;">public</span> <span style="color:blue;">abstract</span> StatePair<State, Out2> <span style="color:#74531f;">Handle2</span>(In2 <span style="color:#1f377f;">in2</span>);</pre>
</p>
<p>
This also means that a function like <code>Request1S</code> becomes simpler:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IState<State, Out1> <span style="color:#74531f;">Request1S</span>(<span style="color:blue;">this</span> In1 <span style="color:#1f377f;">in1</span>)
{
<span style="color:#8f08c4;">return</span>
<span style="color:blue;">from</span> s <span style="color:blue;">in</span> Get<State>()
<span style="color:blue;">let</span> p = s.Handle1(in1)
<span style="color:blue;">from</span> _ <span style="color:blue;">in</span> Put(p.State)
<span style="color:blue;">select</span> p.Value;
}</pre>
</p>
<p>
Since <code>Context</code> and <code>State</code> are isomorphic, you can rewrite all callers of <code>Context</code> to instead use <code>State</code>, like the above example:
</p>
<p>
<pre>IState<State, (Out1 a, Out1 b)> <span style="color:#1f377f;">s</span> =
<span style="color:blue;">from</span> a <span style="color:blue;">in</span> in1.Request1()
<span style="color:blue;">from</span> b <span style="color:blue;">in</span> in1.Request1()
<span style="color:blue;">select</span> (a, b);
<span style="color:blue;">var</span> <span style="color:#1f377f;">t</span> = s.Run(csa);</pre>
</p>
<p>
Do this consistently, and you can eventually delete the <code>Context</code> class.
</p>
<h3 id="c92292ad071c4b468fa223d764d89ab8">
Further possible refactorings <a href="#c92292ad071c4b468fa223d764d89ab8" title="permalink">#</a>
</h3>
<p>
With the <code>Context</code> class gone, you're left with the abstract <code>State</code> class and its implementers:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">abstract</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">State</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">abstract</span> StatePair<State, Out1> <span style="color:#74531f;">Handle1</span>(In1 <span style="color:#1f377f;">in1</span>);
<span style="color:blue;">public</span> <span style="color:blue;">abstract</span> StatePair<State, Out2> <span style="color:#74531f;">Handle2</span>(In2 <span style="color:#1f377f;">in2</span>);
}</pre>
</p>
<p>
One further change worth considering might be to change the abstract base class to an interface.
</p>
<p>
In this article, I've considered the general case where the <code>State</code> class supports an arbitrary number of independent state transitions, symbolised by the methods <code>Handle1</code> and <code>Handle2</code>. With an arbitrary number of such state transitions, you would have additional methods up to <code>HandleN</code> for <em>N</em> independent state transitions.
</p>
<p>
At the other extreme, you may have just a single polymorphic state transition function. My intuition tells me that that's more likely to be the case than one would think at first.
</p>
<h3 id="c53b386a92b04575b675272d2dc85365">
Relationship between pattern and monad <a href="#c53b386a92b04575b675272d2dc85365" title="permalink">#</a>
</h3>
<p>
You can view the State design pattern as a combination of two common practices in object-oriented programming: Mutation and polymorphism.
</p>
<p>
<img src="/content/binary/state-pattern-venn.png" alt="Venn diagram with the two sets mutation and polymorphism. The intersection is labelled state.">
</p>
<p>
The patterns in <em>Design Patterns</em> rely heavily on mutation of object state. Most other 'good' object-oriented code tends to do likewise.
</p>
<p>
Proper object-oriented code also makes good use of polymorphism. Again, refer to <em>Design Patterns</em> or a book like <a href="https://blog.ploeh.dk/ref/refactoring">Refactoring</a> for copious examples.
</p>
<p>
I view the State pattern as the intersection of these two common practices. The problem to solve is this:
</p>
<blockquote>
<p>
"Allow an object to alter its behavior when its internal state changes."
</p>
<footer><cite><a href="/ref/dp">Design Patterns</a></cite></footer>
</blockquote>
<p>
The State pattern achieves that goal by having an inner polymorphic object (<code>State</code>) wrapped by an container object (<code>Context</code>). The <code>State</code> objects can mutate the <code>Context</code>, which enables them to replace themselves with other states.
</p>
<p>
While functional programming also has notions of polymorphism, a pure function can't mutate state. Instead, a pure function must return a new state, leaving the old state unmodified. If there's nothing else to return, you can model such state-changing behaviour as an <a href="https://en.wikipedia.org/wiki/Endomorphism">endomorphism</a>. The article <a href="/2021/05/31/from-state-tennis-to-endomorphism">From State tennis to endomorphism</a> gives a quite literal example of that.
</p>
<p>
Sometimes, however, an object-oriented method does more than one thing: It both mutates state and returns a value. (This, by the way, violates <a href="https://en.wikipedia.org/wiki/Command%E2%80%93query_separation">the Command Query Separation principle</a>.) The State monad is the functional way of doing that: Return both the result and the new state.
</p>
<p>
Essentially, you replace mutation with the State monad.
</p>
<p>
<img src="/content/binary/state-monad-venn.png" alt="Venn diagram with the two sets state monad and polymorphism. The intersection is labelled state.">
</p>
<p>
From a functional perspective, then, we can view the State pattern as the intersection of polymorphism and the State monad.
</p>
<h3 id="d31601e4146d4f48ae30228f697de0a0">
Examples <a href="#d31601e4146d4f48ae30228f697de0a0" title="permalink">#</a>
</h3>
<p>
This article is both long and abstract. Some examples might be helpful, so I'll give a few in separate articles:
<ul>
<li><a href="/2022/09/26/refactoring-the-tcp-state-pattern-example-to-pure-functions">Refactoring the TCP State pattern example to pure functions</a></li>
<li><a href="/2022/10/10/refactoring-a-saga-from-the-state-pattern-to-the-state-monad">Refactoring a saga from the State pattern to the State monad</a></li>
</ul>
The first one uses the example from <em>Design Patterns</em>. That example is, unfortunately not that illuminating, since none of the book's methods return data. Thus, the endomorphism-based refactoring is enough, and you don't need the State monad. Therefore, another example is warranted.
</p>
<h3 id="1b5dead7f5c84f3896d639837314f17d">
Conclusion <a href="#1b5dead7f5c84f3896d639837314f17d" title="permalink">#</a>
</h3>
<p>
You can view the State design pattern as the intersection of polymorphism and mutation. Both are object-oriented staples. The pattern uses polymorphism to model state, and mutation to change from one polymorphic state to another.
</p>
<p>
In functional programming pure functions can't mutate state. You can often design around that problem, but if all else fails, the State monad offers a general-purpose alternative to both return a value and change object state. Thus, you can view the functional equivalent of the State pattern as the intersection of polymorphism and the State monad.
</p>
<p>
<strong>Next:</strong> <a href="/2022/09/26/refactoring-the-tcp-state-pattern-example-to-pure-functions">Refactoring the TCP State pattern example to pure functions</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Natural transformations as invariant functorshttps://blog.ploeh.dk/2022/08/29/natural-transformations-as-invariant-functors2022-08-29T06:12:00+00:00Mark Seemann
<div id="post">
<p>
<em>An article (also) for object-oriented programmers.</em>
</p>
<ins datetime="2022-09-04T06:40Z">
<p>
<strong>Update 2022-09-04:</strong> <em>This article is most likely partially incorrect. What it describes works, but may not be a natural transformation. See <a href="#9336107b26fd48a8971e617d8ebd5159">the below comment</a> for more details.</em>
</p>
</ins>
<p>
This article is part of <a href="/2022/08/01/invariant-functors">a series of articles about invariant functors</a>. An invariant functor is a <a href="/2018/03/22/functors">functor</a> that is neither covariant nor contravariant. See the series introduction for more details. The <a href="/2022/08/08/endomorphism-as-an-invariant-functor">previous article</a> described how you can view an <a href="https://en.wikipedia.org/wiki/Endomorphism">endomorphism</a> as an invariant functor. This article generalises that result.
</p>
<h3 id="f3a2cab08b004fe2bfa8e4b307317d88">
Endomorphism as a natural transformation <a href="#f3a2cab08b004fe2bfa8e4b307317d88" title="permalink">#</a>
</h3>
<p>
An endomorphism is a function whose <a href="https://en.wikipedia.org/wiki/Domain_of_a_function">domain</a> and <a href="https://en.wikipedia.org/wiki/Codomain">codomain</a> is the same. In C# you'd denote the type as <code>Func<T, T></code>, in <a href="https://fsharp.org/">F#</a> as <code>'a -> 'a</code>, and in <a href="https://www.haskell.org/">Haskell</a> as <code>a -> a</code>. <code>T</code>, <code>'a</code>, and <code>a</code> all symbolise generic types - the notation is just different, depending on the language.
</p>
<p>
A 'naked' value is isomorphic to <a href="/2018/09/03/the-identity-functor">the Identity functor</a>. You can wrap a value of the type <code>a</code> in <code>Identity a</code>, and if you have an <code>Identity a</code>, you can extract the <code>a</code> value.
</p>
<p>
An endomorphism is thus isomorphic to a function from Identity to Identity. In C#, you might denote that as <code>Func<Identity<T>, Identity<T>></code>, and in Haskell as <code>Identity a -> Identity a</code>.
</p>
<p>
In fact, you can lift any function to an Identity-valued function:
</p>
<p>
<pre>Prelude Data.Functor.Identity> :t \f -> Identity . f . runIdentity
\f -> Identity . f . runIdentity
:: (b -> a) -> Identity b -> Identity a</pre>
</p>
<p>
While this is a general result that allows <code>a</code> and <code>b</code> to differ, when <code>a ~ b</code> this describes an endomorphism.
</p>
<p>
Since Identity is a functor, a function <code>Identity a -> Identity a</code> is a <a href="/2022/07/18/natural-transformations">natural transformation</a>.
</p>
<p>
The identity function (<code>id</code> in F# and Haskell; <code>x => x</code> in C#) is the only one possible entirely general endomorphism. You can use the <a href="https://hackage.haskell.org/package/natural-transformation">natural-transformation</a> package to make it explicit that this is a natural transformation:
</p>
<p>
<pre><span style="color:#2b91af;">idNT</span> :: <span style="color:blue;">Identity</span> :~> <span style="color:blue;">Identity</span>
idNT = NT $ Identity . <span style="color:blue;">id</span> . runIdentity</pre>
</p>
<p>
The point, so far, is that you can view an endomorphism as a natural transformation.
</p>
<p>
Since an endomorphism forms an invariant functor, this suggests a promising line of inquiry.
</p>
<h3 id="88842564359c40978adb687e27cc460d">
Natural transformations as invariant functors <a href="#88842564359c40978adb687e27cc460d" title="permalink">#</a>
</h3>
<p>
Are <em>all</em> natural transformations invariant functors?
</p>
<p>
Yes, they are. In Haskell, you can implement it like this:
</p>
<p>
<pre><span style="color:blue;">instance</span> (<span style="color:blue;">Functor</span> f, <span style="color:blue;">Functor</span> g) <span style="color:blue;">=></span> <span style="color:blue;">Invariant</span> (<span style="color:blue;">NT</span> f g) <span style="color:blue;">where</span>
invmap f g (NT h) = NT $ <span style="color:blue;">fmap</span> f . h . <span style="color:blue;">fmap</span> g</pre>
</p>
<p>
Here, I chose to define <code>NT</code> from scratch, rather than relying on the <em>natural-transformation</em> package.
</p>
<p>
<pre><span style="color:blue;">newtype</span> NT f g a = NT { unNT :: f a -> g a }</pre>
</p>
<p>
Notice how the implementation (<code><span style="color:blue;">fmap</span> f . h . <span style="color:blue;">fmap</span> g</code>) looks like a generalisation of the endomorphism implementation of <code>invmap</code> (<code>f . h . g</code>). Instead of pre-composing with <code>g</code>, the generalisation pre-composes with <code>fmap g</code>, and instead of post-composing with <code>f</code>, it post-composes with <code>fmap f</code>.
</p>
<p>
Using the same kind of diagram as in the previous article, this composition now looks like this:
</p>
<p>
<img src="/content/binary/invariant-natural-transformation-map-diagram.png" alt="Arrow diagram showing the mapping from a natural transformation in a to a natural transformation in b.">
</p>
<p>
I've used thicker arrows to indicate that each one potentially involves 'more work'. Each is a mapping from a functor to a functor. For the <a href="/2022/04/19/the-list-monad">List functor</a>, for example, the arrow implies zero to many values being mapped. Thus, 'more data' moves 'through' each arrow, and for that reason I thought it made sense to depict them as being thicker. This 'more data' view is not always correct. For example, for <a href="/2018/03/26/the-maybe-functor">the Maybe functor</a>, the amount of data transported though each arrow is zero or one, which rather suggests a thinner arrow. For something like <a href="/2021/07/19/the-state-functor">the State functor</a> or <a href="/2021/08/30/the-reader-functor">the Reader functor</a>, there's really no <em>data</em> in the strictest sense moving through the arrows, but rather functions (which are also, however, a kind of data). Thus, don't take this metaphor of the thicker arrows literally. I did, however, wish to highlight that there's something 'more' going on.
</p>
<p>
The diagram shows a natural transformation <code>h</code> from some functor <code>F</code> to another functor <code>G</code>. It transports objects of the type <code>a</code>. If <code>a</code> and <code>b</code> are isomorphic, you can map that natural transformation to one that transports objects of the type <code>b</code>.
</p>
<p>
Compared to endomorphisms, where you need to, say, map <code>b</code> to <code>a</code>, you now need to map <code>F b</code> to <code>F a</code>. If <code>g</code> maps <code>b</code> to <code>a</code>, then <code>fmap g</code> maps <code>F b</code> to <code>F a</code>. The same line of argument applies to <code>fmap f</code>.
</p>
<p>
In C# you can implement the same behaviour as follows. Assume that you have a natural transformation <code>H</code> from the functor <code>F</code> to the functor <code>G</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> Func<F<A>, G<A>> H { <span style="color:blue;">get</span>; }</pre>
</p>
<p>
You can now implement a non-standard <code>Select</code> overload (as described in the introductory article) that maps a natural transformation <code>FToG<A></code> to a natural transformation <code>FToG<B></code>:
</p>
<p>
<pre><span style="color:blue;">public</span> FToG<B> <span style="color:#74531f;">Select</span><<span style="color:#2b91af;">B</span>>(Func<A, B> <span style="color:#1f377f;">aToB</span>, Func<B, A> <span style="color:#1f377f;">bToA</span>)
{
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> FToG<B>(<span style="color:#1f377f;">fb</span> => H(fb.Select(bToA)).Select(aToB));
}</pre>
</p>
<p>
The implementation looks more imperative than in Haskell, but the idea is the same. First it uses <code>Select</code> on <code>F</code> in order to translate <code>fb</code> (of the type <code>F<B></code>) to an <code>F<A></code>. It then uses <code>H</code> to transform the <code>F<A></code> to an <code>G<A></code>. Finally, now that it has a <code>G<A></code>, it can use <code>Select</code> on <em>that</em> functor to map to a <code>G<B></code>.
</p>
<p>
Note that there's two different functors (<code>F</code> and <code>G</code>) in play, so the two <code>Select</code> methods are different. This is also true in the Haskell code. <code>fmap g</code> need not be the same as <code>fmap f</code>.
</p>
<h3 id="70c39c1999864a82a73a4ce22f252110">
Identity law <a href="#70c39c1999864a82a73a4ce22f252110" title="permalink">#</a>
</h3>
<p>
As in the previous article, I'll set out to <em>prove</em> the two laws for invariant functors, starting with the identity law. Again, I'll use equational reasoning with <a href="https://bartoszmilewski.com/2015/01/20/functors/">the notation that Bartosz Milewski uses</a>. Here's the proof that the <code>invmap</code> instance obeys the identity law:
</p>
<p>
<pre> invmap id id (NT h)
= { definition of invmap }
NT $ fmap id . h . fmap id
= { first functor law }
NT $ id . h . id
= { eta expansion }
NT $ (\x -> (id . h . id) x)
= { definition of (.) }
NT $ (\x -> id(h(id(x))))
= { defintion of id }
NT $ (\x -> h(x))
= { eta reduction }
NT h
= { definition of id }
id (NT h)</pre>
</p>
<p>
I'll leave it here without further comment. The Haskell type system is so expressive and abstract that it makes little sense to try to translate these findings to C# or F# in the abstract. Instead, you'll see some more concrete examples later.
</p>
<h3 id="d54ff79951db44559583a200d20cf6d9">
Composition law <a href="#d54ff79951db44559583a200d20cf6d9" title="permalink">#</a>
</h3>
<p>
As with the identity law, I'll offer a proof for the composition law for the Haskell instance:
</p>
<p>
<pre> invmap f2 f2' $ invmap f1 f1' (NT h)
= { definition of invmap }
invmap f2 f2' $ NT $ fmap f1 . h . fmap f1'
= { defintion of ($) }
invmap f2 f2' (NT (fmap f1 . h . fmap f1'))
= { definition of invmap }
NT $ fmap f2 . (fmap f1 . h . fmap f1') . fmap f2'
= { associativity of composition (.) }
NT $ (fmap f2 . fmap f1) . h . (fmap f1' . fmap f2')
= { second functor law }
NT $ fmap (f2 . f1) . h . fmap (f1' . f2')
= { definition of invmap }
invmap (f2 . f1) (f1' . f2') (NT h)</pre>
</p>
<p>
Unless I've made a mistake, these two proofs should demonstrate that all natural transformations can be turned into an invariant functor - in Haskell, at least, but I'll conjecture that that result carries over to other languages like F# and C# as long as one stays within the confines of <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a>.
</p>
<h3 id="2afc7c35819c4d499f531a7229102404">
The State functor as a natural transformation <a href="#2afc7c35819c4d499f531a7229102404" title="permalink">#</a>
</h3>
<p>
I'll be honest and admit that my motivation for embarking on this exegesis was because I'd come to the realisation that you can think about <a href="/2021/07/19/the-state-functor">the State functor</a> as a natural transformation. Recall that <code>State</code> is usually defined like this:
</p>
<p>
<pre><span style="color:blue;">newtype</span> State s a = State { runState :: s -> (a, s) }</pre>
</p>
<p>
You can easily establish that this definition of <code>State</code> is isomorphic with a natural transformation from <a href="/2018/09/03/the-identity-functor">the Identity functor</a> to <a href="/2018/12/31/tuple-bifunctor">the tuple functor</a>:
</p>
<p>
<pre><span style="color:#2b91af;">stateToNT</span> :: <span style="color:blue;">State</span> s a <span style="color:blue;">-></span> <span style="color:blue;">NT</span> <span style="color:blue;">Identity</span> ((,) a) s
stateToNT (State h) = NT $ h . runIdentity
<span style="color:#2b91af;">ntToState</span> :: <span style="color:blue;">NT</span> <span style="color:blue;">Identity</span> ((,) a) s <span style="color:blue;">-></span> <span style="color:blue;">State</span> s a
ntToState (NT h) = State $ h . Identity</pre>
</p>
<p>
Notice that this is a natural transformation in <code>s</code> - not in <code>a</code>.
</p>
<p>
Since I've already established that natural transformations form invariant functors, this also applies to the State monad.
</p>
<h3 id="6c0025936a9949eca6191989f4d1b8c6">
State mapping <a href="#6c0025936a9949eca6191989f4d1b8c6" title="permalink">#</a>
</h3>
<p>
My point with all of this isn't really to insist that anyone makes actual use of all this machinery, but rather that this line of reasoning helps to <em>identify a capability</em>. We now know that it's possible to translate a <code>State s a</code> value to a <code>State t a</code> value if <code>s</code> is isomorphic to <code>t</code>.
</p>
<p>
As an example, imagine that you have some State-valued function that attempts to find the maximum value based on various criteria. Such a <code>pickMax</code> function may have the type <code><span style="color:blue;">State</span> (<span style="color:blue;">Max</span> <span style="color:#2b91af;">Integer</span>) <span style="color:#2b91af;">String</span></code> where the state type (<code><span style="color:blue;">Max</span> <span style="color:#2b91af;">Integer</span></code>) is used to keep track of the maximum value found while examining candidates.
</p>
<p>
You could conceivably turn such a function around to instead look for the minimum by mapping the state to a <code>Min</code> value instead:
</p>
<p>
<pre><span style="color:#2b91af;">pickMin</span> :: <span style="color:blue;">State</span> (<span style="color:blue;">Min</span> <span style="color:#2b91af;">Integer</span>) <span style="color:#2b91af;">String</span>
pickMin = ntToState $ invmap (Min . getMax) (Max . getMin) $ stateToNT pickMax</pre>
</p>
<p>
You can use <code>getMax</code> to extract the underlying <code>Integer</code> from the <code><span style="color:blue;">Max</span> <span style="color:#2b91af;">Integer</span></code> and then <code>Min</code> to turn it into a <code><span style="color:blue;">Min</span> <span style="color:#2b91af;">Integer</span></code> value, and vice versa. <code><span style="color:blue;">Max</span> <span style="color:#2b91af;">Integer</span></code> and <code><span style="color:blue;">Min</span> <span style="color:#2b91af;">Integer</span></code> are isomorphic.
</p>
<p>
In C#, you can implement a similar method. The code shown here extends the code shown in <a href="/2021/07/19/the-state-functor">The State functor</a>. I chose to call the method <code>SelectState</code> so as to not make things too confusing. The State functor already comes with a <code>Select</code> method that maps <code>T</code> to <code>T1</code> - that's the 'normal', covariant functor implementation. The new method is the invariant functor implementation that maps the state <code>S</code> to <code>S1</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IState<S1, T> <span style="color:#74531f;">SelectState</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">S</span>, <span style="color:#2b91af;">S1</span>>(
<span style="color:blue;">this</span> IState<S, T> <span style="color:#1f377f;">state</span>,
Func<S, S1> <span style="color:#1f377f;">sToS1</span>,
Func<S1, S> <span style="color:#1f377f;">s1ToS</span>)
{
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> InvariantStateMapper<T, S, S1>(state, sToS1, s1ToS);
}
<span style="color:blue;">private</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">InvariantStateMapper</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">S</span>, <span style="color:#2b91af;">S1</span>> : IState<S1, T>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IState<S, T> state;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> Func<S, S1> sToS1;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> Func<S1, S> s1ToS;
<span style="color:blue;">public</span> <span style="color:#2b91af;">InvariantStateMapper</span>(
IState<S, T> <span style="color:#1f377f;">state</span>,
Func<S, S1> <span style="color:#1f377f;">sToS1</span>,
Func<S1, S> <span style="color:#1f377f;">s1ToS</span>)
{
<span style="color:blue;">this</span>.state = state;
<span style="color:blue;">this</span>.sToS1 = sToS1;
<span style="color:blue;">this</span>.s1ToS = s1ToS;
}
<span style="color:blue;">public</span> Tuple<T, S1> <span style="color:#74531f;">Run</span>(S1 <span style="color:#1f377f;">s1</span>)
{
<span style="color:#8f08c4;">return</span> state.Run(s1ToS(s1)).Select(sToS1);
}
}</pre>
</p>
<p>
As usual when working in C# with interfaces instead of higher-order functions, <a href="/2019/12/16/zone-of-ceremony">there's some ceremony to be expected</a>. The only interesting line of code is the <code>Run</code> implementation.
</p>
<p>
It starts by calling <code>s1ToS</code> in order to translate the <code>s1</code> parameter into an <code>S</code> value. This enables it to call <code>Run</code> on <code>state</code>. The result is a tuple with the type <code>Tuple<T, S></code>. It's necessary to translate the <code>S</code> to <code>S1</code> with <code>sToS1</code>. You could do that by extracting the value from the tuple, mapping it, and returning a new tuple. Since a tuple gives rise to a functor (<a href="/2018/12/31/tuple-bifunctor">two, actually</a>) I instead used the <code>Select</code> method I'd already defined on it.
</p>
<p>
Notice how similar the implementation is to the implementation of <a href="/2022/08/08/endomorphism-as-an-invariant-functor">the endomorphism invariant functor</a>. The only difference is that when translating back from <code>S</code> to <code>S1</code>, this happens inside a <code>Select</code> mapping. This is as predicted by the general implementation of invariant functors for natural transformations.
</p>
<p>
In a future article, you'll see an example of <code>SelectState</code> in action.
</p>
<h3 id="5a148244488b4391be257181108072b3">
Other natural transformations <a href="#5a148244488b4391be257181108072b3" title="permalink">#</a>
</h3>
<p>
As the <a href="/2022/07/18/natural-transformations">natural transformations</a> article outlines, there are infinitely many natural transformations. Each one gives rise to an invariant functor.
</p>
<p>
It might be a good exercise to try to implement a few of them as invariant functors. If you want to do it in C#, you could, for example, start with the <em>safe head</em> natural transformation.
</p>
<p>
If you want to stick to interfaces, you could define one like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">ISafeHead</span><<span style="color:#2b91af;">T</span>>
{
Maybe<T> <span style="color:#74531f;">TryFirst</span>(IEnumerable<T> <span style="color:#1f377f;">ts</span>);
}</pre>
</p>
<p>
The exercise is now to define and implement a method like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> ISafeHead<T1> <span style="color:#74531f;">Select</span><<span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">T1</span>>(
<span style="color:blue;">this</span> ISafeHead<T> <span style="color:#1f377f;">source</span>,
Func<T, T1> <span style="color:#1f377f;">tToT1</span>,
Func<T1, T> <span style="color:#1f377f;">t1ToT</span>)
{
<span style="color:green;">// Implementation goes here...</span>
}</pre>
</p>
<p>
The implementation, once you get the handle of it, is entirely automatable. After all, in Haskell it's possible to do it once and for all, as shown above.
</p>
<h3 id="2c86c2f77e0a404da80858d855898843">
Conclusion <a href="#2c86c2f77e0a404da80858d855898843" title="permalink">#</a>
</h3>
<p>
A natural transformation forms an invariant functor. This may not be the most exciting result ever, because invariant functors are limited in use. They only work when translating between types that are already isomorphic. Still, I did <a href="/2022/09/05/the-state-pattern-and-the-state-monad">find a use for this result</a> when I was working with the relationship between the State design pattern and the <a href="/2022/06/20/the-state-monad">State monad</a>.
</p>
<p>
<strong>Next:</strong> <a href="/2022/12/26/functors-as-invariant-functors">Functors as invariant functors</a>.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="9336107b26fd48a8971e617d8ebd5159">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#9336107b26fd48a8971e617d8ebd5159">#</a></div>
<div class="comment-content">
<p>
Due to feedback that I've received, I have to face evidence that this article may be partially incorrect. While I've added that proviso at the top of the article, I've decided to use a comment to expand on the issue.
</p>
<p>
On Twitter, the user <a href="https://twitter.com/Savlambda">@Savlambda</a> (<em>borar</em>) argued that my <code>newtype</code> isn't a natural transformation:
</p>
<blockquote>
<p>
"The newtype 'NT' in the article is not a natural transformation though. Quantification over 'a' is at the "wrong place": it is not allowed for a client module to instantiate the container element type of a natural transformation."
</p>
<footer><cite><a href="https://twitter.com/Savlambda/status/1564175654845030400">@Savlambda</a></cite></footer>
</blockquote>
<p>
While I engaged with the tweet, I have to admit that it took me a while to understand the core of the criticism. Of course I'm not happy about being wrong, but initially I genuinely didn't understand what was the problem. On the other hand, it's not the first time @Savlambda has provided valuable insights, so I knew it'd behove me to pay attention.
</p>
<p>
After a few tweets back and forth, @Savlambda finally supplied a counter-argument that I understood.
</p>
<blockquote>
<p>
"This is not being overly pedantic. Here is one practical implication:"
</p>
<footer><cite><a href="https://twitter.com/Savlambda/status/1564970422981890048">@Savlambda</a></cite></footer>
</blockquote>
<p>
The practical implication shown in the tweet is a screen shot (in order to get around Twitter's character limitation), but I'll reproduce it as code here in order to <a href="https://meta.stackoverflow.com/q/285551/126014">not show images of code</a>.
</p>
<p>
<pre><span style="color:blue;">type</span> <span style="color:#2b91af;">(~>)</span> f g = forall a. f a -> g a
<span style="color:green;">-- Use the natural transformation twice, for different types
</span><span style="color:#2b91af;">convertLists</span> :: ([] ~> g) <span style="color:blue;">-></span> (g <span style="color:#2b91af;">Int</span>, g <span style="color:#2b91af;">Bool</span>)
convertLists nt = (nt [1,2], nt [True])
<span style="color:blue;">newtype</span> NT f g a = NT (f a -> g a)
<span style="color:green;">-- Does not type check, does not work; not a natural transformation
</span><span style="color:#2b91af;">convertLists2</span> :: <span style="color:blue;">NT</span> [] g a <span style="color:blue;">-></span> (g <span style="color:#2b91af;">Int</span>, g <span style="color:#2b91af;">Bool</span>)
convertLists2 (NT f) = (f [1,2], f [True])</pre>
</p>
<p>
I've moved the code comments to prevent horizontal scrolling, but otherwise tried to stay faithful to @Savlambda's screen shot.
</p>
<p>
This was the example that finally hit the nail on the head for me. A natural transformation is a mapping from one functor (<code>f</code>) to another functor (<code>g</code>). I knew that already, but hadn't realised the implications. In Haskell (and other languages with <a href="https://en.wikipedia.org/wiki/Parametric_polymorphism">parametric polymorphism</a>) a <code>Functor</code> is defined for all <code>a</code>.
</p>
<p>
A natural transformation is a higher level of abstraction, mapping one functor to another. That mapping must be defined for all <code>a</code>, and it must be <em>reusable</em>. The second example provided by @Savlambda demonstrates that the function wrapped by <code>NT</code> isn't reusable for different contained types.
</p>
<p>
If you try to compile that example, GHC emits this compiler error:
</p>
<p>
<pre>* Couldn't match type `a' with `Int'
`a' is a rigid type variable bound by
the type signature for:
convertLists2 :: forall (g :: * -> *) a.
NT [] g a -> (g Int, g Bool)
Expected type: g Int
Actual type: g a
* In the expression: f [1, 2]
In the expression: (f [1, 2], f [True])
In an equation for `convertLists2':
convertLists2 (NT f) = (f [1, 2], f [True])</pre>
</p>
<p>
Even though it's never fun to be proven wrong, I want to thank @Savlambda for educating me. One reason I write blog posts like this one is that writing is a way to learn. By writing about topics like these, I educate myself. Occasionally, it turns out that I make a mistake, and <a href="/2018/12/03/set-is-not-a-functor">this isn't the first time that's happened</a>. I also wish to apologise if this article has now left any readers more confused.
</p>
<p>
A remaining question is what practical implications this has? Only rarely do you need a programming construct like <code>convertLists2</code>. On the other hand, had I wanted a function with the type <code>NT [] g Int -> (g Int, g Int)</code>, it would have type-checked just fine.
</p>
<p>
I'm not claiming that this is generally useful either, but I actually wrote this article because I <em>did</em> have use for the result that <code>NT</code> (whatever it is) is an invariant functor. As far as I can tell, that result still holds.
</p>
<p>
I could be wrong about that, too. If you think so, please leave a comment.
</p>
</div>
<div class="comment-date">2022-09-04 7:53 UTC</div>
</div>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Can types replace validation?https://blog.ploeh.dk/2022/08/22/can-types-replace-validation2022-08-22T05:57:00+00:00Mark Seemann
<div id="post">
<p>
<em>With some examples in C#.</em>
</p>
<p>
In a comment to my article on <a href="/2022/08/15/aspnet-validation-revisited">ASP.NET validation revisited</a> Maurice Johnson asks:
</p>
<blockquote>
<p>
"I was just wondering, is it possible to use the type system to do the validation instead ?
</p>
<p>
"What I mean is, for example, to make all the ReservationDto's field a type with validation in the constructor (like a class name, a class email, and so on). Normally, when the framework will build ReservationDto, it will try to construct the fields using the type constructor, and if there is an explicit error thrown during the construction, the framework will send us back the error with the provided message.
</p>
<p>
"Plus, I think types like "email", "name" and "at" are reusable. And I feel like we have more possibilities for validation with that way of doing than with the validation attributes.
</p>
<p>
"What do you think ?"
</p>
<footer><cite><a href="/2022/08/15/aspnet-validation-revisited#f6deac18851d47c3b066f82a8be3847d">Maurice Johnson</a></cite></footer>
</blockquote>
<p>
I started writing a response below the question, but it grew and grew so I decided to turn it into a separate article. I think the question is of general interest.
</p>
<h3 id="2492a444e49b49a09a4f7c03f38b29d5">
The halting problem <a href="#2492a444e49b49a09a4f7c03f38b29d5" title="permalink">#</a>
</h3>
<p>
I'm all in favour of using the type system for encapsulation, but there are limits to what it can do. We know this because it follows from the <a href="https://en.wikipedia.org/wiki/Halting_problem">halting problem</a>.
</p>
<p>
I'm basing my understanding of the halting problem on <a href="https://www.goodreads.com/review/show/1731926050">my reading</a> of <a href="/ref/annotated-turing">The Annotated Turing</a>. In short, given an arbitrary computer program in a Turing-complete language, there's no general algorithm that will determine whether or not the program will finish running.
</p>
<p>
A compiler that performs type-checking is a program, but typical type systems aren't Turing-complete. It's possible to write type checkers that always finish, because the 'programming language' they are running on - the type system - isn't Turing-complete.
</p>
<p>
Normal type systems (like C#'s) aren't Turing-complete. You expect the C# compiler to always arrive at a result (either compiled code or error) in finite time. As a counter-example, consider <a href="https://www.haskell.org/">Haskell</a>'s type system. By default it, too, isn't Turing-complete, but with sufficient language extensions, you <em>can</em> make it Turing-complete. Here's a fun example: <a href="https://aphyr.com/posts/342-typing-the-technical-interview">Typing the technical interview</a> by Kyle Kingsbury (Aphyr). When you make the type system Turing-complete, however, termination is no longer guaranteed. A program may now compile forever or, practically, until it times out or runs out of memory. That's what happened to me when I tried to compile Kyle Kingsbury's code example.
</p>
<p>
How is this relevant?
</p>
<p>
This matters because understanding that a normal type system is <em>not</em> Turing-complete means that there are truths it <em>can't</em> express. Thus, we shouldn't be surprised if we run into rules or policies that we can't express with the type system we're given. What exactly is inexpressible depends on the type system. There are policies you can express in Haskell that are impossible to express in C#, and so on. Let's stick with C#, though. Here are some examples of rules that are practically inexpressible:
</p>
<ul>
<li>An integer must be positive.</li>
<li>A string must be at most 100 characters long.</li>
<li>A maximum value must be greater than a minimum value.</li>
<li>A value must be a valid email address.</li>
</ul>
<p>
<a href="https://www.hillelwayne.com/">Hillel Wayne</a> provides more compelling examples in the article <a href="https://buttondown.email/hillelwayne/archive/making-illegal-states-unrepresentable/">Making Illegal States Unrepresentable</a>.
</p>
<h3 id="34f9d2e5570842adbb79a20972f458a0">
Encapsulation <a href="#34f9d2e5570842adbb79a20972f458a0" title="permalink">#</a>
</h3>
<p>
Depending on how many times you've been around the block, you may find the above list naive. You may, for example, say that it's possible to express that an integer is positive like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">struct</span> <span style="color:#2b91af;">NaturalNumber</span> : IEquatable<NaturalNumber>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">int</span> i;
<span style="color:blue;">public</span> <span style="color:#2b91af;">NaturalNumber</span>(<span style="color:blue;">int</span> candidate)
{
<span style="color:blue;">if</span> (candidate < 1)
<span style="color:blue;">throw</span> <span style="color:blue;">new</span> ArgumentOutOfRangeException(
nameof(candidate),
<span style="color:#a31515;">$"The value must be a positive (non-zero) number, but was: </span>{candidate}<span style="color:#a31515;">."</span>);
<span style="color:blue;">this</span>.i = candidate;
}
<span style="color:green;">// Various other members follow...</span></pre>
</p>
<p>
I like introducing wrapper types like this. To the inexperienced developer this may seem redundant, but using a wrapper like this has several advantages. For one, it makes preconditions explicit. Consider a constructor like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">Reservation</span>(
Guid id,
DateTime at,
Email email,
Name name,
NaturalNumber quantity)</pre>
</p>
<p>
What are the preconditions that you, as a client developer, has to fulfil before you can create a valid <code>Reservation</code> object? First, you must supply five arguments: <code>id</code>, <code>at</code>, <code>email</code>, <code>name</code>, and <code>quantity</code>. There is, however, more information than that.
</p>
<p>
Consider, as an alternative, a constructor like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:#2b91af;">Reservation</span>(
Guid id,
DateTime at,
Email email,
Name name,
<span style="color:blue;">int</span> quantity)</pre>
</p>
<p>
This constructor requires you to supply the same five arguments. There is, however, less explicit information available. If that was the only available constructor, you might be wondering: <em>Can I pass zero as <code>quantity</code>? Can I pass <code>-1</code>?</em>
</p>
<p>
When the only constructor available is the first of these two alternatives, you already have the answer: No, the <code>quantity</code> must be a natural number.
</p>
<p>
Another advantage of creating wrapper types like <code>NaturalNumber</code> is that you centralise run-time checks in one place. Instead of sprinkling defensive code all over the code base, you have it in one place. Any code that receives a <code>NaturalNumber</code> object knows that the check has already been performed.
</p>
<p>
There's a word for this: <a href="/encapsulation-and-solid">Encapsulation</a>.
</p>
<p>
You gather a coherent set of invariants and collect it in a single type, making sure that the type always guarantees its invariants. Note that <a href="/2022/10/24/encapsulation-in-functional-programming">this is an important design technique in functional programming</a> too. While you may not have to worry about state mutation preserving invariants, it's still important to guarantee that all values of a type are <em>valid</em>.
</p>
<h3 id="8c48a55a5db74686ab7a0c8fc40dd650">
Predicative and constructive data <a href="#8c48a55a5db74686ab7a0c8fc40dd650" title="permalink">#</a>
</h3>
<p>
It's debatable whether the above <code>NaturalNumber</code> class <em>really</em> uses the type system to model what constitutes valid data. Since it relies on a run-time predicate, it falls in the category of types Hillel Wayne <a href="https://www.hillelwayne.com/post/constructive/">calls <em>predicative</em></a>. Such types are easy to create and compose well, but on the other hand fail to take full advantage of the type system.
</p>
<p>
It's often worthwhile considering if a <em>constructive</em> design is possible and practical. In other words, is it possible to <a href="https://blog.janestreet.com/effective-ml-video/">make illegal states unrepresentable</a> (MISU)?
</p>
<p>
What's wrong with <code>NaturalNumber</code>? Doesn't it do that? No, it doesn't, because this compiles:
</p>
<p>
<pre><span style="color:blue;">new</span> NaturalNumber(-1)</pre>
</p>
<p>
Surely it <em>will</em> fail at run time, but it compiles. Thus, it's <em>representable</em>.
</p>
<p>
<a href="/2011/04/29/Feedbackmechanismsandtradeoffs">The compiler gives you feedback faster than tests</a>. Considering MISU is worthwhile.
</p>
<p>
Can we model natural numbers in a constructive way? Yes, with <a href="https://en.wikipedia.org/wiki/Peano_axioms">Peano numbers</a>. This is even <a href="/2018/05/28/church-encoded-natural-numbers">possible in C#</a>, but I wouldn't consider it practical. On the other hand, while it's possible to represent any natural number, there is <em>no way</em> to express -1 as a Peano number.
</p>
<p>
As Hillel Wayne describes, constructive data types are much harder and requires a considerable measure of creativity. Often, a constructive model can seem impossible until you get a good idea.
</p>
<blockquote>
<p>
"a list can only be of even length. Most languages will not be able to express such a thing in a reasonable way in the data type."
</p>
<footer><cite><a href="https://note89.github.io/state-of-emergency/">Nils Eriksson</a></cite></footer>
</blockquote>
<p>
Such a requirement may look difficult until inspiration hits. Then one day you may realise that it'd be as simple as a list of pairs (two-tuples). In Haskell, it could be as simple as this:
</p>
<p>
<pre>newtype EvenList a = EvenList [(a,a)] deriving (Eq, Show)</pre>
</p>
<p>
With such a constructive data model, lists of uneven length are unrepresentable. This is a simple example of the kind of creative thinking you may need to engage in with constructive data modelling.
</p>
<p>
If you feel the need to object that Haskell isn't 'most languages', then here's the same idea expressed in C#:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">EvenCollection</span><<span style="color:#2b91af;">T</span>> : IEnumerable<T>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IEnumerable<Tuple<T, T>> values;
<span style="color:blue;">public</span> <span style="color:#2b91af;">EvenCollection</span>(IEnumerable<Tuple<T, T>> <span style="color:#1f377f;">values</span>)
{
<span style="color:blue;">this</span>.values = values;
}
<span style="color:blue;">public</span> IEnumerator<T> <span style="color:#74531f;">GetEnumerator</span>()
{
<span style="color:#8f08c4;">foreach</span> (var <span style="color:#1f377f;">x</span> <span style="color:#8f08c4;">in</span> values)
{
<span style="color:#8f08c4;">yield</span> <span style="color:#8f08c4;">return</span> x.Item1;
<span style="color:#8f08c4;">yield</span> <span style="color:#8f08c4;">return</span> x.Item2;
}
}
IEnumerator IEnumerable.<span style="color:#74531f;">GetEnumerator</span>()
{
<span style="color:#8f08c4;">return</span> GetEnumerator();
}
}</pre>
</p>
<p>
You can create such a list like this:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="color:#1f377f;">list</span> = <span style="color:blue;">new</span> EvenCollection<<span style="color:blue;">string</span>>(<span style="color:blue;">new</span>[]
{
Tuple.Create(<span style="color:#a31515;">"foo"</span>, <span style="color:#a31515;">"bar"</span>),
Tuple.Create(<span style="color:#a31515;">"baz"</span>, <span style="color:#a31515;">"qux"</span>)
});</pre>
</p>
<p>
On the other hand, this doesn't compile:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="color:#1f377f;">list</span> = <span style="color:blue;">new</span> EvenCollection<<span style="color:blue;">string</span>>(<span style="color:blue;">new</span>[]
{
Tuple.Create(<span style="color:#a31515;">"foo"</span>, <span style="color:#a31515;">"bar"</span>),
Tuple.Create(<span style="color:#a31515;">"baz"</span>, <span style="color:#a31515;">"qux"</span>, <span style="color:#a31515;">"quux"</span>)
});</pre>
</p>
<p>
Despite this digression, the point remains: Constructive data modelling may be impossible, unimagined, or impractical.
</p>
<p>
Often, in languages like C# we resort to predicative data modelling. That's also what I did in the article <a href="/2022/08/15/aspnet-validation-revisited">ASP.NET validation revisited</a>.
</p>
<h3 id="830f3f7a663b459994cc585ef9604d11">
Validation as functions <a href="#830f3f7a663b459994cc585ef9604d11" title="permalink">#</a>
</h3>
<p>
That was a long rambling detour inspired by a simple question: Is it possible to use types instead of validation?
</p>
<p>
In order to address that question, it's only proper to explicitly state assumptions and definitions. What's the definition of <em>validation?</em>
</p>
<p>
I'm not aware of a ubiquitous definition. While I could draw from <a href="https://en.wikipedia.org/wiki/Data_validation">the Wikipedia article on the topic</a>, at the time of writing it doesn't cite any sources when it sets out to define what it is. So I may as well paraphrase. It seems fair, though, to consider the stem of the word: <em>Valid</em>.
</p>
<p>
Validation is the process of examining input to determine whether or not it's valid. I consider this a (mostly) self-contained operation: Given the data, is it well-formed and according to specification? If you have to query a database before making a decision, you're not validating the input. In that case, you're applying a business rule. As a rule of thumb I expect validations to be <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a>.
</p>
<p>
Validation, then, seems to imply a process. Before you execute the process, you don't know if data is valid. After executing the process, you do know.
</p>
<p>
Data types, whether predicative like <code>NaturalNumber</code> or constructive like <code>EvenCollection<T></code>, aren't processes or functions. They are results.
</p>
<p>
<img src="/content/binary/validation-as-a-function-from-data-to-type.png" alt="An arrow labelled 'validation' pointing from a document to the left labelled 'Data' to a box to the right labelled 'Type'.">
</p>
<p>
Sometimes an algorithm can use a type to <em>infer</em> the validation function. This is common in statically typed languages, from C# over <a href="https://fsharp.org/">F#</a> to Haskell (which are the languages with which I'm most familiar).
</p>
<h3 id="4cc324c1919e45958542bb82c2ca73d4">
Data Transfer Object as a validation DSL <a href="#4cc324c1919e45958542bb82c2ca73d4" title="permalink">#</a>
</h3>
<p>
In a way you can think of the type system as a <a href="https://en.wikipedia.org/wiki/Domain-specific_language">domain-specific language</a> (DSL) for defining validation functions. It's not perfectly suited for that task, but often good enough that many developers reach for it.
</p>
<p>
Consider the <code>ReservationDto</code> class from the <a href="/2022/08/15/aspnet-validation-revisited">ASP.NET validation revisited</a> article where I eventually gave up on it:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ReservationDto</span>
{
<span style="color:blue;">public</span> LinkDto[]? Links { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> Guid? Id { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
[Required, NotNull]
<span style="color:blue;">public</span> DateTime? At { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
[Required, NotNull]
<span style="color:blue;">public</span> <span style="color:blue;">string</span>? Email { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
<span style="color:blue;">public</span> <span style="color:blue;">string</span>? Name { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
[NaturalNumber]
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Quantity { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }
}</pre>
</p>
<p>
It actually tries to do what Maurice Johnson suggests. Particularly, it defines <code>At</code> as a <code>DateTime?</code> value.
</p>
<p>
<pre>> <span style="color:blue;">var</span> <span style="color:#1f377f;">json</span> = <span style="color:#a31515;">"{ \"At\": \"2022-10-11T19:30\", \"Email\": \"z@example.com\", \"Quantity\": 1}"</span>;
> JsonSerializer.Deserialize<ReservationDto>(json)
ReservationDto { At=[11.10.2022 19:30:00], Email="z@example.com", Id=null, Name=null, Quantity=1 }</pre>
</p>
<p>
A JSON deserializer like this one uses run-time reflection to examine the type in question and then maps the incoming data onto an instance. Many XML deserializers work the same way.
</p>
<p>
What happens if you supply malformed input?
</p>
<p>
<pre>> <span style="color:blue;">var</span> <span style="color:#1f377f;">json</span> = <span style="color:#a31515;">"{ \"At\": \"foo\", \"Email\": \"z@example.com\", \"Quantity\": 1}"</span>;
> JsonSerializer.Deserialize<ReservationDto>(json)
<span style="color:red">System.Text.Json.JsonException:↩
The JSON value could not be converted to System.Nullable`1[System.DateTime].↩
Path: $.At | LineNumber: 0 | BytePositionInLine: 26.↩
[...]</span></pre>
</p>
<p>
(I've wrapped the result over multiple lines for readability. The <code>↩</code> symbol indicates where I've wrapped the text. I've also omitted a stack trace, indicated by <code>[...]</code>. I'll do that repeatedly throughout this article.)
</p>
<p>
What happens if we try to define <code>ReservationDto.Quantity</code> with <code>NaturalNumber</code>?
</p>
<p>
<pre>> var json = "{ \"At\": \"2022-10-11T19:30\", \"Email\": \"z@example.com\", \"Quantity\": 1}";
> JsonSerializer.Deserialize<ReservationDto>(json)
<span style="color:red">System.Text.Json.JsonException:↩
The JSON value could not be converted to NaturalNumber.↩
Path: $.Quantity | LineNumber: 0 | BytePositionInLine: 67.↩
[...]</span></pre>
</p>
<p>
While <a href="https://docs.microsoft.com/dotnet/api/system.text.json.jsonserializer">JsonSerializer</a> is a sophisticated piece of software, it's not so sophisticated that it can automatically map <code>1</code> to a <code>NaturalNumber</code> value.
</p>
<p>
I'm sure that you can configure the behaviour with one or more <a href="https://docs.microsoft.com/dotnet/api/system.text.json.serialization.jsonconverter">JsonConverter</a> objects, but this is exactly the kind of framework <a href="https://en.wikipedia.org/wiki/Whac-A-Mole">Whack-a-mole</a> that I consider costly. It also suggests a wider problem.
</p>
<h3 id="bdad63c97ad44f1199f4f06e2b2e3fff">
Error handling <a href="#bdad63c97ad44f1199f4f06e2b2e3fff" title="permalink">#</a>
</h3>
<p>
What happens if input to a validation function is malformed? You may want to report the errors to the caller, and you may want to report all errors in one go. Consider the user experience if you don't: A user types in a big form and submits it. The system informs him or her that there's an error in the third field. Okay, correct the error and submit again. Now there's an error in the fifth field, and so on.
</p>
<p>
It's often better to return all errors as one collection.
</p>
<p>
The problem is that type-based validation doesn't <em>compose</em> well. What do I mean by that?
</p>
<p>
It's fairly clear that if you take a <em>simple</em> (i.e. non-complex) type like <code>NaturalNumber</code>, if you fail to initialize a value it's because the input is at fault:
</p>
<p>
<pre>> <span style="color:blue;">new</span> NaturalNumber(-1)
<span style="color:red">System.ArgumentOutOfRangeException: The value must be a positive (non-zero) number, but was: -1.↩
(Parameter 'candidate')
+ NaturalNumber..ctor(int)</span></pre>
</p>
<p>
The problem is that for complex types (i.e. types made from other types), exceptions short-circuit. As soon as one exception is thrown, further data validation stops. The <a href="/2022/08/15/aspnet-validation-revisited">ASP.NET validation revisited</a> article shows examples of that particular problem.
</p>
<p>
This happens when validation functions have no composable way to communicate errors. When throwing exceptions, you can return an exception message, but exceptions short-circuit rather than compose. The same is true for the <a href="/2022/05/09/an-either-monad">Either monad</a>: It short-circuits. Once you're on <a href="https://fsharpforfunandprofit.com/posts/recipe-part2/">the failure track</a> you stay there and no further processing takes place. Errors don't compose.
</p>
<h3 id="7a85a7339fc24eda8e922ba12fdb3d89">
Monoidal versus applicative validation <a href="#7a85a7339fc24eda8e922ba12fdb3d89" title="permalink">#</a>
</h3>
<p>
The naive take on validation is to answer the question: <em>Is that data valid or invalid?</em> Notice the binary nature of the question. It's either-or.
</p>
<p>
This is true for both predicative data and constructive data.
</p>
<p>
For constructive data, the question is: Is a candidate value representable? For example, can you represent <em>-1</em> as a Peano number? The answer is either yes or no; true or false.
</p>
<p>
This is even clearer for predicative data, which is defined by a <em>predicate</em>. (Here's another <a href="/2021/09/09/the-specification-contravariant-functor">example of a natural number specification</a>.) A predicate is a function that returns a Boolean value: True or false.
</p>
<p>
It's possible to compose Boolean values. The composition that we need in this case is Boolean <em>and</em>, which is also known as the <em>all</em> <a href="/2017/10/06/monoids">monoid</a>: If all values are <em>true</em>, the composed value is <em>true</em>; if just one value is <em>false</em>, the composed value is <em>false</em>.
</p>
<p>
The problem is that during composition, we lose information. While a single <em>false</em> value causes the entire aggregated value to be <em>false</em>, we don't know why. And we don't know if there was only a single <em>false</em> value, or if there were more than one. Boolean <em>all</em> short-circuits on the first <em>false</em> value it encounters, and stops processing subsequent predicates.
</p>
<p>
In logic, that's all you need, but in data validation you often want to know <em>what's wrong with the data</em>.
</p>
<p>
Fortunately, this is <a href="/2020/12/14/validation-a-solved-problem">a solved problem</a>. Use <a href="/2018/11/05/applicative-validation">applicative validation</a>, an example of which I supplied in the article <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">An applicative reservation validation example in C#</a>.
</p>
<p>
This changes focus on validation. No longer is validation a <em>true/false</em> question. Validation is a function from less-structured data to more-structured data. <a href="https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/">Parse, don't validate</a>.
</p>
<h3 id="96d5b6c4df424f79b9eec8b186a2b3de">
Conclusion <a href="#96d5b6c4df424f79b9eec8b186a2b3de" title="permalink">#</a>
</h3>
<p>
Can types replace validation?
</p>
<p>
In some cases they can, but I think that the general answer is <em>no</em>. Granted, this answer is partially based on capabilities of current deserialisers. <a href="https://docs.microsoft.com/dotnet/api/system.text.json.jsonserializer.deserialize">JsonSerializer.Deserialize</a> short-circuits on the first error it encounters, and the same does <a href="https://hackage.haskell.org/package/aeson/docs/Data-Aeson.html">aeson</a>'s <a href="https://hackage.haskell.org/package/aeson/docs/Data-Aeson.html#v:eitherDecode">eitherDecode</a>.
</p>
<p>
While that's the current state of affairs, it may not have to stay like that forever. One might be able to derive an applicative parser from a desired destination type, but I haven't seen that done yet.
</p>
<p>
It sounds like a worthwhile research project.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="4e4498fcf878443e96fad490569a8e4f">
<div class="comment-author"><a href="https://www.lloydatkinson.net">Lloyd Atkinson</a> <a href="#4e4498fcf878443e96fad490569a8e4f">#</a></div>
<div class="comment-content">
<p>
This slightly reminds me of <a href="https://github.com/colinhacks/zod">Zod</a> which is described as "TypeScript-first schema validation with static type inference".
</p>
<p>
The library automatically infers a type that matches the validation - in a way it blurs this line between types and validation by making them become one.
</p>
<p>
Of course, once you have that infered type there is nothing stopping you using it without the library, but that's something code reviews could catch. It's quite interesting though.
</p>
<pre>
<code>
import { z } from 'zod';
const User = z.object({
username: z.string(),
age: z.number().positive({
message: 'Your age must be positive!',
}),
});
User.parse({ username: 'Ludwig', age: -1 });
// extract the inferred type
type User = z.infer<typeof User>;
// { username: string, age: number }
</code>
</pre>
</div>
<div class="comment-date">2022-08-28 00:53 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.ASP.NET validation revisitedhttps://blog.ploeh.dk/2022/08/15/aspnet-validation-revisited2022-08-15T05:48:00+00:00Mark Seemann
<div id="post">
<p>
<em>Is the built-in validation framework better than applicative validation?</em>
</p>
<p>
I recently published an article called <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">An applicative reservation validation example in C#</a> in which I describe how to use the universal abstractions of <a href="/2018/10/01/applicative-functors">applicative functors</a> and <a href="/2017/11/27/semigroups">semigroups</a> to implement reusable, composable validation.
</p>
<p>
One reader reaction made me stop and think:
</p>
<blockquote>
<p>
"An exercise on how to reject 90% of the framework's existing services (*Validation) only to re implement them more poorly, by renouncing standardization, interoperability and globalization all for the glory of FP."
</p>
<footer><cite><a href="https://twitter.com/PopCatalin/status/1551478523981881349">PopCatalin</a></cite></footer>
</blockquote>
<p>
(At the time of posting, the <a href="https://twitter.com/PopCatalin">PopCatalin Twitter account</a>'s display name was <em>Prime minister of truth™ カタリンポップ🇺🇦</em>, which I find unhelpful. The linked <a href="https://github.com/popcatalin81">GitHub account</a> locates the user in <a href="https://en.wikipedia.org/wiki/Cluj-Napoca">Cluj-Napoca</a>, a city I've <a href="/schedule">repeatedly visited for conferences</a> - the last time as recent as June 2022. I wouldn't be surprised if we've interacted, but if so, I'm sorry to say that I can't connect these accounts with one of the many wonderful people I've met there. In general, I'm getting a strong sarcastic vibe from that account, and I'm not sure whether or not to take <em>Pronouns kucf/fof</em> seriously. As the possibly clueless 51-year white male that I am, I will proceed with good intentions and to the best of my abilities.)
</p>
<p>
That reply is an important reminder that I should once in a while check my assumptions. I'm aware that the ASP.NET framework comes with validation features, but I many years ago dismissed them because I found them inadequate. Perhaps, in the meantime, these built-in services have improved to the point that they are to be preferred over <a href="/2018/11/05/applicative-validation">applicative validation</a>.
</p>
<p>
I decided to attempt to refactor the code to take advantage of the built-in ASP.NET validation to be able to compare the two approaches. This article is an experience report.
</p>
<h3 id="d4b2e6bdae494d0397866f683ddd64e9">
Requirements <a href="#d4b2e6bdae494d0397866f683ddd64e9" title="permalink">#</a>
</h3>
<p>
In order to compare the two approaches, the ASP.NET-based validation should support the same validation features as the applicative validation example:
</p>
<ul>
<li>The <code>At</code> property is required and should be a valid date and time. If it isn't, the validation message should report the problem and the offending input.</li>
<li>The <code>Email</code> property should be required. If it's missing, the validation message should state so.</li>
<li>The <code>Quantity</code> property is required and should be a natural number. If it isn't, the validation message should report the problem and the offending input.</li>
</ul>
<p>
The <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">previous article</a> includes an interaction example that I'll repeat here for convenience:
</p>
<p>
<pre>POST /restaurants/1/reservations?sig=1WiLlS5705bfsffPzaFYLwntrS4FCjE5CLdaeYTHxxg%3D HTTP/1.1
Content-Type: application/json
{ <span style="color:#2e75b6;">"at"</span>: <span style="color:#a31515;">"large"</span>, <span style="color:#2e75b6;">"name"</span>: <span style="color:#a31515;">"Kerry Onn"</span>, <span style="color:#2e75b6;">"quantity"</span>: -1 }
HTTP/1.1 400 Bad Request
Invalid date or time: large.
Email address is missing.
Quantity must be a positive integer, but was: -1.</pre>
</p>
<p>
ASP.NET validation formats the errors differently, as you'll see later in this article. That's not much of a concern, though: <a href="/2014/12/23/exception-messages-are-for-programmers">Error <em>messages</em> are for other developers</a>. They don't really have to be machine-readable or have a strict shape (as opposed to error <em>types</em>, which should be machine-readable).
</p>
<p>
Reporting the offending values, as in <em>"Quantity must be a positive integer, but was: -1."</em> is part of the requirements. A REST API can make no assumptions about its clients. Perhaps one client is an unattended batch job that only logs errors. Logging offending values may be helpful to maintenance developers of such a batch job.
</p>
<h3 id="74d95aed3a5a40eb8f32ace05b0410a0">
Framework API <a href="#74d95aed3a5a40eb8f32ace05b0410a0" title="permalink">#</a>
</h3>
<p>
The first observation to make about the ASP.NET validation API is that it's specific to ASP.NET. It's not a general-purpose API that you can use for other purposes.
</p>
<p>
If, instead, you need to validate input to a console application, a background message handler, a batch job, or a desktop or phone app, you can't use that API.
</p>
<p>
Perhaps each of these styles of software come with their own validation APIs, but even if so, that's a different API you'll have to learn. And in cases where there's no built-in validation API, then what do you do?
</p>
<p>
The beauty and practicality of applicative validation is that it's <em>universal</em>. Since it's based on mathematical foundations, it's not tied to a particular framework, platform, or language. These concepts exist independently of technology. Once you understand the concepts, they're always there for you.
</p>
<p>
The code example from the previous article, as well as here, build upon the code base that accompanies <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>. An example code base has to be written in <em>some</em> language, and I chose C# because I'm more familiar with it than I am with <a href="https://www.java.com/">Java</a>, <a href="https://isocpp.org/">C++</a>, or <a href="https://www.typescriptlang.org/">TypeScript</a>. While I wanted the code base to be realistic, I tried hard to include only coding techniques and patterns that you could use in more than one language.
</p>
<p>
As I wrote the book, I ran into many interesting problems and solutions that were specific to C# and ASP.NET. While I found them too specific to include in the book, I wrote <a href="/2021/06/14/new-book-code-that-fits-in-your-head">a series of blog posts</a> about them. This article is now becoming one of those.
</p>
<p>
The point about the previous article on <a href="/2022/07/25/an-applicative-reservation-validation-example-in-c">applicative reservation validation in C#</a> was to demonstrate how the general technique works. Not specifically in ASP.NET, or even C#, but in general.
</p>
<p>
It just so happens that this example is situated in a context where an alternative solution presents itself. This is not always the case. Sometimes you have to solve this problem yourself, and when this happens, it's useful to know that <a href="/2020/12/14/validation-a-solved-problem">validation is a solved problem</a>. Even so, while a universal solution exists, it doesn't follow that the universal solution is the best. Perhaps there are specialised solutions that are better, each within their constrained contexts.
</p>
<p>
Perhaps ASP.NET validation is an example of that.
</p>
<h3 id="0b80f4aa36954378be9278761b2f27ba">
Email validation <a href="#0b80f4aa36954378be9278761b2f27ba" title="permalink">#</a>
</h3>
<p>
The following is a report on my experience refactoring validation to use the built-in ASP.NET validation API.
</p>
<p>
I decided to start with the <code>Email</code> property, since the only requirement is that this value should be present. That seemed like an easy way to get started.
</p>
<p>
I added the <a href="https://docs.microsoft.com/dotnet/api/system.componentmodel.dataannotations.requiredattribute">[Required]</a> attribute to the <code>ReservationDto</code> class' <code>Email</code> property. Since this code base also uses <a href="https://docs.microsoft.com/dotnet/csharp/nullable-references">nullable reference types</a>, it was necessary to also annotate the property with the <a href="https://docs.microsoft.com/dotnet/api/system.diagnostics.codeanalysis.notnullattribute">[NotNull]</a> attribute:
</p>
<p>
<pre>[Required, NotNull]
<span style="color:blue;">public</span> <span style="color:blue;">string</span>? Email { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }</pre>
</p>
<p>
That's not too difficult, and seems to be working satisfactorily:
</p>
<p>
<pre>POST /restaurants/1/reservations?sig=1WiLlS5705bfsffPzaFYLwntrS4FCjE5CLdaeYTHxxg%3D HTTP/1.1
> content-type: application/json
{
<span style="color:#2e75b6;">"at"</span>: <span style="color:#a31515;">"2022-11-21 19:00"</span>,
<span style="color:#2e75b6;">"name"</span>: <span style="color:#a31515;">"Kerry Onn"</span>,
<span style="color:#2e75b6;">"quantity"</span>: 1
}
HTTP/1.1 400 Bad Request
Content-Type: application/problem+json; charset=utf-8
{
<span style="color:#2e75b6;">"type"</span>: <span style="color:#a31515;">"https://tools.ietf.org/html/rfc7231#section-6.5.1"</span>,
<span style="color:#2e75b6;">"title"</span>: <span style="color:#a31515;">"One or more validation errors occurred."</span>,
<span style="color:#2e75b6;">"status"</span>: 400,
<span style="color:#2e75b6;">"traceId"</span>: <span style="color:#a31515;">"|552ab5ff-494e1d1a9d4c6355."</span>,
<span style="color:#2e75b6;">"errors"</span>: { <span style="color:#2e75b6;">"Email"</span>: [ <span style="color:#a31515;">"The Email field is required."</span> ] }
}</pre>
</p>
<p>
As discussed above, the response body is formatted differently than in the applicative validation example, but I consider that inconsequential for the reasons I gave.
</p>
<p>
So far, so good.
</p>
<h3 id="a10d6da09d064a96bc0022e9657b62e5">
Quantity validation <a href="#a10d6da09d064a96bc0022e9657b62e5" title="permalink">#</a>
</h3>
<p>
The next property I decided to migrate was <code>Quantity</code>. This must be a natural number; that is, an integer greater than zero.
</p>
<p>
Disappointingly, no such built-in validation attribute seems to exist. One <a href="https://stackoverflow.com/a/7419330/126014">highly voted Stack Overflow answer</a> suggested using the <a href="https://docs.microsoft.com/dotnet/api/system.componentmodel.dataannotations.rangeattribute">[Range]</a> attribute, so I tried that:
</p>
<p>
<pre>[Range(1, <span style="color:blue;">int</span>.MaxValue, ErrorMessage = <span style="color:#a31515;">"Quantity must be a natural number."</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Quantity { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }</pre>
</p>
<p>
As a <em>declarative</em> approach to validation goes, I don't think this is off to a good start. I like declarative programming, but I'd prefer to be able to declare that <code>Quantity</code> must be a <em>natural number</em>, rather than in the range of <code>1</code> and <code>int.MaxValue</code>.
</p>
<p>
Does it work, though?
</p>
<p>
<pre>POST /restaurants/1/reservations?sig=1WiLlS5705bfsffPzaFYLwntrS4FCjE5CLdaeYTHxxg%3D HTTP/1.1
content-type: application/json
{
<span style="color:#2e75b6;">"at"</span>: <span style="color:#a31515;">"2022-11-21 19:00"</span>,
<span style="color:#2e75b6;">"name"</span>: <span style="color:#a31515;">"Kerry Onn"</span>,
<span style="color:#2e75b6;">"quantity"</span>: 0
}
HTTP/1.1 400 Bad Request
Content-Type: application/problem+json; charset=utf-8
{
<span style="color:#2e75b6;">"type"</span>: <span style="color:#a31515;">"https://tools.ietf.org/html/rfc7231#section-6.5.1"</span>,
<span style="color:#2e75b6;">"title"</span>: <span style="color:#a31515;">"One or more validation errors occurred."</span>,
<span style="color:#2e75b6;">"status"</span>: 400,
<span style="color:#2e75b6;">"traceId"</span>: <span style="color:#a31515;">"|d9a6be38-4be82ede7c525913."</span>,
<span style="color:#2e75b6;">"errors"</span>: {
<span style="color:#2e75b6;">"Email"</span>: [ <span style="color:#a31515;">"The Email field is required."</span> ],
<span style="color:#2e75b6;">"Quantity"</span>: [ <span style="color:#a31515;">"Quantity must be a natural number."</span> ]
}
}</pre>
</p>
<p>
While it does capture the intent that <code>Quantity</code> must be one or greater, it fails to echo back the offending value.
</p>
<p>
In order to address that concern, I tried reading the documentation to find a way forward. Instead I found this:
</p>
<blockquote>
<p>
"Internally, the attributes call <a href="https://docs.microsoft.com/en-us/dotnet/api/system.string.format">String.Format</a> with a placeholder for the field name and sometimes additional placeholders. [...]"
</p>
<p>
"To find out which parameters are passed to <code>String.Format</code> for a particular attribute's error message, see the <a href="https://github.com/dotnet/runtime/tree/main/src/libraries/System.ComponentModel.Annotations/src/System/ComponentModel/DataAnnotations">DataAnnotations source code</a>."
</p>
<footer><cite><a href="https://docs.microsoft.com/aspnet/core/mvc/models/validation">ASP.NET validation documentation</a></cite></footer>
</blockquote>
<p>
Really?!
</p>
<p>
If you have to read implementation code, <a href="/encapsulation-and-solid">encapsulation</a> is broken.
</p>
<p>
Hardly impressed, I nonetheless found <a href="https://github.com/dotnet/runtime/blob/main/src/libraries/System.ComponentModel.Annotations/src/System/ComponentModel/DataAnnotations/RangeAttribute.cs">the RangeAttribute source code</a>. Alas, it only passes the property <code>name</code>, <code>Minimum</code>, and <code>Maximum</code> to <code>string.Format</code>, but not the offending value:
</p>
<p>
<pre>return string.Format(CultureInfo.CurrentCulture, ErrorMessageString, name, Minimum, Maximum);</pre>
</p>
<p>
This looked like a dead end, but at least it's possible to extend the ASP.NET validation API:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">NaturalNumberAttribute</span> : ValidationAttribute
{
<span style="color:blue;">protected</span> <span style="color:blue;">override</span> ValidationResult IsValid(
<span style="color:blue;">object</span> value,
ValidationContext validationContext)
{
<span style="color:blue;">if</span> (validationContext <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="color:blue;">throw</span> <span style="color:blue;">new</span> ArgumentNullException(nameof(validationContext));
<span style="color:blue;">var</span> i = value <span style="color:blue;">as</span> <span style="color:blue;">int</span>?;
<span style="color:blue;">if</span> (i.HasValue && 0 < i)
<span style="color:blue;">return</span> ValidationResult.Success;
<span style="color:blue;">return</span> <span style="color:blue;">new</span> ValidationResult(
<span style="color:#a31515;">$"</span>{validationContext.MemberName}<span style="color:#a31515;"> must be a positive integer, but was: </span>{value}<span style="color:#a31515;">."</span>);
}
}</pre>
</p>
<p>
Adding this <code>NaturalNumberAttribute</code> class enabled me to change the annotation of the <code>Quantity</code> property:
</p>
<p>
<pre>[NaturalNumber]
<span style="color:blue;">public</span> <span style="color:blue;">int</span> Quantity { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }</pre>
</p>
<p>
This seems to get the job done:
</p>
<p>
<pre>POST /restaurants/1/reservations?sig=1WiLlS5705bfsffPzaFYLwntrS4FCjE5CLdaeYTHxxg%3D HTTP/1.1
content-type: application/json
{
<span style="color:#2e75b6;">"at"</span>: <span style="color:#a31515;">"2022-11-21 19:00"</span>,
<span style="color:#2e75b6;">"name"</span>: <span style="color:#a31515;">"Kerry Onn"</span>,
<span style="color:#2e75b6;">"quantity"</span>: 0
}
HTTP/1.1 400 Bad Request
Content-Type: application/problem+json; charset=utf-8
{
<span style="color:#2e75b6;">"type"</span>: <span style="color:#a31515;">"https://tools.ietf.org/html/rfc7231#section-6.5.1"</span>,
<span style="color:#2e75b6;">"title"</span>: <span style="color:#a31515;">"One or more validation errors occurred."</span>,
<span style="color:#2e75b6;">"status"</span>: 400,
<span style="color:#2e75b6;">"traceId"</span>: <span style="color:#a31515;">"|bb45b60d-4bd255194871157d."</span>,
<span style="color:#2e75b6;">"errors"</span>: {
<span style="color:#2e75b6;">"Email"</span>: [ <span style="color:#a31515;">"The Email field is required."</span> ],
<span style="color:#2e75b6;">"Quantity"</span>: [ <span style="color:#a31515;">"Quantity must be a positive integer, but was: 0."</span> ]
}
}</pre>
</p>
<p>
The <code>[NaturalNumber]</code> attribute now correctly reports the offending value together with a useful error message.
</p>
<p>
Compare, however, the above <code>NaturalNumberAttribute</code> class to the <code>TryParseQuantity</code> function, repeated here for convenience:
</p>
<p>
<pre><span style="color:blue;">private</span> Validated<<span style="color:blue;">string</span>, <span style="color:blue;">int</span>> <span style="color:#74531f;">TryParseQuantity</span>()
{
<span style="color:#8f08c4;">if</span> (Quantity < 1)
<span style="color:#8f08c4;">return</span> Validated.Fail<<span style="color:blue;">string</span>, <span style="color:blue;">int</span>>(
<span style="color:#a31515;">$"Quantity must be a positive integer, but was: </span>{Quantity}<span style="color:#a31515;">."</span>);
<span style="color:#8f08c4;">return</span> Validated.Succeed<<span style="color:blue;">string</span>, <span style="color:blue;">int</span>>(Quantity);
}</pre>
</p>
<p>
<code>TryParseQuantity</code> is shorter and has half the <a href="https://en.wikipedia.org/wiki/Cyclomatic_complexity">cyclomatic complexity</a> of <code>NaturalNumberAttribute</code>. In isolation, at least, I'd prefer the shorter, simpler alternative.
</p>
<h3 id="3c90c9ded0af42fd9017593c9a387e97">
Date and time validation <a href="#3c90c9ded0af42fd9017593c9a387e97" title="permalink">#</a>
</h3>
<p>
Remaining is validation of the <code>At</code> property. As a first step, I converted the property to a <code>DateTime</code> value and added attributes:
</p>
<p>
<pre>[Required, NotNull]
<span style="color:blue;">public</span> DateTime? At { <span style="color:blue;">get</span>; <span style="color:blue;">set</span>; }</pre>
</p>
<p>
I'd been a little apprehensive doing that, fearing that it'd break a lot of code (particularly tests), but that turned out not to be the case. In fact, it actually simplified a few of the tests.
</p>
<p>
On the other hand, this doesn't really work as required:
</p>
<p>
<pre>POST /restaurants/1/reservations?sig=1WiLlS5705bfsffPzaFYLwntrS4FCjE5CLdaeYTHxxg%3D HTTP/1.1
content-type: application/json
{
<span style="color:#2e75b6;">"at"</span>: <span style="color:#a31515;">"2022-11-21 19:00"</span>,
<span style="color:#2e75b6;">"name"</span>: <span style="color:#a31515;">"Kerry Onn"</span>,
<span style="color:#2e75b6;">"quantity"</span>: 0
}
HTTP/1.1 400 Bad Request
Content-Type: application/problem+json; charset=utf-8
{
<span style="color:#2e75b6;">"type"</span>: <span style="color:#a31515;">"https://tools.ietf.org/html/rfc7231#section-6.5.1"</span>,
<span style="color:#2e75b6;">"title"</span>: <span style="color:#a31515;">"One or more validation errors occurred."</span>,
<span style="color:#2e75b6;">"status"</span>: 400,
<span style="color:#2e75b6;">"traceId"</span>: <span style="color:#a31515;">"|1e1d600e-4098fb36635642f6."</span>,
<span style="color:#2e75b6;">"errors"</span>: {
<span style="color:#2e75b6;">"dto"</span>: [ <span style="color:#a31515;">"The dto field is required."</span> ],
<span style="color:#2e75b6;">"$.at"</span>: [ <span style="color:#a31515;">"The JSON value could not be converted to System.Nullable`1[System.DateTime].↩
Path: $.at | LineNumber: 0 | BytePositionInLine: 26."</span> ]
}
}</pre>
</p>
<p>
(I've wrapped the last error message over two lines for readability. The <code>↩</code> symbol indicates where I've wrapped the text.)
</p>
<p>
There are several problems with this response. First, in addition to complaining about the missing <code>at</code> property, it should also have reported that there are problems with the <code>Quantity</code> and that the <code>Email</code> property is missing. Instead, the response implies that the <code>dto</code> field is missing. That's likely confusing to client developers, because <code>dto</code> is an implementation detail; it's the name of the C# parameter of the method that handles the request. Client developers can't and shouldn't know this. Instead, it looks as though the REST API somehow failed to receive the JSON document that the client posted.
</p>
<p>
Second, the error message exposes other implementation details, here that the <code>at</code> field has the type <code>System.Nullable`1[System.DateTime]</code>. This is, at best, irrelevant. At worst, it could be a security issue, because it reveals to a would-be attacker that the system is implemented on .NET.
</p>
<p>
Third, the framework rejects what looks like a perfectly good date and time: <code>2022-11-21 19:00</code>. This is a breaking change, since the API used to accept such values.
</p>
<p>
What's wrong with <code>2022-11-21 19:00</code>? It's not a valid <a href="https://en.wikipedia.org/wiki/ISO_8601">ISO 8601</a> string. According to the ISO 8601 standard, the date and time must be separated by <code>T</code>:
</p>
<p>
<pre>POST /restaurants/1/reservations?sig=1WiLlS5705bfsffPzaFYLwntrS4FCjE5CLdaeYTHxxg%3D HTTP/1.1
content-type: application/json
{
<span style="color:#2e75b6;">"at"</span>: <span style="color:#a31515;">"2022-11-21T19:00"</span>,
<span style="color:#2e75b6;">"name"</span>: <span style="color:#a31515;">"Kerry Onn"</span>,
<span style="color:#2e75b6;">"quantity"</span>: 0
}
HTTP/1.1 400 Bad Request
Content-Type: application/problem+json; charset=utf-8
{
<span style="color:#2e75b6;">"type"</span>: <span style="color:#a31515;">"https://tools.ietf.org/html/rfc7231#section-6.5.1"</span>,
<span style="color:#2e75b6;">"title"</span>: <span style="color:#a31515;">"One or more validation errors occurred."</span>,
<span style="color:#2e75b6;">"status"</span>: 400,
<span style="color:#2e75b6;">"traceId"</span>: <span style="color:#a31515;">"|1e1d600f-4098fb36635642f6."</span>,
<span style="color:#2e75b6;">"errors"</span>: {
<span style="color:#2e75b6;">"Email"</span>: [ <span style="color:#a31515;">"The Email field is required."</span> ],
<span style="color:#2e75b6;">"Quantity"</span>: [ <span style="color:#a31515;">"Quantity must be a positive integer, but was: 0."</span> ]
}
}</pre>
</p>
<p>
Posting a valid ISO 8601 string does, indeed, enable the client to proceed - only to receive a new set of error messages. After I converted <code>At</code> to <code>DateTime?</code>, the ASP.NET validation framework fails to collect and report all errors. Instead it stops if it can't parse the <code>At</code> property. It doesn't report any other errors that might also be present.
</p>
<p>
That is exactly the requirement that applicative validation so elegantly solves.
</p>
<h3 id="e9986e66f63c476ca529788afaa863e4">
Tolerant Reader <a href="#e9986e66f63c476ca529788afaa863e4" title="permalink">#</a>
</h3>
<p>
While it's true that <code>2022-11-21 19:00</code> isn't valid ISO 8601, it's unambiguous. According to <a href="https://en.wikipedia.org/wiki/Robustness_principle">Postel's law</a> an API should be a <a href="https://martinfowler.com/bliki/TolerantReader.html">Tolerant Reader</a>. It's not.
</p>
<p>
This problem, however, is solvable. First, add the Tolerant Reader:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">DateTimeConverter</span> : JsonConverter<DateTime>
{
<span style="color:blue;">public</span> <span style="color:blue;">override</span> DateTime Read(
<span style="color:blue;">ref</span> Utf8JsonReader reader,
Type typeToConvert,
JsonSerializerOptions options)
{
<span style="color:blue;">return</span> DateTime.Parse(
reader.GetString(),
CultureInfo.InvariantCulture);
}
<span style="color:blue;">public</span> <span style="color:blue;">override</span> <span style="color:blue;">void</span> Write(
Utf8JsonWriter writer,
DateTime value,
JsonSerializerOptions options)
{
<span style="color:blue;">if</span> (writer <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="color:blue;">throw</span> <span style="color:blue;">new</span> ArgumentNullException(nameof(writer));
writer.WriteStringValue(value.ToString(<span style="color:#a31515;">"s"</span>));
}
}</pre>
</p>
<p>
Then add it to the JSON serialiser's <a href="https://docs.microsoft.com/dotnet/api/system.text.json.jsonserializeroptions.converters">Converters</a>:
</p>
<p>
<pre>opts.JsonSerializerOptions.Converters.Add(<span style="color:blue;">new</span> DateTimeConverter());</pre>
</p>
<p>
This, at least, addresses the Tolerant Reader concern:
</p>
<p>
<pre>POST /restaurants/1/reservations?sig=1WiLlS5705bfsffPzaFYLwntrS4FCjE5CLdaeYTHxxg%3D HTTP/1.1
content-type: application/json
{
<span style="color:#2e75b6;">"at"</span>: <span style="color:#a31515;">"2022-11-21 19:00"</span>,
<span style="color:#2e75b6;">"name"</span>: <span style="color:#a31515;">"Kerry Onn"</span>,
<span style="color:#2e75b6;">"quantity"</span>: 0
}
HTTP/1.1 400 Bad Request
Content-Type: application/problem+json; charset=utf-8
{
<span style="color:#2e75b6;">"type"</span>: <span style="color:#a31515;">"https://tools.ietf.org/html/rfc7231#section-6.5.1"</span>,
<span style="color:#2e75b6;">"title"</span>: <span style="color:#a31515;">"One or more validation errors occurred."</span>,
<span style="color:#2e75b6;">"status"</span>: 400,
<span style="color:#2e75b6;">"traceId"</span>: <span style="color:#a31515;">"|11576943-400dafd4b489c282."</span>,
<span style="color:#2e75b6;">"errors"</span>: {
<span style="color:#2e75b6;">"Email"</span>: [ <span style="color:#a31515;">"The Email field is required."</span> ],
<span style="color:#2e75b6;">"Quantity"</span>: [ <span style="color:#a31515;">"Quantity must be a positive integer, but was: 0."</span> ]
}
}</pre>
</p>
<p>
The API now accepts the slightly malformed <code>at</code> field. It also correctly handles if the field is entirely missing:
</p>
<p>
<pre>POST /restaurants/1/reservations?sig=1WiLlS5705bfsffPzaFYLwntrS4FCjE5CLdaeYTHxxg%3D HTTP/1.1
content-type: application/json
{
<span style="color:#2e75b6;">"name"</span>: <span style="color:#a31515;">"Kerry Onn"</span>,
<span style="color:#2e75b6;">"quantity"</span>: 0
}
HTTP/1.1 400 Bad Request
Content-Type: application/problem+json; charset=utf-8
{
<span style="color:#2e75b6;">"type"</span>: <span style="color:#a31515;">"https://tools.ietf.org/html/rfc7231#section-6.5.1"</span>,
<span style="color:#2e75b6;">"title"</span>: <span style="color:#a31515;">"One or more validation errors occurred."</span>,
<span style="color:#2e75b6;">"status"</span>: 400,
<span style="color:#2e75b6;">"traceId"</span>: <span style="color:#a31515;">"|11576944-400dafd4b489c282."</span>,
<span style="color:#2e75b6;">"errors"</span>: {
<span style="color:#2e75b6;">"At"</span>: [ <span style="color:#a31515;">"The At field is required."</span> ],
<span style="color:#2e75b6;">"Email"</span>: [ <span style="color:#a31515;">"The Email field is required."</span> ],
<span style="color:#2e75b6;">"Quantity"</span>: [ <span style="color:#a31515;">"Quantity must be a positive integer, but was: 0."</span> ]
}
}</pre>
</p>
<p>
On the other hand, it <em>still</em> doesn't gracefully handle the case when the <code>at</code> field is unrecoverably malformed:
</p>
<p>
<pre>POST /restaurants/1/reservations?sig=1WiLlS5705bfsffPzaFYLwntrS4FCjE5CLdaeYTHxxg%3D HTTP/1.1
content-type: application/json
{
<span style="color:#2e75b6;">"at"</span>: <span style="color:#a31515;">"foo"</span>,
<span style="color:#2e75b6;">"name"</span>: <span style="color:#a31515;">"Kerry Onn"</span>,
<span style="color:#2e75b6;">"quantity"</span>: 0
}
HTTP/1.1 400 Bad Request
Content-Type: application/problem+json; charset=utf-8
{
<span style="color:#2e75b6;">"type"</span>: <span style="color:#a31515;">"https://tools.ietf.org/html/rfc7231#section-6.5.1"</span>,
<span style="color:#2e75b6;">"title"</span>: <span style="color:#a31515;">"One or more validation errors occurred."</span>,
<span style="color:#2e75b6;">"status"</span>: 400,
<span style="color:#2e75b6;">"traceId"</span>: <span style="color:#a31515;">"|11576945-400dafd4b489c282."</span>,
<span style="color:#2e75b6;">"errors"</span>: {
<span style="color:#2e75b6;">""</span>: [ <span style="color:#a31515;">"The supplied value is invalid."</span> ],
<span style="color:#2e75b6;">"dto"</span>: [ <span style="color:#a31515;">"The dto field is required."</span> ]
}
}</pre>
</p>
<p>
<code>The supplied value is invalid.</code> and <code>The dto field is required.</code>? That's not really helpful. And what happened to <code>The Email field is required.</code> and <code>Quantity must be a positive integer, but was: 0.</code>?
</p>
<p>
If there's a way to address this problem, I don't know how. I've tried adding another custom attribute, similar to the above <code>NaturalNumberAttribute</code> class, but that doesn't solve it - probably because the model binder (that deserialises the JSON document to a <code>ReservationDto</code> instance) runs before the validation.
</p>
<p>
Perhaps there's a way to address this problem with yet another class that derives from a base class, but I think that I've already played enough <a href="https://en.wikipedia.org/wiki/Whac-A-Mole">Whack-a-mole</a> to arrive at a conclusion.
</p>
<h3 id="fc3e2119ed1c4a7a8cff0c0992a8b071">
Conclusion <a href="#fc3e2119ed1c4a7a8cff0c0992a8b071" title="permalink">#</a>
</h3>
<p>
Your context may differ from mine, so the conclusion that I arrive at may not apply in your situation. For example, I'm given to understand that one benefit that the ASP.NET validation framework provides is that when used with ASP.NET MVC (instead of as a Web API), (some of) the validation logic can also run in <a href="https://www.javascript.com/">JavaScript</a> in browsers. This, ostensibly, reduces code duplication.
</p>
<blockquote>
<p>
"Yet in the case of validation, a Declarative model is far superior to a FP one. The declarative model allows various environments to implement validation as they need it (IE: Client side validation) while the FP one is strictly limited to the environment executing the code."
</p>
<footer><cite><a href="https://twitter.com/PopCatalin/status/1551478926005911553">PopCatalin</a></cite></footer>
</blockquote>
<p>
On the other hand, using the ASP.NET validation framework requires more code, and more complex code, than with applicative validation. It's a particular set of APIs that you have to learn, and that knowledge doesn't transfer to other frameworks, platforms, or languages.
</p>
<p>
Apart from client-side validation, I fail to see how applicative validation <em>"re implement[s validation] more poorly, by renouncing standardization, interoperability and globalization"</em>.
</p>
<p>
I'm not aware that there's any <em>standard</em> for validation as such, so I think that @PopCatalin has the 'standard' ASP.NET validation API in mind. If so, I consider applicative validation a much more standardised solution than a specialised API.
</p>
<p>
If by <em>interoperability</em> @PopCatalin means the transfer of logic from server side to client side, then it's true that the applicative validation I showed in the previous article runs exclusively on the server. I wonder, however, how much of such custom validation as <code>NaturalNumberAttribute</code> automatically transfers to the client side.
</p>
<p>
When it comes to globalisation, I fail to see how applicative validation is less globalisable than the ASP.NET validation framework. One could easily replace the hard-coded strings in my examples with resource strings.
</p>
<p>
It would seem, again, that <a href="https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule">any sufficiently complicated custom validation framework contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of applicative validation</a>.
</p>
<blockquote>
<p>
"I must admit I really liked the declarative OOP model using annotations when I first saw it in Java (EJB3.0, almost 20yrs ago) until I saw FP way of doing things. FP way is so much simpler and powerful, because it's just function composition, nothing more, no hidden "magic"."
</p>
<footer><cite><a href="https://twitter.com/witoldsz/status/1552429555503493120">Witold Szczerba</a></cite></footer>
</blockquote>
<p>
I still find myself in the same camp as Witold Szczerba. It's easy to get started using validation annotations, but it doesn't follow that it's simpler or better in the long run. As <a href="https://en.wikipedia.org/wiki/Rich_Hickey">Rich Hickey</a> points out in <a href="https://www.infoq.com/presentations/Simple-Made-Easy/">Simple Made Easy</a>, <em>simple</em> and <em>easy</em> isn't the same. If I have to maintain code, I'll usually choose the simple solution over the easy solution. That means choosing applicative validation over a framework-specific validation API.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="f6deac18851d47c3b066f82a8be3847d">
<div class="comment-author">Maurice Johnson <a href="#f6deac18851d47c3b066f82a8be3847d">#</a></div>
<div class="comment-content">
<p>
Hello Mark. I was just wondering, is it possible to use the type system to do the validation instead ?
</p>
<p>
What I mean is, for example, to make all the ReservationDto's field a type with validation in the constructor (like a class name, a class email, and so on). Normally, when the framework will build ReservationDto, it will try to construct the fields using the type constructor, and if there is an explicit error thrown during the construction, the framework will send us back the error with the provided message.
</p>
<p>
Plus, I think types like "email", "name" and "at" are reusable. And I feel like we have more possibilities for validation with that way of doing than with the validation attributes.
</p>
<p>
What do you think ?
</p>
<p>
Regards.
</p>
</div>
<div class="comment-date">2022-08-16 08:30 UTC</div>
</div>
<div class="comment" id="89a9ef6d57f645d7a1e848aaf20b8b72">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#89a9ef6d57f645d7a1e848aaf20b8b72">#</a></div>
<div class="comment-content">
<p>
Maurice, thank you for writing. I started writing a reply, but it grew, so I'm going to turn it into a blog post. I'll post an update here once I've published it, but expect it to take a few weeks.
</p>
</div>
<div class="comment-date">2022-08-18 7:50 UTC</div>
</div>
<div class="comment" id="e932eb2fd0804a049314872d5fe6a358">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#e932eb2fd0804a049314872d5fe6a358">#</a></div>
<div class="comment-content">
<p>
I've published the article: <a href="/2022/08/22/can-types-replace-validation">Can types replace validation?</a>.
</p>
</div>
<div class="comment-date">2022-08-22 6:00 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Endomorphism as an invariant functorhttps://blog.ploeh.dk/2022/08/08/endomorphism-as-an-invariant-functor2022-08-08T04:43:00+00:00Mark Seemann
<div id="post">
<p>
<em>An article (also) for object-oriented programmers.</em>
</p>
<p>
This article is part of <a href="/2022/08/01/invariant-functors">a series of articles about invariant functors</a>. An invariant functor is a <a href="/2018/03/22/functors">functor</a> that is neither covariant nor contravariant. See the series introduction for more details.
</p>
<p>
An <a href="https://en.wikipedia.org/wiki/Endomorphism">endomorphism</a> is a function where the return type is the same as the input type.
</p>
<p>
In <a href="https://www.haskell.org/">Haskell</a> we denote an endomorphism as <code>a -> a</code>, in <a href="http://fsharp.org/">F#</a> we have to add an apostrophe: <code>'a -> 'a</code>, while in C# such a function corresponds to the delegate <code>Func<T, T></code> or, alternatively, to a method that has the same return type as input type.
</p>
<p>
In Haskell you can treat an endomorphism like a <a href="/2017/10/06/monoids">monoid</a> by wrapping it in a <a href="https://bartoszmilewski.com/2014/01/14/functors-are-containers">container</a> called <a href="https://hackage.haskell.org/package/base/docs/Data-Monoid.html#t:Endo">Endo</a>: <code>Endo a</code>. In C#, we <a href="/2021/05/31/from-state-tennis-to-endomorphism">might model it as an interface</a> called <code><span style="color:#2b91af;">IEndomorphism</span><<span style="color:#2b91af;">T</span>></code>.
</p>
<p>
That looks enough like a functor that you might wonder if it is one, but it turns out that it's neither co- nor contravariant. You can deduce this with positional variance analysis (which I've learned from <a href="https://thinkingwithtypes.com/">Thinking with Types</a>). In short, this is because <code>T</code> appears as both input and output - it's neither co- nor contravariant, but rather <em>invariant</em>.
</p>
<h3 id="39fa705060f143aba4c18dd2c8ff7f2d">
Explicit endomorphism interface in C# <a href="#39fa705060f143aba4c18dd2c8ff7f2d" title="permalink">#</a>
</h3>
<p>
Consider an <code><span style="color:#2b91af;">IEndomorphism</span><<span style="color:#2b91af;">T</span>></code> interface in C#:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IEndomorphism</span><<span style="color:#2b91af;">T</span>>
{
T <span style="color:#74531f;">Run</span>(T <span style="color:#1f377f;">x</span>);
}</pre>
</p>
<p>
I've borrowed this interface from the article <a href="/2021/05/31/from-state-tennis-to-endomorphism">From State tennis to endomorphism</a>. In that article I explain that I only introduce this interface for educational reasons. I don't expect you to use something like this in production code bases. On the other hand, everything that applies to <code><span style="color:#2b91af;">IEndomorphism</span><<span style="color:#2b91af;">T</span>></code> also applies to 'naked' functions, as you'll see later in the article.
</p>
<p>
As outlined in the introduction, you can make a container an invariant functor by implementing a non-standard version of <code>Select</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IEndomorphism<B> <span style="color:#74531f;">Select</span><<span style="color:#2b91af;">A</span>, <span style="color:#2b91af;">B</span>>(
<span style="color:blue;">this</span> IEndomorphism<A> <span style="color:#1f377f;">endomorphism</span>,
Func<A, B> <span style="color:#1f377f;">aToB</span>,
Func<B, A> <span style="color:#1f377f;">bToA</span>)
{
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> SelectEndomorphism<A, B>(endomorphism, aToB, bToA);
}
<span style="color:blue;">private</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">SelectEndomorphism</span><<span style="color:#2b91af;">A</span>, <span style="color:#2b91af;">B</span>> : IEndomorphism<B>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IEndomorphism<A> endomorphism;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> Func<A, B> aToB;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> Func<B, A> bToA;
<span style="color:blue;">public</span> <span style="color:#2b91af;">SelectEndomorphism</span>(
IEndomorphism<A> <span style="color:#1f377f;">endomorphism</span>,
Func<A, B> <span style="color:#1f377f;">aToB</span>,
Func<B, A> <span style="color:#1f377f;">bToA</span>)
{
<span style="color:blue;">this</span>.endomorphism = endomorphism;
<span style="color:blue;">this</span>.aToB = aToB;
<span style="color:blue;">this</span>.bToA = bToA;
}
<span style="color:blue;">public</span> B <span style="color:#74531f;">Run</span>(B <span style="color:#1f377f;">x</span>)
{
<span style="color:#8f08c4;">return</span> aToB(endomorphism.Run(bToA(x)));
}
}</pre>
</p>
<p>
Since the <code>Select</code> method has to return an <code>IEndomorphism<B></code> implementation, one option is to use a private, nested class. Most of this is <a href="/2019/12/16/zone-of-ceremony">ceremony</a> required because it's working with interfaces. The interesting part is the nested class' <code>Run</code> implementation.
</p>
<p>
In order to translate an <code>IEndomorphism<A></code> to an <code>IEndomorphism<B></code>, the <code>Run</code> method first uses <code>bToA</code> to translate <code>x</code> to an <code>A</code> value. Once it has the <code>A</code> value, it can <code>Run</code> the <code>endomorphism</code>, which returns another <code>A</code> value. Finally, the method can use <code>aToB</code> to convert the returned <code>A</code> value to a <code>B</code> value that it can return.
</p>
<p>
Here's a simple example. Imagine that you have an endomorphism like this one:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Incrementer</span> : IEndomorphism<BigInteger>
{
<span style="color:blue;">public</span> BigInteger <span style="color:#74531f;">Run</span>(BigInteger <span style="color:#1f377f;">x</span>)
{
<span style="color:#8f08c4;">return</span> x + 1;
}
}</pre>
</p>
<p>
This one simply increments a <a href="https://docs.microsoft.com/dotnet/api/system.numerics.biginteger">BigInteger</a> value. Since <code>BigInteger</code> is isomorphic to a byte array, it's possible to transform this <code>BigInteger</code> endomorphism to a byte array endomorphism:
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:blue;">new</span> <span style="color:blue;">byte</span>[0], <span style="color:blue;">new</span> <span style="color:blue;">byte</span>[] { 1 })]
[InlineData(<span style="color:blue;">new</span> <span style="color:blue;">byte</span>[] { 1 }, <span style="color:blue;">new</span> <span style="color:blue;">byte</span>[] { 2 })]
[InlineData(<span style="color:blue;">new</span> <span style="color:blue;">byte</span>[] { 255, 0 }, <span style="color:blue;">new</span> <span style="color:blue;">byte</span>[] { 0, 1 })]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">InvariantSelection</span>(<span style="color:blue;">byte</span>[] <span style="color:#1f377f;">bs</span>, <span style="color:blue;">byte</span>[] <span style="color:#1f377f;">expected</span>)
{
IEndomorphism<BigInteger> <span style="color:#1f377f;">source</span> = <span style="color:blue;">new</span> Incrementer();
IEndomorphism<<span style="color:blue;">byte</span>[]> <span style="color:#1f377f;">destination</span> =
source.Select(<span style="color:#1f377f;">bi</span> => bi.ToByteArray(), <span style="color:#1f377f;">arr</span> => <span style="color:blue;">new</span> BigInteger(arr));
Assert.Equal(expected, destination.Run(bs));
}</pre>
</p>
<p>
You can convert a <code>BigInteger</code> to a byte array with the <code>ToByteArray</code> method, and convert such a byte array back to a <code>BigInteger</code> using one of its constructor overloads. Since this is possible, the example test can convert this <code>IEndomorphism<BigInteger></code> to an <code>IEndomorphism<<span style="color:blue;">byte</span>[]></code> and later <code>Run</code> it.
</p>
<h3 id="f9afa75c0b014df68d29ddf1244f329f">
Mapping functions in F# <a href="#f9afa75c0b014df68d29ddf1244f329f" title="permalink">#</a>
</h3>
<p>
You don't need an interface in order to turn an endomorphism into an invariant functor. An endomorphism is just a function that has the same input and output type. In C# such a function has the type <code>Func<T, T></code>, while in F# it's written <code>'a <span style="color:blue;">-></span> 'a</code>.
</p>
<p>
You could write an F# module that defines an <code>invmap</code> function, which would be equivalent to the above <code>Select</code> method:
</p>
<p>
<pre><span style="color:blue;">module</span> Endo =
<span style="color:green;">// ('a -> 'b) -> ('b -> 'a) -> ('a -> 'a) -> ('b -> 'b)</span>
<span style="color:blue;">let</span> invmap (f : 'a <span style="color:blue;">-></span> 'b) (g : 'b <span style="color:blue;">-></span> 'a) (h : 'a <span style="color:blue;">-></span> 'a) = g >> h >> f</pre>
</p>
<p>
Since this function doesn't have to deal with the ceremony of interfaces, the implementation is simple function composition: For any input, first apply it to the <code>g</code> function, then apply the output to the <code>h</code> function, and again apply the output of that function to the <code>f</code> function.
</p>
<p>
Here's the same example as above:
</p>
<p>
<pre><span style="color:blue;">let</span> increment (bi : BigInteger) = bi + BigInteger.One
<span style="color:green;">// byte [] -> byte []</span>
<span style="color:blue;">let</span> bArrInc =
Endo.invmap (<span style="color:blue;">fun</span> (bi : BigInteger) <span style="color:blue;">-></span> bi.ToByteArray ()) BigInteger increment</pre>
</p>
<p>
Here's a simple sanity check of the <code>bArrInc</code> function executed in F# Interactive:
</p>
<p>
<pre>> let bArr = bArrInc [| 255uy; 255uy; 0uy |];;
val bArr : byte [] = [|0uy; 0uy; 1uy|]</pre>
</p>
<p>
If you are wondering about that particular output value, I'll refer you to <a href="https://docs.microsoft.com/dotnet/api/system.numerics.biginteger">the BigInteger documentation</a>.
</p>
<h3 id="76c32ad7688b41fc9e083a51584e4a6a">
Function composition <a href="#76c32ad7688b41fc9e083a51584e4a6a" title="permalink">#</a>
</h3>
<p>
The F# implementation of <code>invmap</code> (<code>g >> h >> f</code>) makes it apparent that an endomorphism is an invariant functor via function composition. In F#, though, that fact almost disappears in all the type declaration ceremony. In the Haskell instance from the <a href="https://hackage.haskell.org/package/invariant">invariant</a> package it's even clearer:
</p>
<p>
<pre><span style="color:blue;">instance</span> <span style="color:blue;">Invariant</span> <span style="color:blue;">Endo</span> <span style="color:blue;">where</span>
invmap f g (Endo h) = Endo (f . h . g)</pre>
</p>
<p>
Perhaps a diagram is helpful:
</p>
<p>
<img src="/content/binary/invariant-map-diagram.png" alt="Arrow diagram showing the mapping from an endomorphism in a to an endomorphism in b.">
</p>
<p>
If you have a function <code>h</code> from the type <code>a</code> to <code>a</code> and you need a function <code>b -> b</code>, you can produce it by putting <code>g</code> in front of <code>h</code>, and <code>f</code> after. That's also what the above C# implementation does. In F#, you can express such a composition as <code>g >> h >> f</code>, which seems natural to most westerners, since it goes from left to right. In Haskell, most expressions are instead expressed from right to left, so it becomes: <code>f . h . g</code>. In any case, the result is the desired function that takes a <code>b</code> value as input and returns a <code>b</code> value as output. That composed function is indicated by a dashed arrow in the above diagram.
</p>
<h3 id="3934e941d51f47c698d8b803b7adb97c">
Identity law <a href="#3934e941d51f47c698d8b803b7adb97c" title="permalink">#</a>
</h3>
<p>
Contrary to my usual habit, I'm going to <em>prove</em> that both invariant functor laws hold for this implementation. I'll use equational reasoning with <a href="https://bartoszmilewski.com/2015/01/20/functors/">the notation that Bartosz Milewski uses</a>. Here's the proof that the <code>invmap</code> instance obeys the identity law:
</p>
<p>
<pre> invmap id id (Endo h)
= { definition of invmap }
Endo (id . h . id)
= { eta expansion }
Endo (\x -> (id . h . id) x)
= { defintion of composition (.) }
Endo (\x -> id (h (id x)))
= { defintion of id }
Endo (\x -> h x)
= { eta reduction }
Endo h
= { definition of id }
id (Endo h)</pre>
</p>
<p>
While I'm not going to comment further on that, I can show you what the identity law looks like in C#:
</p>
<p>
<pre>[Theory]
[InlineData(0)]
[InlineData(1)]
[InlineData(9)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">IdentityLaw</span>(<span style="color:blue;">long</span> <span style="color:#1f377f;">l</span>)
{
IEndomorphism<BigInteger> <span style="color:#1f377f;">e</span> = <span style="color:blue;">new</span> Incrementer();
IEndomorphism<BigInteger> <span style="color:#1f377f;">actual</span> = e.Select(<span style="color:#1f377f;">x</span> => x, <span style="color:#1f377f;">x</span> => x);
Assert.Equal(e.Run(l), actual.Run(l));
}</pre>
</p>
<p>
In C#, you typically write the identity function (<code>id</code> in F# and Haskell) as the lambda expression <code><span style="color:#1f377f;">x</span> => x</code>, since the identity function isn't 'built in' for C# like it is for F# and Haskell. (You can define it yourself, but it's not as <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a>.)
</p>
<h3 id="e47f375aafe441949e057787a5d35178">
Composition law <a href="#e47f375aafe441949e057787a5d35178" title="permalink">#</a>
</h3>
<p>
As with the identity law, I'll start by suggesting a proof for the composition law for the Haskell instance:
</p>
<p>
<pre> invmap f2 f2' $ invmap f1 f1' (Endo h)
= { definition of invmap }
invmap f2 f2' $ Endo (f1 . h . f1')
= { defintion of ($) }
invmap f2 f2' (Endo (f1 . h . f1'))
= { definition of invmap }
Endo (f2 . (f1 . h . f1') . f2')
= { associativity of composition (.) }
Endo ((f2 . f1) . h . (f1' . f2'))
= { definition of invmap }
invmap (f2 . f1) (f1' . f2') (Endo h)</pre>
</p>
<p>
As above, a C# example may also help. First, assume that you have some endomorphism like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">SecondIncrementer</span> : IEndomorphism<TimeSpan>
{
<span style="color:blue;">public</span> TimeSpan <span style="color:#74531f;">Run</span>(TimeSpan <span style="color:#1f377f;">x</span>)
{
<span style="color:#8f08c4;">return</span> x + TimeSpan.FromSeconds(1);
}
}</pre>
</p>
<p>
A test then demonstrates the composition law in action:
</p>
<p>
<pre>[Theory]
[InlineData(-3)]
[InlineData(0)]
[InlineData(11)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">CompositionLaw</span>(<span style="color:blue;">long</span> <span style="color:#1f377f;">x</span>)
{
IEndomorphism<TimeSpan> <span style="color:#1f377f;">i</span> = <span style="color:blue;">new</span> SecondIncrementer();
Func<TimeSpan, <span style="color:blue;">long</span>> <span style="color:#1f377f;">f1</span> = <span style="color:#1f377f;">ts</span> => ts.Ticks;
Func<<span style="color:blue;">long</span>, TimeSpan> <span style="color:#1f377f;">f1p</span> = <span style="color:#1f377f;">l</span> => <span style="color:blue;">new</span> TimeSpan(l);
Func<<span style="color:blue;">long</span>, IntPtr> <span style="color:#1f377f;">f2</span> = <span style="color:#1f377f;">l</span> => <span style="color:blue;">new</span> IntPtr(l);
Func<IntPtr, <span style="color:blue;">long</span>> <span style="color:#1f377f;">f2p</span> = <span style="color:#1f377f;">ip</span> => ip.ToInt64();
IEndomorphism<IntPtr> <span style="color:#1f377f;">left</span> = i.Select(f1, f1p).Select(f2, f2p);
IEndomorphism<IntPtr> <span style="color:#1f377f;">right</span> = i.Select(<span style="color:#1f377f;">ts</span> => f2(f1(ts)), <span style="color:#1f377f;">ip</span> => f1p(f2p(ip)));
Assert.Equal(left.Run(<span style="color:blue;">new</span> IntPtr(x)), right.Run(<span style="color:blue;">new</span> IntPtr(x)));
}</pre>
</p>
<p>
Don't try to make any sense of this. As outlined in the introductory article, in order to use an invariant functor, you're going to need an isomorphism. In order to demonstrate the composition law, you need <em>three</em> types that are isomorphic. Since you can convert back and forth between <code>TimeSpan</code> and <code>IntPtr</code> via <code>long</code>, this requirement is formally fulfilled. It doesn't make any sense to add a second to a value and then turn it into a function that changes a pointer. It sounds more like a security problem waiting to happen... Don't try this at home, kids.
</p>
<h3 id="ac1e8d3a75b54f9691c616755750ff77">
Conclusion <a href="#ac1e8d3a75b54f9691c616755750ff77" title="permalink">#</a>
</h3>
<p>
Since an endomorphism can be modelled as a 'generic type', it may look like a candidate for a functor or <a href="/2021/09/02/contravariant-functors">contravariant functor</a>, but alas, neither is possible. The best we can get (apart from <a href="/2017/11/13/endomorphism-monoid">a monoid instance</a>) is an invariant functor.
</p>
<p>
The invariant functor instance for an endomorphism turns out to be simple function composition. That's not how all invariant functors, work, though.
</p>
<p>
<strong>Next:</strong> <a href="/2022/08/29/natural-transformations-as-invariant-functors">Natural transformations as invariant functors</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Invariant functorshttps://blog.ploeh.dk/2022/08/01/invariant-functors2022-08-01T05:49:00+00:00Mark Seemann
<div id="post">
<p>
<em>Containers that support mapping isomorphic values.</em>
</p>
<p>
This article series is part of <a href="/2018/03/19/functors-applicatives-and-friends">a larger series of articles about functors, applicatives, and other mappable containers</a>. So far, you've seen examples of both co- and <a href="/2021/09/02/contravariant-functors">contravariant functors</a>, including <a href="/2021/11/01/profunctors">profunctors</a>. You've also seen a few examples of <a href="/2020/10/19/monomorphic-functors">monomorphic functors</a> - mappable containers where there's no variance at all.
</p>
<p>
What happens, on the other hand, if you have a container of (generic) values, but it's neither co- nor contravariant? An <a href="https://en.wikipedia.org/wiki/Endomorphism">endomorphism</a> is an example - it's neither co- nor contravariant. You'll see a treatment of that in a later article.
</p>
<p>
Even if neither co- nor contravariant mappings exists for a container, all may not be lost. It may still be an <em>invariant functor</em>.
</p>
<h3 id="40d4c4a6deed4af593b3cf002563f085">
Invariance <a href="#40d4c4a6deed4af593b3cf002563f085" title="permalink">#</a>
</h3>
<p>
Consider a <a href="https://bartoszmilewski.com/2014/01/14/functors-are-containers">container</a> <code>f</code> (for <em>functor</em>). Depending on its variance, we call it <em>covariant</em>, <em>contravariant</em>, or <em>invariant</em>:
</p>
<ul>
<li><em>Covariance</em> means that any function <code>a -> b</code> can be lifted into a function <code>f a -> f b</code>.</li>
<li><em>Contravariance</em> means that any function <code>a -> b</code> can be lifted into a function <code>f b -> f a</code>.</li>
<li><em>Invariance</em> means that in general, no function <code>a -> b</code> can be lifted into a function over <code>f a</code>.</li>
</ul>
<p>
<em>In general</em>, that is. A limited escape hatch exists:
</p>
<blockquote>
<p>
"an invariant type [...] allows you to map from <code>a</code> to <code>b</code> if and only if <code>a</code> and <code>b</code> are isomorphic. In a very real sense, this isn't an interesting property - an isomorphism between <code>a</code> and <code>b</code> means they're already the same thing to begin with."
</p>
<footer><cite>Sandy Maguire, <a href="https://thinkingwithtypes.com/">Thinking with Types</a></cite></footer>
</blockquote>
<p>
In <a href="https://www.haskell.org/">Haskell</a> we may define an invariant functor (AKA <a href="http://comonad.com/reader/2008/rotten-bananas/">exponential functor</a>) as in the <a href="https://hackage.haskell.org/package/invariant">invariant</a> package:
</p>
<p>
<pre><span style="color:blue;">class</span> <span style="color:#2b91af;">Invariant</span> f <span style="color:blue;">where</span>
<span style="color:#2b91af;">invmap</span> :: (a <span style="color:blue;">-></span> b) <span style="color:blue;">-></span> (b <span style="color:blue;">-></span> a) <span style="color:blue;">-></span> f a <span style="color:blue;">-></span> f b</pre>
</p>
<p>
This means that an <em>invariant functor</em> <code>f</code> is a container of values where a translation from <code>f a</code> to <code>f b</code> exists if it's possible to translate contained values both ways: From <code>a</code> to <code>b</code>, and from <code>b</code> to <code>a</code>. Callers of the <code>invmap</code> function must supply translations that go both ways.
</p>
<h3 id="36e114d8871c46c9addb2e38936a2872">
Invariant functor in C# <a href="#36e114d8871c46c9addb2e38936a2872" title="permalink">#</a>
</h3>
<p>
It's possible to translate the concept to a language like C#. Since C# doesn't have higher-kinded types, we have to examine the abstraction as a set of patterns or templates. For <a href="/2018/03/22/functors">functors</a> and <a href="/2022/03/28/monads">monads</a>, the C# compiler can perform 'compile-time duck typing' to recognise these motifs to enable query syntax. For more advanced or exotic universal abstractions, such as <a href="/2018/12/24/bifunctors">bifunctors</a>, <a href="/2021/11/01/profunctors">profunctors</a>, or invariant functors, we have to use a concrete container type as a stand-in for 'any' functor. In this article, I'll call it <code><span style="color:#2b91af;">Invariant</span><<span style="color:#2b91af;">A</span>></code>.
</p>
<p>
Such a generic class must have a mapping function that corresponds to the above <code>invmap</code>. In C# it has this signature:
</p>
<p>
<pre><span style="color:blue;">public</span> Invariant<B> <span style="color:#74531f;">InvMap</span><<span style="color:#2b91af;">B</span>>(Func<A, B> <span style="color:#1f377f;">aToB</span>, Func<B, A> <span style="color:#1f377f;">bToA</span>)</pre>
</p>
<p>
In this example, <code>InvMap</code> is an instance method on <code><span style="color:#2b91af;">Invariant</span><<span style="color:#2b91af;">A</span>></code>. You may use it like this:
</p>
<p>
<pre>Invariant<<span style="color:blue;">long</span>> <span style="color:#1f377f;">il</span> = createInvariant();
Invariant<TimeSpan> <span style="color:#1f377f;">its</span> = il.InvMap(<span style="color:#1f377f;">l</span> => <span style="color:blue;">new</span> TimeSpan(l), <span style="color:#1f377f;">ts</span> => ts.Ticks);</pre>
</p>
<p>
It's not that easy to find good examples of truly isomorphic primitives, but <a href="https://docs.microsoft.com/dotnet/api/system.timespan">TimeSpan</a> is just a useful wrapper of <code>long</code>, so it's possible to translate back and forth without loss of information. To create a <code>TimeSpan</code> from a <code>long</code>, you can use the suitable constructor overload. To get a <code>long</code> from a <code>TimeSpan</code>, you can read the <a href="https://docs.microsoft.com/dotnet/api/system.timespan.ticks">Ticks</a> property.
</p>
<p>
Perhaps you find a method name like <code>InvMap</code> non-idiomatic in C#. Perhaps a more <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> name might be <code>Select</code>? That's not a problem:
</p>
<p>
<pre><span style="color:blue;">public</span> Invariant<B> <span style="color:#74531f;">Select</span><<span style="color:#2b91af;">B</span>>(Func<A, B> <span style="color:#1f377f;">aToB</span>, Func<B, A> <span style="color:#1f377f;">bToA</span>)
{
<span style="color:#8f08c4;">return</span> InvMap(aToB, bToA);
}</pre>
</p>
<p>
In that case, usage would look like this:
</p>
<p>
<pre>Invariant<<span style="color:blue;">long</span>> <span style="color:#1f377f;">il</span> = createInvariant();
Invariant<TimeSpan> <span style="color:#1f377f;">its</span> = il.Select(<span style="color:#1f377f;">l</span> => <span style="color:blue;">new</span> TimeSpan(l), <span style="color:#1f377f;">ts</span> => ts.Ticks);</pre>
</p>
<p>
In this article, I'll use <code>Select</code> in order to be consistent with C# naming conventions. Using that name, however, will not make query syntax light up. While the name is fine, the signature is not one that the C# compiler will recognise as enabling special syntax. The name does, however, suggest a kinship with a normal functor, where the mapping in C# is called <code>Select</code>.
</p>
<h3 id="4a5c0a55258d4e398931d2880df059fb">
Laws <a href="#4a5c0a55258d4e398931d2880df059fb" title="permalink">#</a>
</h3>
<p>
As is usual with these kinds of universal abstractions, an invariant functor must satisfy a few laws.
</p>
<p>
The first one we might call the <em>identity law:</em>
</p>
<p>
<pre>invmap id id = id</pre>
</p>
<p>
This law corresponds to the first functor law. When performing the mapping operation, if the values in the invariant functor are mapped to themselves, the result will be an unmodified functor.
</p>
<p>
In C# such a mapping might look like this:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="color:#1f377f;">actual</span> = i.Select(<span style="color:#1f377f;">x</span> => x, <span style="color:#1f377f;">x</span> => x);</pre>
</p>
<p>
The law then says that <code>actual</code> should be equal to <code>i</code>.
</p>
<p>
The second law we might call the <em>composition law:</em>
</p>
<p>
<pre>invmap f2 f2' . invmap f1 f1' = invmap (f2 . f1) (f1' . f2')</pre>
</p>
<p>
Granted, this looks more complicated, but also directly corresponds to the second functor law. If two sequential mapping operations are performed one after the other, the result should be the same as a single mapping operation where the functions are composed.
</p>
<p>
In C# the left-hand side might look like this:
</p>
<p>
<pre>Invariant<IntPtr> <span style="color:#1f377f;">left</span> = i.Select(f1, f1p).Select(f2, f2p);</pre>
</p>
<p>
In C# you can't name functions or variables with a quotation mark (like the Haskell code's <code>f1'</code> and <code>f2'</code>), so instead I named them <code>f1p</code> and <code>f2p</code> (with a <em>p</em> for <em>prime</em>).
</p>
<p>
Likewise, the right-hand side might look like this:
</p>
<p>
<pre>Invariant<IntPtr> <span style="color:#1f377f;">right</span> = i.Select(<span style="color:#1f377f;">ts</span> => f2(f1(ts)), <span style="color:#1f377f;">ip</span> => f1p(f2p(ip)));</pre>
</p>
<p>
The composition law says that the <code>left</code> and <code>right</code> values must be equal.
</p>
<p>
You'll see some more detailed examples in later articles.
</p>
<h3 id="14c58dc57f9744dfbc955015d03b871e">
Examples <a href="#14c58dc57f9744dfbc955015d03b871e" title="permalink">#</a>
</h3>
<p>
This is all too abstract to seem useful in itself, so example are warranted. You'll be able to peruse examples of specific invariant functors in separate articles:
</p>
<ul>
<li><a href="/2022/08/08/endomorphism-as-an-invariant-functor">Endomorphism as an invariant functor</a></li>
<li><a href="/2022/08/29/natural-transformations-as-invariant-functors">Natural transformations as invariant functors</a></li>
<li><a href="/2022/12/26/functors-as-invariant-functors">Functors as invariant functors</a></li>
<li><a href="/2023/02/06/contravariant-functors-as-invariant-functors">Contravariant functors as invariant functors</a></li>
</ul>
<p>
As two of the titles suggest, all functors are also invariant functors, and the same goes for contravariant functors:
</p>
<p>
<img src="/content/binary/invariant-functor-set-diagram.png" alt="Set diagram. The biggest set labelled invariant functos contains two other sets labelled functors and invariant functors.">
</p>
<p>
To be honest, invariant functors are exotic, and you are unlikely to need them in all but the rarest cases. Still, I <em>did</em> run into a scenario where I needed an invariant functor instance to be able to perform <a href="/2022/09/05/the-state-pattern-and-the-state-monad">a particular sleight of hand</a>. The rabbit holes we sometimes fall into...
</p>
<h3 id="ffa0e597f09c40e7bcd1cabe808aa0d5">
Conclusion <a href="#ffa0e597f09c40e7bcd1cabe808aa0d5" title="permalink">#</a>
</h3>
<p>
Invariant functors form a set that contains both co- and contravariant functors, as well as some data structures that are neither. This is an exotic abstraction that you may never need. It did, however, get me out of a bind at one time.
</p>
<strong>Next:</strong> <a href="/2022/08/08/endomorphism-as-an-invariant-functor">Endomorphism as an invariant functor</a>.
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="faae4a8cbe294be8ac7596488d675483">
<div class="comment-author"><a href="https://about.me/tysonwilliams">Tyson Williams</a> <a href="#faae4a8cbe294be8ac7596488d675483">#</a></div>
<div class="comment-content">
<blockquote>
For <a href="/2018/03/22/functors">functors</a> and <a href="/2022/03/28/monads">monads</a>, the C# compiler can perform 'compile-time duck typing' to recognise these motifs to enable query syntax.
</blockquote>
<p>
Instead of 'compile-time duck typing', I think a better phrase to describe this is <a href="https://en.wikipedia.org/wiki/Structural_type_system">structural typing</a>.
</p>
</div>
<div class="comment-date">2022-09-17 16:20 UTC</div>
</div>
<div class="comment" id="98be00b677484b11a51d8e583c25baa5">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#98be00b677484b11a51d8e583c25baa5">#</a></div>
<div class="comment-content">
<p>
Tyson, thank you for writing. I wasn't aware of the term <em>structural typing</em>, so thank you for the link. I've now read that Wikipedia article, but all I know is what's there. Based on it, though, it looks as though <a href="https://fsharp.org">F#</a>'s <a href="https://learn.microsoft.com/dotnet/fsharp/language-reference/generics/statically-resolved-type-parameters">Statically Resolved Type Parameters</a> are another example of structural typing, in addition to the <a href="https://ocaml.org/">OCaml</a> example given in the article.
</p>
<p>
IIRC, <a href="https://www.purescript.org/">PureScript</a>'s <em>row polymorphism</em> may be another example, but it's been many years since <a href="/2017/06/06/fractal-trees-with-purescript">I played with it</a>. In other words, I could be mistaken.
</p>
<p>
Based on the Wikipedia article, it looks as though structural typing is more concerned with polymorphism, but granted, so is duck typing. Given how wrong 'compile-time duck typing' actually is in the above context, 'structural typing' seems more correct.
</p>
<p>
I may still stick with 'compile-time duck typing' as a loose metaphor, though, because most people know what duck typing is, whereas I'm not sure as many people know of structural typing. The purpose of the metaphor is, after all, to be helpful.
</p>
</div>
<div class="comment-date">2022-09-19 14:46 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.An applicative reservation validation example in C#https://blog.ploeh.dk/2022/07/25/an-applicative-reservation-validation-example-in-c2022-07-25T06:56:00+00:00Mark Seemann
<div id="post">
<p>
<em>How to return all relevant error messages in a composable way.</em>
</p>
<p>
I've previously suggested that <a href="/2020/12/14/validation-a-solved-problem">I consider validation a solved problem</a>. I still do, until someone disproves me with a counterexample. Here's a fairly straightforward <a href="/2018/11/05/applicative-validation">applicative validation</a> example in C#.
</p>
<p>
After corresponding and speaking with readers of <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a> I've learned that some readers have objections to the following lines of code:
</p>
<p>
<pre>Reservation? <span style="color:#1f377f;">reservation</span> = dto.Validate(id);
<span style="color:#8f08c4;">if</span> (reservation <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> BadRequestResult();</pre>
</p>
<p>
This code snippet demonstrates how to <a href="https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/">parse, not validate</a>, an incoming Data Transfer Object (DTO). This code base uses C#'s <a href="https://docs.microsoft.com/dotnet/csharp/nullable-references">nullable reference types</a> feature to distinguish between null and non-null objects. Other languages (and earlier versions of C#) can instead use <a href="/2022/04/25/the-maybe-monad">the Maybe monad</a>. Nothing in this article or the book hinges on the <em>nullable reference types</em> feature.
</p>
<p>
If the <code>Validate</code> method (which I really should have called <code>TryParse</code> instead) returns a null value, the Controller from which this code snippet is taken returns a <code>400 Bad Request</code> response.
</p>
<p>
The <code>Validate</code> method is an instance method on the DTO class:
</p>
<p>
<pre><span style="color:blue;">internal</span> Reservation? <span style="color:#74531f;">Validate</span>(Guid <span style="color:#1f377f;">id</span>)
{
<span style="color:#8f08c4;">if</span> (!DateTime.TryParse(At, <span style="color:blue;">out</span> var <span style="color:#1f377f;">d</span>))
<span style="color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="color:#8f08c4;">if</span> (Email <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="color:#8f08c4;">if</span> (Quantity < 1)
<span style="color:#8f08c4;">return</span> <span style="color:blue;">null</span>;
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> Reservation(
id,
d,
<span style="color:blue;">new</span> Email(Email),
<span style="color:blue;">new</span> Name(Name ?? <span style="color:#a31515;">""</span>),
Quantity);
}</pre>
</p>
<p>
What irks some readers is the loss of information. While <code>Validate</code> 'knows' why it's rejecting a candidate, that information is lost and no error message is communicated to unfortunate HTTP clients.
</p>
<p>
One email from a reader went on about this for quite some time and I got the impression that the sender considered this such a grave flaw that it invalidates the entire book.
</p>
<p>
That's not the case.
</p>
<h3 id="26803ee560dd42be825ec266694d0211">
Rabbit hole, evaded <a href="#26803ee560dd42be825ec266694d0211" title="permalink">#</a>
</h3>
<p>
When I wrote the code like above, I was fully aware of trade-offs and priorities. I understood that this particular design would mean that clients get no information about <em>why</em> a particular reservation JSON document is rejected - only that it is.
</p>
<p>
This was a simplification that I explicitly decided to make for educational reasons.
</p>
<p>
The above design is based on something as simple as a null check. I expect all my readers to be able to follow that code. As hinted above, you could also model a method like <code>Validate</code> with the Maybe monad, but while Maybe preserves success cases, it throws away all information about errors. In a production system, this is rarely acceptable, but I found it acceptable for the example code in the book, since this isn't the main topic.
</p>
<p>
Instead of basing the design on nullable reference types or the Maybe monad, you can instead base parsing on applicative validation. In order to explain that, I'd first need to explain <a href="/2018/03/22/functors">functors</a>, <a href="/2018/10/01/applicative-functors">applicative functors</a>, and applicative validation. It might also prove helpful to the reader to explain <a href="/2018/05/22/church-encoding">Church encodings</a>, <a href="/2018/12/24/bifunctors">bifunctors</a>, and <a href="/2017/11/27/semigroups">semigroups</a>. That's quite a rabbit hole to fall into, and I felt that it would be such a big digression from the themes of the book that I decided not to go there.
</p>
<p>
On this blog, however, I have all the space and time I'd like. I can digress as much as I'd like. Most of that digression has already happened. Those articles are already on the blog. I'm going to assume that you've read all of the articles I just linked, or that you understand these concepts.
</p>
<p>
In this article, I'm going to rewrite the DTO parser to also return error messages. It's an entirely local change that breaks no existing tests.
</p>
<h3 id="5f6a7326fb484c1d8b4402b24e49d13b">
Validated <a href="#5f6a7326fb484c1d8b4402b24e49d13b" title="permalink">#</a>
</h3>
<p>
Most functional programmers are already aware of the <a href="/2022/05/09/an-either-monad">Either monad</a>. They often reach for it when they need to expand the Maybe monad with <a href="https://fsharpforfunandprofit.com/posts/recipe-part2/">an error track</a>.
</p>
<p>
The problem with the Either monad is, however, that it short-circuits error handling. It's like throwing exceptions. As soon as an Either composition hits the first error, it stops processing the rest of the data. As a caller, you only get one error message, even if there's more than one thing wrong with your input value.
</p>
<p>
In a distributed system where a client posts a document to a service, you'd like to respond with a collection of errors.
</p>
<p>
You can do this with a data type that's isomorphic with Either, but behaves differently as an applicative functor. Instead of short-circuiting on the first error, it collects them. This, however, turns out to be incompatible to the Either monad's short-circuiting behaviour, so this data structure is usually not given monadic features.
</p>
<p>
This data type is usually called <code>Validation</code>, but when I translated that to C# various static code analysis rules lit up, claiming that there was already a referenced namespace called <code>Validation</code>. Instead, I decided to call the type <code><span style="color:#2b91af;">Validated</span><<span style="color:#2b91af;">F</span>, <span style="color:#2b91af;">S</span>></code>, which I like better anyway.
</p>
<p>
The type arguments are <code>F</code> for <em>failure</em> and <code>S</code> for <em>success</em>. I've put <code>F</code> before <code>S</code> because by convention that's how Either works.
</p>
<p>
I'm using an encapsulated variation of a Church encoding and a series of <code>Apply</code> overloads as described in the article <a href="/2018/10/15/an-applicative-password-list">An applicative password list</a>. There's quite a bit of boilerplate, so I'll just dump the entire contents of the file here instead of tiring you with a detailed walk-through:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Validated</span><<span style="color:#2b91af;">F</span>, <span style="color:#2b91af;">S</span>>
{
<span style="color:blue;">private</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IValidation</span>
{
T <span style="color:#74531f;">Match</span><<span style="color:#2b91af;">T</span>>(Func<F, T> <span style="color:#1f377f;">onFailure</span>, Func<S, T> <span style="color:#1f377f;">onSuccess</span>);
}
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IValidation imp;
<span style="color:blue;">private</span> <span style="color:#2b91af;">Validated</span>(IValidation <span style="color:#1f377f;">imp</span>)
{
<span style="color:blue;">this</span>.imp = imp;
}
<span style="color:blue;">internal</span> <span style="color:blue;">static</span> Validated<F, S> <span style="color:#74531f;">Succeed</span>(S <span style="color:#1f377f;">success</span>)
{
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> Validated<F, S>(<span style="color:blue;">new</span> Success(success));
}
<span style="color:blue;">internal</span> <span style="color:blue;">static</span> Validated<F, S> <span style="color:#74531f;">Fail</span>(F <span style="color:#1f377f;">failure</span>)
{
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> Validated<F, S>(<span style="color:blue;">new</span> Failure(failure));
}
<span style="color:blue;">public</span> T <span style="color:#74531f;">Match</span><<span style="color:#2b91af;">T</span>>(Func<F, T> <span style="color:#1f377f;">onFailure</span>, Func<S, T> <span style="color:#1f377f;">onSuccess</span>)
{
<span style="color:#8f08c4;">return</span> imp.Match(onFailure, onSuccess);
}
<span style="color:blue;">public</span> Validated<F1, S1> <span style="color:#74531f;">SelectBoth</span><<span style="color:#2b91af;">F1</span>, <span style="color:#2b91af;">S1</span>>(
Func<F, F1> <span style="color:#1f377f;">selectFailure</span>,
Func<S, S1> <span style="color:#1f377f;">selectSuccess</span>)
{
<span style="color:#8f08c4;">return</span> Match(
<span style="color:#1f377f;">f</span> => Validated.Fail<F1, S1>(selectFailure(f)),
<span style="color:#1f377f;">s</span> => Validated.Succeed<F1, S1>(selectSuccess(s)));
}
<span style="color:blue;">public</span> Validated<F1, S> <span style="color:#74531f;">SelectFailure</span><<span style="color:#2b91af;">F1</span>>(
Func<F, F1> <span style="color:#1f377f;">selectFailure</span>)
{
<span style="color:#8f08c4;">return</span> SelectBoth(selectFailure, <span style="color:#1f377f;">s</span> => s);
}
<span style="color:blue;">public</span> Validated<F, S1> <span style="color:#74531f;">SelectSuccess</span><<span style="color:#2b91af;">S1</span>>(
Func<S, S1> <span style="color:#1f377f;">selectSuccess</span>)
{
<span style="color:#8f08c4;">return</span> SelectBoth(<span style="color:#1f377f;">f</span> => f, selectSuccess);
}
<span style="color:blue;">public</span> Validated<F, S1> <span style="color:#74531f;">Select</span><<span style="color:#2b91af;">S1</span>>(
Func<S, S1> <span style="color:#1f377f;">selector</span>)
{
<span style="color:#8f08c4;">return</span> SelectSuccess(selector);
}
<span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Success</span> : IValidation
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> S success;
<span style="color:blue;">public</span> <span style="color:#2b91af;">Success</span>(S <span style="color:#1f377f;">success</span>)
{
<span style="color:blue;">this</span>.success = success;
}
<span style="color:blue;">public</span> T <span style="color:#74531f;">Match</span><<span style="color:#2b91af;">T</span>>(
Func<F, T> <span style="color:#1f377f;">onFailure</span>,
Func<S, T> <span style="color:#1f377f;">onSuccess</span>)
{
<span style="color:#8f08c4;">return</span> onSuccess(success);
}
}
<span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Failure</span> : IValidation
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> F failure;
<span style="color:blue;">public</span> <span style="color:#2b91af;">Failure</span>(F <span style="color:#1f377f;">failure</span>)
{
<span style="color:blue;">this</span>.failure = failure;
}
<span style="color:blue;">public</span> T <span style="color:#74531f;">Match</span><<span style="color:#2b91af;">T</span>>(
Func<F, T> <span style="color:#1f377f;">onFailure</span>,
Func<S, T> <span style="color:#1f377f;">onSuccess</span>)
{
<span style="color:#8f08c4;">return</span> onFailure(failure);
}
}
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Validated</span>
{
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Validated<F, S> <span style="color:#74531f;">Succeed</span><<span style="color:#2b91af;">F</span>, <span style="color:#2b91af;">S</span>>(
S <span style="color:#1f377f;">success</span>)
{
<span style="color:#8f08c4;">return</span> Validated<F, S>.Succeed(success);
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Validated<F, S> <span style="color:#74531f;">Fail</span><<span style="color:#2b91af;">F</span>, <span style="color:#2b91af;">S</span>>(
F <span style="color:#1f377f;">failure</span>)
{
<span style="color:#8f08c4;">return</span> Validated<F, S>.Fail(failure);
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Validated<F, S> <span style="color:#74531f;">Apply</span><<span style="color:#2b91af;">F</span>, <span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">S</span>>(
<span style="color:blue;">this</span> Validated<F, Func<T, S>> <span style="color:#1f377f;">selector</span>,
Validated<F, T> <span style="color:#1f377f;">source</span>,
Func<F, F, F> <span style="color:#1f377f;">combine</span>)
{
<span style="color:#8f08c4;">if</span> (selector <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="color:#8f08c4;">throw</span> <span style="color:blue;">new</span> ArgumentNullException(nameof(selector));
<span style="color:#8f08c4;">return</span> selector.Match(
<span style="color:#1f377f;">f1</span> => source.Match(
<span style="color:#1f377f;">f2</span> => Fail<F, S>(combine(f1, f2)),
<span style="color:#1f377f;">_</span> => Fail<F, S>(f1)),
<span style="color:#1f377f;">map</span> => source.Match(
<span style="color:#1f377f;">f2</span> => Fail<F, S>(f2),
<span style="color:#1f377f;">x</span> => Succeed<F, S>(map(x))));
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Validated<F, Func<T2, S>> <span style="color:#74531f;">Apply</span><<span style="color:#2b91af;">F</span>, <span style="color:#2b91af;">T1</span>, <span style="color:#2b91af;">T2</span>, <span style="color:#2b91af;">S</span>>(
<span style="color:blue;">this</span> Validated<F, Func<T1, T2, S>> <span style="color:#1f377f;">selector</span>,
Validated<F, T1> <span style="color:#1f377f;">source</span>,
Func<F, F, F> <span style="color:#1f377f;">combine</span>)
{
<span style="color:#8f08c4;">if</span> (selector <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="color:#8f08c4;">throw</span> <span style="color:blue;">new</span> ArgumentNullException(nameof(selector));
<span style="color:#8f08c4;">return</span> selector.Match(
<span style="color:#1f377f;">f1</span> => source.Match(
<span style="color:#1f377f;">f2</span> => Fail<F, Func<T2, S>>(combine(f1, f2)),
<span style="color:#1f377f;">_</span> => Fail<F, Func<T2, S>>(f1)),
<span style="color:#1f377f;">map</span> => source.Match(
<span style="color:#1f377f;">f2</span> => Fail<F, Func<T2, S>>(f2),
<span style="color:#1f377f;">x</span> => Succeed<F, Func<T2, S>>(<span style="color:#1f377f;">y</span> => map(x, y))));
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Validated<F, Func<T2, T3, S>> <span style="color:#74531f;">Apply</span><<span style="color:#2b91af;">F</span>, <span style="color:#2b91af;">T1</span>, <span style="color:#2b91af;">T2</span>, <span style="color:#2b91af;">T3</span>, <span style="color:#2b91af;">S</span>>(
<span style="color:blue;">this</span> Validated<F, Func<T1, T2, T3, S>> <span style="color:#1f377f;">selector</span>,
Validated<F, T1> <span style="color:#1f377f;">source</span>,
Func<F, F, F> <span style="color:#1f377f;">combine</span>)
{
<span style="color:#8f08c4;">if</span> (selector <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="color:#8f08c4;">throw</span> <span style="color:blue;">new</span> ArgumentNullException(nameof(selector));
<span style="color:#8f08c4;">return</span> selector.Match(
<span style="color:#1f377f;">f1</span> => source.Match(
<span style="color:#1f377f;">f2</span> => Fail<F, Func<T2, T3, S>>(combine(f1, f2)),
<span style="color:#1f377f;">_</span> => Fail<F, Func<T2, T3, S>>(f1)),
<span style="color:#1f377f;">map</span> => source.Match(
<span style="color:#1f377f;">f2</span> => Fail<F, Func<T2, T3, S>>(f2),
<span style="color:#1f377f;">x</span> => Succeed<F, Func<T2, T3, S>>((<span style="color:#1f377f;">y</span>, <span style="color:#1f377f;">z</span>) => map(x, y, z))));
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> Validated<F, Func<T2, T3, S>> <span style="color:#74531f;">Apply</span><<span style="color:#2b91af;">F</span>, <span style="color:#2b91af;">T1</span>, <span style="color:#2b91af;">T2</span>, <span style="color:#2b91af;">T3</span>, <span style="color:#2b91af;">S</span>>(
<span style="color:blue;">this</span> Func<T1, T2, T3, S> <span style="color:#1f377f;">map</span>,
Validated<F, T1> <span style="color:#1f377f;">source</span>,
Func<F, F, F> <span style="color:#1f377f;">combine</span>)
{
<span style="color:#8f08c4;">return</span> Apply(
Succeed<F, Func<T1, T2, T3, S>>((<span style="color:#1f377f;">x</span>, <span style="color:#1f377f;">y</span>, <span style="color:#1f377f;">z</span>) => map(x, y, z)),
source,
combine);
}
}</pre>
</p>
<p>
I only added the <code>Apply</code> overloads that I needed for the following demo code. As stated above, I'm not going to launch into a detailed walk-through, since the code follows the concepts lined out in the various articles I've already mentioned. If there's something that you'd like me to explain then please <a href="https://github.com/ploeh/ploeh.github.com#comments">leave a comment</a>.
</p>
<p>
Notice that <code><span style="color:#2b91af;">Validated</span><<span style="color:#2b91af;">F</span>, <span style="color:#2b91af;">S</span>></code> has no <code>SelectMany</code> method. It's deliberately not a <a href="/2022/03/28/monads">monad</a>, because monadic <em>bind</em> (<code>SelectMany</code>) would conflict with the applicative functor implementation.
</p>
<h3 id="533e728019834b22b323ffab10f4cae8">
Individual parsers <a href="#533e728019834b22b323ffab10f4cae8" title="permalink">#</a>
</h3>
<p>
An essential quality of applicative validation is that it's composable. This means that you can compose a larger, more complex parser from smaller ones. Parsing a <code><span style="color:#2b91af;">ReservationDto</span></code> object, for example, involves parsing the date and time of the reservation, the email address, and the quantity. Here's how to parse the date and time:
</p>
<p>
<pre><span style="color:blue;">private</span> Validated<<span style="color:blue;">string</span>, DateTime> <span style="color:#74531f;">TryParseAt</span>()
{
<span style="color:#8f08c4;">if</span> (!DateTime.TryParse(At, <span style="color:blue;">out</span> var <span style="color:#1f377f;">d</span>))
<span style="color:#8f08c4;">return</span> Validated.Fail<<span style="color:blue;">string</span>, DateTime>(<span style="color:#a31515;">$"Invalid date or time: </span>{At}<span style="color:#a31515;">."</span>);
<span style="color:#8f08c4;">return</span> Validated.Succeed<<span style="color:blue;">string</span>, DateTime>(d);
}</pre>
</p>
<p>
In order to keep things simple I'm going to use strings for error messages. You could instead decide to encode error conditions as a <a href="https://en.wikipedia.org/wiki/Tagged_union">sum type</a> or other polymorphic type. This would be appropriate if you also need to be able to make programmatic decisions based on individual error conditions, or if you need to translate the error messages to more than one language.
</p>
<p>
The <code><span style="color:#74531f;">TryParseAt</span></code> function only attempts to parse the <code>At</code> property to a <code>DateTime</code> value. If parsing fails, it returns a <code><span style="color:#2b91af;">Failure</span></code> value with a helpful error message; otherwise, it wraps the parsed date and time in a <code><span style="color:#2b91af;">Success</span></code> value.
</p>
<p>
Parsing the email address is similar:
</p>
<p>
<pre><span style="color:blue;">private</span> Validated<<span style="color:blue;">string</span>, Email> <span style="color:#74531f;">TryParseEmail</span>()
{
<span style="color:#8f08c4;">if</span> (Email <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="color:#8f08c4;">return</span> Validated.Fail<<span style="color:blue;">string</span>, Email>(<span style="color:#a31515;">$"Email address is missing."</span>);
<span style="color:#8f08c4;">return</span> Validated.Succeed<<span style="color:blue;">string</span>, Email>(<span style="color:blue;">new</span> Email(Email));
}</pre>
</p>
<p>
As is parsing the quantity:
</p>
<p>
<pre><span style="color:blue;">private</span> Validated<<span style="color:blue;">string</span>, <span style="color:blue;">int</span>> <span style="color:#74531f;">TryParseQuantity</span>()
{
<span style="color:#8f08c4;">if</span> (Quantity < 1)
<span style="color:#8f08c4;">return</span> Validated.Fail<<span style="color:blue;">string</span>, <span style="color:blue;">int</span>>(
<span style="color:#a31515;">$"Quantity must be a positive integer, but was: </span>{Quantity}<span style="color:#a31515;">."</span>);
<span style="color:#8f08c4;">return</span> Validated.Succeed<<span style="color:blue;">string</span>, <span style="color:blue;">int</span>>(Quantity);
}</pre>
</p>
<p>
There's no reason to create a parser for the reservation name, because if the name doesn't exist, instead use the empty string. That operation can't fail.
</p>
<h3 id="48f0650702f1490392838bf3c5c88cd1">
Composition <a href="#48f0650702f1490392838bf3c5c88cd1" title="permalink">#</a>
</h3>
<p>
You can now use applicative composition to reuse those individual parsers in a more complex parser:
</p>
<p>
<pre><span style="color:blue;">internal</span> Validated<<span style="color:blue;">string</span>, Reservation> <span style="color:#74531f;">TryParse</span>(Guid <span style="color:#1f377f;">id</span>)
{
Func<DateTime, Email, <span style="color:blue;">int</span>, Reservation> <span style="color:#1f377f;">createReservation</span> =
(<span style="color:#1f377f;">at</span>, <span style="color:#1f377f;">email</span>, <span style="color:#1f377f;">quantity</span>) =>
<span style="color:blue;">new</span> Reservation(id, at, email, <span style="color:blue;">new</span> Name(Name ?? <span style="color:#a31515;">""</span>), quantity);
Func<<span style="color:blue;">string</span>, <span style="color:blue;">string</span>, <span style="color:blue;">string</span>> <span style="color:#1f377f;">combine</span> =
(<span style="color:#1f377f;">x</span>, <span style="color:#1f377f;">y</span>) => <span style="color:blue;">string</span>.Join(Environment.NewLine, x, y);
<span style="color:#8f08c4;">return</span> createReservation
.Apply(TryParseAt(), combine)
.Apply(TryParseEmail(), combine)
.Apply(TryParseQuantity(), combine);
}</pre>
</p>
<p>
<code><span style="color:#1f377f;">createReservation</span></code> is a local function that closes over <code>id</code> and <code>Name</code>. Specifically, it uses the null coalescing operator (<code>??</code>) to turn a null name into the empty string. On the other hand, it takes <code>at</code>, <code>email</code>, and <code>quantity</code> as inputs, since these are the values that must first be parsed.
</p>
<p>
A type like <code><span style="color:#2b91af;">Validated</span><<span style="color:#2b91af;">F</span>, <span style="color:#2b91af;">S</span>></code> is only an applicative functor when the failure dimension (<code>F</code>) gives rise to a semigroup. The way I've modelled it here is as a binary operation that you need to pass as a parameter to each <code>Apply</code> overload. This seems awkward, but is good enough for a proof of concept.
</p>
<p>
The <code><span style="color:#1f377f;">combine</span></code> function joins two strings together, separated by a line break.
</p>
<p>
The <code><span style="color:#74531f;">TryParse</span></code> function composes <code><span style="color:#1f377f;">createReservation</span></code> with <code>TryParseAt</code>, <code>TryParseEmail</code>, and <code>TryParseQuantity</code> using the various <code>Apply</code> overloads. The combination is a <code>Validated</code> value that's either a failure string or a properly encapsulated <code>Reservation</code> object.
</p>
<ins datetime="2023-06-25T20:06Z">
<p>
One thing that I still don't like about this function is that it takes an <code>id</code> parameter. For an article about why that is a problem, and what to do about it, see <a href="/2022/09/12/coalescing-dtos">Coalescing DTOs</a>.
</p>
</ins>
<h3 id="04091776345e4f4b8ba4d446d8d80376">
Using the parser <a href="#04091776345e4f4b8ba4d446d8d80376" title="permalink">#</a>
</h3>
<p>
Client code can now invoke the <code>TryParse</code> function on the DTO. Here is the code inside the <code><span style="color:#74531f;">Post</span></code> method on the <code><span style="color:#2b91af;">ReservationsController</span></code> class:
</p>
<p>
<pre>[HttpPost(<span style="color:#a31515;">"restaurants/{restaurantId}/reservations"</span>)]
<span style="color:blue;">public</span> Task<ActionResult> <span style="color:#74531f;">Post</span>(<span style="color:blue;">int</span> <span style="color:#1f377f;">restaurantId</span>, ReservationDto <span style="color:#1f377f;">dto</span>)
{
<span style="color:#8f08c4;">if</span> (dto <span style="color:blue;">is</span> <span style="color:blue;">null</span>)
<span style="color:#8f08c4;">throw</span> <span style="color:blue;">new</span> ArgumentNullException(nameof(dto));
<span style="color:blue;">var</span> <span style="color:#1f377f;">id</span> = dto.ParseId() ?? Guid.NewGuid();
<span style="color:blue;">var</span> <span style="color:#1f377f;">parseResult</span> = dto.TryParse(id);
<span style="color:#8f08c4;">return</span> parseResult.Match(
<span style="color:#1f377f;">msgs</span> => Task.FromResult<ActionResult>(<span style="color:blue;">new</span> BadRequestObjectResult(msgs)),
<span style="color:#1f377f;">reservation</span> => TryCreate(restaurantId, reservation));
}</pre>
</p>
<p>
When the <code>parseResult</code> matches a failure, it returns a <code><span style="color:blue;">new</span> BadRequestObjectResult</code> with all collected error messages. When, on the other hand, it matches a success, it invokes the <code><span style="color:#74531f;">TryCreate</span></code> helper method with the parsed <code>reservation</code>.
</p>
<h3 id="d260c38d1dd444e291868e377732fd24">
HTTP request and response <a href="#d260c38d1dd444e291868e377732fd24" title="permalink">#</a>
</h3>
<p>
A client will now receive all relevant error messages if it posts a malformed reservation:
</p>
<p>
<pre>POST /restaurants/1/reservations?sig=1WiLlS5705bfsffPzaFYLwntrS4FCjE5CLdaeYTHxxg%3D HTTP/1.1
Content-Type: application/json
{ <span style="color:#2e75b6;">"at"</span>: <span style="color:#a31515;">"large"</span>, <span style="color:#2e75b6;">"name"</span>: <span style="color:#a31515;">"Kerry Onn"</span>, <span style="color:#2e75b6;">"quantity"</span>: -1 }
HTTP/1.1 400 Bad Request
Invalid date or time: large.
Email address is missing.
Quantity must be a positive integer, but was: -1.</pre>
</p>
<p>
Of course, if only a single element is wrong, only that error message will appear.
</p>
<h3 id="fada4ef8e64648b5a2d90e47082acfde">
Conclusion <a href="#fada4ef8e64648b5a2d90e47082acfde" title="permalink">#</a>
</h3>
<p>
The changes described in this article were entirely local to the two involved types: <code><span style="color:#2b91af;">ReservationsController</span></code> and <code><span style="color:#2b91af;">ReservationDto</span></code>. Once I'd expanded <code><span style="color:#2b91af;">ReservationDto</span></code> with the <code><span style="color:#74531f;">TryParse</span></code> function and its helper functions, and changed <code><span style="color:#2b91af;">ReservationsController</span></code> accordingly, the rest of the code base compiled and all tests passed. The point is that this isn't a big change, and that's why I believe that the original design (returning null or non-null) doesn't invalidate anything else I had to say in the book.
</p>
<p>
The change did, however, take quite a bit of boilerplate code, as witnessed by the <code>Validated</code> code dump. That API is, on the other hand, completely reusable, and you can find packages on the internet that already implement this functionality. It's not much of a burden in terms of extra code, but it would have taken a couple of extra chapters to explain in the book. It could easily have been double the size if I had to include material about functors, applicative functors, semigroups, Church encoding, etcetera.
</p>
<p>
To fix two lines of code, I didn't think that was warranted. After all, it's not a major blocker. On the contrary, validation is a solved problem.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="22b961d6ecb84e6f8af7232b09a445fc">
<div class="comment-author">Dan Carter <a href="#22b961d6ecb84e6f8af7232b09a445fc">#</a></div>
<div class="comment-content">
<blockquote>you can find packages on the internet that already implement this functionality</blockquote>
<p>
Do you have any recommendations for a library that implements the <code>Validated<F, S></code> type?
</p>
</div>
<div class="comment-date">2022-08-15 11:15 UTC</div>
</div>
<div class="comment" id="03273eecd6e749188106e8caef68d82e">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#03273eecd6e749188106e8caef68d82e">#</a></div>
<div class="comment-content">
<p>
Dan, thank you for writing. The following is not a recommendation, but the most comprehensive C# library for functional programming currently seems to be <a href="https://github.com/louthy/language-ext">LanguageExt</a>, which includes a <a href="https://louthy.github.io/language-ext/LanguageExt.Core/Monads/Alternative%20Value%20Monads/Validation/index.html">Validation</a> functor.
</p>
<p>
I'm neither recommending nor arguing against LanguageExt.
</p>
<ul>
<li>I've never used it in a real-world code base.</li>
<li>I've been answering questions about it on <a href="https://stackoverflow.com">Stack Overflow</a>. In general, it seems to stump C# developers, since it's very Haskellish and quite advanced.</li>
<li>Today is just a point in time. Libraries come and go.</li>
</ul>
<p>
Since all the ideas presented in these articles are universal abstractions, you can safely and easily implement them yourself, instead of taking a dependency on a third-party library. If you stick with lawful implementations, the only variation possible is with naming. Do you call a functor like this one <code>Validation</code>, <code>Validated</code>, or something else? Do you call monadic <em>bind</em> <code>SelectMany</code> or <code>Bind</code>? Will you have a <code>Flatten</code> or a <code>Join</code> function?
</p>
<p>
When working with teams that are new to these things, I usually start by adding these concepts as source code as they become useful. If a type like <code>Maybe</code> or <code>Validated</code> starts to proliferate, sooner or later you'll need to move it to a shared library so that multiple in-house libraries can use the type to communicate results across library boundaries. Eventually, you may decide to move such a dependency to a NuGet package. You can, at such time, decide to use an existing library instead of your own.
</p>
<p>
The maintenance burden for these kinds of libraries is low, since the APIs and behaviour are defined and locked in advance by mathematics.
</p>
</div>
<div class="comment-date">2022-08-16 5:54 UTC</div>
</div>
<div class="comment" id="fa7fc6662aee4889bcc9f84bc5db1b39">
<div class="comment-author"><a href="https://about.me/tysonwilliams">Tyson Williams</a> <a href="#fa7fc6662aee4889bcc9f84bc5db1b39">#</a></div>
<div class="comment-content">
<blockquote>
If you stick with lawful implementations, the only variation possible is with naming.
</blockquote>
<p>
There are also language-specific choices that can vary.
</p>
<p>
One example involves applicative functors in C#. The "standard" API for applicative functors works well in <a href="https://blog.ploeh.dk/2018/10/01/applicative-functors/#6173a96aedaa4e97ad868c159e6d06fa">Haskell</a> and <a href="https://blog.ploeh.dk/2018/10/01/applicative-functors/#6f9b2f13951e475d8c4fc9682ad96a94">F#</a> because it is designed to be used with curried functions, and both of those languages curry their functions by default. In contrast, <a href="https://blog.ploeh.dk/2018/10/01/applicative-functors/#cef395ee19644f30bfd1ad7a84b6f912:~:text=Applicative%20functors%20push%20the%20limits%20of%20what%20you%20can%20express%20in%20C%23">applicative functors push the limits of what you can express in C#</a>. I am impressed with <a href="https://github.com/louthy/language-ext/blob/ba16d97d4909067222c8b134a80bfd6b7e54c424/LanguageExt.Tests/ValidationTests.cs#L277">the design that Language Ext uses for applicative functors</a>, which is an extension method on a (value) tuple of applicative functor instances that accepts a lambda expression that is given all the "unwrapped" values "inside" the applicative functors.
</p>
<p>
Another example involves monads in TypeScript. To avoid the <a href="https://en.wikipedia.org/wiki/Pyramid_of_doom_(programming)">Pyramid of doom</a> when performing a sequence of monadic operations, Haskell has <a href="https://en.wikibooks.org/wiki/Haskell/do_notation">do notation</a> and F# has <a href="https://learn.microsoft.com/en-us/dotnet/fsharp/language-reference/computation-expressions">computation expressions</a>. There is no equivalent language feature in TypeScript, but it has <a href="https://en.wikipedia.org/wiki/Row_polymorphism">row polymorphism</a>, which <a href="https://gcanti.github.io/fp-ts/guides/do-notation.html">pf-ts uses to effectively implement do notation</a>.
</p>
<p>
A related dimension is how to approximate high-kinded types in a language that lacks them. Language Ext passes in the monad as a type parameter as well as the "lower-kinded" type parameter and then constrains the monad type parameter to implement a monad interface parametereized by the lower type parameter as well as being a struct. I find that second constraint very intersting. Since the type parameter has a struct constraint, it has a default constructor that can be used to get an instance, which then implements methods according to the interface constraint. For more infomration, see <a href="https://github.com/louthy/language-ext/wiki/Does-C%23-Dream-Of-Electric-Monads%3F">this wiki article</a> for a gentle introduction and <a href="https://github.com/louthy/language-ext/blob/ba16d97d4909067222c8b134a80bfd6b7e54c424/LanguageExt.Core/Class%20Instances/Trans/Trans.cs#L74-L80">Trans.cs</a> for how Language Ext uses this approach to only implement traverse once. Similarly, F#+ has a feature called <a href="https://fsprojects.github.io/FSharpPlus/generic-doc.html">generic functions</a> that enable one to write F# like <code>map aFoo</code> instead of the typical <code>Foo.map aFoo</code>.
</p>
</div>
<div class="comment-date">2022-09-20 02:00 UTC</div>
</div>
<div class="comment" id="5619079f7c994fc0bd0dbbdbfeec9e4a">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#5619079f7c994fc0bd0dbbdbfeec9e4a">#</a></div>
<div class="comment-content">
<p>
Tyson, thank you for writing. I agree that details differ. Clearly, this is true across languages, where, say, <a href="https://www.haskell.org/">Haskell</a>'s <code>fmap</code> has a name different from C#'s <code>SelectMany</code>. To state the obvious, the syntax is also different.
</p>
<p>
Even within the same language, you can have variations. Functor mapping in Haskell is generally called <code>fmap</code>, but you can also use <code>map</code> explicitly for lists. The same could be true in C#. I've seen functor and monad implementations in C# that use method names like <code>Map</code> and <code>Bind</code> rather than <code>Select</code> and <code>SelectMany</code>.
</p>
<p>
To expand on this idea, one may also observe that what one language calls <em>Option</em>, another language calls <em>Maybe</em>. The same goes for <code>Result</code> versus <code>Either</code>.
</p>
<p>
As you know, the names <code>Select</code> and <code>SelectMany</code> are special because they enable C# query syntax. While methods named <code>Map</code> and <code>Bind</code> are 'the same' functions, they don't light up that language feature. Another way to enable syntactic sugar for monads in C# is via <code>async</code> and <code>await</code>, as <a href="https://eiriktsarpalis.wordpress.com/2020/07/20/effect-programming-in-csharp/">shown by Eirik Tsarpalis and Nick Palladinos</a>.
</p>
<p>
I do agree with you that there are various options available to an implementer. The point I was trying to make is that while implementation details differ, the concepts are the same. Thus, as a <em>user</em> of one of these APIs (monads, monoids, etc.) you only have to learn the mental model once. You still have to learn the implementation details.
</p>
<p>
I recently heard a professor at <a href="https://en.wikipedia.org/wiki/UCPH_Department_of_Computer_Science">DIKU</a> state that once you know one programming language, you should be able to learn another one in a week. That's the same general idea.
</p>
<p>
(I do, however, have issues with that statement about programming languages as a universal assertion, but I agree that it tends to hold for mainstream languages. When I read <a href="/ref/mazes-for-programmers">Mazes for Programmers</a> I'd never programmed in <a href="https://www.ruby-lang.org/en/">Ruby</a> before, but I had little trouble picking it up for the exercises. On the other hand, most people don't learn Haskell in a week.)
</p>
</div>
<div class="comment-date">2022-09-20 17:42 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Natural transformationshttps://blog.ploeh.dk/2022/07/18/natural-transformations2022-07-18T08:12:00+00:00Mark Seemann
<div id="post">
<p>
<em>Mappings between functors, with some examples in C#.</em>
</p>
<p>
This article is part of <a href="/2022/07/11/functor-relationships">a series of articles about functor relationships</a>. In this one you'll learn about natural transformations, which are simply mappings between two <a href="/2018/03/22/functors">functors</a>. It's probably the easiest relationship to understand. In fact, it may be so obvious that your reaction is: <em>Is that it?</em>
</p>
<p>
In programming, a natural transformation is just a function from one functor to another. A common example is a function that tries to extract a value from a collection. You'll see specific examples a little later in this article.
</p>
<h3 id="05a07e6563da4ba39c83bd92250d3056">
Laws <a href="#05a07e6563da4ba39c83bd92250d3056" title="permalink">#</a>
</h3>
<p>
In this, the dreaded section on <em>laws</em>, I have a nice surprise for you: There aren't any (that we need worry about)!
</p>
<p>
In the broader context of <a href="https://en.wikipedia.org/wiki/Category_theory">category theory</a> there are, in fact, rules that a natural transformation must follow.
</p>
<blockquote>
<p>
"Haskell's parametric polymorphism has an unexpected consequence: any polymorphic function of the type:
</p>
<p>
<pre>alpha :: F a -> G a</pre>
</p>
<p>
"where <code>F</code> and <code>G</code> are functors, automatically satisfies the naturality condition."
</p>
<footer><cite><a href="https://bartoszmilewski.com/2015/04/07/natural-transformations/">Natural Transformations</a>, Bartosz Milewski</cite></footer>
</blockquote>
<p>
While C# isn't <a href="https://www.haskell.org">Haskell</a>, .NET generics are similar enough to Haskell <a href="https://en.wikipedia.org/wiki/Parametric_polymorphism">parametric polymorphism</a> that the result, as far as I can tell, carry over. (Again, however, we have to keep in mind that C# doesn't distinguish between <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a> and impure actions. The knowledge that I infer translates for pure functions. For impure actions, there are no guarantees.)
</p>
<p>
The C# equivalent of the above <code>alpha</code> function would be a method like this:
</p>
<p>
<pre>G<T> <span style="color:#74531f;">Alpha</span><<span style="color:#2b91af;">T</span>>(F<T> <span style="color:#1f377f;">f</span>)</pre>
</p>
<p>
where both <code>F</code> and <code>G</code> are functors.
</p>
<h3 id="643a53f5758f485f8a7000f9e7825c5c">
Safe head <a href="#643a53f5758f485f8a7000f9e7825c5c" title="permalink">#</a>
</h3>
<p>
Natural transformations easily occur in normal programming. You've probably written some yourself, without being aware of it. Here are some examples.
</p>
<p>
It's common to attempt to get the first element of a collection. Collections, however, may be empty, so this is not always possible. In Haskell, you'd model that as a function that takes a list as input and returns a <code>Maybe</code> as output:
</p>
<p>
<pre>Prelude Data.Maybe> :t listToMaybe
listToMaybe :: [a] -> Maybe a
Prelude Data.Maybe> listToMaybe []
Nothing
Prelude Data.Maybe> listToMaybe [7]
Just 7
Prelude Data.Maybe> listToMaybe [3,9]
Just 3
Prelude Data.Maybe> listToMaybe [5,9,2,4,4]
Just 5</pre>
</p>
<p>
In many tutorials such a function is often called <code>safeHead</code>, because it returns the <em>head</em> of a list (i.e. the first item) in a safe manner. It returns <code>Nothing</code> if the list is empty. In <a href="https://fsharp.org">F#</a> this function is called <a href="https://fsharp.github.io/fsharp-core-docs/reference/fsharp-collections-seqmodule.html#tryHead">tryHead</a>.
</p>
<p>
In C# you could write a similar function like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> Maybe<T> <span style="color:#74531f;">TryFirst</span><<span style="color:#2b91af;">T</span>>(<span style="color:blue;">this</span> IEnumerable<T> <span style="color:#1f377f;">source</span>)
{
<span style="color:#8f08c4;">if</span> (source.Any())
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> Maybe<T>(source.First());
<span style="color:#8f08c4;">else</span>
<span style="color:#8f08c4;">return</span> Maybe.Empty<T>();
}</pre>
</p>
<p>
This extension method (which is really a pure function) is a natural transformation between two functors. The source functor is the <a href="/2022/04/19/the-list-monad">list functor</a> and the destination is <a href="/2018/03/26/the-maybe-functor">the Maybe functor</a>.
</p>
<p>
Here are some unit tests that demonstrate how it works:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">TryFirstWhenEmpty</span>()
{
Maybe<Guid> <span style="color:#1f377f;">actual</span> = Enumerable.Empty<Guid>().TryFirst();
Assert.Equal(Maybe.Empty<Guid>(), actual);
}
[Theory]
[InlineData(<span style="color:blue;">new</span>[] { <span style="color:#a31515;">"foo"</span> }, <span style="color:#a31515;">"foo"</span>)]
[InlineData(<span style="color:blue;">new</span>[] { <span style="color:#a31515;">"bar"</span>, <span style="color:#a31515;">"baz"</span> }, <span style="color:#a31515;">"bar"</span>)]
[InlineData(<span style="color:blue;">new</span>[] { <span style="color:#a31515;">"qux"</span>, <span style="color:#a31515;">"quux"</span>, <span style="color:#a31515;">"quuz"</span>, <span style="color:#a31515;">"corge"</span>, <span style="color:#a31515;">"corge"</span> }, <span style="color:#a31515;">"qux"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">TryFirstWhenNotEmpty</span>(<span style="color:blue;">string</span>[] <span style="color:#1f377f;">arr</span>, <span style="color:blue;">string</span> <span style="color:#1f377f;">expected</span>)
{
Maybe<<span style="color:blue;">string</span>> <span style="color:#1f377f;">actual</span> = arr.TryFirst();
Assert.Equal(<span style="color:blue;">new</span> Maybe<<span style="color:blue;">string</span>>(expected), actual);
}</pre>
</p>
<p>
All these tests pass.
</p>
<h3 id="6d899e8f2a7948ed898eaaefc3064dc6">
Safe index <a href="#6d899e8f2a7948ed898eaaefc3064dc6" title="permalink">#</a>
</h3>
<p>
The above <em>safe head</em> natural transformation is just one example. Even for a particular combination of functors like <em>List to Maybe</em> many natural transformations may exist. For this particular combination, there are infinitely many natural transformations.
</p>
<p>
You can view the <em>safe head</em> example as a special case of a more general set of <em>safe indexing</em>. With a collection of values, you can attempt to retrieve the value at a particular index. Since a collection can contain an arbitrary number of elements, however, there's no guarantee that there's an element at the requested index.
</p>
<p>
In order to avoid exceptions, then, you can try to retrieve the value at an index, getting a Maybe value as a result.
</p>
<p>
The F# <code>Seq</code> module defines a function called <a href="https://fsharp.github.io/fsharp-core-docs/reference/fsharp-collections-seqmodule.html#tryItem">tryItem</a>. This function takes an index and a sequence (<code>IEnumerable<T></code>) and returns an <code>option</code> (F#'s name for Maybe):
</p>
<p>
<pre>> Seq.tryItem 2 [2;5;3;5];;
val it : int option = Some 3</pre>
</p>
<p>
The <code>tryItem</code> function itself is <em>not</em> a natural transformation, but because of currying, it's a function that <em>returns</em> a natural transformation. When you partially apply it with an index, it becomes a natural transformation: <code>Seq.tryItem 3</code> is a natural transformation <code>seq<'a> -> 'a option</code>, as is <code>Seq.tryItem 4</code>, <code>Seq.tryItem 5</code>, and so on ad infinitum. Thus, there are infinitely many natural transformations from the List functor to the Maybe functor, and <em>safe head</em> is simply <code>Seq.tryItem 0</code>.
</p>
<p>
In C# you can use the various <code>Func</code> delegates to implement currying, but if you want something that looks a little more object-oriented, you could write code like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Index</span>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">int</span> index;
<span style="color:blue;">public</span> <span style="color:#2b91af;">Index</span>(<span style="color:blue;">int</span> <span style="color:#1f377f;">index</span>)
{
<span style="color:blue;">this</span>.index = index;
}
<span style="color:blue;">public</span> Maybe<T> <span style="color:#74531f;">TryItem</span><<span style="color:#2b91af;">T</span>>(IEnumerable<T> <span style="color:#1f377f;">values</span>)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">candidate</span> = values.Skip(index).Take(1);
<span style="color:#8f08c4;">if</span> (candidate.Any())
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> Maybe<T>(candidate.First());
<span style="color:#8f08c4;">else</span>
<span style="color:#8f08c4;">return</span> Maybe.Empty<T>();
}
}</pre>
</p>
<p>
This <code>Index</code> class captures an index value for potential use against any <code>IEnumerable<T></code>. Thus, the <code>TryItem</code> method is a natural transformation from the List functor to the Maybe functor. Here are some examples:
</p>
<p>
<pre>[Theory]
[InlineData(0, <span style="color:blue;">new</span> <span style="color:blue;">string</span>[0])]
[InlineData(1, <span style="color:blue;">new</span>[] { <span style="color:#a31515;">"bee"</span> })]
[InlineData(2, <span style="color:blue;">new</span>[] { <span style="color:#a31515;">"nig"</span>, <span style="color:#a31515;">"fev"</span> })]
[InlineData(4, <span style="color:blue;">new</span>[] { <span style="color:#a31515;">"sta"</span>, <span style="color:#a31515;">"ali"</span> })]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">MissItem</span>(<span style="color:blue;">int</span> <span style="color:#1f377f;">i</span>, <span style="color:blue;">string</span>[] <span style="color:#1f377f;">xs</span>)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">idx</span> = <span style="color:blue;">new</span> Index(i);
Maybe<<span style="color:blue;">string</span>> <span style="color:#1f377f;">actual</span> = idx.TryItem(xs);
Assert.Equal(Maybe.Empty<<span style="color:blue;">string</span>>(), actual);
}
[Theory]
[InlineData(0, <span style="color:blue;">new</span>[] { <span style="color:#a31515;">"foo"</span> }, <span style="color:#a31515;">"foo"</span>)]
[InlineData(1, <span style="color:blue;">new</span>[] { <span style="color:#a31515;">"bar"</span>, <span style="color:#a31515;">"baz"</span> }, <span style="color:#a31515;">"baz"</span>)]
[InlineData(1, <span style="color:blue;">new</span>[] { <span style="color:#a31515;">"qux"</span>, <span style="color:#a31515;">"quux"</span>, <span style="color:#a31515;">"quuz"</span> }, <span style="color:#a31515;">"quux"</span>)]
[InlineData(2, <span style="color:blue;">new</span>[] { <span style="color:#a31515;">"corge"</span>, <span style="color:#a31515;">"grault"</span>, <span style="color:#a31515;">"fred"</span>, <span style="color:#a31515;">"garply"</span> }, <span style="color:#a31515;">"fred"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">FindItem</span>(<span style="color:blue;">int</span> <span style="color:#1f377f;">i</span>, <span style="color:blue;">string</span>[] <span style="color:#1f377f;">xs</span>, <span style="color:blue;">string</span> <span style="color:#1f377f;">expected</span>)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">idx</span> = <span style="color:blue;">new</span> Index(i);
Maybe<<span style="color:blue;">string</span>> <span style="color:#1f377f;">actual</span> = idx.TryItem(xs);
Assert.Equal(<span style="color:blue;">new</span> Maybe<<span style="color:blue;">string</span>>(expected), actual);
}</pre>
</p>
<p>
Since there are infinitely many integers, there are infinitely many such natural transformations. (This is strictly not true for the above code, since there's a finite number of 32-bit integers. Exercise: Is it possible to rewrite the above <code>Index</code> class to instead work with <a href="https://docs.microsoft.com/dotnet/api/system.numerics.biginteger">BigInteger</a>?)
</p>
<p>
The Haskell <a href="https://hackage.haskell.org/package/natural-transformation">natural-transformation</a> package offers an even more explicit way to present the same example:
</p>
<p>
<pre><span style="color:blue;">import</span> Control.Natural
<span style="color:#2b91af;">tryItem</span> :: (<span style="color:blue;">Eq</span> a, <span style="color:blue;">Num</span> a, <span style="color:blue;">Enum</span> a) <span style="color:blue;">=></span> a <span style="color:blue;">-></span> [] :~> <span style="color:#2b91af;">Maybe</span>
tryItem i = NT $ <span style="color:blue;">lookup</span> i . <span style="color:blue;">zip</span> [0..]</pre>
</p>
<p>
You can view this <code>tryItem</code> function as a function that takes a number and returns a particular natural transformation. For example you can define a value called <code>tryThird</code>, which is a natural transformation from <code>[]</code> to <code>Maybe</code>:
</p>
<p>
<pre>λ tryThird = tryItem 2
λ :t tryThird
tryThird :: [] :~> Maybe</pre>
</p>
<p>
Here are some usage examples:
</p>
<p>
<pre>λ tryThird # []
Nothing
λ tryThird # [1]
Nothing
λ tryThird # [2,3]
Nothing
λ tryThird # [4,5,6]
Just 6
λ tryThird # [7,8,9,10]
Just 9</pre>
</p>
<p>
In all three languages (F#, C#, Haskell), <em>safe head</em> is really just a special case of <em>safe index</em>: <code>Seq.tryItem 0</code> in F#, <code><span style="color:blue;">new</span> Index(0)</code> in C#, and <code>tryItem 0</code> in Haskell.
</p>
<h3 id="a8e50cd940974b88b5564c2939900702">
Maybe to List <a href="#a8e50cd940974b88b5564c2939900702" title="permalink">#</a>
</h3>
<p>
You can also move in the opposite direction: From Maybe to List. In F#, I can't find a function that translates from <code>option 'a</code> to <code>seq 'a</code> (<code>IEnumerable<T></code>), but there are both <a href="https://fsharp.github.io/fsharp-core-docs/reference/fsharp-core-optionmodule.html#toArray">Option.toArray</a> and <a href="https://fsharp.github.io/fsharp-core-docs/reference/fsharp-core-optionmodule.html#toList">Option.toList</a>. I'll use <code>Option.toList</code> for a few examples:
</p>
<p>
<pre>> Option.toList (None : string option);;
val it : string list = []
> Option.toList (Some "foo");;
val it : string list = ["foo"]</pre>
</p>
<p>
Contrary to translating from List to Maybe, going the other way there aren't a lot of options: <code>None</code> translates to an empty list, and <code>Some</code> translates to a singleton list.
</p>
<p>
Using <a href="/2018/06/25/visitor-as-a-sum-type">a Visitor-based</a> Maybe in C#, you can implement the natural transformation like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IEnumerable<T> <span style="color:#74531f;">ToList</span><<span style="color:#2b91af;">T</span>>(<span style="color:blue;">this</span> IMaybe<T> <span style="color:#1f377f;">source</span>)
{
<span style="color:#8f08c4;">return</span> source.Accept(<span style="color:blue;">new</span> ToListVisitor<T>());
}
<span style="color:blue;">private</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ToListVisitor</span><<span style="color:#2b91af;">T</span>> : IMaybeVisitor<T, IEnumerable<T>>
{
<span style="color:blue;">public</span> IEnumerable<T> VisitNothing
{
<span style="color:blue;">get</span> { <span style="color:#8f08c4;">return</span> Enumerable.Empty<T>(); }
}
<span style="color:blue;">public</span> IEnumerable<T> <span style="color:#74531f;">VisitJust</span>(T <span style="color:#1f377f;">just</span>)
{
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span>[] { just };
}
}</pre>
</p>
<p>
Here are some examples:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">NothingToList</span>()
{
IMaybe<<span style="color:blue;">double</span>> <span style="color:#1f377f;">maybe</span> = <span style="color:blue;">new</span> Nothing<<span style="color:blue;">double</span>>();
IEnumerable<<span style="color:blue;">double</span>> <span style="color:#1f377f;">actual</span> = maybe.ToList();
Assert.Empty(actual);
}
[Theory]
[InlineData(-1)]
[InlineData( 0)]
[InlineData(15)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">JustToList</span>(<span style="color:blue;">double</span> <span style="color:#1f377f;">d</span>)
{
IMaybe<<span style="color:blue;">double</span>> <span style="color:#1f377f;">maybe</span> = <span style="color:blue;">new</span> Just<<span style="color:blue;">double</span>>(d);
IEnumerable<<span style="color:blue;">double</span>> <span style="color:#1f377f;">actual</span> = maybe.ToList();
Assert.Single(actual, d);
}</pre>
</p>
<p>
In Haskell this natural transformation is called <a href="https://hackage.haskell.org/package/base/docs/Data-Maybe.html#v:maybeToList">maybeToList</a> - just when you think that Haskell names are always <a href="/2021/06/07/abstruse-nomenclature">abstruse</a>, you learn that some are very explicit and self-explanatory.
</p>
<p>
If we wanted, we could use the <em>natural-transformation</em> package to demonstrate that this is, indeed, a natural transformation:
</p>
<p>
<pre>λ :t NT maybeToList
NT maybeToList :: Maybe :~> []</pre>
</p>
<p>
There would be little point in doing so, since we'd need to unwrap it again to use it. Using the function directly, on the other hand, looks like this:
</p>
<p>
<pre>λ maybeToList Nothing
[]
λ maybeToList $ Just 2
[2]
λ maybeToList $ Just "fon"
["fon"]</pre>
</p>
<p>
A <code>Nothing</code> value is always translated to the empty list, and a <code>Just</code> value to a singleton list, exactly as in the other languages.
</p>
<p>
Exercise: Is this the only possible natural transformation from Maybe to List?
</p>
<h3 id="5dfc02dd10514c2288721ace174c7229">
Maybe-Either relationships <a href="#5dfc02dd10514c2288721ace174c7229" title="permalink">#</a>
</h3>
<p>
The Maybe functor is isomorphic to <a href="/2019/01/14/an-either-functor">Either</a> where the <em>left</em> (or <em>error</em>) dimension is <a href="/2018/01/15/unit-isomorphisms">unit</a>. Here are the two natural transformations in F#:
</p>
<p>
<pre><span style="color:blue;">module</span> Option =
<span style="color:green;">// 'a option -> Result<'a,unit></span>
<span style="color:blue;">let</span> toResult = <span style="color:blue;">function</span>
| Some x <span style="color:blue;">-></span> Ok x
| None <span style="color:blue;">-></span> Error ()
<span style="color:green;">// Result<'a,unit> -> 'a option</span>
<span style="color:blue;">let</span> ofResult = <span style="color:blue;">function</span>
| Ok x <span style="color:blue;">-></span> Some x
| Error () <span style="color:blue;">-></span> None</pre>
</p>
<p>
In F#, Maybe is called <code>option</code> and Either is called <code>Result</code>. Be aware that the F# <code>Result</code> discriminated union puts the <code>Error</code> dimension to the right of the <code>Ok</code>, which is opposite of Either, where <em>left</em> is usually used for errors, and <em>right</em> for successes (because what is correct is right).
</p>
<p>
Here are some examples:
</p>
<p>
<pre>> Some "epi" |> Option.toResult;;
val it : Result<string,unit> = Ok "epi"
> Ok "epi" |> Option.ofResult;;
val it : string option = Some "epi"</pre>
</p>
<p>
Notice that the natural transformation from <code>Result</code> to <code>Option</code> is only defined for <code>Result</code> values where the <code>Error</code> type is <code>unit</code>. You could also define a natural transformation from <em>any</em> <code>Result</code> to <code>option</code>:
</p>
<p>
<pre><span style="color:green;">// Result<'a,'b> -> 'a option</span>
<span style="color:blue;">let</span> ignoreErrorValue = <span style="color:blue;">function</span>
| Ok x <span style="color:blue;">-></span> Some x
| Error _ <span style="color:blue;">-></span> None</pre>
</p>
<p>
That's still a natural transformation, but no longer part of an isomorphism due to the loss of information:
</p>
<p>
<pre>> (Error "Catastrophic failure" |> ignoreErrorValue : int option);;
val it : int option = None</pre>
</p>
<p>
Just like above, when examining the infinitely many natural transformations from List to Maybe, we can use the Haskell <em>natural-transformation</em> package to make this more explicit:
</p>
<p>
<pre><span style="color:#2b91af;">ignoreLeft</span> :: <span style="color:#2b91af;">Either</span> b :~> <span style="color:#2b91af;">Maybe</span>
ignoreLeft = NT $ either (<span style="color:blue;">const</span> Nothing) Just</pre>
</p>
<p>
<code>ignoreLeft</code> is a natural transformation from the <code>Either b</code> functor to the <code>Maybe</code> functor.
</p>
<p>
Using a Visitor-based Either implementation (refactored from <a href="/2018/06/11/church-encoded-either">Church-encoded Either</a>), you can implement an equivalent <code>IgnoreLeft</code> natural transformation in C#:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IMaybe<R> <span style="color:#74531f;">IgnoreLeft</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>>(<span style="color:blue;">this</span> IEither<L, R> <span style="color:#1f377f;">source</span>)
{
<span style="color:#8f08c4;">return</span> source.Accept(<span style="color:blue;">new</span> IgnoreLeftVisitor<L, R>());
}
<span style="color:blue;">private</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">IgnoreLeftVisitor</span><<span style="color:#2b91af;">L</span>, <span style="color:#2b91af;">R</span>> : IEitherVisitor<L, R, IMaybe<R>>
{
<span style="color:blue;">public</span> IMaybe<R> <span style="color:#74531f;">VisitLeft</span>(L <span style="color:#1f377f;">left</span>)
{
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> Nothing<R>();
}
<span style="color:blue;">public</span> IMaybe<R> <span style="color:#74531f;">VisitRight</span>(R <span style="color:#1f377f;">right</span>)
{
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> Just<R>(right);
}
}</pre>
</p>
<p>
Here are some examples:
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:#a31515;">"OMG!"</span>)]
[InlineData(<span style="color:#a31515;">"Catastrophic failure"</span>)]
[InlineData(<span style="color:#a31515;">"Important information!"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">IgnoreLeftOfLeft</span>(<span style="color:blue;">string</span> <span style="color:#1f377f;">msg</span>)
{
IEither<<span style="color:blue;">string</span>, <span style="color:blue;">int</span>> <span style="color:#1f377f;">e</span> = <span style="color:blue;">new</span> Left<<span style="color:blue;">string</span>, <span style="color:blue;">int</span>>(msg);
IMaybe<<span style="color:blue;">int</span>> <span style="color:#1f377f;">actual</span> = e.IgnoreLeft();
Assert.Equal(<span style="color:blue;">new</span> Nothing<<span style="color:blue;">int</span>>(), actual);
}
[Theory]
[InlineData(0)]
[InlineData(1)]
[InlineData(2)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">IgnoreLeftOfRight</span>(<span style="color:blue;">int</span> <span style="color:#1f377f;">i</span>)
{
IEither<<span style="color:blue;">string</span>, <span style="color:blue;">int</span>> <span style="color:#1f377f;">e</span> = <span style="color:blue;">new</span> Right<<span style="color:blue;">string</span>, <span style="color:blue;">int</span>>(i);
IMaybe<<span style="color:blue;">int</span>> <span style="color:#1f377f;">actual</span> = e.IgnoreLeft();
Assert.Equal(<span style="color:blue;">new</span> Just<<span style="color:blue;">int</span>>(i), actual);
}</pre>
</p>
<p>
I'm not insisting that this natural transformation is always useful, but I've occasionally found myself in situations were it came in handy.
</p>
<h3 id="cc17fdc2940948978ff8bea222503965">
Natural transformations to or from Identity <a href="#cc17fdc2940948978ff8bea222503965" title="permalink">#</a>
</h3>
<p>
Some natural transformations are a little less obvious. If you have a <code><span style="color:#2b91af;">NotEmptyCollection</span><<span style="color:#2b91af;">T</span>></code> class as shown in my article <a href="/2017/12/11/semigroups-accumulate">Semigroups accumulate</a>, you could consider the <code>Head</code> property a natural transformation. It translates a <code><span style="color:#2b91af;">NotEmptyCollection</span><<span style="color:#2b91af;">T</span>></code> object to a <code>T</code> object.
</p>
<p>
This function also exists in Haskell, where it's simply called <a href="https://hackage.haskell.org/package/base/docs/Data-List-NonEmpty.html#v:head">head</a>.
</p>
<p>
The input type (<code><span style="color:#2b91af;">NotEmptyCollection</span><<span style="color:#2b91af;">T</span>></code> in C#, <code>NonEmpty a</code> in Haskell) is a functor, but the return type is a 'naked' value. That doesn't look like a functor.
</p>
<p>
True, a naked value isn't a functor, but it's isomorphic to <a href="/2018/09/03/the-identity-functor">the Identity functor</a>. In Haskell, you can make that relationship quite explicit:
</p>
<p>
<pre><span style="color:#2b91af;">headNT</span> :: <span style="color:blue;">NonEmpty</span> :~> <span style="color:blue;">Identity</span>
headNT = NT $ Identity . NonEmpty.<span style="color:blue;">head</span></pre>
</p>
<p>
While not particularly useful in itself, this demonstrates that it's possible to think of the <code>head</code> function as a natural transformation from <code>NonEmpty</code> to <code>Identity</code>.
</p>
<p>
Can you go the other way, too?
</p>
<p>
Yes, indeed. Consider <a href="/2022/03/28/monads">monadic return</a>. This is a function that takes a 'naked' value and wraps it in a particular monad (which is also, always, a functor). Again, you may consider the 'naked' value as isomorphic with the Identity functor, and thus <em>return</em> as a natural transformation:
</p>
<p>
<pre><span style="color:#2b91af;">returnNT</span> :: <span style="color:blue;">Monad</span> m <span style="color:blue;">=></span> <span style="color:blue;">Identity</span> :~> m
returnNT = NT $ <span style="color:blue;">return</span> . runIdentity</pre>
</p>
<p>
We might even consider if a function <code>a -> a</code> (in Haskell syntax) or <code>Func<T, T></code> (in C# syntax) might actually be a natural transformation from Identity to Identity... (It is, but only one such function exists.)
</p>
<h3 id="335f008f8f65439e8253bf96968af169">
Not all natural transformations are useful <a href="#335f008f8f65439e8253bf96968af169" title="permalink">#</a>
</h3>
<p>
Are are all functor combinations possible as natural transformations? Can you take any two functors and define one or more natural transformations? I'm not sure, but it seems clear that even if it is so, not all natural transformations are useful.
</p>
<p>
Famously, for example, you can't <a href="/2019/02/04/how-to-get-the-value-out-of-the-monad">get the value out of</a> the <a href="/2020/06/22/the-io-functor">IO functor</a>. Thus, at first glance it seems impossible to define a natural transformation <em>from</em> <code>IO</code> to some other functor. After all, how would you implement a natural transformation from <code>IO</code> to, say, the Identity functor. That seems impossible.
</p>
<p>
On the other hand, <em>this</em> is possible:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IEnumerable<T> <span style="color:#74531f;">Collapse</span><<span style="color:#2b91af;">T</span>>(<span style="color:blue;">this</span> IO<T> <span style="color:#1f377f;">source</span>)
{
<span style="color:#8f08c4;">yield</span> <span style="color:#8f08c4;">break</span>;
}</pre>
</p>
<p>
That's a natural transformation from <code>IO<T></code> to <code>IEnumerable<T></code>. It's possible to ignore the input value and <em>always</em> return an empty sequence. This natural transformation collapses all values to a single return value.
</p>
<p>
You can repeat this exercise with the Haskell <em>natural-transformation</em> package:
</p>
<p>
<pre><span style="color:#2b91af;">collapse</span> :: f :~> []
collapse = NT $ <span style="color:blue;">const</span> <span style="color:blue;">[]</span></pre>
</p>
<p>
This one collapses <em>any</em> <a href="https://bartoszmilewski.com/2014/01/14/functors-are-containers/">container</a> <code>f</code> to a List (<code>[]</code>), including <code>IO</code>:
</p>
<p>
<pre>λ collapse # (return 10 :: IO Integer)
[]
λ collapse # putStrLn "ploeh"
[]</pre>
</p>
<p>
Notice that in the second example, the <code>IO</code> action is <code>putStrLn "ploeh"</code>, which ought to produce the side effect of writing to the console. This is effectively prevented - instead the <code>collapse</code> natural transformation simply produces the empty list as output.
</p>
<p>
You can define a similar natural transformation from any functor (including <code>IO</code>) to Maybe. Try it as an exercise, in either C#, Haskell, or another language. If you want a Haskell-specific exercise, also define a natural transformation of this type: <code><span style="color:blue;">Alternative</span> g <span style="color:blue;">=></span> f :~> g</code>.
</p>
<p>
These natural transformations are possible, but hardly useful.
</p>
<h3 id="103675f9845b4aa1b32f59a510280a53">
Conclusion <a href="#103675f9845b4aa1b32f59a510280a53" title="permalink">#</a>
</h3>
<p>
A natural transformation is a function that translates one functor into another. Useful examples are safe or total collection indexing, including retrieving the first element from a collection. These natural transformations return a populated Maybe value if the element exists, and an empty Maybe value otherwise.
</p>
<p>
Other examples include translating Maybe values into Either values or Lists.
</p>
<p>
A natural transformation can easily involve loss of information. Even if you're able to retrieve the first element in a collection, the return value includes only that value, and not the rest of the collection.
</p>
<p>
A few natural transformations may be true isomorphisms, but in general, being able to go in both directions isn't required. In degenerate cases, a natural transformation may throw away all information and map to a general empty value like the empty List or an empty Maybe value.
</p>
<p>
<strong>Next:</strong> <a href="/2024/09/16/functor-products">Functor products</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Functor relationshipshttps://blog.ploeh.dk/2022/07/11/functor-relationships2022-07-11T08:09:00+00:00Mark Seemann
<div id="post">
<p>
<em>Sometimes you need to use more than one functor together.</em>
</p>
<p>
This article series is part of <a href="/2018/03/19/functors-applicatives-and-friends">a larger series of articles about functors, applicatives, and other mappable containers</a>. Particularly, you've seen examples of both <a href="/2018/03/22/functors">functors</a> and <a href="/2018/10/01/applicative-functors">applicative functors</a>.
</p>
<p>
There are situations where you can get by with a single functor. Many languages come with list comprehensions or other features to work with collections of values (C#, for instance, has <em>language-integrated query</em>, or: LINQ). The <a href="/2022/04/19/the-list-monad">list functor (and monad)</a> gives you a comprehensive API to manipulate multiple values. Likewise, you may write some parsing (<a href="https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate">or validation</a>) that exclusively uses the <a href="/2019/01/14/an-either-functor">Either functor</a>.
</p>
<p>
At other times, however, you may find yourself having to juggle more than one functor at once. Perhaps you are working with Either values, but one existing API returns <a href="/2018/03/26/the-maybe-functor">Maybe</a> values instead. Or perhaps you need to deal with Either values, but you're already working within <a href="/2018/09/24/asynchronous-functors">an asynchronous functor</a>.
</p>
<p>
There are several standard ways you can combine or transform combinations of functors.
</p>
<h3 id="70383da35af346f6b2dc3095a1bf7273">
A partial catalogue <a href="#70383da35af346f6b2dc3095a1bf7273" title="permalink">#</a>
</h3>
<p>
The following relationships often come in handy - particularly those that top this list:
</p>
<ul>
<li><a href="/2022/07/18/natural-transformations">Natural transformations</a></li>
<li><a href="/2024/09/16/functor-products">Functor products</a></li>
<li><a href="/2024/10/14/functor-sums">Functor sums</a></li>
<li><a href="/2024/10/28/functor-compositions">Functor compositions</a></li>
<li><a href="/2024/11/11/traversals">Traversals</a></li>
<li><a href="/2024/11/25/nested-monads">Nested monads</a></li>
</ul>
<p>
This list is hardly complete, and I may add to it in the future. Compared to some of the other subtopics of <a href="/2017/10/04/from-design-patterns-to-category-theory">the larger articles series on universal abstractions</a>, this catalogue is more heterogeneous. It collects various ways that functors can relate to each other, but uses disparate concepts and abstractions, rather than a single general idea (like a <a href="/2018/12/24/bifunctors">bifunctor</a>, <a href="/2017/10/06/monoids">monoid</a>, or <a href="/2019/04/29/catamorphisms">catamorphism</a>).
</p>
<p>
Keep in mind when reading these articles that all <a href="/2022/03/28/monads">monads</a> are also functors and applicative functors, so what applies to functors also applies to monads.
</p>
<h3 id="1a4f0850f8974b59ab44eed352b0daf7">
Conclusion <a href="#1a4f0850f8974b59ab44eed352b0daf7" title="permalink">#</a>
</h3>
<p>
You can use a single functor in isolation, or you can combine more than one. Most of the relationships described in this articles series work for all (lawful) functors, but traversals require applicative functors and functors that are 'foldable' (i.e. a catamorphism exists).
</p>
<p>
<strong>Next:</strong> <a href="/2022/07/18/natural-transformations">Natural transformations</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Get and Put Statehttps://blog.ploeh.dk/2022/07/04/get-and-put-state2022-07-04T09:15:00+00:00Mark Seemann
<div id="post">
<p>
<em>A pair of standard helper functions for the State monad. An article for object-oriented programmers.</em>
</p>
<p>
The <a href="/2022/06/20/the-state-monad">State monad</a> is completely defined by <a href="/2022/03/28/monads">its two defining functions</a> (<code>SelectMany</code> and <code>Return</code>). While you can get by without them, two additional helper functions (<em>get</em> and <em>put</em>) are so convenient that they're typically included. To be clear, they're not part of the State <em>monad</em> - rather, you can consider them part of what we may term a <em>standard State API</em>.
</p>
<p>
In short, <em>get</em> is a function that, as the name implies, gets the state while inside the State monad, and <em>put</em> replaces the state with a new value.
</p>
<p>
Later in this article, I'll show how to implement these two functions, as well as a usage example. Before we get to that, however, I want to show a motivating example. In other words, an example that doesn't use <em>get</em> and <em>put</em>.
</p>
<p>
The code shown in this article uses the C# State implementation from <a href="/2022/06/20/the-state-monad">the State monad article</a>.
</p>
<h3 id="62936af6492e474cb1f6afd919bc2cde">
Aggregator <a href="#62936af6492e474cb1f6afd919bc2cde" title="permalink">#</a>
</h3>
<p>
Imagine that you have to implement a simple <a href="https://www.enterpriseintegrationpatterns.com/Aggregator.html">Aggregator</a>.
</p>
<blockquote>
<p>
"How do we combine the results of individual but related messages so that they can be processed as a whole?"
</p>
<p>
[...] "Use a stateful filter, an <em>Aggregator</em>, to collect and store individual messages until it receives a complete set of related messages. Then, the <em>Aggregator</em> publishes a single message distilled from the individual messages."
</p>
<footer><cite><a href="/ref/eip">Enterprise Integration Patterns</a></cite></footer>
</blockquote>
<p>
The example that I'll give here is simplified and mostly focuses on how to use the State monad to implement the desired behaviour. The book Enterprise Integration Patterns starts with a simple example where messages arrive with a <a href="https://www.enterpriseintegrationpatterns.com/CorrelationIdentifier.html">correlation ID</a> as an integer. The message payload is also a an integer, just to keep things simple. The Aggregator should only publish an aggregated message once it has received three correlated messages.
</p>
<p>
Using the State monad, you could implement an Aggregator like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Aggregator</span> :
IState<IReadOnlyDictionary<<span style="color:blue;">int</span>, IReadOnlyCollection<<span style="color:blue;">int</span>>>, Maybe<Tuple<<span style="color:blue;">int</span>, <span style="color:blue;">int</span>, <span style="color:blue;">int</span>>>>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">int</span> correlationId;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">int</span> value;
<span style="color:blue;">public</span> <span style="color:#2b91af;">Aggregator</span>(<span style="color:blue;">int</span> <span style="color:#1f377f;">correlationId</span>, <span style="color:blue;">int</span> <span style="color:#1f377f;">value</span>)
{
<span style="color:blue;">this</span>.correlationId = correlationId;
<span style="color:blue;">this</span>.value = value;
}
<span style="color:blue;">public</span> Tuple<Maybe<Tuple<<span style="color:blue;">int</span>, <span style="color:blue;">int</span>, <span style="color:blue;">int</span>>>, IReadOnlyDictionary<<span style="color:blue;">int</span>, IReadOnlyCollection<<span style="color:blue;">int</span>>>> <span style="color:#74531f;">Run</span>(
IReadOnlyDictionary<<span style="color:blue;">int</span>, IReadOnlyCollection<<span style="color:blue;">int</span>>> <span style="color:#1f377f;">state</span>)
{
<span style="color:#8f08c4;">if</span> (state.TryGetValue(correlationId, <span style="color:blue;">out</span> var <span style="color:#1f377f;">coll</span>))
{
<span style="color:#8f08c4;">if</span> (coll.Count == 2)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">retVal</span> =
Tuple.Create(coll.ElementAt(0), coll.ElementAt(1), value);
<span style="color:blue;">var</span> <span style="color:#1f377f;">newState</span> = state.Remove(correlationId);
<span style="color:#8f08c4;">return</span> Tuple.Create(retVal.ToMaybe(), newState);
}
<span style="color:#8f08c4;">else</span>
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">newColl</span> = coll.Append(value);
<span style="color:blue;">var</span> <span style="color:#1f377f;">newState</span> = state.Replace(correlationId, newColl);
<span style="color:#8f08c4;">return</span> Tuple.Create(<span style="color:blue;">new</span> Maybe<Tuple<<span style="color:blue;">int</span>, <span style="color:blue;">int</span>, <span style="color:blue;">int</span>>>(), newState);
}
}
<span style="color:#8f08c4;">else</span>
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">newColl</span> = <span style="color:blue;">new</span>[] { value };
<span style="color:blue;">var</span> <span style="color:#1f377f;">newState</span> = state.Add(correlationId, newColl);
<span style="color:#8f08c4;">return</span> Tuple.Create(<span style="color:blue;">new</span> Maybe<Tuple<<span style="color:blue;">int</span>, <span style="color:blue;">int</span>, <span style="color:blue;">int</span>>>(), newState);
}
}
}</pre>
</p>
<p>
The <code>Aggregator</code> class implements the <code>IState<S, T></code> interface. The full generic type is something of a mouthful, though.
</p>
<p>
The state type (<code>S</code>) is <code>IReadOnlyDictionary<<span style="color:blue;">int</span>, IReadOnlyCollection<<span style="color:blue;">int</span>>></code> - in other words, a dictionary of collections. Each entry in the dictionary is keyed by a correlation ID. Each value is a collection of messages that belong to that ID. Keep in mind that, in order to keep the example simple, each message is just a number (an <code>int</code>).
</p>
<p>
The value to produce (<code>T</code>) is <code>Maybe<Tuple<<span style="color:blue;">int</span>, <span style="color:blue;">int</span>, <span style="color:blue;">int</span>>></code>. This code example uses <a href="/2022/04/25/the-maybe-monad">this implementation of the Maybe monad</a>. The value produced may or may not be empty, depending on whether the Aggregator has received all three required messages in order to produce an aggregated message. Again, for simplicity, the aggregated message is just a triple (a three-tuple).
</p>
<p>
The <code>Run</code> method starts by querying the <code>state</code> dictionary for an entry that corresponds to the <code>correlationId</code>. This entry may or may not be present. If the message is the first in a series of three, there will be no entry, but if it's the second or third message, the entry will be present.
</p>
<p>
In that case, the <code>Run</code> method checks the <code>Count</code> of the collection. If the <code>Count</code> is <code>2</code>, it means that two other messages with the same <code>correlationId</code> was already received. This means that the <code>Aggregator</code> is now handling the third and final message. Thus, it creates the <code>retVal</code> tuple, removes the entry from the dictionary to create the <code>newState</code>, and returns both.
</p>
<p>
If the <code>state</code> contains an entry for the <code>correlationId</code>, but the <code>Count</code> isn't <code>2</code>, the <code>Run</code> method updates the entry by appending the <code>value</code>, updating the state to <code>newState</code>, and returns that together with an empty Maybe value.
</p>
<p>
Finally, if there is no entry for the <code>correlationId</code>, the <code>Run</code> method creates a new collection containing only the <code>value</code>, adds it to the <code>state</code> dictionary, and returns the <code>newState</code> together with an empty Maybe value.
</p>
<h3 id="f8ed5ee8f1eb4f1499e283f2383962e9">
Message handler <a href="#f8ed5ee8f1eb4f1499e283f2383962e9" title="permalink">#</a>
</h3>
<p>
A message handler could be a background service that receives messages from a durable queue, a REST endpoint, or based on some other technology.
</p>
<p>
After it receives a message, a message handler would create a new instance of the <code>Aggregator</code>:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="color:#1f377f;">a</span> = <span style="color:blue;">new</span> Aggregator(msg.CorrelationId, msg.Value);</pre>
</p>
<p>
Since <code>Aggregator</code> implements the <code>IState<S, T></code> interface, the object <code>a</code> represents a stateful computation. A message handler might keep the current state in memory, or rehydrate it from some persistent storage system. Keep in mind that the state must be of the type <code>IReadOnlyDictionary<<span style="color:blue;">int</span>, IReadOnlyCollection<<span style="color:blue;">int</span>>></code>. Wherever it comes from, assume that this state is a variable called <code>s</code> (for <em>state</em>).
</p>
<p>
The message handler can now <code>Run</code> the stateful computation by supplying <code>s</code>:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="color:#1f377f;">t</span> = a.Run(s);</pre>
</p>
<p>
The result is a tuple where the first item is a Maybe value, and the second item is the new state.
</p>
<p>
The message handler can now publish the triple if the Maybe value is populated. In any case, it can update the 'current' state with the new state. That's a nice little <a href="/2020/03/02/impureim-sandwich">impureim sandwich</a>.
</p>
<p>
Notice how this design is different from a typical object-oriented solution. In object-oriented programming, you'd typically have an object than contains the state and then receives the run-time value as input to a method that might then mutate the state. <em>Data with behaviour</em>, as it's sometimes characterised.
</p>
<p>
The State-based computation turns such a design on its head. The computation closes over the run-time values, and the state is supplied as an argument to the <code>Run</code> method. This is an example of the shift of perspective often required to think functionally, rather than object-oriented. <em>That's</em> why it takes time learning Functional Programming (FP); it's not about syntax. It's a different way to think.
</p>
<p>
An object like the above <code>a</code> seems almost frivolous, since it's going to have a short lifetime. Calling code will create it only to call its <code>Run</code> method and then let it go out of scope to be garbage-collected.
</p>
<p>
Of course, in a language more attuned to FP like <a href="https://www.haskell.org">Haskell</a>, it's a different story:
</p>
<p>
<pre><span style="color:blue;">let</span> h = handle (corrId msg) (val msg)</pre>
</p>
<p>
Instead of creating an object using a constructor, you only pass the message values to a function called <code>handle</code>. The return value <code>h</code> is a State value that an overall message handler can then later run with a state <code>s</code>:
</p>
<p>
<pre><span style="color:blue;">let</span> (m, ns) = runState h s</pre>
</p>
<p>
The return value is a tuple where <code>m</code> is the Maybe value that may or may not contain the aggregated message; <code>ns</code> is the new state.
</p>
<h3 id="6904a24cf4a242c98b6732b5fcb8a3d1">
Is this better? <a href="#6904a24cf4a242c98b6732b5fcb8a3d1" title="permalink">#</a>
</h3>
<p>
Is this approach to state mutation better than the default kind of state mutation possible with most languages (including C#)? Why make things so complicated?
</p>
<p>
There's more than one answer. First, in a language like Haskell, state mutation is in general not possible. While you <em>can</em> do state mutation with <a href="/2020/06/08/the-io-container">the IO container</a> in Haskell, this sets you completely free. You don't want to be free, because with freedom comes innumerable ways to shoot yourself in the foot. <a href="https://www.dotnetrocks.com/?show=1542">Constraints liberate</a>.
</p>
<p>
While the IO monad allows uncontrolled state mutation (together with all sorts of other impure actions), the State monad constrains itself and callers to only one type of apparent mutation. The type of the state being 'mutated' is visible in the type system, and that's the only type of value you can 'mutate' (in Haskell, that is).
</p>
<p>
The State monad uses the type system to clearly communicate what the type of state is. Given a language like Haskell, or otherwise given sufficient programming discipline, you can tell from an object's type exactly what to expect.
</p>
<p>
This also goes a long way to explain why monads are such an important concept in Functional Programming. When discussing FP, a common question is: <em>How do you perform side effects?</em> The answer, as may be already implied by this article, is that you use <a href="/2022/03/28/monads">monads</a>. The State monad for local state mutation, and the IO monad for 'global' side effects.
</p>
<h3 id="598094a7efc64e2685cf6196415bca01">
Get <a href="#598094a7efc64e2685cf6196415bca01" title="permalink">#</a>
</h3>
<p>
Clearly you can write an implementation of <code>IState<S, T></code> like the above <code>Aggregator</code> class. Must we always write a class that implements the interface in order to work within the State monad?
</p>
<p>
Monads are all about composition. Usually, you can compose even complex behaviour from smaller building blocks. Just consider the <a href="/2022/04/19/the-list-monad">list monad</a>, which in C# is epitomised by the <code>IEnumerable<T></code> interface. You can write quite complex logic using the building blocks of <a href="https://docs.microsoft.com/dotnet/api/system.linq.enumerable.where">Where</a>, <a href="https://docs.microsoft.com/dotnet/api/system.linq.enumerable.select">Select</a>, <a href="https://docs.microsoft.com/dotnet/api/system.linq.enumerable.aggregate">Aggregate</a>, <a href="https://docs.microsoft.com/dotnet/api/system.linq.enumerable.zip">Zip</a>, etcetera.
</p>
<p>
Likewise, we should expect that to be the case with the State monad, and it is so. The useful extra combinators are <em>get</em> and <em>put</em>.
</p>
<p>
The <em>get</em> function enables a composition to retrieve the current state. Given the <code>IState<S, T></code> interface, you can implement it like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IState<S, S> <span style="color:#74531f;">Get</span><<span style="color:#2b91af;">S</span>>()
{
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> GetState<S>();
}
<span style="color:blue;">private</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">GetState</span><<span style="color:#2b91af;">S</span>> : IState<S, S>
{
<span style="color:blue;">public</span> Tuple<S, S> <span style="color:#74531f;">Run</span>(S <span style="color:#1f377f;">state</span>)
{
<span style="color:#8f08c4;">return</span> Tuple.Create(state, state);
}
}</pre>
</p>
<p>
The <code>Get</code> function represents a stateful computation that copies the <code>state</code> over to the 'value' dimension, so to speak. Notice that the return type is <code>IState<S, S></code>. Copying the state over to the position of the <code>T</code> generic type means that it becomes accessible to the expressions that run inside of <code>Select</code> and <code>SelectMany</code>.
</p>
<p>
You'll see an example once I rewrite the above <code>Aggregator</code> to be entirely based on composition, but in order to do that, I also need the <em>put</em> function.
</p>
<h3 id="224229b040ea417097a9de2ed1247953">
Put <a href="#224229b040ea417097a9de2ed1247953" title="permalink">#</a>
</h3>
<p>
The <em>put</em> function enables you to write a new state value to the underlying state dimension. The implementation in the current code base looks like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IState<S, Unit> <span style="color:#74531f;">Put</span><<span style="color:#2b91af;">S</span>>(S <span style="color:#1f377f;">s</span>)
{
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> PutState<S>(s);
}
<span style="color:blue;">private</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">PutState</span><<span style="color:#2b91af;">S</span>> : IState<S, Unit>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> S s;
<span style="color:blue;">public</span> <span style="color:#2b91af;">PutState</span>(S <span style="color:#1f377f;">s</span>)
{
<span style="color:blue;">this</span>.s = s;
}
<span style="color:blue;">public</span> Tuple<Unit, S> <span style="color:#74531f;">Run</span>(S <span style="color:#1f377f;">state</span>)
{
<span style="color:#8f08c4;">return</span> Tuple.Create(Unit.Default, s);
}
}</pre>
</p>
<p>
This implementation uses a <code>Unit</code> value to represent <code>void</code>. As usual, we have the problem in C-based languages that <code>void</code> isn't a value, but fortunately, <a href="/2018/01/15/unit-isomorphisms">unit is isomorphic to void</a>.
</p>
<p>
Notice that the <code>Run</code> method ignores the current <code>state</code> and instead replaces it with the new state <code>s</code>.
</p>
<h3 id="c347a208b9524a1b85756a66cf53999c">
Look, no classes! <a href="#c347a208b9524a1b85756a66cf53999c" title="permalink">#</a>
</h3>
<p>
The <code>Get</code> and <code>Put</code> functions are enough that we can now rewrite the functionality currently locked up in the <code>Aggregator</code> class. Instead of having to define a new <code>class</code> for that purpose, it's possible to compose our way to the same functionality by writing a function:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IState<IReadOnlyDictionary<<span style="color:blue;">int</span>, IReadOnlyCollection<<span style="color:blue;">int</span>>>, Maybe<Tuple<<span style="color:blue;">int</span>, <span style="color:blue;">int</span>, <span style="color:blue;">int</span>>>>
<span style="color:#74531f;">Aggregate</span>(<span style="color:blue;">int</span> <span style="color:#1f377f;">correlationId</span>, <span style="color:blue;">int</span> <span style="color:#1f377f;">value</span>)
{
<span style="color:#8f08c4;">return</span>
<span style="color:blue;">from</span> state <span style="color:blue;">in</span> State.Get<IReadOnlyDictionary<<span style="color:blue;">int</span>, IReadOnlyCollection<<span style="color:blue;">int</span>>>>()
<span style="color:blue;">let</span> mcoll = state.TryGetValue(correlationId)
<span style="color:blue;">let</span> retVal = <span style="color:blue;">from</span> coll <span style="color:blue;">in</span> mcoll.Where(<span style="color:#1f377f;">c</span> => c.Count == 2)
<span style="color:blue;">select</span> Tuple.Create(coll.ElementAt(0), coll.ElementAt(1), value)
<span style="color:blue;">let</span> newState = retVal
.Select(<span style="color:#1f377f;">_</span> => state.Remove(correlationId))
.GetValueOrFallback(
state.Replace(
correlationId,
mcoll
.Select(<span style="color:#1f377f;">coll</span> => coll.Append(value))
.GetValueOrFallback(<span style="color:blue;">new</span>[] { value })))
<span style="color:blue;">from</span> _ <span style="color:blue;">in</span> State.Put(newState)
<span style="color:blue;">select</span> retVal;
}</pre>
</p>
<p>
Okay, I admit that there's a hint of <a href="https://en.wikipedia.org/wiki/Code_golf">code golf</a> over this. It's certainly not <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> C#. To be clear, I'm not endorsing this style of C#; I'm only showing it to explain the abstraction offered by the State monad. <a href="/2019/03/18/the-programmer-as-decision-maker">Adopt such code at your own peril</a>.
</p>
<p>
The first observation to be made about this code example is that it's written entirely in query syntax. There's a good reason for that. Query syntax is syntactic sugar on top of <code>SelectMany</code>, so you could, conceivably, also write the above expression using method call syntax. However, in order to make early values available to later expressions, you'd have to pass a lot of tuples around. For example, the above expression makes repeated use of <code>mcoll</code>, so had you been using method call syntax instead of query syntax, you would have had to pass that value on to subsequent computations as one item in a tuple. Not impossible, but awkward. With query syntax, all values remain in scope so that you can refer to them later.
</p>
<p>
The expression starts by using <code>Get</code> to get the current state. The <code>state</code> variable is now available in the rest of the expression.
</p>
<p>
The <code>state</code> is a dictionary, so the next step is to query it for an entry that corresponds to the <code>correlationId</code>. I've used an overload of <code>TryGetValue</code> that returns a Maybe value, which also explains (I hope) the <code>m</code> prefix of <code>mcoll</code>.
</p>
<p>
Next, the expression filters <code>mcoll</code> and creates a triple if the <code>coll</code> has a <code>Count</code> of two. Notice that the nested query syntax expression (<code>from...select</code>) isn't running in the State monad, but rather in the <a href="/2022/04/25/the-maybe-monad">Maybe monad</a>. The result, <code>retVal</code>, is another Maybe value.
</p>
<p>
That takes care of the 'return value', but we also need to calculate the new state. This happens in a somewhat roundabout way. The reason that it's not more straightforward is that C# query syntax doesn't allow branching (apart from the ternary <code>?..:</code> operator) and (this version of the language, at least) has weak pattern-matching abilities.
</p>
<p>
Instead, it uses <code>retVal</code> and <code>mcoll</code> as indicators of how to update the state. If <code>retVal</code> is populated, it means that the <code>Aggregate</code> computation will return a triple, in which case it must <code>Remove</code> the entry from the state dictionary. On the other hand, if that's not the case, it must update the entry. Again, there are two options. If <code>mcoll</code> was already populated, it should be updated by appending the <code>value</code>. If not, a new entry containing only the <code>value</code> should be added.
</p>
<p>
Finally, the expression uses <code>Put</code> to save the new state, after which it returns <code>retVal</code>.
</p>
<p>
While this is far from idiomatic C# code, the point is that you can <em>compose</em> your way to the desired behaviour. You don't have to write a new class. Not that this is necessarily an improvement in C#. I'm mostly stating this to highlight a difference in philosophy.
</p>
<p>
Of course, this is all much more elegant in Haskell, where the same functionality is as terse as this:
</p>
<p>
<pre><span style="color:#2b91af;">handle</span> :: (<span style="color:blue;">Ord</span> k, <span style="color:blue;">MonadState</span> (<span style="color:blue;">Map</span> k [a]) m) <span style="color:blue;">=></span> k <span style="color:blue;">-></span> a <span style="color:blue;">-></span> m (<span style="color:#2b91af;">Maybe</span> (a, a, a))
handle correlationId value = <span style="color:blue;">do</span>
m <- get
<span style="color:blue;">let</span> (retVal, newState) =
<span style="color:blue;">case</span> Map.<span style="color:blue;">lookup</span> correlationId m <span style="color:blue;">of</span>
Just [x, y] -> (Just (x, y, value), Map.delete correlationId m)
Just _ -> (Nothing, Map.adjust (++ [value]) correlationId m)
Nothing -> (Nothing, Map.insert correlationId [value] m)
put newState
<span style="color:blue;">return</span> retVal</pre>
</p>
<p>
Notice that this implementation also makes use of <code>get</code> and <code>put</code>.
</p>
<h3 id="2263fd914af84e0082c77a5ad2b9afe6">
Modify <a href="#2263fd914af84e0082c77a5ad2b9afe6" title="permalink">#</a>
</h3>
<p>
The <code>Get</code> and <code>Put</code> functions are basic functions based on the State monad abstraction. These two functions, again, can be used to define some secondary helper functions, whereof <code>Modify</code> is one:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IState<S, Unit> <span style="color:#74531f;">Modify</span><<span style="color:#2b91af;">S</span>>(Func<S, S> <span style="color:#1f377f;">modify</span>)
{
<span style="color:#8f08c4;">return</span> Get<S>().SelectMany(<span style="color:#1f377f;">s</span> => Put(modify(s)));
}</pre>
</p>
<p>
It wasn't required for the above <code>Aggregate</code> function, but here's a basic unit test that demonstrates how it works:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">ModifyExample</span>()
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">x</span> = State.Modify((<span style="color:blue;">int</span> <span style="color:#1f377f;">i</span>) => i + 1);
<span style="color:blue;">var</span> <span style="color:#1f377f;">actual</span> = x.Run(1);
Assert.Equal(2, actual.Item2);
}</pre>
</p>
<p>
It can be useful if you need to perform an 'atomic' state modification. For a realistic Haskell example, you may want to refer to my article <a href="/2019/03/11/an-example-of-state-based-testing-in-haskell">An example of state-based testing in Haskell</a>.
</p>
<h3 id="b1e1ba440a4f4b6aad12bc6721ba6f8b">
Gets <a href="#b1e1ba440a4f4b6aad12bc6721ba6f8b" title="permalink">#</a>
</h3>
<p>
Another occasionally useful second-order helper function is <code>Gets</code>:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IState<S, T> <span style="color:#74531f;">Gets</span><<span style="color:#2b91af;">S</span>, <span style="color:#2b91af;">T</span>>(Func<S, T> <span style="color:#1f377f;">selector</span>)
{
<span style="color:#8f08c4;">return</span> Get<S>().Select(selector);
}</pre>
</p>
<p>
This function can be useful as a combinator if you need just a projection of the current state, instead of the whole state.
</p>
<p>
Here's another basic unit test as an example:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">GetsExample</span>()
{
IState<<span style="color:blue;">string</span>, <span style="color:blue;">int</span>> <span style="color:#1f377f;">x</span> = State.Gets((<span style="color:blue;">string</span> <span style="color:#1f377f;">s</span>) => s.Length);
Tuple<<span style="color:blue;">int</span>, <span style="color:blue;">string</span>> <span style="color:#1f377f;">actual</span> = x.Run(<span style="color:#a31515;">"bar"</span>);
Assert.Equal(Tuple.Create(3, <span style="color:#a31515;">"bar"</span>), actual);
}</pre>
</p>
<p>
While the above Aggregator example didn't require <code>Modify</code> or <code>Gets</code>, I wanted to include them here for completeness sake.
</p>
<h3 id="05a2046a71e94b85a3f5445df9669800">
F# <a href="#05a2046a71e94b85a3f5445df9669800" title="permalink">#</a>
</h3>
<p>
Most of the code shown in this article has been C#, with the occasional Haskell code. You can also implement the State monad, as well as the helper methods, in <a href="https://fsharp.org">F#</a>, where it'd feel more natural to dispense with interfaces and instead work directly with functions. To make things a little clearer, you may want to define a type alias:
</p>
<p>
<pre><span style="color:blue;">type</span> State<'a, 's> = ('s <span style="color:blue;">-></span> 'a * 's)</pre>
</p>
<p>
You can now define a <code>State</code> module that works directly with that kind of function:
</p>
<p>
<pre><span style="color:blue;">module</span> State =
<span style="color:blue;">let</span> run state (f : State<_, _>) = f state
<span style="color:blue;">let</span> lift x state = x, state
<span style="color:blue;">let</span> map f x state =
<span style="color:blue;">let</span> x', newState = run state x
f x', newState
<span style="color:blue;">let</span> bind (f : 'a <span style="color:blue;">-></span> State<'b, 's>) (x : State<'a, 's>) state =
<span style="color:blue;">let</span> x', newState = run state x
run newState (f x')
<span style="color:blue;">let</span> get state = state, state
<span style="color:blue;">let</span> put newState _ = (), newState
<span style="color:blue;">let</span> modify f = get |> map f |> bind put</pre>
</p>
<p>
This is code I originally wrote for <a href="https://codereview.stackexchange.com/a/139652/3878">a Code Review answer</a>. You can go there to see all the details, as well as a motivating example.
</p>
<p>
I see that I never got around to add a <code>gets</code> function... I'll leave that as an exercise.
</p>
<p>
In C#, I've based the example on an interface (<code>IState<S, T></code>), but it would also be possible to implement the State monad as extension methods on <code>Func<S, Tuple<T, S>></code>. Try it! It might be another good exercise.
</p>
<h3 id="994529b38af241b9ba6090f5d670bcc8">
Conclusion <a href="#994529b38af241b9ba6090f5d670bcc8" title="permalink">#</a>
</h3>
<p>
The State monad usually comes with a few helper functions: <em>get</em>, <em>put</em>, <em>modify</em>, and <em>gets</em>. They can be useful as combinators you can use to compose a stateful combination from smaller building blocks, just like you can use LINQ to compose complex queries over data.
</p>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Test Double clockshttps://blog.ploeh.dk/2022/06/27/test-double-clocks2022-06-27T05:44:00+00:00Mark Seemann
<div id="post">
<p>
<em>A short exploration of replacing the system clock with Test Doubles.</em>
</p>
<p>
In a comment to my article <a href="/2022/05/23/waiting-to-never-happen">Waiting to never happen</a>, <a href="https://github.com/ladeak">Laszlo</a> asks:
</p>
<blockquote>
<p>
"Why have you decided to make the date of the reservation relative to the SystemClock, and not the other way around? Would it be more deterministic to use a faked system clock instead?"
</p>
<footer><cite><a href="/2022/05/23/waiting-to-never-happen#c0b2e0bd555b5d5c5555c60bf11bff69">Laszlo</a></cite></footer>
</blockquote>
<p>
The short answer is that I hadn't thought of the alternative. Not in this context, at least.
</p>
<p>
It's a question worth exploring, which I will now proceed to do.
</p>
<h3 id="d7283861d9494c478b2bd8bb90218208">
Why IClock? <a href="#d7283861d9494c478b2bd8bb90218208" title="permalink">#</a>
</h3>
<p>
The article in question discusses a unit test, which ultimately arrives at this:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task <span style="color:#74531f;">ChangeDateToSoldOutDate</span>()
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">r1</span> =
Some.Reservation.WithDate(DateTime.Now.AddDays(8).At(20, 15));
<span style="color:blue;">var</span> <span style="color:#1f377f;">r2</span> = r1
.WithId(Guid.NewGuid())
.TheDayAfter()
.WithQuantity(10);
<span style="color:blue;">var</span> <span style="color:#1f377f;">db</span> = <span style="color:blue;">new</span> FakeDatabase();
db.Grandfather.Add(r1);
db.Grandfather.Add(r2);
<span style="color:blue;">var</span> <span style="color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ReservationsController(
<span style="color:blue;">new</span> SystemClock(),
<span style="color:blue;">new</span> InMemoryRestaurantDatabase(Grandfather.Restaurant),
db);
<span style="color:blue;">var</span> <span style="color:#1f377f;">dto</span> = r1.WithDate(r2.At).ToDto();
<span style="color:blue;">var</span> <span style="color:#1f377f;">actual</span> = <span style="color:blue;">await</span> sut.Put(r1.Id.ToString(<span style="color:#a31515;">"N"</span>), dto);
<span style="color:blue;">var</span> <span style="color:#1f377f;">oRes</span> = Assert.IsAssignableFrom<ObjectResult>(actual);
Assert.Equal(
StatusCodes.Status500InternalServerError,
oRes.StatusCode);
}</pre>
</p>
<p>
The keen reader may notice that the test passes a <code><span style="color:blue;">new</span> SystemClock()</code> to the <code>sut</code>. In case you're wondering what that is, here's the definition:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">SystemClock</span> : IClock
{
<span style="color:blue;">public</span> DateTime <span style="color:#74531f;">GetCurrentDateTime</span>()
{
<span style="color:#8f08c4;">return</span> DateTime.Now;
}
}</pre>
</p>
<p>
While it should be possible to extrapolate the <code>IClock</code> interface from this code snippet, here it is for the sake of completeness:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">interface</span> <span style="color:#2b91af;">IClock</span>
{
DateTime <span style="color:#74531f;">GetCurrentDateTime</span>();
}</pre>
</p>
<p>
Since such an interface exists, why not use it in unit tests?
</p>
<p>
That's possible, but I think it's worth highlighting what motivated this interface in the first place. If you're used to a certain style of test-driven development (TDD), you may think that interfaces exist in order to support TDD. They may. That's how I did TDD 15 years ago, but <a href="/2019/02/18/from-interaction-based-to-state-based-testing">not how I do it today</a>.
</p>
<p>
The motivation for the <code>IClock</code> interface is another. It's there because the system clock is a source of impurity, just like random number generators, database queries, and web service invocations. In order to support <a href="/2020/03/23/repeatable-execution">repeatable execution</a>, it's useful to log the inputs and outputs of impure actions. This includes the system clock.
</p>
<p>
The <code>IClock</code> interface doesn't exist in order to support unit testing, but in order to enable logging via the <a href="https://en.wikipedia.org/wiki/Decorator_pattern">Decorator</a> pattern:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">LoggingClock</span> : IClock
{
<span style="color:blue;">public</span> <span style="color:#2b91af;">LoggingClock</span>(ILogger<LoggingClock> <span style="color:#1f377f;">logger</span>, IClock <span style="color:#1f377f;">inner</span>)
{
Logger = logger;
Inner = inner;
}
<span style="color:blue;">public</span> ILogger<LoggingClock> Logger { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> IClock Inner { <span style="color:blue;">get</span>; }
<span style="color:blue;">public</span> DateTime <span style="color:#74531f;">GetCurrentDateTime</span>()
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">output</span> = Inner.GetCurrentDateTime();
Logger.LogInformation(
<span style="color:#a31515;">"{method}() => {output}"</span>,
nameof(GetCurrentDateTime),
output);
<span style="color:#8f08c4;">return</span> output;
}
}</pre>
</p>
<p>
All code in this article originates from the code base that accompanies <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>.
</p>
<p>
The web application is configured to decorate the <code>SystemClock</code> with the <code>LoggingClock</code>:
</p>
<p>
<pre>services.AddSingleton<IClock>(<span style="color:#1f377f;">sp</span> =>
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">logger</span> = sp.GetService<ILogger<LoggingClock>>();
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> LoggingClock(logger, <span style="color:blue;">new</span> SystemClock());
});</pre>
</p>
<p>
While the motivation for the <code>IClock</code> interface wasn't to support testing, now that it exists, would it be useful for unit testing as well?
</p>
<h3 id="3bf52db0c892409dbc372a83293a8992">
A Stub clock <a href="#3bf52db0c892409dbc372a83293a8992" title="permalink">#</a>
</h3>
<p>
As a first effort, we might try to add a <a href="http://xunitpatterns.com/Test%20Stub.html">Stub</a> clock:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ConstantClock</span> : IClock
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> DateTime dateTime;
<span style="color:blue;">public</span> <span style="color:#2b91af;">ConstantClock</span>(DateTime <span style="color:#1f377f;">dateTime</span>)
{
<span style="color:blue;">this</span>.dateTime = dateTime;
}
<span style="color:green;">// This default value is more or less arbitrary. I chose it as the date</span>
<span style="color:green;">// and time I wrote these lines of code, which also has the implication</span>
<span style="color:green;">// that it was immediately a time in the past. The actual value is,</span>
<span style="color:green;">// however, irrelevant.</span>
<span style="color:blue;">public</span> <span style="color:blue;">readonly</span> <span style="color:blue;">static</span> IClock Default =
<span style="color:blue;">new</span> ConstantClock(<span style="color:blue;">new</span> DateTime(2022, 6, 19, 9, 25, 0));
<span style="color:blue;">public</span> DateTime <span style="color:#74531f;">GetCurrentDateTime</span>()
{
<span style="color:#8f08c4;">return</span> dateTime;
}
}</pre>
</p>
<p>
This implementation always returns the same date and time. I called it <code>ConstantClock</code> for that reason.
</p>
<p>
It's trivial to replace the <code>SystemClock</code> with a <code>ConstantClock</code> in the above test:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task <span style="color:#74531f;">ChangeDateToSoldOutDate</span>()
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">clock</span> = ConstantClock.Default;
<span style="color:blue;">var</span> <span style="color:#1f377f;">r1</span> = Some.Reservation.WithDate(
clock.GetCurrentDateTime().AddDays(8).At(20, 15));
<span style="color:blue;">var</span> <span style="color:#1f377f;">r2</span> = r1
.WithId(Guid.NewGuid())
.TheDayAfter()
.WithQuantity(10);
<span style="color:blue;">var</span> <span style="color:#1f377f;">db</span> = <span style="color:blue;">new</span> FakeDatabase();
db.Grandfather.Add(r1);
db.Grandfather.Add(r2);
<span style="color:blue;">var</span> <span style="color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ReservationsController(
clock,
<span style="color:blue;">new</span> InMemoryRestaurantDatabase(Grandfather.Restaurant),
db);
<span style="color:blue;">var</span> <span style="color:#1f377f;">dto</span> = r1.WithDate(r2.At).ToDto();
<span style="color:blue;">var</span> <span style="color:#1f377f;">actual</span> = <span style="color:blue;">await</span> sut.Put(r1.Id.ToString(<span style="color:#a31515;">"N"</span>), dto);
<span style="color:blue;">var</span> <span style="color:#1f377f;">oRes</span> = Assert.IsAssignableFrom<ObjectResult>(actual);
Assert.Equal(
StatusCodes.Status500InternalServerError,
oRes.StatusCode);
}</pre>
</p>
<p>
As you can see, however, it doesn't seem to be enabling any simplification of the test. It still needs to establish that <code>r1</code> and <code>r2</code> relates to each other as required by the test case, as well as establish that they are valid reservations in the future.
</p>
<p>
You may protest that this is straw man argument, and that it would make the test both simpler and more readable if it would, instead, use explicit, hard-coded values. That's a fair criticism, so I'll get back to that later.
</p>
<h3 id="58fe168e8266401990d62641020db9ac">
Fragility <a href="#58fe168e8266401990d62641020db9ac" title="permalink">#</a>
</h3>
<p>
Before examining the above criticism, there's something more fundamental that I want to get out of the way. I find a Stub clock icky.
</p>
<p>
It works in this case, but may lead to fragile tests. What happens, for example, if another programmer comes by and adds code like this to the System Under Test (SUT)?
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="color:#1f377f;">now</span> = Clock.GetCurrentDateTime();
<span style="color:green;">// Sabotage:</span>
<span style="color:#8f08c4;">while</span> (Clock.GetCurrentDateTime() - now < TimeSpan.FromMilliseconds(1))
{ }</pre>
</p>
<p>
As the comment suggests, in this case it's pure sabotage. I don't think that anyone would deliberately do something like this. This code snippet even sits in an asynchronous method, and in .NET 'everyone' knows that if you want to suspend execution in an asynchronous method, you should use <a href="https://docs.microsoft.com/dotnet/api/system.threading.tasks.task.delay">Task.Delay</a>. I rather intend this code snippet to indicate that keeping time constant, as <code>ConstantClock</code> does, can be fatal.
</p>
<p>
If someone comes by and attempts to implement any kind of time-sensitive logic based on an injected <code>IClock</code>, the consequences could be dire. With the above sabotage, for example, the test hangs forever.
</p>
<p>
When I originally <a href="/2021/01/11/waiting-to-happen">refactored time-sensitive tests</a>, it was because I didn't appreciate having such ticking bombs lying around. A <code>ConstantClock</code> isn't <em>ticking</em> (that's the problem), but it still seems like a booby trap.
</p>
<h3 id="d58309147ef6450589ed0b5c69b33b30">
Offset clock <a href="#d58309147ef6450589ed0b5c69b33b30" title="permalink">#</a>
</h3>
<p>
It seems intuitive that a clock that doesn't go isn't very useful. Perhaps we can address that problem by setting the clock back. Not just a few hours, but days or years:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">OffsetClock</span> : IClock
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> TimeSpan offset;
<span style="color:blue;">private</span> <span style="color:#2b91af;">OffsetClock</span>(DateTime <span style="color:#1f377f;">origin</span>)
{
offset = DateTime.Now - origin;
}
<span style="color:blue;">public</span> <span style="color:blue;">static</span> IClock <span style="color:#74531f;">Start</span>(DateTime <span style="color:#1f377f;">at</span>)
{
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> OffsetClock(at);
}
<span style="color:green;">// This default value is more or less arbitrary. I just picked the same</span>
<span style="color:green;">// date and time as ConstantClock (which see).</span>
<span style="color:blue;">public</span> <span style="color:blue;">readonly</span> <span style="color:blue;">static</span> IClock Default =
Start(at: <span style="color:blue;">new</span> DateTime(2022, 6, 19, 9, 25, 0));
<span style="color:blue;">public</span> DateTime <span style="color:#74531f;">GetCurrentDateTime</span>()
{
<span style="color:#8f08c4;">return</span> DateTime.Now - offset;
}
}</pre>
</p>
<p>
An <code>OffsetClock</code> object starts ticking as soon as it's created, but it ticks at the same pace as the system clock. Time still passes. Rather than a Stub, I think that this implementation qualifies as a <a href="http://xunitpatterns.com/Fake%20Object.html">Fake</a>.
</p>
<p>
Using it in a test is as easy as using the <code>ConstantClock</code>:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task <span style="color:#74531f;">ChangeDateToSoldOutDate</span>()
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">clock</span> = OffsetClock.Default;
<span style="color:blue;">var</span> <span style="color:#1f377f;">r1</span> = Some.Reservation.WithDate(
clock.GetCurrentDateTime().AddDays(8).At(20, 15));
<span style="color:blue;">var</span> <span style="color:#1f377f;">r2</span> = r1
.WithId(Guid.NewGuid())
.TheDayAfter()
.WithQuantity(10);
<span style="color:blue;">var</span> <span style="color:#1f377f;">db</span> = <span style="color:blue;">new</span> FakeDatabase();
db.Grandfather.Add(r1);
db.Grandfather.Add(r2);
<span style="color:blue;">var</span> <span style="color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ReservationsController(
clock,
<span style="color:blue;">new</span> InMemoryRestaurantDatabase(Grandfather.Restaurant),
db);
<span style="color:blue;">var</span> <span style="color:#1f377f;">dto</span> = r1.WithDate(r2.At).ToDto();
<span style="color:blue;">var</span> <span style="color:#1f377f;">actual</span> = <span style="color:blue;">await</span> sut.Put(r1.Id.ToString(<span style="color:#a31515;">"N"</span>), dto);
<span style="color:blue;">var</span> <span style="color:#1f377f;">oRes</span> = Assert.IsAssignableFrom<ObjectResult>(actual);
Assert.Equal(
StatusCodes.Status500InternalServerError,
oRes.StatusCode);
}</pre>
</p>
<p>
The only change from the version that uses <code>ConstantClock</code> is the definition of the <code>clock</code> variable.
</p>
<p>
This test can withstand the above sabotage, because time still passes at normal pace.
</p>
<h3 id="06f0ace3806e41139e30106aeb77e0e1">
Explicit dates <a href="#06f0ace3806e41139e30106aeb77e0e1" title="permalink">#</a>
</h3>
<p>
Above, I promised to return to the criticism that the test is overly abstract. Now that it's possible to directly control time, perhaps it'd simplify the test if we could use hard-coded dates and times, instead of all that relative-time machinery:
</p>
<p>
<pre>[Fact]
<span style="color:blue;">public</span> <span style="color:blue;">async</span> Task <span style="color:#74531f;">ChangeDateToSoldOutDate</span>()
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">r1</span> = Some.Reservation.WithDate(
<span style="color:blue;">new</span> DateTime(2022, 6, 27, 20, 15, 0));
<span style="color:blue;">var</span> <span style="color:#1f377f;">r2</span> = r1
.WithId(Guid.NewGuid())
.WithDate(<span style="color:blue;">new</span> DateTime(2022, 6, 28, 20, 15, 0))
.WithQuantity(10);
<span style="color:blue;">var</span> <span style="color:#1f377f;">db</span> = <span style="color:blue;">new</span> FakeDatabase();
db.Grandfather.Add(r1);
db.Grandfather.Add(r2);
<span style="color:blue;">var</span> <span style="color:#1f377f;">sut</span> = <span style="color:blue;">new</span> ReservationsController(
OffsetClock.Start(at: <span style="color:blue;">new</span> DateTime(2022, 6, 19, 13, 43, 0)),
<span style="color:blue;">new</span> InMemoryRestaurantDatabase(Grandfather.Restaurant),
db);
<span style="color:blue;">var</span> <span style="color:#1f377f;">dto</span> = r1.WithDate(r2.At).ToDto();
<span style="color:blue;">var</span> <span style="color:#1f377f;">actual</span> = <span style="color:blue;">await</span> sut.Put(r1.Id.ToString(<span style="color:#a31515;">"N"</span>), dto);
<span style="color:blue;">var</span> <span style="color:#1f377f;">oRes</span> = Assert.IsAssignableFrom<ObjectResult>(actual);
Assert.Equal(
StatusCodes.Status500InternalServerError,
oRes.StatusCode);
}</pre>
</p>
<p>
Yeah, not really. This isn't worse, but neither is it better. It's the same size of code, and while the dates are now explicit (<a href="https://peps.python.org/pep-0020/">which, ostensibly, is better</a>), the reader now has to deduce the relationship between the clock offset, <code>r1</code>, and <code>r2</code>. I'm not convinced that this is an improvement.
</p>
<h3 id="62ae3d8257b64496ae7e82930438daa5">
Determinism <a href="#62ae3d8257b64496ae7e82930438daa5" title="permalink">#</a>
</h3>
<p>
In the original comment, Laszlo asked if it would be more deterministic to use a Fake system clock instead. This seems to imply that using the system clock is nondeterministic. Granted, it is when not used with care.
</p>
<p>
On the other hand, when used as shown in the initial test, it's <em>almost</em> deterministic. What time-related circumstances would have to come around for the test to fail?
</p>
<p>
The important precondition is that both reservations are in the future. The test picks a date eight days in the future. How might that precondition fail?
</p>
<p>
The only failure condition I can think of is if test execution somehow gets suspended <em>after</em> <code>r1</code> and <code>r2</code> are initialised, but <em>before</em> calling <code>sut.Put</code>. If you run the test on a laptop and put it to sleep for more than eight days, you may be so extremely lucky (or unlucky, depending on how you look at it) that this turns out to be the case. When execution resumes, the reservations are now in the past, and <code>sut.Put</code> will fail because of that.
</p>
<p>
I'm not convinced that this is at all likely, and it's not a scenario that I'm inclined to take into account.
</p>
<p>
And in any case, the test variation that uses <code>OffsetClock</code> is as 'vulnerable' to that scenario as the <code>SystemClock</code>. The only <a href="https://martinfowler.com/bliki/TestDouble.html">Test Double</a> not susceptible to such a scenario is <code>ConstantClock</code>, but as you have seen, this has more immediate problems.
</p>
<h3 id="0d09375b62d94fc287cb70568e0b7f1a">
Conclusion <a href="#0d09375b62d94fc287cb70568e0b7f1a" title="permalink">#</a>
</h3>
<p>
If you've read or seen a sufficient amount of time-travel science fiction, you know that it's not a good idea to try to change time. This also seems to be the case here. At least, I can see a few disadvantages to using Test Double clocks, but no clear advantages.
</p>
<p>
The above is, of course, only one example, but the concern of how to control the passing of time in unit testing isn't new to me. This is something that have been an issue on and off since I started with TDD in 2003. I keep coming back to the notion that the simplest solution is to use as many <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a> as possible, combined with a few impure actions that may require explicit use of dates and times relative to the system clock, as shown in previous articles.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="1d4fb68f3b5542e9897c82ad5c342a8a">
<div class="comment-author"><a href="https://github.com/ladeak">Laszlo</a> <a href="#1d4fb68f3b5542e9897c82ad5c342a8a">#</a></div>
<div class="comment-content">
<p>
I agree to most described in this post. However, I still find StubClock as my 'default' approach. I summarized the my reasons in this <a href="https://gist.github.com/ladeak/4f6ece31e941e28bafa1fca844d9fe3b">gist reply</a>.
</p>
</div>
<div class="comment-date">2022-06-30 7:43 UTC</div>
</div>
<div class="comment" id="5aa51b5af530409290c0944b07ce9b10">
<div class="comment-author"><a href="https://github.com/c-vetter">C. Vetter</a> <a href="#5aa51b5af530409290c0944b07ce9b10">#</a></div>
<div class="comment-content">
<figure class="quote">
<blockquote>
Yeah, not really. This isn't worse, but neither is it better. It's the same size of code, and while the dates are now explicit (<a href="https://peps.python.org/pep-0020/">which, ostensibly, is better</a>), the reader now has to deduce the relationship between the clock offset, <code>r1</code>, and <code>r2</code>. I'm not convinced that this is an improvement.
</blockquote>
</figure>
<p>I think, you overlook an important fact here: It depends™.</p>
<p>
As <a href="https://youtu.be/pSOBeD1GC_Y">Obi Wan taught us</a>, the point of view is often quite important.
In this case, yes it's true, the changed code is more explicit, from a certain point of view, because <em>the dates are now explicit</em>.
But: in the previous version, the <em>relationships were explicit</em>, whereas they have been rendered implicit now.
Which is better depends on the context, and in this context, I think the change is for the worse.
</p>
<p>
In this context, I think we care neither about the specific dates nor the relationship between both reservation dates.
All we care about is their relationship to the present and that they are different from each other.
With that in mind, I'd suggest to extend your <code>Some</code> container with more datetimes, in addition to <code>Now</code>,
like <code>FutureDate</code> and <code>OtherFutureDate</code>.
</p>
<p>
How those are constructed is generally of no relevance to the current test.
After all, if we wanted to be 100% sure about every piece,
we'd basically have re-write our entire runtime for each test case, which would just be madness.
And yes, I'd just construct them based on the current system time.
</p>
<p>
Regarding the overall argument, I'll say that dealing with time issues is generally a pain,
but most of the time, we don't really need to deal with what happens at specific times.
In those rare cases, yes, it makes sense to fix the test's time, but I'd leave that as a rare exception.
Partly because such tests tend to require some kind of wide-ranging mock that messes with a lot of things.
</p>
<p>
If we're talking about stuff like Y2k-proofing (if you're too young to remember, look it up, kids),
it bears thinking about actually creating a whole test machine (virtual or physical)
with an appropriate system time and just running your test suite on there.
In times of docker, I'll bet that that will be less pain in many cases than adding time-fixing mock stuff.
</p>
<p>
If passage of time is important, that's another bag of pain right there,
but I'd look into segregating that as much as possible from everything else.
If, for example, you need things to happen after 100 minutes have passed,
I'd prefer having a single time-based event system that
all other code can subscribe to and be called back when the interesting time arrives.
That way, I can test the consumers without actually travelling through time,
while testing the timer service will be reduced to making sure that events are fired at the appropriate times.
<small>The latter could even happen on a persistent test machine that just keeps running,
giving you insight on long-time behavior (just an idea, not a prescription 😉).</small>
</p>
</div>
<div class="comment-date">2024-05-08 11:13 UTC</div>
</div>
<div class="comment" id="eddb383b90f8416aaecbee862b808df9">
<div class="comment-author"><a href="/">Mark Seemann</a> <a href="#eddb383b90f8416aaecbee862b808df9">#</a></div>
<div class="comment-content">
<p>
Thank you for writing. It's a good point that the two alternatives that I compare really only represent different perspectives. As one part becomes more explicit, the other becomes more implicit, and vice versa. I hadn't though of that, so thank you for pointing that out.
</p>
<p>
Perhaps, as you suggest, a better API might be in order. I'm sure this isn't my last round around that block. I don't, however, want to add <code>Now</code>, <code>FutureDate</code>, etc. to the <code>Some</code> API. This module contains a collection of <a href="https://en.wikipedia.org/wiki/Equivalence_class">representative values of various equivalence classes</a>, and in order to ensure test repeatability, <a href="/2017/09/11/test-data-without-builders">they should be immutable</a> and deterministic. This rules out hiding a call to <code>DateTime.Now</code> behind such an API.
</p>
<p>
That doesn't, however, rule out other types of APIs. If you move to <a href="/2017/09/18/the-test-data-generator-functor">test data generators</a> instead, it might make sense to define a 'future date' generator.
</p>
<p>
All that said, I agree that the best way to test time-sensitive code is to model it in such a way that it's deterministic. I've <a href="/2020/02/24/discerning-and-maintaining-purity">touched on this topic before</a>, and most of the tests in the sample code base that accompanies <a href="/ctfiyh">Code That Fits in Your Head</a> takes that approach.
</p>
<p>
The test discussed in this article, however, sits <a href="/2023/07/31/test-driving-the-pyramids-top">higher in the Test Pyramid</a>, and for such <a href="/2012/06/27/FacadeTest">Facade Tests</a>, I'd like to exercise them in as realistic a context as possible. That's why I run them on the real system clock.
</p>
</div>
<div class="comment-date">2024-05-18 8:45 UTC</div>
</div>
</div>
<hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.The State monadhttps://blog.ploeh.dk/2022/06/20/the-state-monad2022-06-20T21:52:00+00:00Mark Seemann
<div id="post">
<p>
<em>Stateful computations as a monad. An example for object-oriented programmers.</em>
</p>
<p>
This article is an instalment in <a href="/2022/03/28/monads">an article series about monads</a>. A previous article described <a href="/2021/07/19/the-state-functor">the State functor</a>. As is the case with many (but not all) <a href="/2018/03/22/functors">functors</a>, this one also forms a monad.
</p>
<p>
This article continues where the State functor article stopped. It uses the same code base.
</p>
<h3 id="0309f41b7a434781a1640f18ac7cea30">
SelectMany <a href="#0309f41b7a434781a1640f18ac7cea30" title="permalink">#</a>
</h3>
<p>
A monad must define either a <em>bind</em> or <em>join</em> function. In C#, monadic bind is called <code>SelectMany</code>. Given the <code>IState<S, T></code> interface defined in the State functor article, you can implement <code>SelectMany</code> like this:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IState<S, T1> <span style="color:#74531f;">SelectMany</span><<span style="color:#2b91af;">S</span>, <span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">T1</span>>(
<span style="color:blue;">this</span> IState<S, T> <span style="color:#1f377f;">source</span>,
Func<T, IState<S, T1>> <span style="color:#1f377f;">selector</span>)
{
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> SelectManyState<S, T, T1>(source, selector);
}
<span style="color:blue;">private</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">SelectManyState</span><<span style="color:#2b91af;">S</span>, <span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">T1</span>> : IState<S, T1>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> IState<S, T> source;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> Func<T, IState<S, T1>> selector;
<span style="color:blue;">public</span> <span style="color:#2b91af;">SelectManyState</span>(
IState<S, T> <span style="color:#1f377f;">source</span>,
Func<T, IState<S, T1>> <span style="color:#1f377f;">selector</span>)
{
<span style="color:blue;">this</span>.source = source;
<span style="color:blue;">this</span>.selector = selector;
}
<span style="color:blue;">public</span> Tuple<T1, S> <span style="color:#74531f;">Run</span>(S <span style="color:#1f377f;">state</span>)
{
Tuple<T, S> <span style="color:#1f377f;">tuple</span> = source.Run(state);
IState<S, T1> <span style="color:#1f377f;">projection</span> = selector(tuple.Item1);
<span style="color:#8f08c4;">return</span> projection.Run(tuple.Item2);
}
}</pre>
</p>
<p>
As <code>SelectMany</code> implementations go, this is easily the most complex so far in this article series. While it looks complex, it really isn't. It's only complicated.
</p>
<p>
The three lines of code in the <code>Run</code> method does most of the work. The rest is essentially <a href="/2019/12/16/zone-of-ceremony">ceremony</a> required because C# doesn't have language features like object expressions.
</p>
<p>
To be fair, part of the boilerplate is also caused by using an interface instead of functions. In <a href="https://fsharp.org">F#</a> you could get by with as little as this:
</p>
<p>
<pre><span style="color:blue;">let</span> bind (f : 'a <span style="color:blue;">-></span> State<'b, 's>) (x : State<'a, 's>) state =
<span style="color:blue;">let</span> x', newState = run state x
run newState (f x')</pre>
</p>
<p>
I found an F# State implementation on my hard drive that turned out to originate from <a href="https://codereview.stackexchange.com/a/139652/3878">this Code Review answer</a>. You can go there to see it in context.
</p>
<p>
The <code>SelectMany</code> method first runs the <code>source</code> with the supplied <code>state</code>. This produces a tuple with a value and a new state. The value is <code>tuple.Item1</code>, which has the type <code>T</code>. The method proceeds to use that value to call the <code>selector</code>, which produces a new State value. Finally, the method runs the <code>projection</code> with the new state (<code>tuple.Item2</code>).
</p>
<p>
Monadic bind becomes useful when you have more than one function that returns a monadic value. Consider a code snippet like this:
</p>
<p>
<pre>IState<<span style="color:blue;">int</span>, <span style="color:blue;">string</span>> <span style="color:#1f377f;">s</span> =
<span style="color:blue;">new</span> Switch(<span style="color:#a31515;">"foo"</span>, <span style="color:#a31515;">"bar"</span>).SelectMany(<span style="color:#1f377f;">txt</span> => <span style="color:blue;">new</span> VowelExpander(txt));</pre>
</p>
<p>
This uses the silly <code>VowelExpander</code> class from <a href="/2021/07/19/the-state-functor">the State functor article</a>, as well as this new frivolous State implementation:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">Switch</span> : IState<<span style="color:blue;">int</span>, <span style="color:blue;">string</span>>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">string</span> option1;
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">string</span> option2;
<span style="color:blue;">public</span> <span style="color:#2b91af;">Switch</span>(<span style="color:blue;">string</span> <span style="color:#1f377f;">option1</span>, <span style="color:blue;">string</span> <span style="color:#1f377f;">option2</span>)
{
<span style="color:blue;">this</span>.option1 = option1;
<span style="color:blue;">this</span>.option2 = option2;
}
<span style="color:blue;">public</span> Tuple<<span style="color:blue;">string</span>, <span style="color:blue;">int</span>> <span style="color:#74531f;">Run</span>(<span style="color:blue;">int</span> <span style="color:#1f377f;">state</span>)
{
<span style="color:#8f08c4;">if</span> (0 <= state)
<span style="color:#8f08c4;">return</span> Tuple.Create(option1, state);
<span style="color:blue;">var</span> <span style="color:#1f377f;">newState</span> = 0;
<span style="color:#8f08c4;">return</span> Tuple.Create(option2, newState);
}
}</pre>
</p>
<p>
Both <code>Switch</code> and <code>VowelExpander</code> are State objects. If <code>SelectMany</code> didn't flatten as it goes, composition would have resulted in a nested State value. You'll see an example later in this article.
</p>
<h3 id="edc7d11f7fbd4b72b6b16858617fbbab">
Query syntax <a href="#edc7d11f7fbd4b72b6b16858617fbbab" title="permalink">#</a>
</h3>
<p>
Monads also enable query syntax in C# (just like they enable other kinds of syntactic sugar in languages like F# and <a href="https://www.haskell.org">Haskell</a>). As outlined in the <a href="/2022/03/28/monads">monad introduction</a>, however, you must add a special <code>SelectMany</code> overload:
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IState<S, T1> <span style="color:#74531f;">SelectMany</span><<span style="color:#2b91af;">S</span>, <span style="color:#2b91af;">T</span>, <span style="color:#2b91af;">U</span>, <span style="color:#2b91af;">T1</span>>(
<span style="color:blue;">this</span> IState<S, T> <span style="color:#1f377f;">source</span>,
Func<T, IState<S, U>> <span style="color:#1f377f;">k</span>,
Func<T, U, T1> <span style="color:#1f377f;">s</span>)
{
<span style="color:#8f08c4;">return</span> source.SelectMany(<span style="color:#1f377f;">x</span> => k(x).Select(<span style="color:#1f377f;">y</span> => s(x, y)));
}</pre>
</p>
<p>
As already predicted in the <a href="/2022/03/28/monads">monad introduction</a>, this boilerplate overload is always implemented in the same way. Only the signature changes. With it, you could instead write the above composition of <code>Switch</code> and <code>VowelExpander</code> like this:
</p>
<p>
<pre>IState<<span style="color:blue;">int</span>, <span style="color:blue;">string</span>> <span style="color:#1f377f;">s</span> = <span style="color:blue;">from</span> txt <span style="color:blue;">in</span> <span style="color:blue;">new</span> Switch(<span style="color:#a31515;">"foo"</span>, <span style="color:#a31515;">"bar"</span>)
<span style="color:blue;">from</span> txt1 <span style="color:blue;">in</span> <span style="color:blue;">new</span> VowelExpander(txt)
<span style="color:blue;">select</span> txt1;</pre>
</p>
<p>
That example requires a new variable (<code>txt1</code>). Given that it's often difficult to come up with good variable names, this doesn't look like much of an improvement. Still, it's possible.
</p>
<h3 id="048b4a03e28a4f7baca996319ca54627">
Join <a href="#048b4a03e28a4f7baca996319ca54627" title="permalink">#</a>
</h3>
<p>
In <a href="/2022/03/28/monads">the introduction</a> you learned that if you have a <code>Flatten</code> or <code>Join</code> function, you can implement <code>SelectMany</code>, and the other way around. Since we've already defined <code>SelectMany</code> for <code>IState<S, T></code>, we can use that to implement <code>Join</code>. In this article I use the name <code>Join</code> rather than <code>Flatten</code>. This is an arbitrary choice that doesn't impact behaviour. Perhaps you find it confusing that I'm inconsistent, but I do it in order to demonstrate that the behaviour is the same even if the name is different.
</p>
<p>
The concept of a monad is universal, but the names used to describe its components differ from language to language. What C# calls <code>SelectMany</code>, Scala calls <code>flatMap</code>, and what Haskell calls <code>join</code>, other languages may call <code>Flatten</code>.
</p>
<p>
You can always implement <code>Join</code> by using <code>SelectMany</code> with the identity function.
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IState<S, T> <span style="color:#74531f;">Join</span><<span style="color:#2b91af;">S</span>, <span style="color:#2b91af;">T</span>>(<span style="color:blue;">this</span> IState<S, IState<S, T>> <span style="color:#1f377f;">source</span>)
{
<span style="color:#8f08c4;">return</span> source.SelectMany(<span style="color:#1f377f;">x</span> => x);
}</pre>
</p>
<p>
Here's a way you can use it:
</p>
<p>
<pre>IState<<span style="color:blue;">int</span>, IState<<span style="color:blue;">int</span>, <span style="color:blue;">string</span>>> <span style="color:#1f377f;">nested</span> =
<span style="color:blue;">new</span> Switch(<span style="color:#a31515;">"foo"</span>, <span style="color:#a31515;">"bar"</span>).Select(<span style="color:#1f377f;">txt</span> => (IState<<span style="color:blue;">int</span>, <span style="color:blue;">string</span>>)<span style="color:blue;">new</span> VowelExpander(txt));
IState<<span style="color:blue;">int</span>, <span style="color:blue;">string</span>> <span style="color:#1f377f;">flattened</span> = nested.Join();</pre>
</p>
<p>
Of the three examples involving <code>Switch</code> and <code>VowelExpander</code>, this one most clearly emphasises the idea that a monad is a functor you can flatten. Using <code>Select</code> (instead of <code>SelectMany</code>) creates a nested State value when you try to compose the two together. With <code>Join</code> you can flatten them.
</p>
<p>
Not that doing it this way is better in any way. In practice, you'll mostly use either <code>SelectMany</code> or query syntax. It's a rare case when I use something like <code>Join</code>.
</p>
<h3 id="57fff60c94da4624965da2eb87e46f17">
Return <a href="#57fff60c94da4624965da2eb87e46f17" title="permalink">#</a>
</h3>
<p>
Apart from monadic bind, a monad must also define a way to put a normal value into the monad. Conceptually, I call this function <em>return</em> (because that's the name that Haskell uses):
</p>
<p>
<pre><span style="color:blue;">public</span> <span style="color:blue;">static</span> IState<S, T> <span style="color:#74531f;">Return</span><<span style="color:#2b91af;">S</span>, <span style="color:#2b91af;">T</span>>(T <span style="color:#1f377f;">x</span>)
{
<span style="color:#8f08c4;">return</span> <span style="color:blue;">new</span> ReturnState<S, T>(x);
}
<span style="color:blue;">private</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">ReturnState</span><<span style="color:#2b91af;">S</span>, <span style="color:#2b91af;">T</span>> : IState<S, T>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> T x;
<span style="color:blue;">public</span> <span style="color:#2b91af;">ReturnState</span>(T <span style="color:#1f377f;">x</span>)
{
<span style="color:blue;">this</span>.x = x;
}
<span style="color:blue;">public</span> Tuple<T, S> <span style="color:#74531f;">Run</span>(S <span style="color:#1f377f;">state</span>)
{
<span style="color:#8f08c4;">return</span> Tuple.Create(x, state);
}
}</pre>
</p>
<p>
Like the above <code>SelectMany</code> implementation, this is easily the most complicated <code>Return</code> implementation so far shown in this article series. Again, however, most of it is just boilerplate necessitated by C#'s lack of certain language features (most notably object expressions). And again, this is also somewhat unfair because I could have chosen to demonstrate the State monad using <code>Func<S, Tuple<T, S>></code> instead of an interface. (This would actually be a good exercise; try it!)
</p>
<p>
If you strip away all the boilerplate, the implementation is a trivial one-liner (the <code>Run</code> method), as also witnessed by this equivalent F# function that just returns a tuple:
</p>
<p>
<pre><span style="color:blue;">let</span> lift x state = x, state</pre>
</p>
<p>
When partially applied (<code>State.lift x</code>) that function returns a State value (i.e. a <code>'s <span style="color:blue;">-></span> 'a * 's</code> function).
</p>
<p>
Again, you can see that F# code in context in <a href="https://codereview.stackexchange.com/a/139652/3878">this Code Review answer</a>.
</p>
<h3 id="7cff668e401e40a4b7ee005683e341f6">
Left identity <a href="#7cff668e401e40a4b7ee005683e341f6" title="permalink">#</a>
</h3>
<p>
We need to identify the <em>return</em> function in order to examine <a href="/2022/04/11/monad-laws">the monad laws</a>. Now that this is done, let's see what the laws look like for the State monad, starting with the left identity law.
</p>
<p>
<pre>[Theory]
[InlineData(DayOfWeek.Monday, 2)]
[InlineData(DayOfWeek.Tuesday, 0)]
[InlineData(DayOfWeek.Wednesday, 19)]
[InlineData(DayOfWeek.Thursday, 42)]
[InlineData(DayOfWeek.Friday, 2112)]
[InlineData(DayOfWeek.Saturday, 90)]
[InlineData(DayOfWeek.Sunday, 210)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">LeftIdentity</span>(DayOfWeek <span style="color:#1f377f;">a</span>, <span style="color:blue;">int</span> <span style="color:#1f377f;">state</span>)
{
Func<DayOfWeek, IState<<span style="color:blue;">int</span>, DayOfWeek>> <span style="color:#1f377f;">@return</span> = State.Return<<span style="color:blue;">int</span>, DayOfWeek>;
Func<DayOfWeek, IState<<span style="color:blue;">int</span>, <span style="color:blue;">string</span>>> <span style="color:#1f377f;">h</span> = <span style="color:#1f377f;">dow</span> => <span style="color:blue;">new</span> VowelExpander(dow.ToString());
Assert.Equal(@return(a).SelectMany(h).Run(state), h(a).Run(state));
}</pre>
</p>
<p>
In order to compare the two State values, the test has to <code>Run</code> them and then compare the return values.
</p>
<h3 id="e47c76d943214e0399bf1ab4ad1e843c">
Right identity <a href="#e47c76d943214e0399bf1ab4ad1e843c" title="permalink">#</a>
</h3>
<p>
In a similar manner, we can showcase the right identity law as a test.
</p>
<p>
<pre>[Theory]
[InlineData( <span style="color:blue;">true</span>, 0)]
[InlineData( <span style="color:blue;">true</span>, 1)]
[InlineData( <span style="color:blue;">true</span>, 8)]
[InlineData(<span style="color:blue;">false</span>, 0)]
[InlineData(<span style="color:blue;">false</span>, 2)]
[InlineData(<span style="color:blue;">false</span>, 7)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">RightIdentity</span>(<span style="color:blue;">bool</span> <span style="color:#1f377f;">a</span>, <span style="color:blue;">int</span> <span style="color:#1f377f;">state</span>)
{
Func<<span style="color:blue;">bool</span>, IState<<span style="color:blue;">int</span>, <span style="color:blue;">string</span>>> <span style="color:#1f377f;">f</span> = <span style="color:#1f377f;">b</span> => <span style="color:blue;">new</span> VowelExpander(b.ToString());
Func<<span style="color:blue;">string</span>, IState<<span style="color:blue;">int</span>, <span style="color:blue;">string</span>>> <span style="color:#1f377f;">@return</span> = State.Return<<span style="color:blue;">int</span>, <span style="color:blue;">string</span>>;
IState<<span style="color:blue;">int</span>, <span style="color:blue;">string</span>> <span style="color:#1f377f;">m</span> = f(a);
Assert.Equal(m.SelectMany(@return).Run(state), m.Run(state));
}</pre>
</p>
<p>
As always, even a parametrised test constitutes no <em>proof</em> that the law holds. I show the tests to illustrate what the laws look like in 'real' code.
</p>
<h3 id="4a3a29ab66304776b7bc6676d1675762">
Associativity <a href="#4a3a29ab66304776b7bc6676d1675762" title="permalink">#</a>
</h3>
<p>
The last monad law is the associativity law that describes how (at least) three functions compose. We're going to need three functions. For the purpose of demonstrating the law, any three pure functions will do. While the following functions are silly and not at all 'realistic', they have the virtue of being as simple as they can be (while still providing a bit of variety). They don't 'mean' anything, so don't worry too much about their behaviour. It is, as far as I can tell, nonsensical. Later articles will show some more realistic examples of the State monad in action.
</p>
<p>
<pre><span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">F</span> : IState<DateTime, <span style="color:blue;">int</span>>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">string</span> s;
<span style="color:blue;">public</span> <span style="color:#2b91af;">F</span>(<span style="color:blue;">string</span> <span style="color:#1f377f;">s</span>)
{
<span style="color:blue;">this</span>.s = s;
}
<span style="color:blue;">public</span> Tuple<<span style="color:blue;">int</span>, DateTime> <span style="color:#74531f;">Run</span>(DateTime <span style="color:#1f377f;">state</span>)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">i</span> = s.Length;
<span style="color:blue;">var</span> <span style="color:#1f377f;">newState</span> = state.AddDays(i);
<span style="color:blue;">var</span> <span style="color:#1f377f;">newValue</span> = i + state.Month;
<span style="color:#8f08c4;">return</span> Tuple.Create(newValue, newState);
}
}
<span style="color:blue;">private</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">G</span> : IState<DateTime, TimeSpan>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> <span style="color:blue;">int</span> i;
<span style="color:blue;">public</span> <span style="color:#2b91af;">G</span>(<span style="color:blue;">int</span> <span style="color:#1f377f;">i</span>)
{
<span style="color:blue;">this</span>.i = i;
}
<span style="color:blue;">public</span> Tuple<TimeSpan, DateTime> <span style="color:#74531f;">Run</span>(DateTime <span style="color:#1f377f;">state</span>)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">newState</span> = state.AddYears(i - state.Year);
<span style="color:blue;">var</span> <span style="color:#1f377f;">newValue</span> = TimeSpan.FromMinutes(i);
<span style="color:#8f08c4;">return</span> Tuple.Create(newValue, newState);
}
}
<span style="color:blue;">public</span> <span style="color:blue;">sealed</span> <span style="color:blue;">class</span> <span style="color:#2b91af;">H</span> : IState<DateTime, <span style="color:blue;">bool</span>>
{
<span style="color:blue;">private</span> <span style="color:blue;">readonly</span> TimeSpan duration;
<span style="color:blue;">public</span> <span style="color:#2b91af;">H</span>(TimeSpan <span style="color:#1f377f;">duration</span>)
{
<span style="color:blue;">this</span>.duration = duration;
}
<span style="color:blue;">public</span> Tuple<<span style="color:blue;">bool</span>, DateTime> <span style="color:#74531f;">Run</span>(DateTime <span style="color:#1f377f;">state</span>)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">newState</span> = state - duration;
<span style="color:blue;">bool</span> <span style="color:#1f377f;">newValue</span> =
newState.DayOfWeek == DayOfWeek.Saturday || newState.DayOfWeek == DayOfWeek.Sunday;
<span style="color:#8f08c4;">return</span> Tuple.Create(newValue, newState);
}
}</pre>
</p>
<p>
Armed with these three classes, we can now demonstrate the Associativity law:
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:#a31515;">"foo"</span>, <span style="color:#a31515;">"2022-03-23"</span>)]
[InlineData(<span style="color:#a31515;">"bar"</span>, <span style="color:#a31515;">"2021-12-23T18:05"</span>)]
[InlineData(<span style="color:#a31515;">"baz"</span>, <span style="color:#a31515;">"1984-01-06T00:33"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">Associativity</span>(<span style="color:blue;">string</span> <span style="color:#1f377f;">a</span>, DateTime <span style="color:#1f377f;">state</span>)
{
Func<<span style="color:blue;">string</span>, IState<DateTime, <span style="color:blue;">int</span>>> <span style="color:#1f377f;">f</span> = <span style="color:#1f377f;">s</span> => <span style="color:blue;">new</span> F(s);
Func<<span style="color:blue;">int</span>, IState<DateTime, TimeSpan>> <span style="color:#1f377f;">g</span> = <span style="color:#1f377f;">i</span> => <span style="color:blue;">new</span> G(i);
Func<TimeSpan, IState<DateTime, <span style="color:blue;">bool</span>>> <span style="color:#1f377f;">h</span> = <span style="color:#1f377f;">ts</span> => <span style="color:blue;">new</span> H(ts);
IState<DateTime, <span style="color:blue;">int</span>> <span style="color:#1f377f;">m</span> = f(a);
Assert.Equal(
m.SelectMany(g).SelectMany(h).Run(state),
m.SelectMany(<span style="color:#1f377f;">x</span> => g(x).SelectMany(h)).Run(state));
}</pre>
</p>
<p>
The version of <a href="https://xunit.net">xUnit.net</a> I'm using for these examples (xUnit.net 2.2.0 on .NET Framework 4.6.1 - I may already have hinted that this is an old code base I had lying around) comes with a converter between <code>string</code> and <code>DateTime</code>, which explains why the <code>[InlineData]</code> can supply <code>DateTime</code> values as <code>string</code>s.
</p>
<h3 id="bf812a647a5c412da53f52f63989ca1d">
Conclusion <a href="#bf812a647a5c412da53f52f63989ca1d" title="permalink">#</a>
</h3>
<p>
For people coming from an imperative or object-oriented background, it can often be difficult to learn how to think 'functionally'. It took me years before I felt that I was on firm ground, and even so, I'm still learning new techniques today. As an imperative programmer, one often thinks in terms of state mutation.
</p>
<p>
In Functional Programming, there are often other ways to solve problems than in object-oriented programming, but if you can't think of a way, you can often reach for the fairly blunt hammer than the State monad is. It enables you to implement ostensibly state-based algorithms in a functional way.
</p>
<p>
This article was abstract, because I wanted to focus on the monad nature itself, rather than on practical applications. Future articles will provide more useful examples.
</p>
<p>
<strong>Next:</strong> <a href="/2022/11/14/the-reader-monad">The Reader monad</a>.
</p>
</div><hr>
This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>.Some thoughts on naming testshttps://blog.ploeh.dk/2022/06/13/some-thoughts-on-naming-tests2022-06-13T07:51:00+00:00Mark Seemann
<div id="post">
<p>
<em>What is the purpose of a test name?</em>
</p>
<p>
Years ago I was participating in a coding event where we <a href="/2020/01/13/on-doing-katas">did katas</a>. My pairing partner and I was doing <em>silent ping pong</em>. Ping-pong style pair programming is when one programmer writes a test and passes the keyboard to the partner, who writes enough code to pass the test. He or she then writes a new test and passes control back to the first person. In the <em>silent</em> variety, you're not allowed to talk. This is an exercise in communicating via code.
</p>
<p>
My partner wrote a test and I made it pass. After the exercise was over, we were allowed to talk to evaluate how it went, and my partner remarked that he'd been surprised that I'd implemented the opposite behaviour of what he'd intended. (It was something where there was a fork in the logic depending on a number being less than or greater to zero; I don't recall the exact details.)
</p>
<p>
We looked at the test that he had written, and sure enough: He'd named the test by clearly indicating one behaviour, but then he'd written an assertion that looked for the opposite behaviour.
</p>
<p>
I hadn't even noticed.
</p>
<p>
I didn't read the test name. I only considered the test body, because that's the executable specification.
</p>
<h3 id="211621375fca4b7590abdfe87a27542b">
How tests are named <a href="#211621375fca4b7590abdfe87a27542b" title="permalink">#</a>
</h3>
<p>
I've been thinking about test names ever since. What is the role of a test name?
</p>
<p>
In some languages, you write unit tests as methods or functions. That's how you do it in C#, Java, and many other languages:
</p>
<p>
<pre>[Theory]
[InlineData(<span style="color:#a31515;">"Home"</span>)]
[InlineData(<span style="color:#a31515;">"Calendar"</span>)]
[InlineData(<span style="color:#a31515;">"Reservations"</span>)]
<span style="color:blue;">public</span> <span style="color:blue;">void</span> <span style="color:#74531f;">WithControllerHandlesSuffix</span>(<span style="color:blue;">string</span> <span style="color:#1f377f;">name</span>)
{
<span style="color:blue;">var</span> <span style="color:#1f377f;">sut</span> = <span style="color:blue;">new</span> UrlBuilder();
<span style="color:blue;">var</span> <span style="color:#1f377f;">actual</span> = sut.WithController(name + <span style="color:#a31515;">"Controller"</span>);
<span style="color:blue;">var</span> <span style="color:#1f377f;">expected</span> = sut.WithController(name);
Assert.Equal(expected, actual);
}</pre>
</p>
<p>
Usually, when we define new class methods, we've learned that naming is important. Truly, this applies to test methods, too?
</p>
<p>
Yet, other languages don't use class methods to define tests. The most common JavaScript frameworks don't, and <a href="/2018/05/07/inlined-hunit-test-lists">neither does Haskell HUnit</a>. Instead, tests are simply values with labels.
</p>
<p>
This hints at something that may be important.
</p>
<h3 id="03845212a1284263a6cdf452e7fd0bea">
The role of test names <a href="#03845212a1284263a6cdf452e7fd0bea" title="permalink">#</a>
</h3>
<p>
If tests aren't necessarily class methods, then what role do names play?
</p>
<p>
Usually, when considering method names, it's important to provide a descriptive name in order to help client developers. A client developer writing calling code must figure out which methods to call on an object. Good names help with that.
</p>
<p>
Automated tests, on the other hand, have no explicit callers. There's no client developer to communicate with. Instead, a test framework such as <a href="https://xunit.net">xUnit.net</a> scans the public API of a test suite and automatically finds the test methods to execute.
</p>
<p>
The most prominent motivation for writing good method names doesn't apply here. We must reevaluate the role of test names, also keeping in mind that with some frameworks, in some languages, tests aren't even methods.
</p>
<h3 id="06e15a7c7d90430685afd0182b878546">
Mere anarchy is loosed upon the world <a href="#06e15a7c7d90430685afd0182b878546" title="permalink">#</a>
</h3>
<p>
The story that introduces this article has a point. When considering a test, I tend to go straight to the test body. I only read the test name if I find the test body unclear.
</p>
<p>
Does this mean that the test name is irrelevant? Should we simply number the tests: <code>Test1</code>, <code>Test212</code>, and so on?
</p>
<p>
That hardly seems like a good idea - not even to a person like me who considers the test name secondary to the test definition.
</p>
<p>
This begs the question, though: If <code>Test42</code> isn't a good name, then what does a good test name look like?
</p>
<h3 id="bf895999bc034985ab10e839a69be5d4">
Naming schemes <a href="#bf895999bc034985ab10e839a69be5d4" title="permalink">#</a>
</h3>
<p>
Various people suggest naming schemes. In the .NET world many people like <a href="https://osherove.com/blog/2005/4/3/naming-standards-for-unit-tests.html">Roy Osherove's naming standard for unit tests</a>: <code>[UnitOfWork_StateUnderTest_ExpectedBehavior]</code>. I find it too verbose to my tastes, but my point isn't to attack this particular naming scheme. In my <a href="/2016/02/10/types-properties-software">Types + Properties = Software</a> article series, I experimented with using a poor man's version of <em>Given When Then:</em>
</p>
<p>
<pre>[<<span style="color:#4ec9b0;">Property</span>>]
<span style="color:blue;">let</span> <span style="color:navy;">``Given deuce when player wins then score is correct``</span>
(winner : <span style="color:#4ec9b0;">Player</span>) =
<span style="color:blue;">let</span> actual : <span style="color:#4ec9b0;">Score</span> = <span style="color:navy;">scoreWhenDeuce</span> winner
<span style="color:blue;">let</span> expected = <span style="color:navy;">Advantage</span> winner
expected =! actual</pre>
</p>
<p>
It was a worthwhile experiment, but I don't think I ever used that style again. After all, <em>Given When Then</em> is just another way of saying <em>Arrange Act Assert</em>, and I already <a href="/2013/06/24/a-heuristic-for-formatting-code-according-to-the-aaa-pattern">organise my test code according to the AAA pattern</a>.
</p>
<p>
These days, I don't follow any particular naming scheme, but I do keep a guiding principle in mind.
</p>
<h3 id="4db32b0268064abda82e32fb5724d7b6">
Information channel <a href="#4db32b0268064abda82e32fb5724d7b6" title="permalink">#</a>
</h3>
<p>
A test name, whether it's a method name or a label, is an opportunity to communicate with the reader of the code. You can communicate via code, via names, via comments, and so on. A test name is more like a mandatory comment than a normal method name.
</p>
<p>
Books like <a href="/ref/clean-code">Clean Code</a> make a compelling case that comments should be secondary to good names. The point isn't that all comments are bad, but that some are:
</p>
<p>
<pre><span style="color:blue;">var</span> <span style="color:#1f377f;">z</span> = x + y; <span style="color:green;">// Add x and y</span></pre>
</p>
<p>
It's rarely a good idea to add a comment that describes what the code <em>does</em>. This should already be clear from the code itself.
</p>
<p>
A comment can still provide important information that code can't easily do. It may explain the <em>purpose</em> of the code. I try to take this into account when naming tests: Not repeat what the code does, but suggest a hint about its raison d'être.
</p>
<p>
I try to strike a balance between <code>Test2112</code> and <code>Given deuce when player wins then score is correct</code>. I view the task of naming tests as equivalent to producing section headings in an article like this one. They offer a hint at the kind of information that might be available in the section (<em>The role of test names</em>, <em>How tests are named</em>, or <em>Information channel</em>), but sometimes they're more tongue-in-cheek than helpful (<em>Mere anarchy is loosed upon the world</em>). I tend to name tests with a similar degree of precision (or lack thereof): <code>HomeReturnsJson</code>, <code>NoHackingOfUrlsAllowed</code>, <code>GetPreviousYear</code>, etcetera.
</p>
<p>
These names, in isolation, hardly tell you what the tests are about. I'm okay with that, because I don't think that they have to.
</p>
<h3 id="cd5ccd57ea8a41dbb78f8b5c32985311">
What do you use test names for? <a href="#cd5ccd57ea8a41dbb78f8b5c32985311" title="permalink">#</a>
</h3>
<p>
I occasionally discuss this question with other people. It seems to me that it's one of the topics where Socratic questioning breaks down:
</p>
<p>
Them: <em>How do you name tests?</em>
</p>
<p>
Me: <em>I try to strike a balance between information and not repeating myself.</em>
</p>
<p>
Them: <em>How do you like this particular naming scheme?</em>
</p>
<p>
Me: <em>It looks verbose to me. It seems to be repeating what's already in the test code.</em>
</p>
<p>
Them: <em>I like to read the test name to see what the test does.</em>
</p>
<p>
Me: <em>If the name and test code disagree, which one is right?</em>
</p>
<p>
Them: <em>The test name should follow the naming scheme.</em>
</p>
<p>
Me: <em>Why do you find that important?</em>
</p>
<p>
Them: <em>It's got... electrolytes.</em>
</p>
<p>
Okay, I admit that <a href="https://en.wikipedia.org/wiki/Satire">I'm a being uncharitable</a>, but the point that I'm after is that test names are <em>different</em>, yet most people seem to reflect little on this.
</p>
<p>
When do you read test names?
</p>
<p>
Personally, I rarely read or otherwise <em>use</em> test names. When I'm writing a test, I also write the name, but at that point I don't really <em>need</em> the name. Sometimes I start with a placeholder name (<code>Foo</code>), write the test, and change the name once I understand what the test does.
</p>
<p>
Once a test is written, ideally it should just be sitting there as a regression test. <a href="/2013/04/02/why-trust-tests">The less you touch it, the better you can trust it</a>.
</p>
<p>
You may have hundreds or thousands of tests. When you run your test suite, you care about the outcome. Did it pass or fail? The outcome is the result of a Boolean <em>and</em> operation. The test suite only passes when all tests pass, but you don't have to look at each test result. The aggregate result is enough as long as the test suite passes.
</p>
<p>
You only need to look at a test when it fails. When this happens, most tools enable you to go straight to the failing test by clicking on it. (And if this isn't possible, I usually find it easier to navigate to the failing test either by line number or by copying the test name and navigating to it by pasting the name into my editor's navigation UI.) You don't really need the name to find a failing test. If the test was named <code>Test1337</code> it would be as easy to find as if it was named <code>Given deuce when player wins then score is correct</code>.
</p>
<p>
Once I look at a failing test, I start by looking at the test code and comparing that to the assertion message.
</p>
<p>
Usually, when a test fails, it breaks for a reason. A code change caused the test to fail. Often, the offending change was one you did ten seconds earlier. Armed with an assertion message and the test code, I usually understand the problem right away.
</p>
<p>
In rare cases the test is one that I've never seen before, and I'm confused about its purpose. This is when I read the test name. At that point, I appreciate if the name is helpful.
</p>
<h3 id="62f1324f399243c393c77a3cbb46f173">
Conclusion <a href="#62f1324f399243c393c77a3cbb46f173" title="permalink">#</a>
</h3>
<p>
I'm puzzled that people are so <a href="/2021/03/22/the-dispassionate-developer">passionate</a> about test names. I consider them the least important part of a test. A name isn't irrelevant, but I find the test code more important. The code is an executable specification. It expresses the desired truth about a system.
</p>
<p>
Test code is code that has the same lifetime as the production code. It pays to structure it as well as the production code. If a test is well-written, you should be able to understand it without reading its name.
</p>
<p>
That's an ideal, and in reality we are fallible. Thus, providing a helpful name gives the reader a second chance to understand a test. The name shouldn't, however, be your first priority.
</p>
</div>
<div id="comments">
<hr>
<h2 id="comments-header">
Comments
</h2>
<div class="comment" id="8e8c4bda169848e988926028933189d0">
<div class="comment-author"><a href="http://github.com/neongraal">Struan Judd</a> <a href="#8e8c4bda169848e988926028933189d0">#</a></div>
<div class="comment-content">
<p>
I often want run selected tests from the command line and thus use the test runner's abilty to filter all available tests. Where the set of tests I want to run is all the tests below some point in the heirarchy of tests I can filter by the common prefix, or the test class name.
</p>
<p>
But I also often find myself wanting to run a set of tests that meet some functional criteria, e.g Validation approval tests, or All the tests for a particular feature across all the levels of the code base. In this case if the tests follow a naming convention where such test attributes are included in the test name, either via the method or class name, then such test filtering is possible.
</p>
</div>
<div class="comment-date">2022-06-13 10:21 UTC</div>
</div>
<div class="comment" id="d1a000d5eb8349a586e19dbb231ce744">
<div class="comment-author"><a href="https://github.com/flakey-bit">Eddie Stanley</a> <a href="#d1a000d5eb8349a586e19dbb231ce744">#</a></div>
<div class="comment-content">
<p>
Mark, are you a <a href="https://www.thoughtworks.com/insights/blog/mockists-are-dead-long-live-classicists" target="_blank">Classicist or a Mockist</a>? I'm going to go out on a limb here and say you're probably a classicist.
Test code written in a classicist style probably conveys the intent well already. I think code written in a Mockist style may not convey the intent as well, hence the test name (or a comment) becomes more useful to convey that information.
</p>
</div>
<div class="comment-date">2022-06-13 23:40 UTC</div>
</div>
<div class="comment" id="63217ba97aed47cb877c9dd56e53ae31">
<div class="comment-author"><a href="https://www.chriskrycho.com">Chris Krycho</a> <a href="#63217ba97aed47cb877c9dd56e53ae31">#</a></div>
<div class="comment-content">
<p>
There are (at least) two ways of using test names (as well as test <em>module</em> names, as suggested by Struan Judd) that we make extensive use of in the LinkedIn code base and which I have used in every code base I have ever written tests for:
</p>
<ul>
<li>
<p>
<strong>To indicate the <em>intent</em> of the test.</strong> It is well and good to say that the assertions should convey the conditions, but often it is not clear <em>why</em> a condition is intended to hold. Test names (and descriptive strings on the assertions) can go a very long way, especially when working in a large and/or unfamiliar code base, to understand whether the assertion remains relevant, or <em>how</em> it is relevant.
</p>
<p>
Now, granted: it is quite possible for those to get out of date, much as comments do. However, just as good comments remain valuable even though there is a risk of stale comments, good test names can be valuable even though they can also become stale.
</p>
<p>
The key, for me, is exactly the same as good comments—and you could argue that comments therefore obviate the need for test names. If we only cared about tests from the POV of reading the code, I would actually agree! However, because we often read the tests a