ploeh blog https://blog.ploeh.dk danish software design en-us Mark Seemann Mon, 30 Jun 2025 05:54:41 UTC Mon, 30 Jun 2025 05:54:41 UTC Song recommendations from Haskell combinators https://blog.ploeh.dk/2025/06/30/song-recommendations-from-haskell-combinators/ Mon, 30 Jun 2025 05:54:00 UTC <div id="post"> <p> <em>Traversing lists of IO. A refactoring.</em> </p> <p> This article is part of a series named <a href="/2025/04/07/alternative-ways-to-design-with-functional-programming">Alternative ways to design with functional programming</a>. In the <a href="/2025/06/23/song-recommendations-from-f-combinators">previous article</a>, you saw how to refactor the example code base to a composition of standard <a href="https://fsharp.org/">F#</a> combinators. It's a pragmatic solution to the problem of dealing with lots of data in a piecemeal fashion, but although it uses concepts and programming constructs from functional programming, I don't consider it a proper <a href="/2018/11/19/functional-architecture-a-definition">functional architecture</a>. </p> <p> You'd expect the <a href="https://www.haskell.org/">Haskell</a> version to be the most idiomatic of the three language variations, but ironically, I had more trouble making the code in this article look nice than I had with the F# variation. You'll see what the problem is later, but it boils down to a combination of Haskell's right-to-left default composition order, and precedence rules of some of the operators. </p> <p> Please consult the previous articles for context about the example code base. The code shown in this article is from the <em>combinators</em> Git branch. It refactors the code shown in the article <a href="/2025/04/21/porting-song-recommendations-to-haskell">Porting song recommendations to Haskell</a>. </p> <p> The goal is to extract <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a> from the overall recommendations algorithm and compose them using standard combinators, such as <code>=&lt;&lt;</code>, <code>&lt;$&gt;</code>, and <code>traverse</code>. </p> <h3 id="1a8e6b1d3799463d9c70332949317048"> Getting rid of local mutation <a href="#1a8e6b1d3799463d9c70332949317048">#</a> </h3> <p> My first goal was to get rid of the <code>IORef</code>-based local mutation shown in the <a href="/2025/04/21/porting-song-recommendations-to-haskell">'baseline' code base</a>. That wasn't too difficult. If you're interested in the <a href="https://www.industriallogic.com/blog/whats-this-about-micro-commits/">micro-commits</a> I made to get to that milestone, you can consult the Git repository. The interim result looked like this: </p> <p> <pre>getRecommendations&nbsp;srvc&nbsp;un&nbsp;=&nbsp;<span style="color:blue;">do</span> &nbsp;&nbsp;<span style="color:green;">--&nbsp;1.&nbsp;Get&nbsp;user&#39;s&nbsp;own&nbsp;top&nbsp;scrobbles </span>&nbsp;&nbsp;<span style="color:green;">--&nbsp;2.&nbsp;Get&nbsp;other&nbsp;users&nbsp;who&nbsp;listened&nbsp;to&nbsp;the&nbsp;same&nbsp;songs </span>&nbsp;&nbsp;<span style="color:green;">--&nbsp;3.&nbsp;Get&nbsp;top&nbsp;scrobbles&nbsp;of&nbsp;those&nbsp;users </span>&nbsp;&nbsp;<span style="color:green;">--&nbsp;4.&nbsp;Aggregate&nbsp;the&nbsp;songs&nbsp;into&nbsp;recommendations </span> &nbsp;&nbsp;<span style="color:green;">--&nbsp;Impure </span>&nbsp;&nbsp;scrobbles&nbsp;&lt;-&nbsp;getTopScrobbles&nbsp;srvc&nbsp;un &nbsp;&nbsp;<span style="color:green;">--&nbsp;Pure </span>&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;scrobblesSnapshot&nbsp;=&nbsp;<span style="color:blue;">take</span>&nbsp;100&nbsp;$&nbsp;sortOn&nbsp;(Down&nbsp;.&nbsp;scrobbleCount)&nbsp;scrobbles &nbsp;&nbsp;<span style="color:green;">--&nbsp;Impure </span>&nbsp;&nbsp;recommendations&nbsp;&lt;- &nbsp;&nbsp;&nbsp;&nbsp;join&nbsp;&lt;$&gt; &nbsp;&nbsp;&nbsp;&nbsp;traverse&nbsp;(\scrobble&nbsp;-&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">fmap</span>&nbsp;join&nbsp;$ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;traverse&nbsp;(\otherListener&nbsp;-&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">fmap</span>&nbsp;scrobbledSong&nbsp;. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">take</span>&nbsp;10&nbsp;. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sortOn&nbsp;(Down&nbsp;.&nbsp;songRating&nbsp;.&nbsp;scrobbledSong)&nbsp;. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">filter</span>&nbsp;(songHasVerifiedArtist&nbsp;.&nbsp;scrobbledSong)&nbsp;&lt;$&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;getTopScrobbles&nbsp;srvc&nbsp;(userName&nbsp;otherListener))&nbsp;. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">take</span>&nbsp;20&nbsp;&lt;$&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sortOn&nbsp;(Down&nbsp;.&nbsp;userScrobbleCount)&nbsp;. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">filter</span>&nbsp;((10_000&nbsp;&lt;=)&nbsp;.&nbsp;userScrobbleCount)&nbsp;=&lt;&lt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;getTopListeners&nbsp;srvc&nbsp;(songId&nbsp;$&nbsp;scrobbledSong&nbsp;scrobble)) &nbsp;&nbsp;&nbsp;&nbsp;scrobblesSnapshot &nbsp;&nbsp;<span style="color:green;">--&nbsp;Pure </span>&nbsp;&nbsp;<span style="color:blue;">return</span>&nbsp;$&nbsp;<span style="color:blue;">take</span>&nbsp;200&nbsp;$&nbsp;sortOn&nbsp;(Down&nbsp;.&nbsp;songRating)&nbsp;recommendations</pre> </p> <p> Granted, it's not the most readable way to present the algorithm, but it <em>is</em>, after all, only an intermediate step. As usual, I'll remind the reader that Haskell code should, by default, be read from right to left. When split over multiple lines, this also means that an expression should be read from the bottom to the top. Armed with that knowledge (and general knowledge of Haskell), combined with some helpful indentation, it's not altogether unreadable, but not something I'd like to come back to after half a year. And definitely not something I would foist upon (hypothetical) colleagues. </p> <p> The careful reader may notice that I've decided to use the reverse <em>bind</em> operator <code>=&lt;&lt;</code>, rather than the standard <code>&gt;&gt;=</code> operator. I usually do that with Haskell, because most of Haskell is composed from right to left, and <code>=&lt;&lt;</code> is consistent with that direction. The standard <code>&gt;&gt;=</code> operator, on the other hand, composes monadic actions from left to right. You could argue that that's more natural (to Western audiences), but since everything else stays right-to-left biased, using <code>&gt;&gt;=</code> confuses the reading direction. </p> <p> As a Westerner, I prefer left-to-right reading order, but in general I've found it hard to fight Haskell's bias in the other direction. </p> <p> As the <code>-- Pure</code> and <code>-- Impure</code> comments indicate, interleaving the pure functions with impure actions makes the entire expression impure. The more I do that, the less pure code remains. </p> <h3 id="faf4d26cfb2b42f89449be601e91e582"> Single expression <a href="#faf4d26cfb2b42f89449be601e91e582">#</a> </h3> <p> Going from from the above snapshot to a single impure expression doesn't require many more steps. </p> <p> <pre>getRecommendations&nbsp;srvc&nbsp;un&nbsp;= &nbsp;&nbsp;<span style="color:green;">--&nbsp;1.&nbsp;Get&nbsp;user&#39;s&nbsp;own&nbsp;top&nbsp;scrobbles </span>&nbsp;&nbsp;<span style="color:green;">--&nbsp;2.&nbsp;Get&nbsp;other&nbsp;users&nbsp;who&nbsp;listened&nbsp;to&nbsp;the&nbsp;same&nbsp;songs </span>&nbsp;&nbsp;<span style="color:green;">--&nbsp;3.&nbsp;Get&nbsp;top&nbsp;scrobbles&nbsp;of&nbsp;those&nbsp;users </span>&nbsp;&nbsp;<span style="color:green;">--&nbsp;4.&nbsp;Aggregate&nbsp;the&nbsp;songs&nbsp;into&nbsp;recommendations </span> &nbsp;&nbsp;<span style="color:blue;">take</span>&nbsp;200&nbsp;.&nbsp;sortOn&nbsp;(Down&nbsp;.&nbsp;songRating)&nbsp;&lt;$&gt; &nbsp;&nbsp;((\scrobbles&nbsp;-&gt; &nbsp;&nbsp;&nbsp;&nbsp;join&nbsp;&lt;$&gt; &nbsp;&nbsp;&nbsp;&nbsp;traverse&nbsp;(\scrobble&nbsp;-&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">fmap</span>&nbsp;join&nbsp;$ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;traverse&nbsp;(\otherListener&nbsp;-&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">fmap</span>&nbsp;scrobbledSong&nbsp;. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">take</span>&nbsp;10&nbsp;. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sortOn&nbsp;(Down&nbsp;.&nbsp;songRating&nbsp;.&nbsp;scrobbledSong)&nbsp;. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">filter</span>&nbsp;(songHasVerifiedArtist&nbsp;.&nbsp;scrobbledSong)&nbsp;&lt;$&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;getTopScrobbles&nbsp;srvc&nbsp;(userName&nbsp;otherListener))&nbsp;. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">take</span>&nbsp;20&nbsp;&lt;$&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sortOn&nbsp;(Down&nbsp;.&nbsp;userScrobbleCount)&nbsp;. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">filter</span>&nbsp;((10_000&nbsp;&lt;=)&nbsp;.&nbsp;userScrobbleCount)&nbsp;=&lt;&lt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;getTopListeners&nbsp;srvc&nbsp;(songId&nbsp;$&nbsp;scrobbledSong&nbsp;scrobble)) &nbsp;&nbsp;&nbsp;&nbsp;(<span style="color:blue;">take</span>&nbsp;100&nbsp;$&nbsp;sortOn&nbsp;(Down&nbsp;.&nbsp;scrobbleCount)&nbsp;scrobbles))&nbsp;=&lt;&lt; &nbsp;&nbsp;getTopScrobbles&nbsp;srvc&nbsp;un)</pre> </p> <p> Neither did it improve readability. </p> <h3 id="91851237845f49a8976aeaaa5a24ebf6"> Helper functions <a href="#91851237845f49a8976aeaaa5a24ebf6">#</a> </h3> <p> As in previous incarnations of this exercise, it helps if you extract some well-named helper functions, like this one: </p> <p> <pre><span style="color:#2b91af;">getUsersOwnTopScrobbles</span>&nbsp;<span style="color:blue;">::</span>&nbsp;[<span style="color:blue;">Scrobble</span>]&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;[<span style="color:blue;">Scrobble</span>] getUsersOwnTopScrobbles&nbsp;=&nbsp;<span style="color:blue;">take</span>&nbsp;100&nbsp;.&nbsp;sortOn&nbsp;(Down&nbsp;.&nbsp;scrobbleCount)</pre> </p> <p> As a one-liner, that one perhaps isn't that impressive, but none of them are particularly complicated. The biggest function is this: </p> <p> <pre><span style="color:#2b91af;">getTopScrobblesOfOtherUsers</span>&nbsp;<span style="color:blue;">::</span>&nbsp;[<span style="color:blue;">Scrobble</span>]&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;[<span style="color:blue;">Song</span>] getTopScrobblesOfOtherUsers&nbsp;= &nbsp;&nbsp;<span style="color:blue;">fmap</span>&nbsp;scrobbledSong&nbsp;. &nbsp;&nbsp;<span style="color:blue;">take</span>&nbsp;10&nbsp;. &nbsp;&nbsp;sortOn&nbsp;(Down&nbsp;.&nbsp;songRating&nbsp;.&nbsp;scrobbledSong)&nbsp;. &nbsp;&nbsp;<span style="color:blue;">filter</span>&nbsp;(songHasVerifiedArtist&nbsp;.&nbsp;scrobbledSong)</pre> </p> <p> You can see the rest in the Git repository. None of them are exported by the module, which makes them implementation details that you may decide to change or remove at a later date. </p> <p> You can now compose the overall action. </p> <p> <pre>getRecommendations&nbsp;srvc&nbsp;un&nbsp;= &nbsp;&nbsp;<span style="color:green;">--&nbsp;1.&nbsp;Get&nbsp;user&#39;s&nbsp;own&nbsp;top&nbsp;scrobbles </span>&nbsp;&nbsp;<span style="color:green;">--&nbsp;2.&nbsp;Get&nbsp;other&nbsp;users&nbsp;who&nbsp;listened&nbsp;to&nbsp;the&nbsp;same&nbsp;songs </span>&nbsp;&nbsp;<span style="color:green;">--&nbsp;3.&nbsp;Get&nbsp;top&nbsp;scrobbles&nbsp;of&nbsp;those&nbsp;users </span>&nbsp;&nbsp;<span style="color:green;">--&nbsp;4.&nbsp;Aggregate&nbsp;the&nbsp;songs&nbsp;into&nbsp;recommendations </span> &nbsp;&nbsp;(aggregateTheSongsIntoRecommendations&nbsp;.&nbsp;getTopScrobblesOfOtherUsers)&nbsp;.&nbsp;join&nbsp;&lt;$&gt; &nbsp;&nbsp;((traverse&nbsp;(getTopScrobbles&nbsp;srvc&nbsp;.&nbsp;userName)&nbsp;. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;getOtherUsersWhoListenedToTheSameSongs)&nbsp;.&nbsp;join&nbsp;=&lt;&lt; &nbsp;&nbsp;&nbsp;&nbsp;(traverse&nbsp;(getTopListeners&nbsp;srvc&nbsp;.&nbsp;(songId&nbsp;.&nbsp;scrobbledSong))&nbsp;. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;getUsersOwnTopScrobbles&nbsp;=&lt;&lt;&nbsp;getTopScrobbles&nbsp;srvc&nbsp;un))</pre> </p> <p> Some of the parentheses break over multiple lines in a non-conventional way. This is my best effort to format the code in a way that emphasises the four steps comprising the algorithm, while still staying within the bounds of the language, and keeping <a href="https://hackage.haskell.org/package/hlint">hlint</a> silent. </p> <p> I could try to argue that if you squint a bit, the operators and other glue like <code>join</code> should <a href="/2018/07/02/terse-operators-make-business-code-more-readable">fade into the background</a>, but in this case, I don't even buy that argument myself. </p> <p> It bothers me that it's so hard to compose the code in a way that approaches being self-documenting. I find that the F# composition in the <a href="/2025/06/23/song-recommendations-from-f-combinators">previous article</a> does a better job of that. </p> <h3 id="f36d4eb691fe4434b89595c5d313f147"> Syntactic sugar <a href="#f36d4eb691fe4434b89595c5d313f147">#</a> </h3> <p> The stated goal in this article is to demonstrate how it's possible to use standard combinators to glue the algorithm together. I've been complaining throughout this article that, while possible, it leaves the code less readable than desired. </p> <p> That one reader who actually knows Haskell is likely frustrated with me. After all, the language does offer a way out. Using the syntactic sugar of <code>do</code> notation, you can instead write the composition like this: </p> <p> <pre>getRecommendations&nbsp;srvc&nbsp;un&nbsp;=&nbsp;<span style="color:blue;">do</span> &nbsp;&nbsp;<span style="color:green;">--&nbsp;1.&nbsp;Get&nbsp;user&#39;s&nbsp;own&nbsp;top&nbsp;scrobbles </span>&nbsp;&nbsp;<span style="color:green;">--&nbsp;2.&nbsp;Get&nbsp;other&nbsp;users&nbsp;who&nbsp;listened&nbsp;to&nbsp;the&nbsp;same&nbsp;songs </span>&nbsp;&nbsp;<span style="color:green;">--&nbsp;3.&nbsp;Get&nbsp;top&nbsp;scrobbles&nbsp;of&nbsp;those&nbsp;users </span>&nbsp;&nbsp;<span style="color:green;">--&nbsp;4.&nbsp;Aggregate&nbsp;the&nbsp;songs&nbsp;into&nbsp;recommendations </span> &nbsp;&nbsp;userTops&nbsp;&lt;-&nbsp;getTopScrobbles&nbsp;srvc&nbsp;un&nbsp;&lt;&amp;&gt;&nbsp;getUsersOwnTopScrobbles &nbsp;&nbsp;otherListeners&nbsp;&lt;-&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;traverse&nbsp;(getTopListeners&nbsp;srvc&nbsp;.&nbsp;(songId&nbsp;.&nbsp;scrobbledSong))&nbsp;userTops&nbsp;&lt;&amp;&gt; &nbsp;&nbsp;&nbsp;&nbsp;getOtherUsersWhoListenedToTheSameSongs&nbsp;.&nbsp;join &nbsp;&nbsp;songs&nbsp;&lt;- &nbsp;&nbsp;&nbsp;&nbsp;traverse&nbsp;(getTopScrobbles&nbsp;srvc&nbsp;.&nbsp;userName)&nbsp;otherListeners&nbsp;&lt;&amp;&gt; &nbsp;&nbsp;&nbsp;&nbsp;getTopScrobblesOfOtherUsers&nbsp;.&nbsp;join &nbsp;&nbsp;<span style="color:blue;">return</span>&nbsp;$&nbsp;aggregateTheSongsIntoRecommendations&nbsp;songs</pre> </p> <p> By splitting the process up into steps with named variables, you can achieve the much-yearned-for top-to-bottom reading order. Taking advantage of the <code>&lt;&amp;&gt;</code> operator from <a href="https://hackage.haskell.org/package/base/docs/Data-Functor.html">Data.Functor</a> we also get left-to-right reading order on each line. </p> <p> That's the best I've been able to achieve under the constraint that the <code>IO</code>-bound operations stay interleaved with pure functions. </p> <h3 id="49db6a22b7924c9cb37813b51f1334e6"> Conclusion <a href="#49db6a22b7924c9cb37813b51f1334e6">#</a> </h3> <p> Mixing pure functions with impure actions like this is necessary when composing whole programs (usually at the entry point; i.e. <code>main</code>), but shouldn't be considered good functional-programming style in general. The entire <code>getRecommendations</code> action is impure, being non-deterministic. </p> <p> Still, even Haskell code eventually needs to compose code in this way. Therefore, it's relevant covering how this may be done. Even so, alternative architectures exist. </p> <p> <strong>Next:</strong> Song recommendations with pipes and filters. </p> </div><hr> This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>. Mark Seemann https://blog.ploeh.dk/2025/06/30/song-recommendations-from-haskell-combinators Song recommendations from F# combinators https://blog.ploeh.dk/2025/06/23/song-recommendations-from-f-combinators/ Mon, 23 Jun 2025 05:49:00 UTC <div id="post"> <p> <em>Traversing sequences of tasks. A refactoring.</em> </p> <p> This article is part of a series named <a href="/2025/04/07/alternative-ways-to-design-with-functional-programming">Alternative ways to design with functional programming</a>. In the <a href="/2025/06/16/song-recommendations-from-c-combinators">previous article</a>, you saw how to refactor the example code base to a composition of standard combinators. It's a pragmatic solution to the problem of dealing with lots of data in a piecemeal fashion, but although it uses concepts and programming constructs from functional programming, I don't consider it a proper <a href="/2018/11/19/functional-architecture-a-definition">functional architecture</a>. </p> <p> Porting the C# code to <a href="https://fsharp.org/">F#</a> doesn't change that part, but most F# developers will probably agree that this style of programming is more <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> in F# than in C#. </p> <p> Please consult the previous articles for context about the example code base. The code shown in this article is from the <em>fsharp-combinators</em> Git branch. It refactors the code shown in the article <a href="/2025/04/14/porting-song-recommendations-to-f">Porting song recommendations to F#</a>. </p> <p> The goal is to extract <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a> from the overall recommendations algorithm and compose them using standard combinators, such as <code>bind</code>, <code>map</code>, and <code>traverse</code>. </p> <h3 id="a83f974b9e474c4a83845e284fb361cc"> Composition from combinators <a href="#a83f974b9e474c4a83845e284fb361cc">#</a> </h3> <p> Let's start with the completed composition, and subsequently look at the most interesting parts. </p> <p> <pre><span style="color:blue;">type</span>&nbsp;<span style="color:#2b91af;">RecommendationsProvider</span>&nbsp;(<span style="font-weight:bold;color:#1f377f;">songService</span>&nbsp;:&nbsp;<span style="color:#2b91af;">SongService</span>)&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">member</span>&nbsp;_.GetRecommendationsAsync&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;1.&nbsp;Get&nbsp;user&#39;s&nbsp;own&nbsp;top&nbsp;scrobbles</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;2.&nbsp;Get&nbsp;other&nbsp;users&nbsp;who&nbsp;listened&nbsp;to&nbsp;the&nbsp;same&nbsp;songs</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;3.&nbsp;Get&nbsp;top&nbsp;scrobbles&nbsp;of&nbsp;those&nbsp;users</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;4.&nbsp;Aggregate&nbsp;the&nbsp;songs&nbsp;into&nbsp;recommendations</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">songService</span>.<span style="font-weight:bold;color:#74531f;">GetTopScrobblesAsync</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&gt;&gt;&nbsp;<span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">bind</span>&nbsp;( &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#74531f;">getOwnTopScrobbles</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&gt;&gt;&nbsp;<span style="color:#2b91af;">TaskSeq</span>.<span style="color:#74531f;">traverse</span>&nbsp;( &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;_.Song.Id &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&gt;&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">songService</span>.<span style="font-weight:bold;color:#74531f;">GetTopListenersAsync</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&gt;&gt;&nbsp;<span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">bind</span>&nbsp;( &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#74531f;">getTopScrobbles</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&gt;&gt;&nbsp;<span style="color:#2b91af;">TaskSeq</span>.<span style="color:#74531f;">traverse</span>&nbsp;( &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;_.UserName &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&gt;&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">songService</span>.<span style="font-weight:bold;color:#74531f;">GetTopScrobblesAsync</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&gt;&gt;&nbsp;<span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">map</span>&nbsp;<span style="color:#74531f;">aggregateRecommendations</span>))) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&gt;&gt;&nbsp;<span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">map</span>&nbsp;(<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">flatten</span>&nbsp;&gt;&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">flatten</span>&nbsp;&gt;&gt;&nbsp;<span style="color:#74531f;">takeTopRecommendations</span>))</pre> </p> <p> This is a single expression with nested subexpressions, and you may notice that it's completely <a href="https://en.wikipedia.org/wiki/Tacit_programming">point-free</a>. This may be a little hardcore even for most F# programmers, since F# idiomatically favours explicit lambda expressions and the pipeline operator <code>|&gt;</code>. </p> <p> Although I'm personally fascinated by point-free programming, I might consider a more <code>fun</code> alternative if working in a team. You can see such a variation in the Git repository in an intermediary commit. The reason that I've pulled so heavily in this direction here is that it more clearly demonstrate why we call such functions <em>combinators</em>: They provide the glue that enable us to compose functions together. </p> <p> If you're wondering, <code>&gt;&gt;</code> is also a combinator. In <a href="https://www.haskell.org/">Haskell</a> it's more common with 'unpronounceable' operators such as <code>&gt;&gt;=</code>, <code>.</code>, <code>&amp;</code>, etc., and I'd argue that such <a href="/2018/07/02/terse-operators-make-business-code-more-readable">terse operators can make code more readable</a>. </p> <p> The functions <code>getOwnTopScrobbles</code>, <code>getTopScrobbles</code>, <code>aggregateRecommendations</code>, and <code>takeTopRecommendations</code> are helper functions. Here's one of them: </p> <p> <pre><span style="color:blue;">let</span>&nbsp;<span style="color:blue;">private</span>&nbsp;<span style="color:#74531f;">getOwnTopScrobbles</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">scrobbles</span>&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">scrobbles</span>&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">sortByDescending</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>.ScrobbleCount)&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">truncate</span>&nbsp;100</pre> </p> <p> The other helpers are also simple, single-expression functions like this one. </p> <p> As <a href="https://tyrrrz.me/blog/pure-impure-segregation-principle">Oleksii Holub implies</a>, you could make each of these small functions public if you wished to test them individually. </p> <p> Let's now look at the various building blocks that enable this composition. </p> <h3 id="596f7212bd144ebaa674cbc4e019d892"> Combinators <a href="#596f7212bd144ebaa674cbc4e019d892">#</a> </h3> <p> The F# base library comes with more standard combinators than are generally available for C#, not only for lists, but also <code>Option</code> and <code>Result</code> values. On the other hand, when it comes to <a href="/2022/06/06/asynchronous-monads">asynchronous monads</a>, the F# base library offers <code>task</code> and <code>async</code> <a href="https://learn.microsoft.com/dotnet/fsharp/language-reference/computation-expressions">computation expressions</a>, but no <code>Task</code> module. You'll need to add <code>Task.bind</code> and <code>Task.map</code> yourself, or import a library that exports those combinators. The article <a href="/2022/06/06/asynchronous-monads">Asynchronous monads</a> shows the implementation used here. </p> <p> The <a href="/2024/11/11/traversals">traverse</a> implementation shouldn't be too surprising, either, but here I implemented it directly, instead of via <code>sequence</code>. </p> <p> <pre><span style="color:blue;">let</span>&nbsp;<span style="color:#74531f;">traverse</span>&nbsp;<span style="color:#74531f;">f</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">xs</span>&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;<span style="color:#74531f;">go</span>&nbsp;<span style="color:#1f377f;">acc</span>&nbsp;<span style="color:#1f377f;">x</span>&nbsp;=&nbsp;<span style="color:blue;">task</span>&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let!</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">x&#39;</span>&nbsp;=&nbsp;<span style="color:#1f377f;">x</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let!</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">acc&#39;</span>&nbsp;=&nbsp;<span style="color:#1f377f;">acc</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">return</span>&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">append</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">acc&#39;</span>&nbsp;[<span style="font-weight:bold;color:#1f377f;">x&#39;</span>]&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">xs</span>&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">map</span>&nbsp;<span style="color:#74531f;">f</span>&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">fold</span>&nbsp;<span style="color:#74531f;">go</span>&nbsp;(<span style="color:blue;">task</span>&nbsp;{&nbsp;<span style="color:blue;">return</span>&nbsp;[]&nbsp;})</pre> </p> <p> Finally, the <code>flatten</code> function is the <a href="/2022/03/28/monads">standard implementation</a> that goes via monadic <em>bind</em>. In F#'s <code>Seq</code> module, <em>bind</em> is called <code>collect</code>. </p> <p> <pre><span style="color:blue;">let</span>&nbsp;<span style="color:#74531f;">flatten</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">xs</span>&nbsp;=&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">collect</span>&nbsp;<span style="color:#74531f;">id</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">xs</span></pre> </p> <p> That all there is to it. </p> <h3 id="2ecbad404afd47a1a4b12e0e68913543"> Conclusion <a href="#2ecbad404afd47a1a4b12e0e68913543">#</a> </h3> <p> In this article, you saw how to port the C# code from the <a href="/2025/06/16/song-recommendations-from-c-combinators">previous article</a> to F#. Since this style of programming is more idiomatic in F#, more building blocks and language features are available, and hence this kind of refactoring is better suited to F#. </p> <p> I still don't consider this proper <a href="/2018/11/19/functional-architecture-a-definition">functional architecture</a>, but it's pragmatic and I could see myself writing code like this in a professional setting. </p> <p> <strong>Next:</strong> <a href="/2025/06/30/song-recommendations-from-haskell-combinators">Song recommendations from Haskell combinators</a>. </p> </div><hr> This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>. Mark Seemann https://blog.ploeh.dk/2025/06/23/song-recommendations-from-f-combinators Song recommendations from C# combinators https://blog.ploeh.dk/2025/06/16/song-recommendations-from-c-combinators/ Mon, 16 Jun 2025 07:41:00 UTC <div id="post"> <p> <em>LINQ-style composition, including SelectMany and Traverse.</em> </p> <p> This article is part of a larger series titled <a href="/2025/04/07/alternative-ways-to-design-with-functional-programming">Alternative ways to design with functional programming</a>. In the <a href="/2025/06/09/song-recommendations-from-combinators">previous article</a>, I described, in general terms, a pragmatic small-scale architecture that may <em>look</em> functional, although it really isn't. </p> <p> Please consult the previous articles for context about the example code base. The code shown in this article is from the <em>combinators</em> Git branch. </p> <p> The goal is to extract <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a> from the overall recommendations algorithm and compose them using standard combinators, such as <code>SelectMany</code> (<a href="/2022/03/28/monads">monadic <em>bind</em></a>), <code>Select</code>, and <code>Traverse</code>. </p> <h3 id="a667e64d78a346eba94571361ec22fc4"> Composition from combinators <a href="#a667e64d78a346eba94571361ec22fc4">#</a> </h3> <p> Let's start with the completed composition, and subsequently look at the most interesting parts. </p> <p> <pre><span style="color:blue;">public</span>&nbsp;<span style="color:#2b91af;">Task</span>&lt;<span style="color:#2b91af;">IReadOnlyList</span>&lt;<span style="color:#2b91af;">Song</span>&gt;&gt;&nbsp;<span style="font-weight:bold;color:#74531f;">GetRecommendationsAsync</span>(<span style="color:blue;">string</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">userName</span>) { &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;1.&nbsp;Get&nbsp;user&#39;s&nbsp;own&nbsp;top&nbsp;scrobbles</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;2.&nbsp;Get&nbsp;other&nbsp;users&nbsp;who&nbsp;listened&nbsp;to&nbsp;the&nbsp;same&nbsp;songs</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;3.&nbsp;Get&nbsp;top&nbsp;scrobbles&nbsp;of&nbsp;those&nbsp;users</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;4.&nbsp;Aggregate&nbsp;the&nbsp;songs&nbsp;into&nbsp;recommendations</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">return</span>&nbsp;_songService.<span style="font-weight:bold;color:#74531f;">GetTopScrobblesAsync</span>(<span style="font-weight:bold;color:#1f377f;">userName</span>) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">SelectMany</span>(<span style="font-weight:bold;color:#1f377f;">scrobbles</span>&nbsp;=&gt;&nbsp;<span style="color:#74531f;">UserTopScrobbles</span>(<span style="font-weight:bold;color:#1f377f;">scrobbles</span>) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">Traverse</span>(<span style="font-weight:bold;color:#1f377f;">scrobble</span>&nbsp;=&gt;&nbsp;_songService &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">GetTopListenersAsync</span>(<span style="font-weight:bold;color:#1f377f;">scrobble</span>.Song.Id) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="color:#74531f;">TopListeners</span>) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">SelectMany</span>(<span style="font-weight:bold;color:#1f377f;">users</span>&nbsp;=&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">users</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">Traverse</span>(<span style="font-weight:bold;color:#1f377f;">user</span>&nbsp;=&gt;&nbsp;_songService &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">GetTopScrobblesAsync</span>(<span style="font-weight:bold;color:#1f377f;">user</span>.UserName) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="color:#74531f;">TopScrobbles</span>)) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="color:#74531f;">Songs</span>))) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="color:#74531f;">TakeTopRecommendations</span>)); }</pre> </p> <p> This is a single expression with nested subexpressions. </p> <p> The functions <code>UserTopScrobbles</code>, <code>TopListeners</code>, <code>TopScrobbles</code>, <code>Songs</code>, and <code>TakeTopRecommendations</code> are <code>private</code> helper functions. Here's one of them: </p> <p> <pre><span style="color:blue;">private</span>&nbsp;<span style="color:blue;">static</span>&nbsp;<span style="color:#2b91af;">IEnumerable</span>&lt;<span style="color:#2b91af;">Scrobble</span>&gt;&nbsp;<span style="color:#74531f;">UserTopScrobbles</span>(<span style="color:#2b91af;">IEnumerable</span>&lt;<span style="color:#2b91af;">Scrobble</span>&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">scrobbles</span>) { &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">return</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">scrobbles</span>.<span style="font-weight:bold;color:#74531f;">OrderByDescending</span>(<span style="font-weight:bold;color:#1f377f;">scrobble</span>&nbsp;=&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">scrobble</span>.ScrobbleCount).<span style="font-weight:bold;color:#74531f;">Take</span>(100); }</pre> </p> <p> The other helpers are also simple, single-expression functions like this one. </p> <p> As <a href="https://tyrrrz.me/blog/pure-impure-segregation-principle">Oleksii Holub implies</a>, you could make each of these small functions <code>public</code> if you wished to test them individually. </p> <p> Let's now look at the various building blocks that enable this composition. </p> <h3 id="2d6d18db84d84ada8084fd9a035918d6"> Asynchronous monad <a href="#2d6d18db84d84ada8084fd9a035918d6">#</a> </h3> <p> C# (or .NET) in general only comes with standard combinators for <a href="https://learn.microsoft.com/dotnet/api/system.collections.generic.ienumerable-1">IEnumerable&lt;T&gt;</a>, so whenever you need them for other <a href="/2022/03/28/monads">monads</a>, you have to define them yourself (or pull in a reusable library that defines them). For the above composition, you'll need <code>SelectMany</code> and <code>Select</code> for <code>Task</code> computations. You can see implementations in the article <a href="/2022/06/06/asynchronous-monads">Asynchronous monads</a>, so I'll not repeat the code here. </p> <p> One exception is this extension method, which is a variant monadic <em>return</em>, which I'm not sure if I've published before: </p> <p> <pre><span style="color:blue;">internal</span>&nbsp;<span style="color:blue;">static</span>&nbsp;<span style="color:#2b91af;">Task</span>&lt;<span style="color:#2b91af;">T</span>&gt;&nbsp;<span style="color:#74531f;">AsTask</span>&lt;<span style="color:#2b91af;">T</span>&gt;(<span style="color:blue;">this</span>&nbsp;<span style="color:#2b91af;">T</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">source</span>) { &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">return</span>&nbsp;<span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">FromResult</span>(<span style="font-weight:bold;color:#1f377f;">source</span>); }</pre> </p> <p> Nothing much is going on here, since it's just a wrapper of <a href="https://learn.microsoft.com/dotnet/api/system.threading.tasks.task.fromresult">Task.FromResult</a>. The <code>this</code> keyword, however, makes <code>AsTask</code> an extension method, which makes usage marginally prettier. It's not used in the above composition, but, as you'll see below, in the implementation of <code>Traverse</code>. </p> <h3 id="f71260c45e2b45df8a778db7182f5519"> Traversal <a href="#f71260c45e2b45df8a778db7182f5519">#</a> </h3> <p> The <a href="/2024/11/11/traversals">traversal</a> could be implemented from a hypothetical <code>Sequence</code> action, but you can also implement it directly, which is what I chose to do here. </p> <p> <pre><span style="color:blue;">internal</span>&nbsp;<span style="color:blue;">static</span>&nbsp;<span style="color:#2b91af;">Task</span>&lt;<span style="color:#2b91af;">IEnumerable</span>&lt;<span style="color:#2b91af;">TResult</span>&gt;&gt;&nbsp;<span style="color:#74531f;">Traverse</span>&lt;<span style="color:#2b91af;">T</span>,&nbsp;<span style="color:#2b91af;">TResult</span>&gt;( &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">this</span>&nbsp;<span style="color:#2b91af;">IEnumerable</span>&lt;<span style="color:#2b91af;">T</span>&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">source</span>, &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">Func</span>&lt;<span style="color:#2b91af;">T</span>,&nbsp;<span style="color:#2b91af;">Task</span>&lt;<span style="color:#2b91af;">TResult</span>&gt;&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">selector</span>) { &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">return</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">source</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">Aggregate</span>( &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">Enumerable</span>.<span style="color:#74531f;">Empty</span>&lt;<span style="color:#2b91af;">TResult</span>&gt;().<span style="font-weight:bold;color:#74531f;">AsTask</span>(), &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">async</span>&nbsp;(<span style="font-weight:bold;color:#1f377f;">acc</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">x</span>)&nbsp;=&gt;&nbsp;(<span style="color:blue;">await</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">acc</span>).<span style="font-weight:bold;color:#74531f;">Append</span>(<span style="color:blue;">await</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">x</span>)); }</pre> </p> <p> Mapping <code>selector</code> over <code>source</code> produces a sequence of tasks. The <code>Aggregate</code> expression subsequently inverts the <a href="https://bartoszmilewski.com/2014/01/14/functors-are-containers/">containers</a> to a single task that contains a sequence of result values. </p> <p> That's really all there is to it. </p> <h3 id="b717fff9dd2b4bdb826dc62a3d3a9050"> Conclusion <a href="#b717fff9dd2b4bdb826dc62a3d3a9050">#</a> </h3> <p> In the <a href="/2025/06/09/song-recommendations-from-combinators">previous article</a>, I made no secret of my position on this refactoring. For the example at hand, the benefit is at best marginal. The purpose of this article isn't to insist that you must write code like this. Rather, it's a demonstration of what's possible. </p> <p> If you have a problem which is similar, but more complicated, refactoring to standard combinators <em>may</em> be a good idea. After all, a standard combinator like <code>SelectMany</code>, <code>Traverse</code>, etc. is well-understood and lawful. You should expect combinators to be defect-free, so using them instead of ad-hoc code constructs like nested loops with conditionals could help eliminate some trivial bugs. </p> <p> Additionally, if you're working with a team comfortable with these few abstractions, code assembled from standard combinators may actually turn out to be <em>more</em> readable that code buried in ad-hoc imperative <a href="https://en.wikipedia.org/wiki/Control_flow">control flow</a>. And if not everyone on the team is on board with this style, perhaps it's <a href="/2015/08/03/idiomatic-or-idiosyncratic">an opportunity to push the envelope a bit</a>. </p> <p> Of course, if you use a language where such constructs are already idiomatic, colleagues should already be used to this style of programming. </p> <p> <strong>Next:</strong> <a href="/2025/06/23/song-recommendations-from-f-combinators">Song recommendations from F# combinators</a>. </p> </div><hr> This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>. Mark Seemann https://blog.ploeh.dk/2025/06/16/song-recommendations-from-c-combinators Song recommendations from combinators https://blog.ploeh.dk/2025/06/09/song-recommendations-from-combinators/ Mon, 09 Jun 2025 14:02:00 UTC <div id="post"> <p> <em>Interleaving impure actions with pure functions. Not really functional programming.</em> </p> <p> This article is part of a larger article series about <a href="/2025/04/07/alternative-ways-to-design-with-functional-programming">alternative ways to design with functional programming</a>, particularly when faced with massive data loads. In the previous few articles, you saw <a href="/2018/11/19/functional-architecture-a-definition">functional architecture</a> at its apparent limit. With sufficiently large data sizes, the <a href="/2020/03/02/impureim-sandwich">Impureim Sandwich</a> pattern starts to buckle. That's really not an indictment of that pattern; only an observation that no design pattern applies universally. </p> <p> In this and the next few articles, we'll instead look at a more pragmatic option. In this article I'll discuss the general idea, and follow up in other articles with examples in three different languages. </p> <p> In this overall article series, I'm using <a href="https://tyrrrz.me/">Oleksii Holub</a>'s inspiring article <a href="https://tyrrrz.me/blog/pure-impure-segregation-principle">Pure-Impure Segregation Principle</a> as an outset for the code example. Previous articles in this article series have already covered the basics, but the gist of it is a song recommendation service that uses past play information ('scrobbles') to suggest new songs to a user. </p> <h3 id="ae576d7ff83c4cf8b5ce96b861d3cad0"> Separating pure functions from impure composition <a href="#ae576d7ff83c4cf8b5ce96b861d3cad0">#</a> </h3> <p> In the original article, Oleksii Holub suggests a way to separate <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a> from impure actions: We may extract as much pure code from the overall algorithm as possible, but we're still left with pure functions and impure actions mixed together. </p> <p> Here's my reproduction of that suggestion, with trivial modifications: </p> <p> <pre><span style="color:green;">//&nbsp;Pure</span> <span style="color:blue;">public</span>&nbsp;<span style="color:blue;">static</span>&nbsp;<span style="color:#2b91af;">IReadOnlyList</span>&lt;<span style="color:blue;">int</span>&gt;&nbsp;<span style="color:#74531f;">HandleOwnScrobbles</span>(<span style="color:#2b91af;">IReadOnlyCollection</span>&lt;<span style="color:#2b91af;">Scrobble</span>&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">scrobbles</span>)&nbsp;=&gt; &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">scrobbles</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">OrderByDescending</span>(<span style="font-weight:bold;color:#1f377f;">s</span>&nbsp;=&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>.ScrobbleCount) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">Take</span>(100) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">s</span>&nbsp;=&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>.Song.Id) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">ToArray</span>(); <span style="color:green;">//&nbsp;Pure</span> <span style="color:blue;">public</span>&nbsp;<span style="color:blue;">static</span>&nbsp;<span style="color:#2b91af;">IReadOnlyList</span>&lt;<span style="color:blue;">string</span>&gt;&nbsp;<span style="color:#74531f;">HandleOtherListeners</span>(<span style="color:#2b91af;">IReadOnlyCollection</span>&lt;<span style="color:#2b91af;">User</span>&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">users</span>)&nbsp;=&gt; &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">users</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">Where</span>(<span style="font-weight:bold;color:#1f377f;">u</span>&nbsp;=&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">u</span>.TotalScrobbleCount&nbsp;&gt;=&nbsp;10_000) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">OrderByDescending</span>(<span style="font-weight:bold;color:#1f377f;">u</span>&nbsp;=&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">u</span>.TotalScrobbleCount) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">Take</span>(20) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">u</span>&nbsp;=&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">u</span>.UserName) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">ToArray</span>(); <span style="color:green;">//&nbsp;Pure</span> <span style="color:blue;">public</span>&nbsp;<span style="color:blue;">static</span>&nbsp;<span style="color:#2b91af;">IReadOnlyList</span>&lt;<span style="color:#2b91af;">Song</span>&gt;&nbsp;<span style="color:#74531f;">HandleOtherScrobbles</span>(<span style="color:#2b91af;">IReadOnlyCollection</span>&lt;<span style="color:#2b91af;">Scrobble</span>&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">scrobbles</span>)&nbsp;=&gt; &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">scrobbles</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">Where</span>(<span style="font-weight:bold;color:#1f377f;">s</span>&nbsp;=&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>.Song.IsVerifiedArtist) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">OrderByDescending</span>(<span style="font-weight:bold;color:#1f377f;">s</span>&nbsp;=&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>.Song.Rating) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">Take</span>(10) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="font-weight:bold;color:#1f377f;">s</span>&nbsp;=&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>.Song) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">ToArray</span>(); <span style="color:green;">//&nbsp;Pure</span> <span style="color:blue;">public</span>&nbsp;<span style="color:blue;">static</span>&nbsp;<span style="color:#2b91af;">IReadOnlyList</span>&lt;<span style="color:#2b91af;">Song</span>&gt;&nbsp;<span style="color:#74531f;">FinalizeRecommendations</span>(<span style="color:#2b91af;">IReadOnlyList</span>&lt;<span style="color:#2b91af;">Song</span>&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">songs</span>)&nbsp;=&gt; &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">songs</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">OrderByDescending</span>(<span style="font-weight:bold;color:#1f377f;">s</span>&nbsp;=&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>.Rating) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">Take</span>(200) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">ToArray</span>(); <span style="color:blue;">public</span>&nbsp;<span style="color:blue;">async</span>&nbsp;<span style="color:#2b91af;">Task</span>&lt;<span style="color:#2b91af;">IReadOnlyList</span>&lt;<span style="color:#2b91af;">Song</span>&gt;&gt;&nbsp;<span style="font-weight:bold;color:#74531f;">GetRecommendationsAsync</span>(<span style="color:blue;">string</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">userName</span>) { &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;Impure</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">scrobbles</span>&nbsp;=&nbsp;<span style="color:blue;">await</span>&nbsp;_songService.<span style="font-weight:bold;color:#74531f;">GetTopScrobblesAsync</span>(<span style="font-weight:bold;color:#1f377f;">userName</span>); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;Pure</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">songIds</span>&nbsp;=&nbsp;<span style="color:#74531f;">HandleOwnScrobbles</span>(<span style="font-weight:bold;color:#1f377f;">scrobbles</span>); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">recommendationCandidates</span>&nbsp;=&nbsp;<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">List</span>&lt;<span style="color:#2b91af;">Song</span>&gt;(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">foreach</span>&nbsp;(<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">songId</span>&nbsp;<span style="font-weight:bold;color:#8f08c4;">in</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">songIds</span>) &nbsp;&nbsp;&nbsp;&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;Impure</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">otherListeners</span>&nbsp;=&nbsp;<span style="color:blue;">await</span>&nbsp;_songService &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">GetTopListenersAsync</span>(<span style="font-weight:bold;color:#1f377f;">songId</span>); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;Pure</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">otherUserNames</span>&nbsp;=&nbsp;<span style="color:#74531f;">HandleOtherListeners</span>(<span style="font-weight:bold;color:#1f377f;">otherListeners</span>); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">foreach</span>&nbsp;(<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">otherUserName</span>&nbsp;<span style="font-weight:bold;color:#8f08c4;">in</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">otherUserNames</span>) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;Impure</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">otherScrobbles</span>&nbsp;=&nbsp;<span style="color:blue;">await</span>&nbsp;_songService &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">GetTopScrobblesAsync</span>(<span style="font-weight:bold;color:#1f377f;">otherUserName</span>); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;Pure</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">songsToRecommend</span>&nbsp;=&nbsp;<span style="color:#74531f;">HandleOtherScrobbles</span>(<span style="font-weight:bold;color:#1f377f;">otherScrobbles</span>); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">recommendationCandidates</span>.<span style="font-weight:bold;color:#74531f;">AddRange</span>(<span style="font-weight:bold;color:#1f377f;">songsToRecommend</span>); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;Pure</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">return</span>&nbsp;<span style="color:#74531f;">FinalizeRecommendations</span>(<span style="font-weight:bold;color:#1f377f;">recommendationCandidates</span>); }</pre> </p> <p> As Oleksii Holub writes, </p> <blockquote> <p> "However, instead of having one cohesive element to reason about, we ended up with multiple fragments, each having no meaning or value of their own. While unit testing of individual parts may have become easier, the benefit is very questionable, as it provides no confidence in the correctness of the algorithm as a whole." </p> </blockquote> <p> I agree with that assessment, but still find it warranted to pursue the idea a little further. After all, my goal with this overall article series isn't to be prescriptive, but rather descriptive. By presenting and comparing alternatives, we become aware of more options. This, hopefully, helps us choose the 'right tool for the job'. </p> <h3 id="eba5c36720814300b4a4127359569b46"> Triple-decker sandwich? <a href="#eba5c36720814300b4a4127359569b46">#</a> </h3> <p> If we look closer at this alternative, however, we find that we only need to deal with three impure actions. We might, then, postulate that this is an <a href="/2023/10/09/whats-a-sandwich">expanded sandwich</a> - a triple-decker sandwich, if you will. </p> <p> To be clear, I don't find this a reasonable argument. Even if you accept expanding the sandwich metaphor to add a pure validation step, the number of layers, and the structure of the sandwich would still be known at compile time. You may start at the impure boundary, then phase into pure validation, return to another impure step to gather data, call your 'main' pure function, and finally write the result to some kind of output. To borrow a figure from the <a href="/2023/10/09/whats-a-sandwich">What's a sandwich?</a> article: </p> <p> <img src="/content/binary/pure-impure-pure-impure-box.png" alt="A box with green, red, green, and red horizontal tiers."> </p> <p> On the other hand, this isn't what the above code suggestion does. The problem with the song recommendation algorithm is that the impure actions cascade. While we start with a single impure out-of-process query, we then use the result of that to loop over, and perform <em>n</em> more queries. This, in fact, happens again, nested in the outer loop, so in terms of network calls, we're looking at an <em>O(n<sup>2</sup>)</em> algorithm. </p> <p> We can actually be more precise than that, because the 'outer' queries actually limit their result sets. The first query only considers the top 100 results, so we know that <code>GetTopListenersAsync</code> is going to be called at most 100 times. The result of this call is again limited to the top 20, so that the inner calls to <code>GetTopScrobblesAsync</code> run at most 20 * 100 = 2,000 times. In all, the upper limit is 1 + 100 + 2,000 = 2,101 network calls. (Okay, so really, this is just an <em>O(1)</em> algorithm, although <em>1 ~ 2,101</em>.) </p> <p> Not that that isn't going to take a bit of time. </p> <p> All that said, it's not execution time that concerns me in this context. Assume that the algorithm is already as optimal as possible, and that those 2,101 network calls are necessary. What rather concerns me here is how to organize the code in a way that's as maintainable as possible. As usual, when that's the main concern, I'll remind the reader to consider the example problem as a stand-in for a more complicated problem. Even Oleksii Holub's original code example is only some fifty-odd lines of code, which in itself hardly warrants all the hand-wringing we're currently subjecting it to. </p> <p> Rather, what I'd like to address is the dynamic back-and-forth between pure function and impure action. Each of these thousands of out-of-process calls are non-deterministic. If you're tasked with maintaining or editing this algorithm, your brain will be taxed by all that unpredictable behaviour. Many subtle bugs lurk there. </p> <p> The more we can pull the code towards pure functions the better, because <a href="/2021/07/28/referential-transparency-fits-in-your-head">referential transparency fits in your head</a>. </p> <p> So, to be explicit, I don't consider this kind of composition as an expanded Impureim Sandwich. </p> <h3 id="4903416bdf924a29a6f6043b178da4df"> Standard combinators <a href="#4903416bdf924a29a6f6043b178da4df">#</a> </h3> <p> Is it possible to somehow improve, even just a little, on the above suggestion? Can we somehow make it look a little 'more functional'? </p> <p> We could use some standard combinators, like <a href="/2022/03/28/monads">monadic <em>bind</em></a>, <a href="/2024/11/11/traversals">traversals</a>, and so on. </p> <p> To be honest, for the specific song-recommendations example, the benefit is marginal at best, but doing it would still demonstrate a particular technique. We'd be able to get rid of the local mutation of <code>recommendationCandidates</code>, but that's about it. </p> <p> Even so, refactoring to self-contained expressions makes other refactoring easier. As a counter-example, imagine that you'd like to extract the inner <code>foreach</code> loop in the above code example to a helper method. </p> <p> <pre><span style="color:blue;">private</span>&nbsp;<span style="color:blue;">async</span>&nbsp;<span style="color:#2b91af;">Task</span>&nbsp;<span style="font-weight:bold;color:#74531f;">CollectOtherUserTopScrobbles</span>( &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">List</span>&lt;<span style="color:#2b91af;">Song</span>&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">recommendationCandidates</span>, &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">IReadOnlyList</span>&lt;<span style="color:blue;">string</span>&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">otherUserNames</span>) { &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">foreach</span>&nbsp;(<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">otherUserName</span>&nbsp;<span style="font-weight:bold;color:#8f08c4;">in</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">otherUserNames</span>) &nbsp;&nbsp;&nbsp;&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;Impure</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">otherScrobbles</span>&nbsp;=&nbsp;<span style="color:blue;">await</span>&nbsp;_songService &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">GetTopScrobblesAsync</span>(<span style="font-weight:bold;color:#1f377f;">otherUserName</span>); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;Pure</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">songsToRecommend</span>&nbsp;=&nbsp;<span style="color:#74531f;">HandleOtherScrobbles</span>(<span style="font-weight:bold;color:#1f377f;">otherScrobbles</span>); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">recommendationCandidates</span>.<span style="font-weight:bold;color:#74531f;">AddRange</span>(<span style="font-weight:bold;color:#1f377f;">songsToRecommend</span>); &nbsp;&nbsp;&nbsp;&nbsp;} }</pre> </p> <p> The call site would then look like this: </p> <p> <pre><span style="color:green;">//&nbsp;Pure</span> <span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">otherUserNames</span>&nbsp;=&nbsp;<span style="color:#74531f;">HandleOtherListeners</span>(<span style="font-weight:bold;color:#1f377f;">otherListeners</span>); <span style="color:green;">//&nbsp;Impure</span> <span style="color:blue;">await</span>&nbsp;<span style="font-weight:bold;color:#74531f;">CollectOtherUserTopScrobbles</span>(<span style="font-weight:bold;color:#1f377f;">recommendationCandidates</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">otherUserNames</span>);</pre> </p> <p> In this specific example, such a refactoring isn't too difficult, but it's more complicated than it could be. Because of state mutation, we have to pass the object to be modified, in this case <code>recommendationCandidates</code>, along as a method argument. Here, there's only one, but if you have code where you change the state of two objects, you'd have to pass two extra parameters, and so on. </p> <p> You've most likely worked in a real code base where you have tried to extract a helper method, only to discover that it's so incredibly tangled with the objects that it modifies that you need a long parameter list. What should have been a simplification is in danger of making everything worse. </p> <p> On the other hand, self-contained expressions, even if, as in this case, they're non-deterministic, don't mutate state. In general, this tends to make it easier to extract subexpressions as helper methods, if only because they are less coupled to the rest of the code. They may required inputs as parameters, but at least you don't have to pass around objects to be modified. </p> <p> Thus, the reason I find it worthwhile to include articles about this kind of refactoring is that, since it demonstrates how to refactor to a more expression-based style, you may be able to extrapolate to your own context. And who knows, you may encounter a context where more substantial improvements can be made by moving in this direction. </p> <p> As usual in this article series, you'll see how to apply this technique in three different languages. </p> <ul> <li><a href="/2025/06/16/song-recommendations-from-c-combinators">Song recommendations from C# combinators</a></li> <li>Song recommendations from F# combinators</li> <li>Song recommendations from Haskell combinators</li> </ul> <p> All that said, it's important to underscore that I don't consider this proper <a href="/2018/11/19/functional-architecture-a-definition">functional architecture</a>. Even the Haskell example is too non-deterministic to my tastes. </p> <h3 id="d470d8b3fd2344788be43691aac8a29c"> Conclusion <a href="#d470d8b3fd2344788be43691aac8a29c">#</a> </h3> <p> Perhaps the most pragmatic approach to a problem like the song-recommendations example is to allow the impure actions and pure functions to interleave. I don't mean to insist that functional programming is the only way to make code maintainable. You can organize code according to other principles, and some of them may also leave you with a code base that can serve its mission well, now and in the future. </p> <p> Another factor to take into account is the skill level of the team tasked with maintaining a code base. What are they comfortable with? </p> <p> Not that I think you should settle for status quo. Progress can only be made if you <a href="/2015/08/03/idiomatic-or-idiosyncratic">push the envelop a little</a>, but you can also come up with a code base so alien to your colleagues that they can't work with it at all. </p> <p> I could easily imagine a team where the solution in the next three articles is the only style they'd be able to maintain. </p> <p> <strong>Next:</strong> <a href="/2025/06/16/song-recommendations-from-c-combinators">Song recommendations from C# combinators</a>. </p> </div><hr> This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>. Mark Seemann https://blog.ploeh.dk/2025/06/09/song-recommendations-from-combinators Testing races with a slow Decorator https://blog.ploeh.dk/2025/06/02/testing-races-with-a-slow-decorator/ Mon, 02 Jun 2025 08:03:00 UTC <div id="post"> <p> <em>Delaying database interactions for test purposes.</em> </p> <p> In chapter 12 in <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, I cover a typical <a href="https://en.wikipedia.org/wiki/Race_condition">race condition</a> and how to test for it. The book comes with a pedagogical explanation of the problem, including a diagram in the style of <a href="/ref/ddia">Designing Data-Intensive Applications</a>. In short, the problem occurs when two or more clients are competing for the last remaining seats in a particular time slot. </p> <p> In my two-day workshop based on the book, I also cover this scenario. The goal is to show how to write automated tests for this kind of non-deterministic behaviour. In the book, and in the workshop, my approach is to rely on the <a href="https://en.wikipedia.org/wiki/Law_of_large_numbers">law of large numbers</a>. An automated test attempts to trigger the race condition by trying 'enough' times. A timeout on the test assumes that if the test does not trigger the condition in the allotted time window, then the bug is addressed. </p> <p> At one of my workshops, one participant told me of a more efficient and elegant way to test for this. I wish I could remember exactly at which workshop it was, and who the gentleman was, but alas, it escapes me. </p> <h3 id="05f24d2c08e34c5fbbd7e91dc8667969"> Reproducing the condition <a href="#05f24d2c08e34c5fbbd7e91dc8667969">#</a> </h3> <p> How do you deterministically reproduce non-deterministic behaviour? The default answer is almost tautological. You can't, since it's non-deterministic. </p> <p> The irony, however, is that in the workshop, I deterministically demonstrate the problem. The problem, in short, is that in order to decide whether or not to accept a reservation request, the system first reads data from its database, runs a fairly complex piece of decision logic, and finally writes the reservation to the database - if it decides to accept it, based on what it read. When competing processes vie for the last remaining seats, a race may occur where both (or all) base their decision on the same data, so they all come to the conclusion that they still have enough remaining capacity. Again, refer to the book, and its accompanying code base, for the details. </p> <p> How do I demonstrate this condition in the workshop? I go into the Controller code and insert a temporary, human-scale delay after reading from the database, but before making the decision: </p> <p> <pre><span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">reservations</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;Repository.<span style="font-weight:bold;color:#74531f;">ReadReservations</span>(<span style="font-weight:bold;color:#1f377f;">r</span>.At); <span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">Delay</span>(<span style="color:#2b91af;">TimeSpan</span>.<span style="color:#74531f;">FromSeconds</span>(10)); <span style="font-weight:bold;color:#8f08c4;">if</span>&nbsp;(!MaitreD.<span style="font-weight:bold;color:#74531f;">WillAccept</span>(<span style="color:#2b91af;">DateTime</span>.Now,&nbsp;<span style="font-weight:bold;color:#1f377f;">reservations</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">r</span>)) &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">return</span>&nbsp;<span style="color:#74531f;">NoTables500InternalServerError</span>(); <span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;Repository.<span style="font-weight:bold;color:#74531f;">Create</span>(<span style="font-weight:bold;color:#1f377f;">restaurant</span>.Id,&nbsp;<span style="font-weight:bold;color:#1f377f;">reservation</span>);</pre> </p> <p> Then I open two windows, from which I, within a couple of seconds of each other, try to make competing reservations. When the bug is present, both reservations are accepted, although, according to business rules, only one should be. </p> <p> So that's how to deterministically demonstrate the problem. Just insert a long enough delay. </p> <p> We can't, however, leave such delays in the production code, so I never even considered that this simple technique could be used for automated testing. </p> <h3 id="d908cad4489642babb34a58a37c678c7"> Slowing things down with a Decorator <a href="#d908cad4489642babb34a58a37c678c7">#</a> </h3> <p> That's until my workshop participant told me his trick: Why don't you slow down the database interactions for test-purposes only? At first, I thought he had in mind some nasty compiler pragmas or environment hacks, but no. Why don't you use a <a href="https://en.wikipedia.org/wiki/Decorator_pattern">Decorator</a> to slow things down? </p> <p> Indeed, why not? </p> <p> Fortunately, all database interaction already takes place behind an <code>IReservationsRepository</code> interface. Adding a test-only, delaying Decorator is straightforward. </p> <p> <pre><span style="color:blue;">public</span>&nbsp;<span style="color:blue;">sealed</span>&nbsp;<span style="color:blue;">class</span>&nbsp;<span style="color:#2b91af;">SlowReservationsRepository</span>&nbsp;:&nbsp;<span style="color:#2b91af;">IReservationsRepository</span> { &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">private</span>&nbsp;<span style="color:blue;">readonly</span>&nbsp;<span style="color:#2b91af;">TimeSpan</span>&nbsp;halfDelay; &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">public</span>&nbsp;<span style="color:#2b91af;">SlowReservationsRepository</span>( &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">TimeSpan</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">delay</span>, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">IReservationsRepository</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">inner</span>) &nbsp;&nbsp;&nbsp;&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Delay&nbsp;=&nbsp;<span style="font-weight:bold;color:#1f377f;">delay</span>; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;halfDelay&nbsp;=&nbsp;<span style="font-weight:bold;color:#1f377f;">delay</span>&nbsp;<span style="font-weight:bold;color:#74531f;">/</span>&nbsp;2; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Inner&nbsp;=&nbsp;<span style="font-weight:bold;color:#1f377f;">inner</span>; &nbsp;&nbsp;&nbsp;&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">public</span>&nbsp;<span style="color:#2b91af;">TimeSpan</span>&nbsp;Delay&nbsp;{&nbsp;<span style="color:blue;">get</span>;&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">public</span>&nbsp;<span style="color:#2b91af;">IReservationsRepository</span>&nbsp;Inner&nbsp;{&nbsp;<span style="color:blue;">get</span>;&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">public</span>&nbsp;<span style="color:blue;">async</span>&nbsp;<span style="color:#2b91af;">Task</span>&nbsp;<span style="font-weight:bold;color:#74531f;">Create</span>(<span style="color:blue;">int</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">restaurantId</span>,&nbsp;<span style="color:#2b91af;">Reservation</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">reservation</span>) &nbsp;&nbsp;&nbsp;&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">Delay</span>(halfDelay); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;Inner.<span style="font-weight:bold;color:#74531f;">Create</span>(<span style="font-weight:bold;color:#1f377f;">restaurantId</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">reservation</span>); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">Delay</span>(halfDelay); &nbsp;&nbsp;&nbsp;&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">public</span>&nbsp;<span style="color:blue;">async</span>&nbsp;<span style="color:#2b91af;">Task</span>&nbsp;<span style="font-weight:bold;color:#74531f;">Delete</span>(<span style="color:blue;">int</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">restaurantId</span>,&nbsp;<span style="color:#2b91af;">Guid</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">id</span>) &nbsp;&nbsp;&nbsp;&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">Delay</span>(halfDelay); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;Inner.<span style="font-weight:bold;color:#74531f;">Delete</span>(<span style="font-weight:bold;color:#1f377f;">restaurantId</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">id</span>); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">Delay</span>(halfDelay); &nbsp;&nbsp;&nbsp;&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">public</span>&nbsp;<span style="color:blue;">async</span>&nbsp;<span style="color:#2b91af;">Task</span>&lt;<span style="color:#2b91af;">Reservation</span>?&gt;&nbsp;<span style="font-weight:bold;color:#74531f;">ReadReservation</span>( &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">int</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">restaurantId</span>, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">Guid</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">id</span>) &nbsp;&nbsp;&nbsp;&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">Delay</span>(halfDelay); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">result</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;Inner.<span style="font-weight:bold;color:#74531f;">ReadReservation</span>(<span style="font-weight:bold;color:#1f377f;">restaurantId</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">id</span>); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">Delay</span>(halfDelay); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">return</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">result</span>; &nbsp;&nbsp;&nbsp;&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">public</span>&nbsp;<span style="color:blue;">async</span>&nbsp;<span style="color:#2b91af;">Task</span>&lt;<span style="color:#2b91af;">IReadOnlyCollection</span>&lt;<span style="color:#2b91af;">Reservation</span>&gt;&gt;&nbsp;<span style="font-weight:bold;color:#74531f;">ReadReservations</span>( &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">int</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">restaurantId</span>, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">DateTime</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">min</span>, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">DateTime</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">max</span>) &nbsp;&nbsp;&nbsp;&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">Delay</span>(halfDelay); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">result</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;Inner.<span style="font-weight:bold;color:#74531f;">ReadReservations</span>(<span style="font-weight:bold;color:#1f377f;">restaurantId</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">min</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">max</span>); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">Delay</span>(halfDelay); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">return</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">result</span>; &nbsp;&nbsp;&nbsp;&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">public</span>&nbsp;<span style="color:blue;">async</span>&nbsp;<span style="color:#2b91af;">Task</span>&nbsp;<span style="font-weight:bold;color:#74531f;">Update</span>(<span style="color:blue;">int</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">restaurantId</span>,&nbsp;<span style="color:#2b91af;">Reservation</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">reservation</span>) &nbsp;&nbsp;&nbsp;&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">Delay</span>(halfDelay); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;Inner.<span style="font-weight:bold;color:#74531f;">Update</span>(<span style="font-weight:bold;color:#1f377f;">restaurantId</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">reservation</span>); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">Delay</span>(halfDelay); &nbsp;&nbsp;&nbsp;&nbsp;} }</pre> </p> <p> This one uniformly slows down all operations. I arbitrarily decided to split the <code>Delay</code> in half, in order to apply half of it before each action, and the other half after. Honestly, I didn't mull this over too much; I just thought that if I did it that way, I wouldn't have to speculate whether it would make a difference if the delay happened before or after the action in question. </p> <h3 id="a0e3b21e6def42b781e47a98b19f2294"> Slowing down tests <a href="#a0e3b21e6def42b781e47a98b19f2294">#</a> </h3> <p> I added a few helper methods to the <code>RestaurantService</code> class that inherits from <a href="https://learn.microsoft.com/dotnet/api/microsoft.aspnetcore.mvc.testing.webapplicationfactory-1"><span style="color:#2b91af;">WebApplicationFactory</span>&lt;<span style="color:#2b91af;">Startup</span>&gt;</a>, mainly to enable decoration of the injected Repository. With those changes, I could now rewrite my test like this: </p> <p> <pre>[<span style="color:#2b91af;">Fact</span>] <span style="color:blue;">public</span>&nbsp;<span style="color:blue;">async</span>&nbsp;<span style="color:#2b91af;">Task</span>&nbsp;<span style="font-weight:bold;color:#74531f;">NoOverbookingRace</span>() { &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">date</span>&nbsp;=&nbsp;<span style="color:#2b91af;">DateTime</span>.Now.Date.<span style="font-weight:bold;color:#74531f;">AddDays</span>(1).<span style="font-weight:bold;color:#74531f;">AddHours</span>(18.5); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">using</span>&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">service</span>&nbsp;=&nbsp;<span style="color:#2b91af;">RestaurantService</span>.<span style="color:#74531f;">CreateWith</span>(<span style="font-weight:bold;color:#1f377f;">repo</span>&nbsp;=&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">SlowReservationsRepository</span>( &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">TimeSpan</span>.<span style="color:#74531f;">FromMilliseconds</span>(100),&nbsp;<span style="font-weight:bold;color:#1f377f;">repo</span>)); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">task1</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#1f377f;">service</span>.<span style="font-weight:bold;color:#74531f;">PostReservation</span>(<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">ReservationDtoBuilder</span>() &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">WithDate</span>(<span style="font-weight:bold;color:#1f377f;">date</span>) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">WithQuantity</span>(10) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">Build</span>()); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">task2</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#1f377f;">service</span>.<span style="font-weight:bold;color:#74531f;">PostReservation</span>(<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">ReservationDtoBuilder</span>() &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">WithDate</span>(<span style="font-weight:bold;color:#1f377f;">date</span>) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">WithQuantity</span>(10) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">Build</span>()); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">actual</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">WhenAll</span>(<span style="font-weight:bold;color:#1f377f;">task1</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">task2</span>); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Single</span>( &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">actual</span>, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">msg</span>&nbsp;=&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">msg</span>.StatusCode&nbsp;==&nbsp;<span style="color:#2b91af;">HttpStatusCode</span>.InternalServerError); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">ok</span>&nbsp;=&nbsp;<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Single</span>(<span style="font-weight:bold;color:#1f377f;">actual</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">msg</span>&nbsp;=&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">msg</span>.IsSuccessStatusCode); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;Check&nbsp;that&nbsp;the&nbsp;reservation&nbsp;was&nbsp;actually&nbsp;created:</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">resp</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">service</span>.<span style="font-weight:bold;color:#74531f;">GetReservation</span>(<span style="font-weight:bold;color:#1f377f;">ok</span>.Headers.Location); &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">resp</span>.<span style="font-weight:bold;color:#74531f;">EnsureSuccessStatusCode</span>(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">reservation</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">resp</span>.<span style="font-weight:bold;color:#74531f;">ParseJsonContent</span>&lt;<span style="color:#2b91af;">ReservationDto</span>&gt;(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Equal</span>(10,&nbsp;<span style="font-weight:bold;color:#1f377f;">reservation</span>.Quantity); }</pre> </p> <p> The restaurant being tested has a maximum capacity of ten guests, so while it can accommodate either of the two requests, it can't make room for both. </p> <p> For this example, I arbitrarily chose to configure the Decorator with a 100-millisecond delay. Every interaction with the database caused by that test gets a built-in 100-millisecond delay. 50 ms before each action, and 50 ms after. </p> <p> The test starts both tasks, <code>task1</code> and <code>task2</code>, without awaiting them. This allows them to run concurrently. After starting both tasks, the test awaits both of them with <a href="https://learn.microsoft.com/dotnet/api/system.threading.tasks.task.whenall">Task.WhenAll</a>. </p> <p> The <a href="/2013/06/24/a-heuristic-for-formatting-code-according-to-the-aaa-pattern">assertion phase</a> of the test is more involved than you may be used to see. The reason is that it deals with more than one possible failure scenario. </p> <p> The first two assertions (<code>Assert.Single</code>) deal with the complete absence of transaction control in the application. In that case, both <code>POST</code> requests succeed, which they aren't supposed to. If the system works properly, it should accept one request and reject the other. </p> <p> The rest of the assertions check that the successful reservation was actually created. That's another failure scenario. </p> <p> The way I chose to deal with the race condition is standard in .NET. I used a <a href="https://learn.microsoft.com/dotnet/api/system.transactions.transactionscope">TransactionScope</a>. This is peculiar and, in my opinion, questionable API that enables you to start a transaction <em>anywhere</em> in your code, and then complete when you you're done. In the code base that accompanies <a href="/code-that-fits-in-your-head">Code That Fits in Your Head</a>, it looks like this: </p> <p> <pre><span style="color:blue;">private</span>&nbsp;<span style="color:blue;">async</span>&nbsp;<span style="color:#2b91af;">Task</span>&lt;<span style="color:#2b91af;">ActionResult</span>&gt;&nbsp;<span style="font-weight:bold;color:#74531f;">TryCreate</span>(<span style="color:#2b91af;">Restaurant</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">restaurant</span>,&nbsp;<span style="color:#2b91af;">Reservation</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">reservation</span>) { &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">using</span>&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">scope</span>&nbsp;=&nbsp;<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">TransactionScope</span>(<span style="color:#2b91af;">TransactionScopeAsyncFlowOption</span>.Enabled); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">reservations</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;Repository &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">ReadReservations</span>(<span style="font-weight:bold;color:#1f377f;">restaurant</span>.Id,&nbsp;<span style="font-weight:bold;color:#1f377f;">reservation</span>.At) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">now</span>&nbsp;=&nbsp;Clock.<span style="font-weight:bold;color:#74531f;">GetCurrentDateTime</span>(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">if</span>&nbsp;(!<span style="font-weight:bold;color:#1f377f;">restaurant</span>.MaitreD.<span style="font-weight:bold;color:#74531f;">WillAccept</span>(<span style="font-weight:bold;color:#1f377f;">now</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">reservations</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">reservation</span>)) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">return</span>&nbsp;<span style="color:#74531f;">NoTables500InternalServerError</span>(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;Repository.<span style="font-weight:bold;color:#74531f;">Create</span>(<span style="font-weight:bold;color:#1f377f;">restaurant</span>.Id,&nbsp;<span style="font-weight:bold;color:#1f377f;">reservation</span>).<span style="font-weight:bold;color:#74531f;">ConfigureAwait</span>(<span style="color:blue;">false</span>); &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">scope</span>.<span style="font-weight:bold;color:#74531f;">Complete</span>(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">return</span>&nbsp;<span style="color:#74531f;">Reservation201Created</span>(<span style="font-weight:bold;color:#1f377f;">restaurant</span>.Id,&nbsp;<span style="font-weight:bold;color:#1f377f;">reservation</span>); }</pre> </p> <p> Notice the <code>scope.Complete()</code> statement towards the end. </p> <p> What happens if someone forgets to call <code>scope.Complete()</code>? </p> <p> In that case, the thread that wins the race returns <code>201 Created</code>, but when the <code>scope</code> goes out of scope, it's disposed of. If <code>Complete()</code> wasn't called, the transaction is rolled back, but the HTTP response code remains <code>201</code>. Thus, the two assertions that inspect the response codes aren't enough to catch this particular kind of defect. </p> <p> Instead, the test subsequently queries the System Under Test to verify that the resource was, indeed, created. </p> <h3 id="064b9e4646544b60b47b25e2cc4cc8a5"> Wait time <a href="#064b9e4646544b60b47b25e2cc4cc8a5">#</a> </h3> <p> The original test shown in the book times out after 30 seconds if it can't produce the race condition. Compared to that, the refactored test shown here is <em>fast</em>. Even so, we may fear that it spends too much time doing nothing. How much time might that be? </p> <p> The <code>TryCreate</code> helper method shown above is the only part of a <code>POST</code> request that interacts with the Repository. As you can see, it calls it twice: Once to read, and once to write, if it decides to do that. With a 100 ms delay, that's 200 ms. </p> <p> And while the test issues two <code>POST</code> requests, they run in parallel. That's the whole point. It means that they'll still run in approximately 200 ms. </p> <p> The test then issues a <code>GET</code> request to verify that the resource was created. That triggers another database read, which takes another 100 ms. </p> <p> That's 300 ms in all. Given that these tests are part of a second-level test suite, and not your default developer test suite, that may be good enough. </p> <p> Still, that's the <code>POST</code> scenario. I also wrote a test that checks for a race condition when doing <code>PUT</code> requests, and it performs more work. </p> <p> <pre>[<span style="color:#2b91af;">Fact</span>] <span style="color:blue;">public</span>&nbsp;<span style="color:blue;">async</span>&nbsp;<span style="color:#2b91af;">Task</span>&nbsp;<span style="font-weight:bold;color:#74531f;">NoOverbookingPutRace</span>() { &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">date</span>&nbsp;=&nbsp;<span style="color:#2b91af;">DateTime</span>.Now.Date.<span style="font-weight:bold;color:#74531f;">AddDays</span>(1).<span style="font-weight:bold;color:#74531f;">AddHours</span>(18.5); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">using</span>&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">service</span>&nbsp;=&nbsp;<span style="color:#2b91af;">RestaurantService</span>.<span style="color:#74531f;">CreateWith</span>(<span style="font-weight:bold;color:#1f377f;">repo</span>&nbsp;=&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">SlowReservationsRepository</span>( &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">TimeSpan</span>.<span style="color:#74531f;">FromMilliseconds</span>(100),&nbsp;<span style="font-weight:bold;color:#1f377f;">repo</span>)); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;(<span style="font-weight:bold;color:#1f377f;">address1</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">dto1</span>)&nbsp;=&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">service</span>.<span style="font-weight:bold;color:#74531f;">PostReservation</span>(<span style="font-weight:bold;color:#1f377f;">date</span>,&nbsp;4); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;(<span style="font-weight:bold;color:#1f377f;">address2</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">dto2</span>)&nbsp;=&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">service</span>.<span style="font-weight:bold;color:#74531f;">PostReservation</span>(<span style="font-weight:bold;color:#1f377f;">date</span>,&nbsp;4); &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">dto1</span>.Quantity&nbsp;+=&nbsp;2; &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">dto2</span>.Quantity&nbsp;+=&nbsp;2; &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">task1</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#1f377f;">service</span>.<span style="font-weight:bold;color:#74531f;">PutReservation</span>(<span style="font-weight:bold;color:#1f377f;">address1</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">dto1</span>); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">task2</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#1f377f;">service</span>.<span style="font-weight:bold;color:#74531f;">PutReservation</span>(<span style="font-weight:bold;color:#1f377f;">address2</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">dto2</span>); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">actual</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">WhenAll</span>(<span style="font-weight:bold;color:#1f377f;">task1</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">task2</span>); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Single</span>( &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">actual</span>, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">msg</span>&nbsp;=&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">msg</span>.StatusCode&nbsp;==&nbsp;<span style="color:#2b91af;">HttpStatusCode</span>.InternalServerError); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">ok</span>&nbsp;=&nbsp;<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Single</span>(<span style="font-weight:bold;color:#1f377f;">actual</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">msg</span>&nbsp;=&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">msg</span>.IsSuccessStatusCode); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;Check&nbsp;that&nbsp;the&nbsp;reservations&nbsp;now&nbsp;have&nbsp;consistent&nbsp;values:</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">client</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#1f377f;">service</span>.<span style="font-weight:bold;color:#74531f;">CreateClient</span>(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">resp1</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">client</span>.<span style="font-weight:bold;color:#74531f;">GetAsync</span>(<span style="font-weight:bold;color:#1f377f;">address1</span>); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">resp2</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">client</span>.<span style="font-weight:bold;color:#74531f;">GetAsync</span>(<span style="font-weight:bold;color:#1f377f;">address2</span>); &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">resp1</span>.<span style="font-weight:bold;color:#74531f;">EnsureSuccessStatusCode</span>(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">resp2</span>.<span style="font-weight:bold;color:#74531f;">EnsureSuccessStatusCode</span>(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">body1</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">resp1</span>.<span style="font-weight:bold;color:#74531f;">ParseJsonContent</span>&lt;<span style="color:#2b91af;">ReservationDto</span>&gt;(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">body2</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">resp2</span>.<span style="font-weight:bold;color:#74531f;">ParseJsonContent</span>&lt;<span style="color:#2b91af;">ReservationDto</span>&gt;(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Single</span>(<span style="color:blue;">new</span>[]&nbsp;{&nbsp;<span style="font-weight:bold;color:#1f377f;">body1</span>.Quantity,&nbsp;<span style="font-weight:bold;color:#1f377f;">body2</span>.Quantity&nbsp;},&nbsp;6); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Single</span>(<span style="color:blue;">new</span>[]&nbsp;{&nbsp;<span style="font-weight:bold;color:#1f377f;">body1</span>.Quantity,&nbsp;<span style="font-weight:bold;color:#1f377f;">body2</span>.Quantity&nbsp;},&nbsp;4); }</pre> </p> <p> This test first has to create two reservations in a nice, sequential manner. Then it attempts to perform two concurrent updates, and finally it tests that all is as it should be: That both reservations still exist, but only one had its <code>Quantity</code> increased to <code>6</code>. </p> <p> This test first makes two <code>POST</code> requests, nicely serialized so as to avoid a race condition. That's 400 ms. </p> <p> Each <code>PUT</code> request triggers three Repository actions, for a total of 300 ms (since they run in parallel). </p> <p> Finally, the test issues two <code>GET</code> requests for verification, for another 2 times 100 ms. Now that I'm writing this, I realize that I could also have parallelized these two calls, but as you read on, you'll see why that's not necessary. </p> <p> In all, this test waits for 900 ms. That's almost a second. </p> <p> Can we improve on that? </p> <h3 id="b5c26b0e8a80487db636d47cd408cf3a"> Decreasing unnecessary wait time <a href="#b5c26b0e8a80487db636d47cd408cf3a">#</a> </h3> <p> In the latter example, the 300 ms wait time for the parallel <code>PUT</code> requests are necessary to trigger the race condition, but the rest of the test's actions don't need slowing down. We can remove the unwarranted wait time by setting up two services: One slow, and one normal. </p> <p> To be honest, I could have modelled this by just instantiating two service objects, but why do something as pedestrian as that when you can turn <code>RestaurantService</code> into a <a href="/2020/10/19/monomorphic-functors">monomorphic functor</a>? </p> <p> <pre><span style="color:blue;">internal</span>&nbsp;<span style="color:#2b91af;">RestaurantService</span>&nbsp;<span style="font-weight:bold;color:#74531f;">Select</span>(<span style="color:#2b91af;">Func</span>&lt;<span style="color:#2b91af;">IReservationsRepository</span>,&nbsp;<span style="color:#2b91af;">IReservationsRepository</span>&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">selector</span>) { &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">if</span>&nbsp;(<span style="font-weight:bold;color:#1f377f;">selector</span>&nbsp;<span style="color:blue;">is</span>&nbsp;<span style="color:blue;">null</span>) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">throw</span>&nbsp;<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">ArgumentNullException</span>(<span style="color:blue;">nameof</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>)); &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#8f08c4;">return</span>&nbsp;<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">RestaurantService</span>(<span style="font-weight:bold;color:#1f377f;">selector</span>(repository)); }</pre> </p> <p> Granted, this is verging on the frivolous, but when writing code for a blog post, I think I'm allowed a little fun. </p> <p> In any case, this now enables me to rewrite the test like this: </p> <p> <pre>[<span style="color:#2b91af;">Fact</span>] <span style="color:blue;">public</span>&nbsp;<span style="color:blue;">async</span>&nbsp;<span style="color:#2b91af;">Task</span>&nbsp;<span style="font-weight:bold;color:#74531f;">NoOverbookingRace</span>() { &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">date</span>&nbsp;=&nbsp;<span style="color:#2b91af;">DateTime</span>.Now.Date.<span style="font-weight:bold;color:#74531f;">AddDays</span>(1).<span style="font-weight:bold;color:#74531f;">AddHours</span>(18.5); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">using</span>&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">service</span>&nbsp;=&nbsp;<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">RestaurantService</span>(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">using</span>&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">slowService</span>&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">from</span>&nbsp;repo&nbsp;<span style="color:blue;">in</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">service</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">select</span>&nbsp;<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">SlowReservationsRepository</span>(<span style="color:#2b91af;">TimeSpan</span>.<span style="color:#74531f;">FromMilliseconds</span>(100),&nbsp;repo); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">task1</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#1f377f;">slowService</span>.<span style="font-weight:bold;color:#74531f;">PostReservation</span>(<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">ReservationDtoBuilder</span>() &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">WithDate</span>(<span style="font-weight:bold;color:#1f377f;">date</span>) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">WithQuantity</span>(10) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">Build</span>()); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">task2</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#1f377f;">slowService</span>.<span style="font-weight:bold;color:#74531f;">PostReservation</span>(<span style="color:blue;">new</span>&nbsp;<span style="color:#2b91af;">ReservationDtoBuilder</span>() &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">WithDate</span>(<span style="font-weight:bold;color:#1f377f;">date</span>) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">WithQuantity</span>(10) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.<span style="font-weight:bold;color:#74531f;">Build</span>()); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">actual</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="color:#2b91af;">Task</span>.<span style="color:#74531f;">WhenAll</span>(<span style="font-weight:bold;color:#1f377f;">task1</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">task2</span>); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Single</span>( &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">actual</span>, &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">msg</span>&nbsp;=&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">msg</span>.StatusCode&nbsp;==&nbsp;<span style="color:#2b91af;">HttpStatusCode</span>.InternalServerError); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">ok</span>&nbsp;=&nbsp;<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Single</span>(<span style="font-weight:bold;color:#1f377f;">actual</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">msg</span>&nbsp;=&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">msg</span>.IsSuccessStatusCode); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;Check&nbsp;that&nbsp;the&nbsp;reservation&nbsp;was&nbsp;actually&nbsp;created:</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">resp</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">service</span>.<span style="font-weight:bold;color:#74531f;">GetReservation</span>(<span style="font-weight:bold;color:#1f377f;">ok</span>.Headers.Location); &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">resp</span>.<span style="font-weight:bold;color:#74531f;">EnsureSuccessStatusCode</span>(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">reservation</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#8f08c4;">await</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">resp</span>.<span style="font-weight:bold;color:#74531f;">ParseJsonContent</span>&lt;<span style="color:#2b91af;">ReservationDto</span>&gt;(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">Assert</span>.<span style="color:#74531f;">Equal</span>(10,&nbsp;<span style="font-weight:bold;color:#1f377f;">reservation</span>.Quantity); }</pre> </p> <p> Notice how only the parallel execution of <code>task1</code> and <code>task2</code> run on the slow system. The rest runs as fast as it can. It's as if the client was hitting two different servers that just happen to connect to the same database. Now the test only waits for the 200 ms described above. The <code>PUT</code> test, likewise, only idles for 300 ms instead of 900 ms. </p> <h3 id="90a4897a10b24d2c98d09129884f7e13"> Near-deterministic tests <a href="#90a4897a10b24d2c98d09129884f7e13">#</a> </h3> <p> Does this deterministically reproduce the race condition? In practice, it may move us close enough, but theoretically the race is still on. With the increased wait time, it's now much more unlikely that the race condition does <em>not</em> happen, but it still could. </p> <p> Imagine that <code>task1</code> queries the Repository. Just as it's received a response, but before <code>task2</code> starts its query, execution is paused, perhaps because of garbage collection. Once the program resumes, <code>task1</code> runs to completion before <code>task2</code> reads from the database. In that case, <code>task2</code> ends up making the right decision, rejecting the reservation. Even if no transaction control were in place. </p> <p> This may not be a particularly realistic scenario, but I suppose it could happen if the computer is stressed in general. Even so, you might decide to make such <a href="https://en.wikipedia.org/wiki/False_positives_and_false_negatives">false-negative</a> scenarios even more unlikely by increasing the delay time. Of course, the downside is that tests take even longer to run. </p> <p> Another potential problem is that there's no <em>guarantee</em> that <code>task1</code> and <code>task2</code> run in parallel. Even if the test doesn't <code>await</code> any of the tasks, both start executing immediately. There's an (unlikely) chance that <code>task1</code> completes before <code>task2</code> starts. Again, I don't consider this likely, but I suppose it could happen because of thread starvation, generation 2 garbage collection, the disk running full, etc. The point is that the test shown here is still playing the odds, even if the odds are really good. </p> <h3 id="335493859fd0494488353e60de46c9c3"> Conclusion <a href="#335493859fd0494488353e60de46c9c3">#</a> </h3> <p> Instead of running a scenario 'enough' times that reproducing a race condition is likely, you can increase the odds to near-certainty by slowing down the race. In this example, the race involves a database, but you might also encounter race conditions internally in multi-threaded code. I'm not insisting that the technique described in this article applies universally, but if you can slow down certain interactions in the right way, you may be able reproduce problems as automated tests. </p> <p> If you've ever troubleshot a race condition, you've probably tried inserting sleeps into the code in various places to understand the problem. As described above, a single, strategically-placed <code>Task.Delay</code> may be all you need to reproduce a problem. What escaped me for a long time, however, was that I didn't realize that I could cleanly insert such pauses into production code. Until my workshop participant suggested using a Decorator. </p> <p> A delaying Decorator slows interactions with the database down sufficiently to reproduce the race condition as an automated test. </p> </div> <div id="comments"> <hr> <h2 id="comments-header"> Comments </h2> <div class="comment" id="165739b939044dbc930c3f6185c40378"> <div class="comment-author"><a href="https://medium.com/@nickkell">Nick</a> <a href="#165739b939044dbc930c3f6185c40378">#</a></div> <div class="comment-content"> <p> Unfortunately I haven't read your book, so perhaps this question is already answered there: how would a transaction prevent a race condition? I would have expected to solve it using optimistic concurrency control. </p> <p> My other point is regarding artificial delays. How is that deterministic? If you're injecting decorators to take advantage of seams in the code, and you know the number of operations, then you don't need delays at all. To highlight the race condition you need to ensure two requests have read the same data concurrently. After that you can allow the writes, expecting one to succeed and the other to fail. You could use a synchronization primitive, such as CountdownEvent. Initialize it with a count of 2 to represent your concurrent requests. Repository.ReadReservations() would call Signal() to decrement it after fetching from the database. Repository.Create() would call Wait() to ensure both reads had been made before writing to the database. </p> </div> <div class="comment-date">2025-06-15 17:58 UTC</div> </div> <div class="comment" id="b820ddd2766744f7be9ef61e3eb18714"> <div class="comment-author"><a href="/">Mark Seemann</a> <a href="#b820ddd2766744f7be9ef61e3eb18714">#</a></div> <div class="comment-content"> <p> Nick, thank you for writing. Regarding the transaction, as the above <code>TryCreate</code> snippet shows, it wraps both the read and the write operation, and since the default <a href="https://learn.microsoft.com/dotnet/api/system.transactions.isolationlevel">isolation level</a> is <code>Serializable</code>, </p> <blockquote> "no new data can be added during the transaction." </blockquote> <p> As to the second point, I don't think I've claimed that inducing an artificial delay as a Decorator is deterministic. To the contrary, I <a href="#90a4897a10b24d2c98d09129884f7e13">explicitly discussed how this is only near-deterministic</a>. </p> <p> That said, you're correct that there's an even better way, and I have another article queued up that takes an approach similar to the one that you describe. </p> </div> <div class="comment-date">2025-06-18 07:01 UTC</div> </div> </div><hr> This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>. Mark Seemann https://blog.ploeh.dk/2025/06/02/testing-races-with-a-slow-decorator Song recommendations as a Haskell Impureim Sandwich https://blog.ploeh.dk/2025/05/26/song-recommendations-as-a-haskell-impureim-sandwich/ Mon, 26 May 2025 07:15:00 UTC <div id="post"> <p> <em>A pure function on potentially massive data.</em> </p> <p> This article is part of a larger article series called <a href="/2025/04/07/alternative-ways-to-design-with-functional-programming">Alternative ways to design with functional programming</a>. As the title suggests, these articles discuss various ways to apply functional-programming principles to a particular problem. All the articles engage with the same problem. In short, the task is to calculate song recommendations for a user, based on massive data sets. Earlier articles in this series give you detailed explanation of the problem. </p> <p> In the <a href="/2025/05/19/song-recommendations-as-an-f-impureim-sandwich">previous article</a>, you saw how to refactor the 'base' <a href="https://fsharp.org/">F#</a> code base to a <a href="https://en.wikipedia.org/wiki/Pure_function">pure function</a>. In this article, you'll see the same refactoring applied to the 'base' <a href="https://www.haskell.org/">Haskell</a> code base shown in <a href="/2025/04/21/porting-song-recommendations-to-haskell">Porting song recommendations to Haskell</a>. </p> <p> The Git branch discussed in this article is the <em>pure-function</em> branch in the Haskell Git repository. </p> <h3 id="0ed27ab7abea4417918789c4e30e1204"> Collecting all the data <a href="#0ed27ab7abea4417918789c4e30e1204">#</a> </h3> <p> Like in the previous articles, we may start by adding two more methods to the <code>SongService</code> type class, which will enable us to enumerate all songs and all users. The full type class, with all four methods, then looks like this: </p> <p> <pre><span style="color:blue;">class</span>&nbsp;<span style="color:#2b91af;">SongService</span>&nbsp;a&nbsp;<span style="color:blue;">where</span> &nbsp;&nbsp;<span style="color:#2b91af;">getAllSongs</span>&nbsp;<span style="color:blue;">::</span>&nbsp;a&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:#2b91af;">IO</span>&nbsp;[<span style="color:blue;">Song</span>] &nbsp;&nbsp;<span style="color:#2b91af;">getAllUsers</span>&nbsp;<span style="color:blue;">::</span>&nbsp;a&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:#2b91af;">IO</span>&nbsp;[<span style="color:blue;">User</span>] &nbsp;&nbsp;<span style="color:#2b91af;">getTopListeners</span>&nbsp;<span style="color:blue;">::</span>&nbsp;a&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:#2b91af;">Int</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:#2b91af;">IO</span>&nbsp;[<span style="color:blue;">User</span>] &nbsp;&nbsp;<span style="color:#2b91af;">getTopScrobbles</span>&nbsp;<span style="color:blue;">::</span>&nbsp;a&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:#2b91af;">String</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:#2b91af;">IO</span>&nbsp;[<span style="color:blue;">Scrobble</span>]</pre> </p> <p> If you compare with the type class definition shown in the article <a href="/2025/04/21/porting-song-recommendations-to-haskell">Porting song recommendations to Haskell</a>, you'll see that <code>getAllSongs</code> and <code>getAllUsers</code> are the new methods. </p> <p> They enable you to collect all top listeners, and all top scrobbles, even though it may take some time. To gather all the top listeners, we may add this <code>collectAllTopListeners</code> action: </p> <p> <pre>collectAllTopListeners&nbsp;srvc&nbsp;=&nbsp;<span style="color:blue;">do</span> &nbsp;&nbsp;songs&nbsp;&lt;-&nbsp;getAllSongs&nbsp;srvc &nbsp;&nbsp;listeners&nbsp;&lt;-&nbsp;newIORef&nbsp;Map.empty &nbsp;&nbsp;forM_&nbsp;songs&nbsp;$&nbsp;\song&nbsp;-&gt;&nbsp;<span style="color:blue;">do</span> &nbsp;&nbsp;&nbsp;&nbsp;topListeners&nbsp;&lt;-&nbsp;getTopListeners&nbsp;srvc&nbsp;$&nbsp;songId&nbsp;song &nbsp;&nbsp;&nbsp;&nbsp;modifyIORef&nbsp;listeners&nbsp;(Map.insert&nbsp;(songId&nbsp;song)&nbsp;topListeners) &nbsp;&nbsp;readIORef&nbsp;listeners</pre> </p> <p> Likewise, you can amass all the top scrobbles with a similar action: </p> <p> <pre>collectAllTopScrobbles&nbsp;srvc&nbsp;=&nbsp;<span style="color:blue;">do</span> &nbsp;&nbsp;users&nbsp;&lt;-&nbsp;getAllUsers&nbsp;srvc &nbsp;&nbsp;scrobbles&nbsp;&lt;-&nbsp;newIORef&nbsp;Map.empty &nbsp;&nbsp;forM_&nbsp;users&nbsp;$&nbsp;\user&nbsp;-&gt;&nbsp;<span style="color:blue;">do</span> &nbsp;&nbsp;&nbsp;&nbsp;topScrobbles&nbsp;&lt;-&nbsp;getTopScrobbles&nbsp;srvc&nbsp;$&nbsp;userName&nbsp;user &nbsp;&nbsp;&nbsp;&nbsp;modifyIORef&nbsp;scrobbles&nbsp;(Map.insert&nbsp;(userName&nbsp;user)&nbsp;topScrobbles) &nbsp;&nbsp;readIORef&nbsp;scrobbles</pre> </p> <p> As you may have noticed, they're so similar that, had there been <a href="https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)">more than two</a>, we might consider extracting the similar parts to a reusable operation. </p> <p> In both cases, we start with the action that enables us to enumerate all the resources (songs or scrobbles) that we're interested in. For each of these, we then invoke the action to get the 'top' resources for that song or scrobble. There's a massive <em>n+1</em> problem here, but you could conceivably parallelize all these queries, as they're independent. Still, it's bound to take much time, possibly hours. </p> <p> You may be wondering why I chose to use <code>IORef</code> values for both actions, instead of more <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatic</a> combinator-based expressions. Indeed, that is what I would usually do, but in this case, these two actions are heavily IO-bound already, and it makes the Haskell code more similar to the F# code. Not that that is normally a goal, but here I thought it might help you, the reader, to compare the different code bases. </p> <p> All the data is kept in a <a href="https://hackage.haskell.org/package/containers/docs/Data-Map-Strict.html">Map</a> per action, so two massive maps in all. Once these two actions return, we're done with the <em>read</em> phase of the <a href="/2025/01/13/recawr-sandwich">Recawr Sandwich</a>. </p> <h3 id="5378c29512584cab95d484b1bdd97fd9"> Referentially transparent function with local mutation <a href="#5378c29512584cab95d484b1bdd97fd9">#</a> </h3> <p> As a first step, we may wish to turn the <code>getRecommendations</code> action into a <a href="https://en.wikipedia.org/wiki/Referential_transparency">referentially transparent</a> function. If you look through the commits in the Git repository, you can see that I actually did this through a series of <a href="https://www.industriallogic.com/blog/whats-this-about-micro-commits/">micro-commits</a>, but here I only present a more coarse-grained version of the changes I made. </p> <p> In this version, I've removed the <code>srvc</code> (<code>SongService</code>) parameter, and instead added the two maps <code>topScrobbles</code> and <code>topListeners</code>. </p> <p> <pre><span style="color:#2b91af;">getRecommendations</span>&nbsp;<span style="color:blue;">::</span>&nbsp;<span style="color:blue;">Map</span>&nbsp;<span style="color:#2b91af;">String</span>&nbsp;[<span style="color:blue;">Scrobble</span>]&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:blue;">Map</span>&nbsp;<span style="color:#2b91af;">Int</span>&nbsp;[<span style="color:blue;">User</span>]&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:#2b91af;">String</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:#2b91af;">IO</span>&nbsp;[<span style="color:blue;">Song</span>] getRecommendations&nbsp;topScrobbles&nbsp;topListeners&nbsp;un&nbsp;=&nbsp;<span style="color:blue;">do</span> &nbsp;&nbsp;<span style="color:green;">--&nbsp;1.&nbsp;Get&nbsp;user&#39;s&nbsp;own&nbsp;top&nbsp;scrobbles </span>&nbsp;&nbsp;<span style="color:green;">--&nbsp;2.&nbsp;Get&nbsp;other&nbsp;users&nbsp;who&nbsp;listened&nbsp;to&nbsp;the&nbsp;same&nbsp;songs </span>&nbsp;&nbsp;<span style="color:green;">--&nbsp;3.&nbsp;Get&nbsp;top&nbsp;scrobbles&nbsp;of&nbsp;those&nbsp;users </span>&nbsp;&nbsp;<span style="color:green;">--&nbsp;4.&nbsp;Aggregate&nbsp;the&nbsp;songs&nbsp;into&nbsp;recommendations </span> &nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;scrobbles&nbsp;=&nbsp;Map.findWithDefault&nbsp;<span style="color:blue;">[]</span>&nbsp;un&nbsp;topScrobbles &nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;scrobblesSnapshot&nbsp;=&nbsp;<span style="color:blue;">take</span>&nbsp;100&nbsp;$&nbsp;sortOn&nbsp;(Down&nbsp;.&nbsp;scrobbleCount)&nbsp;scrobbles &nbsp;&nbsp;recommendationCandidates&nbsp;&lt;-&nbsp;newIORef&nbsp;<span style="color:blue;">[]</span> &nbsp;&nbsp;forM_&nbsp;scrobblesSnapshot&nbsp;$&nbsp;\scrobble&nbsp;-&gt;&nbsp;<span style="color:blue;">do</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;otherListeners&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Map.findWithDefault&nbsp;<span style="color:blue;">[]</span>&nbsp;(songId&nbsp;$&nbsp;scrobbledSong&nbsp;scrobble)&nbsp;topListeners &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;otherListenersSnapshot&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">take</span>&nbsp;20&nbsp;$ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sortOn&nbsp;(Down&nbsp;.&nbsp;userScrobbleCount)&nbsp;$ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">filter</span>&nbsp;((10_000&nbsp;&lt;=)&nbsp;.&nbsp;userScrobbleCount)&nbsp;otherListeners &nbsp;&nbsp;&nbsp;&nbsp;forM_&nbsp;otherListenersSnapshot&nbsp;$&nbsp;\otherListener&nbsp;-&gt;&nbsp;<span style="color:blue;">do</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;otherScrobbles&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Map.findWithDefault&nbsp;<span style="color:blue;">[]</span>&nbsp;(userName&nbsp;otherListener)&nbsp;topScrobbles &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;otherScrobblesSnapshot&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">take</span>&nbsp;10&nbsp;$ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sortOn&nbsp;(Down&nbsp;.&nbsp;songRating&nbsp;.&nbsp;scrobbledSong)&nbsp;$ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">filter</span>&nbsp;(songHasVerifiedArtist&nbsp;.&nbsp;scrobbledSong)&nbsp;otherScrobbles &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;forM_&nbsp;otherScrobblesSnapshot&nbsp;$&nbsp;\otherScrobble&nbsp;-&gt;&nbsp;<span style="color:blue;">do</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;song&nbsp;=&nbsp;scrobbledSong&nbsp;otherScrobble &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;modifyIORef&nbsp;recommendationCandidates&nbsp;(song&nbsp;:) &nbsp;&nbsp;recommendations&nbsp;&lt;-&nbsp;readIORef&nbsp;recommendationCandidates &nbsp;&nbsp;<span style="color:blue;">return</span>&nbsp;$&nbsp;<span style="color:blue;">take</span>&nbsp;200&nbsp;$&nbsp;sortOn&nbsp;(Down&nbsp;.&nbsp;songRating)&nbsp;recommendations</pre> </p> <p> You've probably noticed that this action still looks impure, since it returns <code>IO [Song]</code>. Even so, it's referentially transparent, since it's fully deterministic and without side effects. </p> <p> The way I refactored the action, this order of changes was what made most sense to me. Getting rid of the <code>SongService</code> parameter was more important to me than getting rid of the <code>IO</code> wrapper. </p> <p> In any case, this is only an interim state towards a more idiomatic pure Haskell function. </p> <h3 id="5eba5b02d8724403b8ea6f26054fc9af"> A single expression <a href="#5eba5b02d8724403b8ea6f26054fc9af">#</a> </h3> <p> A curious property of expression-based languages is that you can conceivably write functions in 'one line of code'. Granted, it would often be a terribly wide line, not at all readable, a beast to maintain, and often with poor performance, so not something you'd want to alway do. </p> <p> In this case, however, we <em>can</em> do that, although in order to stay within an <a href="/2019/11/04/the-80-24-rule">80x24 box</a>, we break the expression over multiple lines. </p> <p> <pre><span style="color:#2b91af;">getRecommendations</span>&nbsp;<span style="color:blue;">::</span>&nbsp;<span style="color:blue;">Map</span>&nbsp;<span style="color:#2b91af;">String</span>&nbsp;[<span style="color:blue;">Scrobble</span>]&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:blue;">Map</span>&nbsp;<span style="color:#2b91af;">Int</span>&nbsp;[<span style="color:blue;">User</span>]&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:#2b91af;">String</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;[<span style="color:blue;">Song</span>] getRecommendations&nbsp;topScrobbles&nbsp;topListeners&nbsp;un&nbsp;= &nbsp;&nbsp;<span style="color:green;">--&nbsp;1.&nbsp;Get&nbsp;user&#39;s&nbsp;own&nbsp;top&nbsp;scrobbles </span>&nbsp;&nbsp;<span style="color:green;">--&nbsp;2.&nbsp;Get&nbsp;other&nbsp;users&nbsp;who&nbsp;listened&nbsp;to&nbsp;the&nbsp;same&nbsp;songs </span>&nbsp;&nbsp;<span style="color:green;">--&nbsp;3.&nbsp;Get&nbsp;top&nbsp;scrobbles&nbsp;of&nbsp;those&nbsp;users </span>&nbsp;&nbsp;<span style="color:green;">--&nbsp;4.&nbsp;Aggregate&nbsp;the&nbsp;songs&nbsp;into&nbsp;recommendations </span> &nbsp;&nbsp;<span style="color:blue;">take</span>&nbsp;200&nbsp;$ &nbsp;&nbsp;sortOn&nbsp;(Down&nbsp;.&nbsp;songRating)&nbsp;$ &nbsp;&nbsp;<span style="color:blue;">fmap</span>&nbsp;scrobbledSong&nbsp;$ &nbsp;&nbsp;(\otherListener&nbsp;-&gt;&nbsp;<span style="color:blue;">take</span>&nbsp;10&nbsp;$ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sortOn&nbsp;(Down&nbsp;.&nbsp;songRating&nbsp;.&nbsp;scrobbledSong)&nbsp;$ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">filter</span>&nbsp;(songHasVerifiedArtist&nbsp;.&nbsp;scrobbledSong)&nbsp;$ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Map.findWithDefault&nbsp;<span style="color:blue;">[]</span>&nbsp;(userName&nbsp;otherListener)&nbsp;topScrobbles)&nbsp;=&lt;&lt; &nbsp;&nbsp;(\scrobble&nbsp;-&gt;&nbsp;<span style="color:blue;">take</span>&nbsp;20&nbsp;$ &nbsp;&nbsp;&nbsp;&nbsp;sortOn&nbsp;(Down&nbsp;.&nbsp;userScrobbleCount)&nbsp;$ &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">filter</span>&nbsp;((10_000&nbsp;&lt;=)&nbsp;.&nbsp;userScrobbleCount)&nbsp;$ &nbsp;&nbsp;&nbsp;&nbsp;Map.findWithDefault&nbsp;<span style="color:blue;">[]</span>&nbsp;(songId&nbsp;$&nbsp;scrobbledSong&nbsp;scrobble)&nbsp;topListeners)&nbsp;=&lt;&lt; &nbsp;&nbsp;<span style="color:blue;">take</span>&nbsp;100 &nbsp;&nbsp;(sortOn&nbsp;(Down&nbsp;.&nbsp;scrobbleCount)&nbsp;$&nbsp;Map.findWithDefault&nbsp;<span style="color:blue;">[]</span>&nbsp;un&nbsp;topScrobbles)</pre> </p> <p> This snapshot also got rid of the <code>IORef</code> value, and <code>IO</code> in general. The function is still referentially transparent, but now Haskell can also see that. </p> <p> Even so, this looks dense and confusing. It doesn't help that Haskell should usually be read right-to-left, and bottom-to-top. Is it possible to improve the readability of this function? </p> <h3 id="c621ff8635bd4eabaa085b7b7d4a8c75"> Composition from smaller functions <a href="#c621ff8635bd4eabaa085b7b7d4a8c75">#</a> </h3> <p> To improve readability and maintainability, we may now extract helper functions. The first one easily suggests itself. </p> <p> <pre><span style="color:#2b91af;">getUsersOwnTopScrobbles</span>&nbsp;<span style="color:blue;">::</span>&nbsp;<span style="color:blue;">Ord</span>&nbsp;k&nbsp;<span style="color:blue;">=&gt;</span>&nbsp;<span style="color:blue;">Map</span>&nbsp;k&nbsp;[<span style="color:blue;">Scrobble</span>]&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;k&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;[<span style="color:blue;">Scrobble</span>] getUsersOwnTopScrobbles&nbsp;topScrobbles&nbsp;un&nbsp;= &nbsp;&nbsp;<span style="color:blue;">take</span>&nbsp;100&nbsp;$ &nbsp;&nbsp;sortOn&nbsp;(Down&nbsp;.&nbsp;scrobbleCount)&nbsp;$&nbsp;Map.findWithDefault&nbsp;<span style="color:blue;">[]</span>&nbsp;un&nbsp;topScrobbles</pre> </p> <p> Each of the subexpressions in the above code listing are candidates for the same kind of treatment, like this one: </p> <p> <pre><span style="color:#2b91af;">getOtherUsersWhoListenedToTheSameSongs</span>&nbsp;<span style="color:blue;">::</span>&nbsp;<span style="color:blue;">Map</span>&nbsp;<span style="color:#2b91af;">Int</span>&nbsp;[<span style="color:blue;">User</span>]&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:blue;">Scrobble</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;[<span style="color:blue;">User</span>] getOtherUsersWhoListenedToTheSameSongs&nbsp;topListeners&nbsp;scrobble&nbsp;= &nbsp;&nbsp;<span style="color:blue;">take</span>&nbsp;20&nbsp;$ &nbsp;&nbsp;sortOn&nbsp;(Down&nbsp;.&nbsp;userScrobbleCount)&nbsp;$ &nbsp;&nbsp;<span style="color:blue;">filter</span>&nbsp;((10_000&nbsp;&lt;=)&nbsp;.&nbsp;userScrobbleCount)&nbsp;$ &nbsp;&nbsp;Map.findWithDefault&nbsp;<span style="color:blue;">[]</span>&nbsp;(songId&nbsp;$&nbsp;scrobbledSong&nbsp;scrobble)&nbsp;topListeners</pre> </p> <p> You can't see it from the code listings themselves, but the module doesn't export these functions. They remain implementation details. </p> <p> With a few more helper functions, you can now implement the <code>getRecommendations</code> function by composing the helpers. </p> <p> <pre><span style="color:#2b91af;">getRecommendations</span>&nbsp;<span style="color:blue;">::</span>&nbsp;<span style="color:blue;">Map</span>&nbsp;<span style="color:#2b91af;">String</span>&nbsp;[<span style="color:blue;">Scrobble</span>]&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:blue;">Map</span>&nbsp;<span style="color:#2b91af;">Int</span>&nbsp;[<span style="color:blue;">User</span>]&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:#2b91af;">String</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;[<span style="color:blue;">Song</span>] getRecommendations&nbsp;topScrobbles&nbsp;topListeners&nbsp;un&nbsp;= &nbsp;&nbsp;<span style="color:green;">--&nbsp;1.&nbsp;Get&nbsp;user&#39;s&nbsp;own&nbsp;top&nbsp;scrobbles </span>&nbsp;&nbsp;<span style="color:green;">--&nbsp;2.&nbsp;Get&nbsp;other&nbsp;users&nbsp;who&nbsp;listened&nbsp;to&nbsp;the&nbsp;same&nbsp;songs </span>&nbsp;&nbsp;<span style="color:green;">--&nbsp;3.&nbsp;Get&nbsp;top&nbsp;scrobbles&nbsp;of&nbsp;those&nbsp;users </span>&nbsp;&nbsp;<span style="color:green;">--&nbsp;4.&nbsp;Aggregate&nbsp;the&nbsp;songs&nbsp;into&nbsp;recommendations </span> &nbsp;&nbsp;aggregateSongsIntoRecommendations&nbsp;$ &nbsp;&nbsp;getTopSongsOfOtherUser&nbsp;topScrobbles&nbsp;=&lt;&lt; &nbsp;&nbsp;getOtherUsersWhoListenedToTheSameSongs&nbsp;topListeners&nbsp;=&lt;&lt; &nbsp;&nbsp;getUsersOwnTopScrobbles&nbsp;topScrobbles&nbsp;un</pre> </p> <p> Since Haskell by default composes from right to left, when you break such a composition over multiple lines (in order to stay within a <a href="/2019/11/04/the-80-24-rule">80x24</a> box), it should be read bottom-up. </p> <p> You can remedy this situation by importing the <code>&</code> operator from <a href="https://hackage.haskell.org/package/base-4.21.0.0/docs/Data-Function.html">Data.Function</a>: </p> <p> <pre><span style="color:#2b91af;">getRecommendations</span>&nbsp;<span style="color:blue;">::</span>&nbsp;<span style="color:blue;">Map</span>&nbsp;<span style="color:#2b91af;">String</span>&nbsp;[<span style="color:blue;">Scrobble</span>]&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:blue;">Map</span>&nbsp;<span style="color:#2b91af;">Int</span>&nbsp;[<span style="color:blue;">User</span>]&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:#2b91af;">String</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;[<span style="color:blue;">Song</span>] getRecommendations&nbsp;topScrobbles&nbsp;topListeners&nbsp;un&nbsp;=&nbsp; &nbsp;&nbsp;getUsersOwnTopScrobbles&nbsp;topScrobbles&nbsp;un&nbsp;&gt;&gt;= &nbsp;&nbsp;getOtherUsersWhoListenedToTheSameSongs&nbsp;topListeners&nbsp;&gt;&gt;= &nbsp;&nbsp;getTopSongsOfOtherUser&nbsp;topScrobbles&nbsp;&amp; &nbsp;&nbsp;aggregateSongsIntoRecommendations</pre> </p> <p> Notice that I've named each of the helper functions after the code comments that accompanied the previous incarnations of this function. If we consider <a href="http://butunclebob.com/ArticleS.TimOttinger.ApologizeIncode">code comments apologies for not properly organizing the code</a>, we've now managed to structure it in such a way that those apologies are no longer required. </p> <h3 id="b934bca9ccf5403491d86a418d0ca8a3"> Conclusion <a href="#b934bca9ccf5403491d86a418d0ca8a3">#</a> </h3> <p> If you accept the (perhaps preposterous) assumption that it's possible to fit the required data in <a href="https://en.wikipedia.org/wiki/Persistent_data_structure">persistent data structures</a>, refactoring the recommendation algorithm to a pure function isn't that difficult. That's the pure part of a Recawr Sandwich. While I haven't shown the actual sandwich here, it's quite straightforward. You can find it in the tests in the Haskell Git repository. Also, once you've moved all the data collection to the boundary of the application, you may no longer need the <code>SongService</code> type class. </p> <p> I find the final incarnation of the code shown here to be quite attractive. While I've kept the helper functions private to the module, it's always an option to export them if you find that warranted. This could improve testability of the overall code base, albeit at the risk of increasing the surface area of the API that you have to maintain and secure. </p> <p> There are always trade-offs to be considered. Even if you, eventually, find that for this particular example, the input data size is just <em>too</em> big to make this alternative viable, there are, in my experience, many other situations when this kind of architecture is a good solution. Even if the input size is a decent amount of megabytes, the simplification offered by an <a href="/2020/03/02/impureim-sandwich">Impureim Sandwich</a> may trump the larger memory footprint. As always, if you're concerned about performance, <a href="https://ericlippert.com/2012/12/17/performance-rant/">measure it</a>. </p> <p> This article concludes the overview of using an Recawr Sandwich to address the problem. Since it's, admittedly, a bit of a stretch to imagine running a program that uses terabytes (or more) of memory, we now turn to alternative architectures. </p> <p> <strong>Next:</strong> <a href="/2025/06/09/song-recommendations-from-combinators">Song recommendations from combinators</a>. </p> </div><hr> This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>. Mark Seemann https://blog.ploeh.dk/2025/05/26/song-recommendations-as-a-haskell-impureim-sandwich Song recommendations as an F# Impureim Sandwich https://blog.ploeh.dk/2025/05/19/song-recommendations-as-an-f-impureim-sandwich/ Mon, 19 May 2025 08:06:00 UTC <div id="post"> <p> <em>A pure function on potentially massive data.</em> </p> <p> This article is part of a larger article series titled <a href="/2025/04/07/alternative-ways-to-design-with-functional-programming">Alternative ways to design with functional programming</a>. In the <a href="/2025/05/05/song-recommendations-as-a-c-impureim-sandwich">previous article</a>, you saw an example of applying the <a href="/2020/03/02/impureim-sandwich">Impureim Sandwich</a> pattern to the problem at hand: A song recommendation engine that sifts through much historical data. </p> <p> As already covered in <a href="/2025/04/28/song-recommendations-as-an-impureim-sandwich">Song recommendations as an Impureim Sandwich</a>, the drawback, if you will, of a <a href="/2025/01/13/recawr-sandwich">Recawr Sandwich</a> is that you need to collect all data from impure sources before you can pass it to a <a href="https://en.wikipedia.org/wiki/Pure_function">pure function</a>. It may happen that you need so much data that this strategy becomes untenable. <a href="/2025/05/12/song-recommendations-proof-of-concept-memory-measurements">This may be the case here</a>, but surprisingly often, what strikes us humans as being vast amounts are peanuts for computers. </p> <p> So even if you don't find this particular example realistic, I'll forge ahead and show how to apply the Recawr Sandwich pattern to this problem. This is essentially a port to <a href="https://fsharp.org/">F#</a> of the C# code from the previous article. If you rather want to see some more realistic architectures to deal with the overall problem, you can always go back to the table of contents in the <a href="/2025/04/07/alternative-ways-to-design-with-functional-programming">first article of the series</a>. </p> <p> In this article, I'm working with the <em>fsharp-pure-function</em> branch of the Git repository. </p> <h3 id="090279ba52c044c1b9a38d6d29ce26f1"> Collecting all the data <a href="#090279ba52c044c1b9a38d6d29ce26f1">#</a> </h3> <p> Like in the previous article, we may start by adding two more members to the <code>SongService</code> interface, which will enable us to enumerate all songs and all users. The full interface, with all four methods, then looks like this: </p> <p> <pre><span style="color:blue;">type</span>&nbsp;<span style="color:#2b91af;">SongService</span>&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">abstract</span>&nbsp;<span style="font-weight:bold;color:#74531f;">GetAllSongs</span>&nbsp;:&nbsp;<span style="color:#2b91af;">unit</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:#2b91af;">Task</span>&lt;<span style="color:#2b91af;">IEnumerable</span>&lt;<span style="color:#2b91af;">Song</span>&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">abstract</span>&nbsp;<span style="font-weight:bold;color:#74531f;">GetAllUsers</span>&nbsp;:&nbsp;<span style="color:#2b91af;">unit</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:#2b91af;">Task</span>&lt;<span style="color:#2b91af;">IEnumerable</span>&lt;<span style="color:#2b91af;">User</span>&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">abstract</span>&nbsp;<span style="font-weight:bold;color:#74531f;">GetTopListenersAsync</span>&nbsp;:&nbsp;songId&nbsp;:&nbsp;<span style="color:#2b91af;">int</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:#2b91af;">Task</span>&lt;<span style="color:#2b91af;">IReadOnlyCollection</span>&lt;<span style="color:#2b91af;">User</span>&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">abstract</span>&nbsp;<span style="font-weight:bold;color:#74531f;">GetTopScrobblesAsync</span>&nbsp;:&nbsp;userName&nbsp;:&nbsp;<span style="color:#2b91af;">string</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:#2b91af;">Task</span>&lt;<span style="color:#2b91af;">IReadOnlyCollection</span>&lt;<span style="color:#2b91af;">Scrobble</span>&gt;&gt;</pre> </p> <p> If you compare with the interface definition shown in the article <a href="/2025/04/14/porting-song-recommendations-to-f">Porting song recommendations to F#</a>, you'll see that <code>GetAllSongs</code> and <code>GetAllUsers</code> are the new methods. </p> <p> They enable you to collect all top listeners, and all top scrobbles, even though it may take some time. To gather all the top listeners, we may add this <code>collectAllTopListeners</code> action: </p> <p> <pre><span style="color:blue;">let</span>&nbsp;<span style="color:#74531f;">collectAllTopListeners</span>&nbsp;(<span style="font-weight:bold;color:#1f377f;">songService</span>&nbsp;:&nbsp;<span style="color:#2b91af;">SongService</span>)&nbsp;=&nbsp;<span style="color:blue;">task</span>&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">d</span>&nbsp;=&nbsp;<span style="color:#2b91af;">Dictionary</span>&lt;<span style="color:#2b91af;">int</span>,&nbsp;<span style="color:#2b91af;">IReadOnlyCollection</span>&lt;<span style="color:#2b91af;">User</span>&gt;&gt;&nbsp;() &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let!</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">songs</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#1f377f;">songService</span>.<span style="font-weight:bold;color:#74531f;">GetAllSongs</span>&nbsp;() &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">do!</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">songs</span>&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">TaskSeq</span>.<span style="color:#74531f;">iter</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:blue;">task</span>&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let!</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">topListeners</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#1f377f;">songService</span>.<span style="font-weight:bold;color:#74531f;">GetTopListenersAsync</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>.Id &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">d</span>.<span style="font-weight:bold;color:#74531f;">Add</span>&nbsp;(<span style="font-weight:bold;color:#1f377f;">s</span>.Id,&nbsp;<span style="font-weight:bold;color:#1f377f;">topListeners</span>)&nbsp;}&nbsp;) &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">return</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">d</span>&nbsp;:&gt;&nbsp;<span style="color:#2b91af;">IReadOnlyDictionary</span>&lt;_,&nbsp;_&gt;&nbsp;}</pre> </p> <p> Likewise, you can amass all the top scrobbles with a similar action: </p> <p> <pre><span style="color:blue;">let</span>&nbsp;<span style="color:#74531f;">collectAllTopScrobbles</span>&nbsp;(<span style="font-weight:bold;color:#1f377f;">songService</span>&nbsp;:&nbsp;<span style="color:#2b91af;">SongService</span>)&nbsp;=&nbsp;<span style="color:blue;">task</span>&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">d</span>&nbsp;=&nbsp;<span style="color:#2b91af;">Dictionary</span>&lt;<span style="color:#2b91af;">string</span>,&nbsp;<span style="color:#2b91af;">IReadOnlyCollection</span>&lt;<span style="color:#2b91af;">Scrobble</span>&gt;&gt;&nbsp;() &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let!</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">users</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#1f377f;">songService</span>.<span style="font-weight:bold;color:#74531f;">GetAllUsers</span>&nbsp;() &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">do!</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">users</span>&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">TaskSeq</span>.<span style="color:#74531f;">iter</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">u</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:blue;">task</span>&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let!</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">topScrobbles</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#1f377f;">songService</span>.<span style="font-weight:bold;color:#74531f;">GetTopScrobblesAsync</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">u</span>.UserName &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">d</span>.<span style="font-weight:bold;color:#74531f;">Add</span>&nbsp;(<span style="font-weight:bold;color:#1f377f;">u</span>.UserName,&nbsp;<span style="font-weight:bold;color:#1f377f;">topScrobbles</span>)&nbsp;}&nbsp;) &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">return</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">d</span>&nbsp;:&gt;&nbsp;<span style="color:#2b91af;">IReadOnlyDictionary</span>&lt;_,&nbsp;_&gt;&nbsp;}</pre> </p> <p> As you may have noticed, they're so similar that, had there been <a href="https://en.wikipedia.org/wiki/Rule_of_three_(computer_programming)">more than two</a>, we might consider extracting the similar parts to a reusable operation. </p> <p> In both cases, we start with the action that enables us to enumerate all the resources (songs or scrobbles) that we're interested in. For each of these, we then invoke the action to get the 'top' resources for that song or scrobble. There's a massive <em>n+1</em> problem here, but you could conceivably parallelize all these queries, as they're independent. Still, it's bound to take much time, possibly hours. </p> <p> All the data is kept in a dictionary per action, so two massive dictionaries in all. Once these two actions return, we're done with the <em>read</em> phase of the Recawr Sandwich. </p> <h3 id="5194a0ec946046068d42c086f7fd2697"> Traversals <a href="#5194a0ec946046068d42c086f7fd2697">#</a> </h3> <p> You may have wondered about the above <code>TaskSeq.iter</code> action. That's not part of the standard library. What is it, and where does it come from? </p> <p> It's a specialized <a href="/2024/11/11/traversals">traversal</a>, designed to make asynchronous <a href="https://en.wikipedia.org/wiki/Command%E2%80%93query_separation">Commands</a> more streamlined. </p> <p> <pre><span style="color:blue;">let</span>&nbsp;<span style="color:#74531f;">iter</span>&nbsp;<span style="color:#74531f;">f</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">xs</span>&nbsp;=&nbsp;<span style="color:blue;">task</span>&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let!</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">units</span>&nbsp;=&nbsp;<span style="color:#74531f;">traverse</span>&nbsp;<span style="color:#74531f;">f</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">xs</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">iter</span>&nbsp;<span style="color:#74531f;">id</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">units</span>&nbsp;}</pre> </p> <p> If you've ever wondered why the <em>identity function</em> (<code>id</code>) is useful, here's an example. In the first line of code, <code>units</code> is a <code>unit seq</code> value; i.e. a sequence of <code>unit</code> values. To make <code>TaskSeq.iter</code> as easy to use as possible, it should turn that multitude of <code>unit</code> values into a single <code>unit</code> value. There's more than one way to do that, but I found that using <code>Seq.iter</code> was about the most succinct option I could think of. Be that as it may, <code>Seq.iter</code> requires as an argument a function that returns <code>unit</code>, and since we already have <code>unit</code> values, <code>id</code> does the job. </p> <p> The <code>iter</code> action uses the <code>TaskSeq</code> module's <code>traverse</code> function, which is defined like this: </p> <p> <pre><span style="color:blue;">let</span>&nbsp;<span style="color:#74531f;">traverse</span>&nbsp;<span style="color:#74531f;">f</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">xs</span>&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;<span style="color:#74531f;">go</span>&nbsp;<span style="color:#1f377f;">acc</span>&nbsp;<span style="color:#1f377f;">x</span>&nbsp;=&nbsp;<span style="color:blue;">task</span>&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let!</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">x&#39;</span>&nbsp;=&nbsp;<span style="color:#1f377f;">x</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let!</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">acc&#39;</span>&nbsp;=&nbsp;<span style="color:#1f377f;">acc</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">return</span>&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">append</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">acc&#39;</span>&nbsp;[<span style="font-weight:bold;color:#1f377f;">x&#39;</span>]&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">xs</span>&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">map</span>&nbsp;<span style="color:#74531f;">f</span>&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">fold</span>&nbsp;<span style="color:#74531f;">go</span>&nbsp;(<span style="color:blue;">task</span>&nbsp;{&nbsp;<span style="color:blue;">return</span>&nbsp;[]&nbsp;})</pre> </p> <p> The type of <code>traverse</code> is <code>(&#39;a&nbsp;-&gt;&nbsp;#Task&lt;&#39;c&gt;)&nbsp;-&gt;&nbsp;&#39;a&nbsp;seq&nbsp;-&gt;&nbsp;Task&lt;&#39;c&nbsp;seq&gt;</code>; that is, it applies an asynchronous action to each of a sequence of <code>'a</code> values, and returns an asynchronous workflow that contains a sequence of <code>'c</code> values. </p> <h3 id="16c034285a8d41b39923e27c6b81788e"> Dictionary lookups <a href="#16c034285a8d41b39923e27c6b81788e">#</a> </h3> <p> In .NET, queries that may fail are <a href="/2015/08/03/idiomatic-or-idiosyncratic">idiomatically</a> modelled with methods that take <code>out</code> parameters. This is also true for dictionary lookups. Since that kind of design doesn't compose well, it's useful to add a little helper function that instead may return an empty value. While you'd <a href="/2019/07/15/tester-doer-isomorphisms">generally do that by returning an option value</a>, in this case, an empty collection is more appropriate. </p> <p> <pre><span style="color:blue;">let</span>&nbsp;<span style="color:#74531f;">findOrEmpty</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">key</span>&nbsp;(<span style="font-weight:bold;color:#1f377f;">d</span>&nbsp;:&nbsp;<span style="color:#2b91af;">IReadOnlyDictionary</span>&lt;_,&nbsp;<span style="color:#2b91af;">IReadOnlyCollection</span>&lt;_&gt;&gt;)&nbsp;= &nbsp;&nbsp;&nbsp;<span style="color:blue;">match</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">d</span>.<span style="font-weight:bold;color:#74531f;">TryGetValue</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">key</span>&nbsp;<span style="color:blue;">with</span> &nbsp;&nbsp;&nbsp;|&nbsp;<span style="color:blue;">true</span>,&nbsp;<span style="font-weight:bold;color:#1f377f;">v</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">v</span> &nbsp;&nbsp;&nbsp;|&nbsp;_&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="color:#2b91af;">List</span>.empty</pre> </p> <p> You may have noticed that I also added a similar helper function in <a href="/2025/05/05/song-recommendations-as-a-c-impureim-sandwich">the C# example</a>, although there I called it <code>GetOrEmpty</code>. </p> <h3 id="cddd7cc72cf748b995fc0b58b0971c78"> Pure function with local mutation <a href="#cddd7cc72cf748b995fc0b58b0971c78">#</a> </h3> <p> As a first step, we may wish to turn the <code>GetRecommendationsAsync</code> method into a pure function. If you look through the commits in the Git repository, you can see that I actually did this through a series of <a href="https://www.industriallogic.com/blog/whats-this-about-micro-commits/">micro-commits</a>, but here I only present a more coarse-grained version of the changes I made. </p> <p> Instead of a method on a class, we now have a self-contained function that takes, as arguments, two dictionaries, but no <code>SongService</code> dependency. </p> <p> <pre><span style="color:blue;">let</span>&nbsp;<span style="color:#74531f;">getRecommendations</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">topScrobbles</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">topListeners</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">userName</span>&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;1.&nbsp;Get&nbsp;user&#39;s&nbsp;own&nbsp;top&nbsp;scrobbles</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;2.&nbsp;Get&nbsp;other&nbsp;users&nbsp;who&nbsp;listened&nbsp;to&nbsp;the&nbsp;same&nbsp;songs</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;3.&nbsp;Get&nbsp;top&nbsp;scrobbles&nbsp;of&nbsp;those&nbsp;users</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;4.&nbsp;Aggregate&nbsp;the&nbsp;songs&nbsp;into&nbsp;recommendations</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">scrobbles</span>&nbsp;=&nbsp;<span style="font-weight:bold;color:#1f377f;">topScrobbles</span>&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Dict</span>.<span style="color:#74531f;">findOrEmpty</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">userName</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">scrobblesSnapshot</span>&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">scrobbles</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">sortByDescending</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>.ScrobbleCount) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">truncate</span>&nbsp;100 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">toList</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">recommendationCandidates</span>&nbsp;=&nbsp;<span style="color:#2b91af;">ResizeArray</span>&nbsp;() &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">for</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">scrobble</span>&nbsp;<span style="color:blue;">in</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">scrobblesSnapshot</span>&nbsp;<span style="color:blue;">do</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">otherListeners</span>&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">topListeners</span>&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Dict</span>.<span style="color:#74531f;">findOrEmpty</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">scrobble</span>.Song.Id &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">otherListenersSnapshot</span>&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">otherListeners</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">filter</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">u</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">u</span>.TotalScrobbleCount&nbsp;&gt;=&nbsp;10_000) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">sortByDescending</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">u</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">u</span>.TotalScrobbleCount) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">truncate</span>&nbsp;20 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">toList</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">for</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">otherListener</span>&nbsp;<span style="color:blue;">in</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">otherListenersSnapshot</span>&nbsp;<span style="color:blue;">do</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">otherScrobbles</span>&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">topScrobbles</span>&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Dict</span>.<span style="color:#74531f;">findOrEmpty</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">otherListener</span>.UserName &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">otherScrobblesSnapshot</span>&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">otherScrobbles</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">filter</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>.Song.IsVerifiedArtist) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">sortByDescending</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>.Song.Rating) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">truncate</span>&nbsp;10 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">toList</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">otherScrobblesSnapshot</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">List</span>.<span style="color:#74531f;">map</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>.Song) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">recommendationCandidates</span>.<span style="font-weight:bold;color:#74531f;">AddRange</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">recommendations</span>&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">recommendationCandidates</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">sortByDescending</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>.Rating) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">truncate</span>&nbsp;200 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">toList</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;:&gt;&nbsp;<span style="color:#2b91af;">IReadOnlyCollection</span>&lt;_&gt; &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">recommendations</span></pre> </p> <p> Since this is now a pure function, there's no need to run as an asynchronous workflow. The function no longer returns a <code>Task</code>, and I've also dispensed with the <em>Async</em> suffix. </p> <p> The implementation still has imperative remnants. It initializes an empty <code>ResizeArray</code> (AKA <code>List&lt;T&gt;</code>), and loops through nested loops to repeatedly call <a href="https://learn.microsoft.com/dotnet/api/system.collections.generic.list-1.addrange">AddRange</a>. </p> <p> Even though the function contains local state mutation, none of it escapes the function's scope. The function is <a href="https://en.wikipedia.org/wiki/Referential_transparency">referentially transparent</a> because it always returns the same result when given the same input, and it has no side effects. </p> <p> You might still wish that it was 'more functional', which is certainly possible. </p> <h3 id="dda1233fb2ff4bc3b21d49d910fabf6f"> A single expression <a href="#dda1233fb2ff4bc3b21d49d910fabf6f">#</a> </h3> <p> A curious property of expression-based languages is that you can conceivably write functions in 'one line of code'. Granted, it would often be a terribly wide line, not at all readable, a beast to maintain, and often with poor performance, so not something you'd want to alway do. </p> <p> In this case, however, we <em>can</em> do that, although in order to stay within an <a href="/2019/11/04/the-80-24-rule">80x24 box</a>, we break the expression over multiple lines. </p> <p> <pre><span style="color:blue;">let</span>&nbsp;<span style="color:#74531f;">getRecommendations</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">topScrobbles</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">topListeners</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">userName</span>&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;1.&nbsp;Get&nbsp;user&#39;s&nbsp;own&nbsp;top&nbsp;scrobbles</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;2.&nbsp;Get&nbsp;other&nbsp;users&nbsp;who&nbsp;listened&nbsp;to&nbsp;the&nbsp;same&nbsp;songs</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;3.&nbsp;Get&nbsp;top&nbsp;scrobbles&nbsp;of&nbsp;those&nbsp;users</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;4.&nbsp;Aggregate&nbsp;the&nbsp;songs&nbsp;into&nbsp;recommendations</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">topScrobbles</span> &nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Dict</span>.<span style="color:#74531f;">findOrEmpty</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">userName</span> &nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">sortByDescending</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>.ScrobbleCount) &nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">truncate</span>&nbsp;100 &nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">collect</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">scrobble</span>&nbsp;<span style="color:blue;">-&gt;</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">topListeners</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Dict</span>.<span style="color:#74531f;">findOrEmpty</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">scrobble</span>.Song.Id &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">filter</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">u</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">u</span>.TotalScrobbleCount&nbsp;&gt;=&nbsp;10_000) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">sortByDescending</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">u</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">u</span>.TotalScrobbleCount) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">truncate</span>&nbsp;20 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">collect</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">otherListener</span>&nbsp;<span style="color:blue;">-&gt;</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">topScrobbles</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Dict</span>.<span style="color:#74531f;">findOrEmpty</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">otherListener</span>.UserName &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">filter</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>.Song.IsVerifiedArtist) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">sortByDescending</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>.Song.Rating) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">truncate</span>&nbsp;10 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">map</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>.Song))) &nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">sortByDescending</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>.Rating) &nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">truncate</span>&nbsp;200 &nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">toList</span> &nbsp;&nbsp;&nbsp;&nbsp;:&gt;&nbsp;<span style="color:#2b91af;">IReadOnlyCollection</span>&lt;_&gt;</pre> </p> <p> To be honest, the four lines of comments push the function definition over the edge of 24 lines of code, but without them, this variation actually does fit an 80x24 box. Even so, I'm not arguing that this is the best possible way to organize and lay out this function. </p> <p> You may rightly complain that it's too dense. Perhaps you're also concerned about the <a href="https://wiki.c2.com/?ArrowAntiPattern">arrow code</a> tendency. </p> <p> I'm not disagreeing, but at least this represents a milestone where the function is not only referentially transparent, but also implemented without local mutation. Not that that really should be the most important criterion, but once you have an entirely expression-based implementation, it's usually easier to break it up into smaller building blocks. </p> <h3 id="7d81ce38521b406fa306d4bfa79a44bb"> Composition from smaller functions <a href="#7d81ce38521b406fa306d4bfa79a44bb">#</a> </h3> <p> To improve readability and maintainability, we may now extract helper functions. The first one easily suggests itself. </p> <p> <pre><span style="color:blue;">let</span>&nbsp;<span style="color:blue;">private</span>&nbsp;<span style="color:#74531f;">getUsersOwnTopScrobbles</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">topScrobbles</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">userName</span>&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">topScrobbles</span> &nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Dict</span>.<span style="color:#74531f;">findOrEmpty</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">userName</span> &nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">sortByDescending</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>&nbsp;<span style="color:blue;">-&gt;</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">s</span>.ScrobbleCount) &nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">truncate</span>&nbsp;100</pre> </p> <p> Each of the subexpressions in the above code listing are candidates for the same kind of treatment, like this one: </p> <p> <pre><span style="color:blue;">let</span>&nbsp;<span style="color:blue;">private</span>&nbsp;<span style="color:#74531f;">getOtherUsersWhoListenedToTheSameSongs</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">topListeners</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">scrobble</span>&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;<span style="font-weight:bold;color:#1f377f;">topListeners</span> &nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Dict</span>.<span style="color:#74531f;">findOrEmpty</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">scrobble</span>.Song.Id &nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">filter</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">u</span>&nbsp;-&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">u</span>.TotalScrobbleCount&nbsp;&gt;=&nbsp;10_000) &nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">sortByDescending</span>&nbsp;(<span style="color:blue;">fun</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">u</span>&nbsp;-&gt;&nbsp;<span style="font-weight:bold;color:#1f377f;">u</span>.TotalScrobbleCount) &nbsp;&nbsp;&nbsp;&nbsp;|&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">truncate</span>&nbsp;20</pre> </p> <p> Notice that these helper methods are marked <code>private</code> so that they remain implementation details within the module that exports the <code>getRecommendations</code> function. </p> <p> With a few more helper functions, you can now implement the <code>getRecommendations</code> function by composing the helpers. </p> <p> <pre><span style="color:blue;">let</span>&nbsp;<span style="color:#74531f;">getRecommendations</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">topScrobbles</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">topListeners</span>&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#74531f;">getUsersOwnTopScrobbles</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">topScrobbles</span> &nbsp;&nbsp;&nbsp;&nbsp;&gt;&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">collect</span>&nbsp;( &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:#74531f;">getOtherUsersWhoListenedToTheSameSongs</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">topListeners</span> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&gt;&gt;&nbsp;<span style="color:#2b91af;">Seq</span>.<span style="color:#74531f;">collect</span>&nbsp;(<span style="color:#74531f;">getTopSongsOfOtherUser</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">topScrobbles</span>)) &nbsp;&nbsp;&nbsp;&nbsp;&gt;&gt;&nbsp;<span style="color:#74531f;">aggregateSongsIntoRecommendations</span></pre> </p> <p> Notice that I've named each of the helper functions after the code comments that accompanied the previous incarnations of this function. If we consider <a href="http://butunclebob.com/ArticleS.TimOttinger.ApologizeIncode">code comments apologies for not properly organizing the code</a>, we've now managed to structure it in such a way that those apologies are no longer required. </p> <h3 id="7467fbb8e803446ea3dff035e7388301"> Conclusion <a href="#7467fbb8e803446ea3dff035e7388301">#</a> </h3> <p> If you accept the (perhaps preposterous) assumption that it's possible to fit the required data in <a href="https://en.wikipedia.org/wiki/Persistent_data_structure">persistent data structures</a>, refactoring the recommendation algorithm to a pure function isn't that difficult. That's the pure part of a Recawr Sandwich. While I haven't shown the actual sandwich here, it's identical to the example shown in <a href="/2025/05/05/song-recommendations-as-a-c-impureim-sandwich">Song recommendations as a C# Impureim Sandwich</a>. </p> <p> I find the final incarnation of the code shown here to be quite attractive. While I've kept the helper functions <code>private</code>, it's always an option to promote them to <code>public</code> functions if you find that warranted. This could improve testability of the overall code base, albeit at the risk of increasing the surface area of the API that you have to maintain and secure. </p> <p> There are always trade-offs to be considered. Even if you, eventually, find that for this particular example, the input data size is just <em>too</em> big to make this alternative viable, there are, in my experience, many other situations when this kind of architecture is a good solution. Even if the input size is a decent amount of megabytes, the simplification offered by an Impureim Sandwich may trump the larger memory footprint. As always, if you're concerned about performance, <a href="https://ericlippert.com/2012/12/17/performance-rant/">measure it</a>. </p> <p> Before we turn to alternative architectures, we'll survey how this variation looks in <a href="https://www.haskell.org/">Haskell</a>. As is generally the case in this article series, if you don't care about Haskell, you can always go back to the table of contents in the <a href="/2025/04/07/alternative-ways-to-design-with-functional-programming">first article in the series</a> and instead navigate to the next article that interests you. </p> <p> <strong>Next:</strong> <a href="/2025/05/26/song-recommendations-as-a-haskell-impureim-sandwich">Song recommendations as a Haskell Impureim Sandwich</a>. </p> </div><hr> This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>. Mark Seemann https://blog.ploeh.dk/2025/05/19/song-recommendations-as-an-f-impureim-sandwich Song recommendations proof-of-concept memory measurements https://blog.ploeh.dk/2025/05/12/song-recommendations-proof-of-concept-memory-measurements/ Mon, 12 May 2025 07:52:00 UTC <div id="post"> <p> <em>An attempt at measurement, and some results.</em> </p> <p> This is an article in a <a href="/2025/04/07/alternative-ways-to-design-with-functional-programming">larger series about functional programming design alternatives</a>, and a direct continuation of the <a href="/2025/05/05/song-recommendations-as-a-c-impureim-sandwich">previous article</a>. The question lingering after the <a href="/2020/03/02/impureim-sandwich">Impureim Sandwich</a> proof of concept is: What are the memory requirements of front-loading all users, songs, and scrobbles? </p> <p> One can guess, as I've <a href="/2025/04/28/song-recommendations-as-an-impureim-sandwich">already done</a>, but it's safer to measure. In this article, you'll find a description of the experiment, as well as some results. </p> <h3 id="6f88f7533af54936a2a923903171c831"> Test program <a href="#6f88f7533af54936a2a923903171c831">#</a> </h3> <p> Since I don't measure application memory profiles that often, I searched the web to learn how, and found <a href="https://stackoverflow.com/a/28514434/126014">this answer</a> by <a href="https://stackoverflow.com/users/22656/jon-skeet">Jon Skeet</a>. That's a reputable source, so I'm assuming that the described approach is appropriate. </p> <p> I added a new command-line executable to the source code and made this the entry point: </p> <p> <pre><span style="color:blue;">const</span>&nbsp;<span style="color:blue;">int</span>&nbsp;size&nbsp;=&nbsp;100_000; <span style="color:blue;">static</span>&nbsp;<span style="color:blue;">async</span>&nbsp;Task&nbsp;Main() { &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;before&nbsp;=&nbsp;GC.GetTotalMemory(<span style="color:blue;">true</span>); &nbsp;&nbsp;&nbsp;&nbsp;var&nbsp;(listeners,&nbsp;scrobbles)&nbsp;=&nbsp;<span style="color:blue;">await</span>&nbsp;Populate(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;after&nbsp;=&nbsp;GC.GetTotalMemory(<span style="color:blue;">true</span>); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;diff&nbsp;=&nbsp;after&nbsp;-&nbsp;before; &nbsp;&nbsp;&nbsp;&nbsp;Console.WriteLine(<span style="color:#a31515;">&quot;Total&nbsp;memory:&nbsp;{0:N0}B.&quot;</span>,&nbsp;diff); &nbsp;&nbsp;&nbsp;&nbsp;GC.KeepAlive(listeners); &nbsp;&nbsp;&nbsp;&nbsp;GC.KeepAlive(scrobbles); }</pre> </p> <p> <code>listeners</code> and <code>scrobbles</code> are two dictionaries of data, as described in the <a href="/2025/05/05/song-recommendations-as-a-c-impureim-sandwich">previous article</a>. Together, they contain the data that we measure. Both are populated by this method: </p> <p> <pre><span style="color:blue;">private</span>&nbsp;<span style="color:blue;">static</span>&nbsp;<span style="color:blue;">async</span>&nbsp;Task&lt;( &nbsp;&nbsp;&nbsp;&nbsp;IReadOnlyDictionary&lt;<span style="color:blue;">int</span>,&nbsp;IReadOnlyCollection&lt;User&gt;&gt;, &nbsp;&nbsp;&nbsp;&nbsp;IReadOnlyDictionary&lt;<span style="color:blue;">string</span>,&nbsp;IReadOnlyCollection&lt;Scrobble&gt;&gt;)&gt;&nbsp;Populate() { &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;service&nbsp;=&nbsp;PopulateService(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;listeners&nbsp;=&nbsp;<span style="color:blue;">await</span>&nbsp;service.CollectAllTopListeners(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;scrobbles&nbsp;=&nbsp;<span style="color:blue;">await</span>&nbsp;service.CollectAllTopScrobbles(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">return</span>&nbsp;(listeners,&nbsp;scrobbles); }</pre> </p> <p> The <code>service</code> variable is a <code>FakeSongService</code> object populated with randomly generated data. The <code>CollectAllTopListeners</code> and <code>CollectAllTopScrobbles</code> methods are the same as described in the previous article. When the method returns the two dictionaries, the <code>service</code> object goes out of scope and can be garbage-collected. When the program measures the memory load, it measures the size of the two dictionaries, but not <code>service</code>. </p> <p> I've reused the <a href="https://fscheck.github.io/FsCheck/">FsCheck</a> generators for random data generation: </p> <p> <pre><span style="color:blue;">private</span>&nbsp;<span style="color:blue;">static</span>&nbsp;SongService&nbsp;PopulateService() { &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;users&nbsp;=&nbsp;RecommendationsProviderTests.Gen.UserName.Sample(size); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;songs&nbsp;=&nbsp;RecommendationsProviderTests.Gen.Song.Sample(size); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;scrobbleGen&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">from</span>&nbsp;user&nbsp;<span style="color:blue;">in</span>&nbsp;Gen.Elements(users) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">from</span>&nbsp;song&nbsp;<span style="color:blue;">in</span>&nbsp;Gen.Elements(songs) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">from</span>&nbsp;scrobbleCount&nbsp;<span style="color:blue;">in</span>&nbsp;Gen.Choose(1,&nbsp;10) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">select</span>&nbsp;(user,&nbsp;song,&nbsp;scrobbleCount); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;service&nbsp;=&nbsp;<span style="color:blue;">new</span>&nbsp;FakeSongService(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">foreach</span>&nbsp;(var&nbsp;(user,&nbsp;song,&nbsp;scrobbleCount)&nbsp;<span style="color:blue;">in</span>&nbsp;scrobbleGen.Sample(size)) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;service.Scrobble(user,&nbsp;song,&nbsp;scrobbleCount); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">return</span>&nbsp;service; }</pre> </p> <p> A <code>Gen&lt;T&gt;</code> object comes with a <code>Sample</code> method you can use to request a specified number of randomly generated values. </p> <p> In order to keep the code simple, I used the <code>size</code> value for both the number of songs, number of users, and number of scrobbles. This probably creates too few scrobbles; a topic that requires further discussion later. </p> <h3 id="2759c45eda984c02871d99ff21d9936d"> Measurements <a href="#2759c45eda984c02871d99ff21d9936d">#</a> </h3> <p> I ran the above program with various <code>size</code> values; <em>100,000</em> up to <em>1,000,000</em> in <em>100,000</em> increments, and from there up to <em>1,000,000</em> (one million) in <em>500,000</em> increments. At the higher values, it took a good ten minutes to run the program. </p> <p> <img src="/content/binary/song-recommendations-memory-costs.png" alt="Song recommendations memory cost line chart."> </p> <p> As the chart indicates, I ran the program with various data representations (more on that below). While there are four distinct data series, they overlap pairwise so perfectly that the graph doesn't show the difference. The <em>record</em> and <em>struct record</em> data series are so identical that you can't visibly see the difference. The same is true for the <em>bitmasked class</em> and the <em>bitmasked struct</em> data series, which only go to <code>size</code> <em>500,000</em>. </p> <p> There are small, but noticeable jumps from <em>4,500,000</em> to <em>5,000,000</em> and again from <em>8,500,000</em> to <em>9,000,000</em>, but the overall impression is that the relationship is linear. It seems safe to conclude that the solution scales linearly with the data size. </p> <p> The number of bytes per size is almost constant and averages to 178 bytes. How does that compare to <a href="/2025/04/28/song-recommendations-as-an-impureim-sandwich">my previous memory size estimates</a>? There, I estimated a song and a scrobble to require 8 bytes each, and a user less than 32 bytes. The way the above simulation runs, it generates one song, one user, and one scrobble per size unit. Therefore, I'd expect the average memory cost per experiment size to be around <em>8 + 8 + 32 = 48</em>, plus some overhead from the dictionaries. </p> <p> Given that the number I measure is 178, that's 130 bytes of overhead. Honestly, that's more than I expected. I expect a dictionary to maintain an array of keys, perhaps hashed with a bucket per hash value. Perhaps, had I picked another data structure than a plain old <a href="https://learn.microsoft.com/dotnet/api/system.collections.generic.dictionary-2">Dictionary</a>, it's possible that the overhead would be different. Or perhaps I just don't understand .NET's memory model, when push comes to shove. </p> <p> I then tried to split the single <code>size</code> parameter into three that would control the number of users, songs, and scrobbles independently. Setting both the number of users and songs to ten million, I then ran a series of simulations with increasing scrobble counts. </p> <p> <img src="/content/binary/song-recommendations-scrobble-memory-costs.png" alt="Scrobble memory cost line chart."> </p> <p> The relationship still looks linear, and at a hundred million scrobbles (and ten million users and ten million songs), the simulation uses 8.3 GB of memory. </p> <p> I admit that I'm still a bit puzzled by the measurements, compared to my previous estimates. I would have expected those sizes to require about 1,2 GB, plus overhead, so the actual measurements are off by a factor of 7. Not quite an order of magnitude, but close. </p> <h3 id="b1a0539a274f49ef937b15fc07090990"> Realism <a href="#b1a0539a274f49ef937b15fc07090990">#</a> </h3> <p> How useful are these measurements? How realistic are the experiments' parameters? Most streaming audio services report having catalogues with around 100 million songs, which is ten times more than what I've measured here. Such services may also have significantly more users than ten million, but what is going to make or break this architecture option (keeping all data in memory) is how many scrobbles users have, and how many times they listen to each song. </p> <p> Even if we naively still believe that a scrobble only takes up 8 bytes, it doesn't follow automatically that 100 scrobbles take up 800 bytes. It depends on how many repeats there are. Recall how we may model a scrobble: </p> <p> <pre><span style="color:blue;">public</span>&nbsp;<span style="color:blue;">sealed</span>&nbsp;<span style="color:blue;">record</span>&nbsp;<span style="color:#2b91af;">Scrobble</span>(Song&nbsp;<span style="font-weight:bold;color:#1f377f;">Song</span>,&nbsp;<span style="color:blue;">int</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">ScrobbleCount</span>);</pre> </p> <p> If a user listens to the same song ten times, we don't have to create ten <code>Scrobble</code> objects; we can create one and set the <code>ScrobbleCount</code> to <code>10</code>. </p> <p> The memory requirement to store users' scrobbles depend on the average listening pattern. Even with millions of users, we may be able to store scrobbles in memory if users listen to relatively few songs. On the other hand, if they only listen to each song once, it's probably not going to fit in memory. </p> <p> Still, we're dangerously close to the edge of what we can fit in memory. Shouldn't I just declare bankruptcy on that idea and move on? </p> <p> The purpose of this <a href="/2025/04/07/alternative-ways-to-design-with-functional-programming">overall article series</a> is to demonstrate <em>alternatives</em> to the <a href="/2020/03/02/impureim-sandwich">Impureim Sandwich</a> pattern, so I'm ultimately going to do exactly that: Move on. </p> <p> But not yet. </p> <h3 id="a5fce68a31c7414386c451ea2fe219be"> Sharding <a href="#a5fce68a31c7414386c451ea2fe219be">#</a> </h3> <p> Some applications are truly global in nature, and when that's the case, keeping everything in memory may not be 'web scale'. </p> <p> Still, I've seen more than one international company treat geographic areas as separate entities. This may be for legal reasons, or other business concerns that are unrelated to technology constraints. </p> <p> As a programmer, you may think that a song recommendations service ought to be truly global. After all, more data produces more accurate results, right? </p> <p> Your business owners may not think so. They may be concerned that regional music tastes may 'bleed over' market boundaries, and that this could ultimately scare customers away. </p> <p> Even if you can technically prove that this isn't a relevant concern, because you can write an algorithm that takes this into account, you may get a direct order that, say, Southeast Asian scrobbles may not be used in North America, or vice verse. </p> <p> It's worth investigating whether such business or legal constraints are already in place, because if they are, this may mean that you can shard the data, and that each shard still fits in memory. </p> <p> You may still think that I'm trying to salvage a bad idea, but that's not my agenda. I discuss these topics because in my experience, many programmers don't consider them. Understanding the wider context of a problem may suggest solutions that you might otherwise dismiss. </p> <p> But what if the business constraints change in the future? Shouldn't we be ready for that? </p> <p> Yes and no. You should consider how such changes would impact the architecture. Then you discuss the advantages and disadvantages with other stakeholders. </p> <p> Keep in mind that the reason to consider an Impureim Sandwich is because it's <em>simple</em> and easy to implement and maintain. Other alternatives may be more 'scalable', but also riskier to implement. You should involve other stakeholders in such decisions. </p> <h3 id="2dee0af02fbb4ac9b9176685ecb2cc3a"> Song representations <a href="#2dee0af02fbb4ac9b9176685ecb2cc3a">#</a> </h3> <p> The first of the above charts draws graphs for four data series: </p> <ul> <li>struct record</li> <li>record</li> <li>bitmasked struct</li> <li>bitmasked class</li> </ul> <p> These measure four different ways to model data; here more specifically a song. </p> <p> My initial model of a song was a <code>record</code>: </p> <p> <pre><span style="color:blue;">public</span>&nbsp;<span style="color:blue;">sealed</span>&nbsp;<span style="color:blue;">record</span>&nbsp;<span style="color:#2b91af;">Song</span>(<span style="color:blue;">int</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">Id</span>,&nbsp;<span style="color:blue;">bool</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">IsVerifiedArtist</span>,&nbsp;<span style="color:blue;">byte</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">Rating</span>);</pre> </p> <p> Then I thought that perhaps, since the type only contains <a href="https://learn.microsoft.com/dotnet/csharp/language-reference/builtin-types/value-types">value types</a>, it might be better to turn the above <code>record</code> into a <code>record struct</code>: </p> <p> <pre><span style="color:blue;">public</span>&nbsp;<span style="color:blue;">record</span>&nbsp;<span style="color:blue;">struct</span>&nbsp;<span style="color:#2b91af;">Song</span>(<span style="color:blue;">int</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">Id</span>,&nbsp;<span style="color:blue;">bool</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">IsVerifiedArtist</span>,&nbsp;<span style="color:blue;">byte</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">Rating</span>);</pre> </p> <p> It turns out that it makes no visible difference. In the chart, the two data series are so close to each other that you can't distinguish them. </p> <p> Then I thought that instead of an <code>int</code>, a <code>bool</code>, and a <code>byte</code>, I could use a single bitmask to model all three values. </p> <p> After all, I was only guessing when it came to data types anyway. It's likely that <code>Rating</code> is only a five-point or ten-point scale, but I still used a <code>byte</code> to model it. This suggests that I'm not using 96% of the data type's range. Perhaps I could use one of those redundant bits for <code>IsVerifiedArtist</code>, instead of an entire <code>bool</code>. </p> <p> Taking this further, modelling the <code>Id</code> as an <code>int</code> suggests that you may have 4,294,967,295 unique songs. That's 4.3 <em>billion</em> songs - at least an order of magnitude more than those 100 million songs that we hear about. In reality though, most systems that use <code>int</code> for IDs only do so because <code>int</code> is CLS-compliant, and <a href="https://learn.microsoft.com/dotnet/api/system.uint32">uint is not</a>. In other words, most systems that use <code>int</code> for IDs most likely only use the positive half, which means there are 16 bytes to use for other purposes. Enter the <em>bitmasked song:</em> </p> <p> <pre><span style="color:blue;">public</span>&nbsp;<span style="color:blue;">readonly</span>&nbsp;<span style="color:blue;">struct</span>&nbsp;<span style="color:#2b91af;">Song</span> { &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">private</span>&nbsp;<span style="color:blue;">const</span>&nbsp;<span style="color:blue;">uint</span>&nbsp;idMask&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;0b0000_0111_1111_1111_1111_1111_1111_1111; &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">private</span>&nbsp;<span style="color:blue;">const</span>&nbsp;<span style="color:blue;">uint</span>&nbsp;isVerifiedArtistMask&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;0b1000_0000_0000_0000_0000_0000_0000_0000; &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">private</span>&nbsp;<span style="color:blue;">const</span>&nbsp;<span style="color:blue;">uint</span>&nbsp;ratingMask&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;0b0111_1000_0000_0000_0000_0000_0000_0000; &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">private</span>&nbsp;<span style="color:blue;">readonly</span>&nbsp;<span style="color:blue;">uint</span>&nbsp;bits; &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">public</span>&nbsp;<span style="color:#2b91af;">Song</span>(<span style="color:blue;">int</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">id</span>,&nbsp;<span style="color:blue;">bool</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">isVerifiedArtist</span>,&nbsp;<span style="color:blue;">byte</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">rating</span>) &nbsp;&nbsp;&nbsp;&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">idBits</span>&nbsp;=&nbsp;(<span style="color:blue;">uint</span>)id&nbsp;&amp;&nbsp;idMask; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">isVerifiedArtistBits</span>&nbsp;=&nbsp;isVerifiedArtist&nbsp;?&nbsp;isVerifiedArtistMask&nbsp;:&nbsp;0u; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;<span style="font-weight:bold;color:#1f377f;">ratingBits</span>&nbsp;=&nbsp;((<span style="color:blue;">uint</span>)rating&nbsp;&lt;&lt;&nbsp;27)&nbsp;&amp;&nbsp;ratingMask; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;bits&nbsp;=&nbsp;idBits&nbsp;|&nbsp;isVerifiedArtistBits&nbsp;|&nbsp;ratingBits; &nbsp;&nbsp;&nbsp;&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">public</span>&nbsp;<span style="color:blue;">int</span>&nbsp;Id&nbsp;=&gt;&nbsp;(<span style="color:blue;">int</span>)(bits&nbsp;&amp;&nbsp;idMask); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">public</span>&nbsp;<span style="color:blue;">bool</span>&nbsp;IsVerifiedArtist&nbsp;=&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(bits&nbsp;&amp;&nbsp;isVerifiedArtistMask)&nbsp;==&nbsp;isVerifiedArtistMask; &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">public</span>&nbsp;<span style="color:blue;">byte</span>&nbsp;Rating&nbsp;=&gt;&nbsp;(<span style="color:blue;">byte</span>)((bits&nbsp;&amp;&nbsp;ratingMask)&nbsp;&gt;&gt;&nbsp;27); }</pre> </p> <p> In this representation, I've set aside the lower 27 bits for the ID, enabling IDs to range between 0 and 134,217,727. The top bit is used for <code>IsVerifiedArtist</code>, and the remaining four bits for <code>Rating</code>. </p> <p> This data structure only holds a single <code>uint</code>, and since I made it a <code>struct</code>, I thought it'd have minimal overhead. </p> <p> As you can see in the above chart, that's not the case. When I run the experiment, this representation requires <em>more</em> memory. </p> <p> Just to make sure I wasn't missing anything obvious, I tried making the bitmasked <code>Song</code> a <code>class</code> instead. No visible difference. </p> <p> If you're wondering why the bitmasked data series only go to 500,000, it's because this change made the experiments glacial. It took somewhere between 12 and 24 hours to run the experiment with a <code>size</code> of 500,000. </p> <p> For what it's worth, I don't think the slowdown is directly related to the data representation, but rather to the change I had to make to the <a href="https://fscheck.github.io/FsCheck/">FsCheck</a>-based data generator: </p> <p> <pre><span style="color:blue;">let</span>&nbsp;songParams&nbsp;=&nbsp;gen&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;maxId&nbsp;=&nbsp;0b0111_1111_1111_1111_1111_1111_1111 &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let!</span>&nbsp;songId&nbsp;=&nbsp;Gen.choose&nbsp;(1,&nbsp;maxId) &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let!</span>&nbsp;isVerified&nbsp;=&nbsp;ArbMap.generate&nbsp;ArbMap.defaults &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let!</span>&nbsp;rating&nbsp;=&nbsp;Gen.choose&nbsp;(0,&nbsp;10)&nbsp;|&gt;&nbsp;Gen.map&nbsp;byte &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">return</span>&nbsp;songId,&nbsp;isVerified,&nbsp;rating&nbsp;} [&lt;CompiledName&nbsp;<span style="color:#a31515;">&quot;Song&quot;</span>&gt;] <span style="color:blue;">let</span>&nbsp;song&nbsp;=&nbsp;gen&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let!</span>&nbsp;(id,&nbsp;isVerifiedArtist,&nbsp;rating)&nbsp;=&nbsp;songParams &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">return</span>&nbsp;Song&nbsp;(id,&nbsp;isVerifiedArtist,&nbsp;rating)&nbsp;}</pre> </p> <p> I can't explain why the bitmasked representation requires more memory, but I'm increasingly having a nagging feeling that I've made a mistake somewhere. If you can spot a mistake, please let me know by leaving a comment. </p> <h3 id="90e12771e0214d239cc2bc1c6a540c0a"> Other data representations <a href="#90e12771e0214d239cc2bc1c6a540c0a">#</a> </h3> <p> I also considered whether it'd make sense to represent the entire data set as a huge matrix. One could, for example, let rows represent users, and columns songs, and let each element represent the number of times a user has listened to a particular song: </p> <table> <tr> <td>User</td> <td>Song 1</td> <td>Song 2</td> <td>Song 3</td> <td>...</td> </tr> <tr> <td>123</td> <td>0</td> <td>0</td> <td>4</td> <td>...</td> </tr> <tr> <td>456</td> <td>2</td> <td>0</td> <td>4</td> <td>...</td> </tr> <tr> <td colspan="5">...</td> </tr> </table> <p> Let's say that you may expect some users to listen to a song more than 255 times, but probably not more than 65,535 times. Thus, you could store each play count as a <a href="https://learn.microsoft.com/dotnet/api/system.uint16">ushort</a>. Still, you would need <em>users x songs</em> values, so if you have 100 million songs and 10 million users, that implies 2 PB of memory. That doesn't sound useful. </p> <p> On the other hand, most of those elements are going to be <em>0</em>, so perhaps one could use an <a href="https://en.wikipedia.org/wiki/Adjacency_list">adjacency list</a> instead. That is, however, essentially what an <code>IReadOnlyDictionary&lt;<span style="color:blue;">string</span>,&nbsp;IReadOnlyCollection&lt;Scrobble&gt;&gt;</code> is, so we're now back where we started. </p> <h3 id="80eaf0d1dd9848cb875f74b94cb64054"> Conclusion <a href="#80eaf0d1dd9848cb875f74b94cb64054">#</a> </h3> <p> This article reports on some measurements I've made of memory requirements, assuming that we keep all scrobble data in memory. While I suspect that I've made a mistake, it still seems reasonable to conclude that the <em>song recommendations</em> scenario is on the edge of what's possible with the Impureim Sandwich pattern. </p> <p> That's okay. I'm interested in understanding the limitations of solutions. </p> <p> I do think, however, that it's worth taking note of just how massive amounts of data are required before the Impureim Sandwich pattern becomes untenable. </p> <p> When I describe the pattern, the most common reaction is that it doesn't scale. And, as this article has explored, it's true that it doesn't scale. But it scales much, much better than you think. You may have billions of entities in your system, and they may still fit in a few gigabytes. Don't dismiss the Impureim Sandwich before you've made a real effort to understand the memory implications of it. Your intuition is likely to work against you. </p> <p> I'll round off this part of the article series by showing how the Impureim Sandwich looks in other, more functional languages. </p> <p> <strong>Next:</strong> <a href="/2025/05/19/song-recommendations-as-an-f-impureim-sandwich">Song recommendations as an F# Impureim Sandwich</a>. </p> </div><hr> This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>. Mark Seemann https://blog.ploeh.dk/2025/05/12/song-recommendations-proof-of-concept-memory-measurements Song recommendations as a C# Impureim Sandwich https://blog.ploeh.dk/2025/05/05/song-recommendations-as-a-c-impureim-sandwich/ Mon, 05 May 2025 06:23:00 UTC <div id="post"> <p> <em>A refactoring example.</em> </p> <p> This is an article in a <a href="/2025/04/07/alternative-ways-to-design-with-functional-programming">larger series about functional programming design alternatives</a>. I'm assuming that you've read the previous articles, but briefly told, I'm considering an example presented by <a href="https://tyrrrz.me/">Oleksii Holub</a> in the article <a href="https://tyrrrz.me/blog/pure-impure-segregation-principle">Pure-Impure Segregation Principle</a>. The example gathers song recommendations for a user in <a href="https://twitter.com/Tyrrrz/status/1493369905869213700">a long-running process</a>. </p> <p> In <a href="/2025/04/28/song-recommendations-as-an-impureim-sandwich">the previous article</a> I argued that while the memory requirements for this problem seem so vast that an <a href="/2020/03/02/impureim-sandwich">Impureim Sandwich</a> appears out of the question, it's nonetheless worthwhile to at least try it out. The refactoring isn't that difficult, and it turns out that it does simplify the code. </p> <h3 id="1fa2f097e260439fbc0bafb67a35a394"> Enumeration API <a href="#1fa2f097e260439fbc0bafb67a35a394">#</a> </h3> <p> The data access API is a web service: </p> <blockquote> <p> "I don't own the database, those are requests to an external API service (think Spotify API) that provides the data." </p> <footer><cite><a href="https://twitter.com/Tyrrrz/status/1495465374179119105">Tweet</a></cite>, Oleksii Holub, 2022</footer> </blockquote> <p> In order to read all data, we'll have to assume that there's a way to enumerate all songs and all users. With that assumption, I add the <code>GetAllSongs</code> and <code>GetAllUsers</code> methods to the <code>SongService</code> interface: </p> <p> <pre><span style="color:blue;">public</span>&nbsp;<span style="color:blue;">interface</span>&nbsp;<span style="color:#2b91af;">SongService</span> { &nbsp;&nbsp;&nbsp;&nbsp;Task&lt;IEnumerable&lt;Song&gt;&gt;&nbsp;GetAllSongs(); &nbsp;&nbsp;&nbsp;&nbsp;Task&lt;IEnumerable&lt;User&gt;&gt;&nbsp;GetAllUsers(); &nbsp;&nbsp;&nbsp;&nbsp;Task&lt;IReadOnlyCollection&lt;User&gt;&gt;&nbsp;GetTopListenersAsync(<span style="color:blue;">int</span>&nbsp;songId); &nbsp;&nbsp;&nbsp;&nbsp;Task&lt;IReadOnlyCollection&lt;Scrobble&gt;&gt;&nbsp;GetTopScrobblesAsync(<span style="color:blue;">string</span>&nbsp;userName); }</pre> </p> <p> It is, of course, a crucial assumption, and it's possible that no such API exists. On the other hand, a REST API could expose such functionality as a paged feed. Leafing through potentially hundreds (or thousands) such pages is bound to take some time, so it's good to know that this is a background process. As I briefly mentioned in <a href="/2025/04/28/song-recommendations-as-an-impureim-sandwich">the previous article</a>, we could imagine that we have a dedicated indexing server for this kind purpose. While we may rightly expect the initial data load to take some time (hours, even), once it's in memory, we should be able to reuse it to calculate song recommendations for all users, instead of just one user. </p> <p> In the previous article I estimated that it should be possible to keep all songs in memory with less than a gigabyte. Users, without scrobbles, also take up surprisingly little space, to the degree that a million users fit in a few dozen megabytes. Even if, eventually, we may be concerned about memory, we don't have to be concerned about this part. </p> <p> In any case, the addition of these two new methods doesn't break the existing example code, although I did have to implement the method in the <code>FakeSongService</code> class that I introduced in the article <a href="/2025/04/10/characterising-song-recommendations">Characterising song recommendations</a>: </p> <p> <pre><span style="color:blue;">public</span>&nbsp;Task&lt;IEnumerable&lt;Song&gt;&gt;&nbsp;GetAllSongs() { &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">return</span>&nbsp;Task.FromResult&lt;IEnumerable&lt;Song&gt;&gt;(songs.Values); } <span style="color:blue;">public</span>&nbsp;Task&lt;IEnumerable&lt;User&gt;&gt;&nbsp;GetAllUsers() { &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">return</span>&nbsp;Task.FromResult(users.Select(kvp&nbsp;=&gt;&nbsp;<span style="color:blue;">new</span>&nbsp;User(kvp.Key,&nbsp;kvp.Value.Values.Sum()))); }</pre> </p> <p> With those additions, we can load all data as the first layer (<em>phase</em>, really) of the sandwich. </p> <h3 id="9e69e4d3be8f4562b0cf9369610f6ebb"> Front-loading the data <a href="#9e69e4d3be8f4562b0cf9369610f6ebb">#</a> </h3> <p> Loading all the data is the responsibility of this <code>DataCollector</code> module: </p> <p> <pre><span style="color:blue;">public</span>&nbsp;<span style="color:blue;">static</span>&nbsp;<span style="color:blue;">class</span>&nbsp;<span style="color:#2b91af;">DataCollector</span> { &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">public</span>&nbsp;<span style="color:blue;">static</span>&nbsp;<span style="color:blue;">async</span>&nbsp;Task&lt;IReadOnlyDictionary&lt;<span style="color:blue;">int</span>,&nbsp;IReadOnlyCollection&lt;User&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;CollectAllTopListeners(<span style="color:blue;">this</span>&nbsp;SongService&nbsp;songService) &nbsp;&nbsp;&nbsp;&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;dict&nbsp;=&nbsp;<span style="color:blue;">new</span>&nbsp;Dictionary&lt;<span style="color:blue;">int</span>,&nbsp;IReadOnlyCollection&lt;User&gt;&gt;(); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">foreach</span>&nbsp;(var&nbsp;song&nbsp;<span style="color:blue;">in</span>&nbsp;<span style="color:blue;">await</span>&nbsp;songService.GetAllSongs()) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;topListeners&nbsp;=&nbsp;<span style="color:blue;">await</span>&nbsp;songService.GetTopListenersAsync(song.Id); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dict.Add(song.Id,&nbsp;topListeners); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">return</span>&nbsp;dict; &nbsp;&nbsp;&nbsp;&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">public</span>&nbsp;<span style="color:blue;">static</span>&nbsp;<span style="color:blue;">async</span>&nbsp;Task&lt;IReadOnlyDictionary&lt;<span style="color:blue;">string</span>,&nbsp;IReadOnlyCollection&lt;Scrobble&gt;&gt;&gt; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;CollectAllTopScrobbles(<span style="color:blue;">this</span>&nbsp;SongService&nbsp;songService) &nbsp;&nbsp;&nbsp;&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;dict&nbsp;=&nbsp;<span style="color:blue;">new</span>&nbsp;Dictionary&lt;<span style="color:blue;">string</span>,&nbsp;IReadOnlyCollection&lt;Scrobble&gt;&gt;(); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">foreach</span>&nbsp;(var&nbsp;user&nbsp;<span style="color:blue;">in</span>&nbsp;<span style="color:blue;">await</span>&nbsp;songService.GetAllUsers()) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;topScrobbles&nbsp;=&nbsp;<span style="color:blue;">await</span>&nbsp;songService.GetTopScrobblesAsync(user.UserName); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dict.Add(user.UserName,&nbsp;topScrobbles); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">return</span>&nbsp;dict; &nbsp;&nbsp;&nbsp;&nbsp;} }</pre> </p> <p> These two methods work with any <code>SongService</code> implementation, so while the code base will work with <code>FakeSongService</code>, real 'production code' might as well use an HTTP-based implementation that pages through the implied web API. </p> <p> The dictionaries returned by the methods are likely to be huge. That's a major point of this exercise. Once the change is implemented and <a href="https://en.wikipedia.org/wiki/Characterization_test">Characterisation Tests</a> show that it still works, it makes sense to generate data to get a sense of the memory footprint. </p> <h3 id="cad69cb800a146e9ac2556c7deaac655"> Table-driven methods <a href="#cad69cb800a146e9ac2556c7deaac655">#</a> </h3> <p> Perhaps you wonder why the above <code>CollectAllTopListeners</code> and <code>CollectAllTopScrobbles</code> methods return dictionaries of exactly that shape. </p> <p> <a href="http://amzn.to/1dLYr0r">Code Complete</a> describes a programming technique called <em>table-driven methods</em>. The idea is to replace branching instructions such as <code>if</code>, <code>else</code>, and <code>switch</code> with a lookup table. The overall point, however, is that you can replace function calls with table lookups. </p> <p> Consider the <code>GetTopListenersAsync</code> method. It takes an <code>int</code> as input, and returns a <code>Task&lt;IReadOnlyCollection&lt;User&gt;&gt; </code> as output. If you ignore the <code>Task</code>, that's an <code>IReadOnlyCollection&lt;User&gt;</code>. In other words, you can exchange an <code>int</code> for an <code>IReadOnlyCollection&lt;User&gt;</code>. </p> <p> If you have an <code>IReadOnlyDictionary&lt;<span style="color:blue;">int</span>,&nbsp;IReadOnlyCollection&lt;User&gt;&gt;</code> you can <em>also</em> exchange an <code>int</code> for an <code>IReadOnlyCollection&lt;User&gt;</code>. These two APIs are functionally equivalent - although, of course, they have very different memory and run-time profiles. </p> <p> The same goes for the <code>GetTopScrobblesAsync</code> method: It takes a <code>string</code> as input and returns an <code>IReadOnlyCollection&lt;Scrobble&gt;</code> as output (if you ignore the <code>Task</code>). An <code>IReadOnlyDictionary&lt;<span style="color:blue;">string</span>,&nbsp;IReadOnlyCollection&lt;Scrobble&gt;&gt;</code> is equivalent. </p> <p> To make it practical, it turns out that we also need a little helper method to deal with the case where the dictionary has no entry for a given key: </p> <p> <pre><span style="color:blue;">internal</span>&nbsp;<span style="color:blue;">static</span>&nbsp;IReadOnlyCollection&lt;T&gt;&nbsp;GetOrEmpty&lt;<span style="color:#2b91af;">T</span>,&nbsp;<span style="color:#2b91af;">TKey</span>&gt;( &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">this</span>&nbsp;IReadOnlyDictionary&lt;TKey,&nbsp;IReadOnlyCollection&lt;T&gt;&gt;&nbsp;dict, &nbsp;&nbsp;&nbsp;&nbsp;TKey&nbsp;key) { &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">if</span>&nbsp;(dict.TryGetValue(key,&nbsp;<span style="color:blue;">out</span>&nbsp;var&nbsp;result)) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">return</span>&nbsp;result; &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">return</span>&nbsp;Array.Empty&lt;T&gt;(); }</pre> </p> <p> If there's no entry for a key, this function instead returns an empty array. </p> <p> That should make it as easy as possible to replace calls to <code>GetTopListenersAsync</code> and <code>GetTopScrobblesAsync</code> with dictionary lookups. </p> <h3 id="43b6cc924db64320a41f0a4093c65e29"> Adding method parameters <a href="#43b6cc924db64320a41f0a4093c65e29">#</a> </h3> <p> When refactoring, it's a good idea to proceed in small, controlled steps. You can see each of my <a href="https://www.industriallogic.com/blog/whats-this-about-micro-commits/">micro-commits</a> in the Git repository's <em>refactor-to-function</em> branch. Here, I'll give an overview. </p> <p> First, I added two dictionaries as parameters to the <code>GetRecommendationsAsync</code> method. You may recall that the method used to look like this: </p> <p> <pre><span style="color:blue;">public</span>&nbsp;<span style="color:blue;">async</span>&nbsp;Task&lt;IReadOnlyList&lt;Song&gt;&gt;&nbsp;GetRecommendationsAsync(<span style="color:blue;">string</span>&nbsp;userName)</pre> </p> <p> After I added the two dictionaries, it looks like this: </p> <p> <pre><span style="color:blue;">public</span>&nbsp;<span style="color:blue;">async</span>&nbsp;Task&lt;IReadOnlyList&lt;Song&gt;&gt;&nbsp;GetRecommendationsAsync( &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">string</span>&nbsp;userName, &nbsp;&nbsp;&nbsp;&nbsp;IReadOnlyDictionary&lt;<span style="color:blue;">string</span>,&nbsp;IReadOnlyCollection&lt;Scrobble&gt;&gt;&nbsp;topScrobbles, &nbsp;&nbsp;&nbsp;&nbsp;IReadOnlyDictionary&lt;<span style="color:blue;">int</span>,&nbsp;IReadOnlyCollection&lt;User&gt;&gt;&nbsp;topListeners)</pre> </p> <p> At this point, the <code>GetRecommendationsAsync</code> method uses neither the <code>topScrobbles</code> nor the <code>topListeners</code> parameter. Still, I consider this a <a href="https://stackoverflow.blog/2022/12/19/use-git-tactically/">distinct checkpoint that I commit to Git</a>. As I've outlined in my book <a href="/2021/06/14/new-book-code-that-fits-in-your-head">Code That Fits in Your Head</a>, it's safest to either refactor production code while keeping test code untouched, or refactor test code without editing the production code. An API change like the current is an example of a situation where that separation is impossible. This is the reason I want to keep it small and isolated. While the change does touch both production code and test code, I'm not editing the behaviour of the System Under Test. </p> <p> Tests now look like this: </p> <p> <pre>[&lt;Property&gt;] <span style="color:blue;">let</span>&nbsp;``No&nbsp;data``&nbsp;()&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;Gen.userName&nbsp;|&gt;&nbsp;Arb.fromGen&nbsp;|&gt;&nbsp;Prop.forAll&nbsp;&lt;|&nbsp;<span style="color:blue;">fun</span>&nbsp;userName&nbsp;<span style="color:blue;">-&gt;</span> &nbsp;&nbsp;&nbsp;&nbsp;task&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;srvc&nbsp;=&nbsp;FakeSongService&nbsp;() &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;sut&nbsp;=&nbsp;RecommendationsProvider&nbsp;srvc &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let!</span>&nbsp;topScrobbles&nbsp;=&nbsp;DataCollector.CollectAllTopScrobbles&nbsp;srvc &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let!</span>&nbsp;topListeners&nbsp;=&nbsp;DataCollector.CollectAllTopListeners&nbsp;srvc &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let!</span>&nbsp;actual&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;sut.GetRecommendationsAsync&nbsp;(userName,&nbsp;topScrobbles,&nbsp;topListeners) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Assert.Empty&nbsp;actual&nbsp;}&nbsp;:&gt;&nbsp;Task</pre> </p> <p> The test now uses the <code>DataCollector</code> to front-load data into dictionaries that it then passes to <code>GetRecommendationsAsync</code>. Keep in mind that <code>GetRecommendationsAsync</code> doesn't yet use that data, but it's available to it once I make a change to that effect. </p> <p> You may wish to compare this version of the <code>No data</code> test with the previous version shown in the article <a href="/2025/04/10/characterising-song-recommendations">Characterising song recommendations</a>. </p> <h3 id="02c3e9706d044528ad1d5acec6865385"> Refactoring to function <a href="#02c3e9706d044528ad1d5acec6865385">#</a> </h3> <p> The code is now ready for refactoring <a href="/2017/01/27/from-dependency-injection-to-dependency-rejection">from dependency injection to dependency rejection</a>. It's even possible to do it one method call at a time, because the data in the <code>FakeSongService</code> is the same as the data in the two dictionaries. It's just two different representations of the same data. </p> <p> Since, as described above, the dictionaries are equivalent to the <code>SongService</code> queries, each is easily replaced with the other. The first impure action in <code>GetRecommendationsAsync</code>, for example, is this one: </p> <p> <pre><span style="color:blue;">var</span>&nbsp;scrobbles&nbsp;=&nbsp;<span style="color:blue;">await</span>&nbsp;_songService.GetTopScrobblesAsync(userName);</pre> </p> <p> The equivalent dictionary lookup enables us to change that line of code to this: </p> <p> <pre><span style="color:blue;">var</span>&nbsp;scrobbles&nbsp;=&nbsp;topScrobbles.GetOrEmpty(userName);</pre> </p> <p> Notice that the dictionary lookup is a <a href="https://en.wikipedia.org/wiki/Pure_function">pure function</a> that the method need not <code>await</code>. </p> <p> Even though the rest of <code>GetRecommendationsAsync</code> still queries the injected <code>SongService</code>, all tests pass, and I can commit this small, isolated change to Git. </p> <p> Proceeding in a similar fashion enables us to eliminate the <code>SongService</code> queries one by one. There are only three method calls, so this can be done in three controlled steps. Once the last impure query has been replaced, the C# compiler complains about the <code>async</code> keyword in the declaration of the <code>GetRecommendationsAsync</code> method. </p> <p> Not only is the <code>async</code> keyword no longer required, the method is no longer asynchronous. There's no reason to return a <code>Task</code>, and the <code>Async</code> method name suffix is also misleading. </p> <p> <pre><span style="color:blue;">public</span>&nbsp;IReadOnlyList&lt;Song&gt;&nbsp;GetRecommendations( &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">string</span>&nbsp;userName, &nbsp;&nbsp;&nbsp;&nbsp;IReadOnlyDictionary&lt;<span style="color:blue;">string</span>,&nbsp;IReadOnlyCollection&lt;Scrobble&gt;&gt;&nbsp;topScrobbles, &nbsp;&nbsp;&nbsp;&nbsp;IReadOnlyDictionary&lt;<span style="color:blue;">int</span>,&nbsp;IReadOnlyCollection&lt;User&gt;&gt;&nbsp;topListeners)</pre> </p> <p> The <code>GetRecommendations</code> method no longer uses the injected <code>SongService</code>, and since it's is the only method of the <code>RecommendationsProvider</code> class, we can now (r)eject the dependency. </p> <p> This furthermore means that the class no longer has any class fields; we might as well make it (and the <code>GetRecommendations</code> function) <code>static</code>. Here's the final function in its entirety: </p> <p> <pre><span style="color:blue;">public</span>&nbsp;<span style="color:blue;">static</span>&nbsp;IReadOnlyList&lt;Song&gt;&nbsp;GetRecommendations( &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">string</span>&nbsp;userName, &nbsp;&nbsp;&nbsp;&nbsp;IReadOnlyDictionary&lt;<span style="color:blue;">string</span>,&nbsp;IReadOnlyCollection&lt;Scrobble&gt;&gt;&nbsp;topScrobbles, &nbsp;&nbsp;&nbsp;&nbsp;IReadOnlyDictionary&lt;<span style="color:blue;">int</span>,&nbsp;IReadOnlyCollection&lt;User&gt;&gt;&nbsp;topListeners) { &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;1.&nbsp;Get&nbsp;user&#39;s&nbsp;own&nbsp;top&nbsp;scrobbles</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;2.&nbsp;Get&nbsp;other&nbsp;users&nbsp;who&nbsp;listened&nbsp;to&nbsp;the&nbsp;same&nbsp;songs</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;3.&nbsp;Get&nbsp;top&nbsp;scrobbles&nbsp;of&nbsp;those&nbsp;users</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:green;">//&nbsp;4.&nbsp;Aggregate&nbsp;the&nbsp;songs&nbsp;into&nbsp;recommendations</span> &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;scrobbles&nbsp;=&nbsp;topScrobbles.GetOrEmpty(userName); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;scrobblesSnapshot&nbsp;=&nbsp;scrobbles &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.OrderByDescending(s&nbsp;=&gt;&nbsp;s.ScrobbleCount) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.Take(100) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.ToArray(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;recommendationCandidates&nbsp;=&nbsp;<span style="color:blue;">new</span>&nbsp;List&lt;Song&gt;(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">foreach</span>&nbsp;(var&nbsp;scrobble&nbsp;<span style="color:blue;">in</span>&nbsp;scrobblesSnapshot) &nbsp;&nbsp;&nbsp;&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;otherListeners&nbsp;=&nbsp;topListeners.GetOrEmpty(scrobble.Song.Id); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;otherListenersSnapshot&nbsp;=&nbsp;otherListeners &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.Where(u&nbsp;=&gt;&nbsp;u.TotalScrobbleCount&nbsp;&gt;=&nbsp;10_000) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.OrderByDescending(u&nbsp;=&gt;&nbsp;u.TotalScrobbleCount) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.Take(20) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.ToArray(); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">foreach</span>&nbsp;(var&nbsp;otherListener&nbsp;<span style="color:blue;">in</span>&nbsp;otherListenersSnapshot) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;otherScrobbles&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;topScrobbles.GetOrEmpty(otherListener.UserName); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;otherScrobblesSnapshot&nbsp;=&nbsp;otherScrobbles &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.Where(s&nbsp;=&gt;&nbsp;s.Song.IsVerifiedArtist) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.OrderByDescending(s&nbsp;=&gt;&nbsp;s.Song.Rating) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.Take(10) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.ToArray(); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;recommendationCandidates.AddRange( &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;otherScrobblesSnapshot.Select(s&nbsp;=&gt;&nbsp;s.Song) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;); &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;} &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">var</span>&nbsp;recommendations&nbsp;=&nbsp;recommendationCandidates &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.OrderByDescending(s&nbsp;=&gt;&nbsp;s.Rating) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.Take(200) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.ToArray(); &nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">return</span>&nbsp;recommendations; }</pre> </p> <p> The overall structure is similar to <a href="https://tyrrrz.me/blog/pure-impure-segregation-principle">the original version</a>. Now that the code is simpler (because there's no longer any asynchrony) you could keep refactoring. With this C# code example, I'm going to stop here, but when I port it to <a href="https://fsharp.org/">F#</a> I'm going to refactor more aggressively. </p> <h3 id="b421654127ca4580995e008d4c01e3ab"> Sandwich <a href="#b421654127ca4580995e008d4c01e3ab">#</a> </h3> <p> One point of the whole exercise is to demonstrate how to refactor to an Impureim Sandwich. The <code>GetRecommendations</code> method shown above constitutes the pure filling of the sandwich, but what does the entire sandwich look like? </p> <p> In this code base, the sandwiches only exist as unit tests, the simplest of which is still the <code>No data</code> test: </p> <p> <pre>[&lt;Property&gt;] <span style="color:blue;">let</span>&nbsp;``No&nbsp;data``&nbsp;()&nbsp;= &nbsp;&nbsp;&nbsp;&nbsp;Gen.userName&nbsp;|&gt;&nbsp;Arb.fromGen&nbsp;|&gt;&nbsp;Prop.forAll&nbsp;&lt;|&nbsp;<span style="color:blue;">fun</span>&nbsp;user&nbsp;<span style="color:blue;">-&gt;</span> &nbsp;&nbsp;&nbsp;&nbsp;task&nbsp;{ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;srvc&nbsp;=&nbsp;FakeSongService&nbsp;() <span style="background-color: lightsalmon;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let!</span>&nbsp;topScrobbles&nbsp;=&nbsp;DataCollector.CollectAllTopScrobbles&nbsp;srvc &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let!</span>&nbsp;topListeners&nbsp;=&nbsp;DataCollector.CollectAllTopListeners&nbsp;srvc</span> <span style="background-color: palegreen;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="color:blue;">let</span>&nbsp;actual&nbsp;=&nbsp;RecommendationsProvider.GetRecommendations&nbsp;(user,&nbsp;topScrobbles,&nbsp;topListeners)</span> <span style="background-color: lightsalmon;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Assert.Empty&nbsp;actual</span>&nbsp;}&nbsp;:&gt;&nbsp;Task</pre> </p> <p> In the above code snippet, I've coloured in the relevant part of the test. I admit that it's a stretch to colour the last line red, since <code>Assert.Empty</code> is, at least, deterministic. One could argue that since it throws an exception on failure, it's not strictly free of side effects, but that's really a weak argument. It would be easy to refactor <a href="/2022/11/07/applicative-assertions">assertions to pure functions</a>. </p> <p> Instead, you may consider the bottom layer of the sandwich as a placeholder where something impure might happen. The background service that updates the song recommendations may, for example, save the result as a (CQRS-style) materialised view. </p> <p> The above test snippet, then, is more of a sketch of how the Impureim Sandwich may look: First, front-load data using the <code>DataCollector</code> methods; second, call <code>GetRecommendations</code>; third, do something with the result. </p> <h3 id="26a74728792d494e814b8d86f2ad8531"> Conclusion <a href="#26a74728792d494e814b8d86f2ad8531">#</a> </h3> <p> The changes demonstrated in this article serve two purposes. One is to show how to refactor an impure action to a pure function, pursuing the notion of an Impureim Sandwich. The second is to evaluate a proof-of-concept: If we do, indeed, front-load all of the data, is it realistic that all <a href="https://yourdatafitsinram.net/">data fits in RAM</a>? </p> <p> We have yet to address that question, but since the present article is already substantial, I'll address that in a separate article. </p> <strong>Next:</strong> <a href="/2025/05/12/song-recommendations-proof-of-concept-memory-measurements">Song recommendations proof-of-concept memory measurements</a>. </div><hr> This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>. Mark Seemann https://blog.ploeh.dk/2025/05/05/song-recommendations-as-a-c-impureim-sandwich Song recommendations as an Impureim Sandwich https://blog.ploeh.dk/2025/04/28/song-recommendations-as-an-impureim-sandwich/ Mon, 28 Apr 2025 07:16:00 UTC <div id="post"> <p> <em>Does your data fit in RAM?</em> </p> <p> This article is part of <a href="/2025/04/07/alternative-ways-to-design-with-functional-programming">a series on functional programming design alternatives</a>. In a <a href="/2025/04/10/characterising-song-recommendations">previous article</a> you saw how to add enough <a href="https://en.wikipedia.org/wiki/Characterization_test">Characterisation Tests</a> to capture the intended behaviour of the example song recommendations system originally presented by <a href="https://tyrrrz.me/">Oleksii Holub</a> in the article <a href="https://tyrrrz.me/blog/pure-impure-segregation-principle">Pure-Impure Segregation Principle</a>. </p> <h3 id="572693e6fb56455a958ca0d62aa13319"> Problem statement <a href="#572693e6fb56455a958ca0d62aa13319">#</a> </h3> <p> After showing how one problem can be refactored to <a href="https://en.wikipedia.org/wiki/Pure_function">pure functions</a>, Oleksii Holub writes: </p> <blockquote> <p> "Although very useful, the type of "lossless" refactoring shown earlier only works if the data required by the function can be easily encapsulated within its input parameters. Unfortunately, this is not always the case. </p> <p> "Often a function may need to dynamically resolve data from an external API or a database, with no way of knowing about it beforehand. This typically results in an implementation where pure and impure concerns are interleaved with each other, creating a tightly coupled cohesive structure." </p> <footer><cite>Oleksii Holub, <a href="https://tyrrrz.me/blog/pure-impure-segregation-principle">Pure-Impure Segregation Principle</a></cite></footer> </blockquote> <p> The article then proceeds to present the <em>song recommendations</em> example. It's a single C# method that queries a data store or third-party service to recommend songs. I'm imagining that it queries a third-party web service that contains usages data for a system like <a href="https://en.wikipedia.org/wiki/Spotify">Spotify</a>. </p> <blockquote> <p> "The above algorithm works by retrieving the user's most listened songs, finding other people who have also listened to the same titles, and then extracting their top songs as well. Those songs are then aggregated into a list of recommendations and returned to the caller. </p> <p> "It's quite clear that this function would benefit greatly from being pure, seeing how much business logic is encapsulated within it. Unfortunately, the technique we relied upon earlier won't work here. </p> <p> "In order to fully isolate <code>GetRecommendationsAsync(...)</code> from its impure dependencies, we would have to somehow supply the function with an entire list of songs, users, and their scrobbles upfront. If we assume that we're dealing with data on millions of users, it's obvious that this would be completely impractical and likely even impossible." </p> <footer><cite>Oleksii Holub, <a href="https://tyrrrz.me/blog/pure-impure-segregation-principle">Pure-Impure Segregation Principle</a></cite></footer> </blockquote> <p> It does, indeed, sound impractical. </p> <h3 id="ca5fdd1711554b3eb1ae7d516c003959"> Data sizes <a href="#ca5fdd1711554b3eb1ae7d516c003959">#</a> </h3> <p> Can you, however, trust your intuition? Research suggests that the human brain is ill-equipped to think about randomness and probabilities, and I've observed something similar when it comes to data sizes. </p> <p> <img src="/content/binary/dr-evil-one-million.jpg" alt="Dr. Evil: One million."> </p> <p> In the real world, a million of anything countable is an almost incomprehensible amount, so it's no wonder if our intuition fails us. <em>A million records</em> sounds like a lot, but if it's only a few integers, is it really that bad? </p> <p> Many systems use 32-bit integers for various IDs. A million IDs, then, is 32 million bits, or approximately 4 MB. As I'm writing this, the smallest Azure instance (<em>Free F1</em>) has 1 GB of memory, and while the OS takes a big bite out of that, 4 MB is nothing. </p> <p> The <em>song recommendations</em> problem implies larger memory pressure. It may not fit on every machine, but it's worth considering if, after all, it doesn't fit in RAM. </p> <h3 id="0579cae8c684409f8a43eaf2baa1d5b0"> My real-life experience with developing streaming services <a href="#0579cae8c684409f8a43eaf2baa1d5b0">#</a> </h3> <p> It just so happens that I have professional experience developing REST APIs for a white-label audio streaming service. Back in the early 2010s I helped design and implement the company's online music catalogue, user system, and a few other services. The catalogue is particularly interesting in this regard, since it only changed nightly, and we were planning on relying on HTTP for caching. </p> <p> I vividly recall a meeting we had with the IT operations specialist responsible for the new catalogue service. We explained that we'd set HTTP cache timeouts to 6 hours, and asked if he'd be able to set up a <a href="https://en.wikipedia.org/wiki/Reverse_proxy">reverse proxy</a> so that we didn't have to implement caching in our code base. </p> <p> He asked how much cache space we needed. </p> <p> We knew the size of a typical HTTP response, and the number of tracks, artists, and albums in the system, so after a back-of-the-envelope calculation, we told him: 18 GB. </p> <p> He just shrugged and said <em>"OK"</em>. </p> <p> In 2012 I though that 18 GB was a fair amount of data (I actually still think that). Even so, the operations team had plenty of servers with that capacity. </p> <p> Later, I did more work for that company, but most of it is less relevant to the <em>song recommendations</em> example. What does turn out to be relevant to the topic is something I learned the first few days of my engagement. </p> <p> Early on, before I was involved, the company needed a recommendation system, but hadn't been able to find any off-the-shelf component. This was in the early 2000s and before <a href="https://en.wikipedia.org/wiki/Apache_Solr">Solr</a>, but after <a href="https://en.wikipedia.org/wiki/Apache_Lucene">Lucene</a>. I'm not aware of all the forces that affected my then future client, but in the end, they decided to write their own search and recommendations engine. </p> <p> Essentially, during the night a beefy server would read all relevant data from the database, crunch it, create data structures, and keep all data in memory. Like the reverse proxy, it required a server with more RAM than a normal <a href="https://en.wikipedia.org/wiki/Pizza_box_form_factor">pizza box</a>, but not prohibitively so. </p> <h3 id="8fa70bcec92b4b8681af7ca819e82a22"> Costs <a href="#8fa70bcec92b4b8681af7ca819e82a22">#</a> </h3> <p> Consider the cost of hardware, compared to developer time. A few specialised servers may set your organisation back a few thousand of dollars/pounds/euros. That's an amount you can easily burn through in salary if the code is too complicated, or has too many bugs. </p> <p> You may argue that if you already have programmers on staff, they don't cost extra, but a too-complicated code base is still going to slow them down. Thus, the wrong software design could incur an <a href="https://en.wikipedia.org/wiki/Opportunity_cost">opportunity cost</a> greater than the cost of a server. </p> <p> One of many reasons I'm interested in functional programming (FP) is its potential to make code bases simpler. The <a href="/2020/03/02/impureim-sandwich">Impureim Sandwich</a> is a wonderfully simple design, so it's worth pursuing; not only for some FP ideal, but because of its simplifying potential. </p> <p> Intuition may tell us that the <em>song recommendations</em> scenario is prohibitively big, and therefore, an Impureim Sandwich is out of the question. As this overall article series explores, it's not the only alternative, but given its advantages, its worth giving it a second chance. </p> <h3 id="ac5d135ca06a472e88971ad6a8960aac"> Analysis <a href="#ac5d135ca06a472e88971ad6a8960aac">#</a> </h3> <p> The <code>GetRecommendationsAsync</code> method from <a href="https://tyrrrz.me/blog/pure-impure-segregation-principle#interleaved-impurities">the example</a> makes a lot of external calls, with its nested loops. The method uses the first call to <code>GetTopScrobblesAsync</code> to produce the <code>scrobblesSnapshot</code> variable, which is capped at 100 objects. If we assume that this method call returns at least 100 objects, the outer <code>foreach</code> loop will make 100 calls to <code>GetTopListenersAsync</code>. </p> <p> If we again assume that each of these return enough data, the inner <code>foreach</code> loop will make 20 calls to <code>GetTopScrobblesAsync</code>, for each object in the outer loop. That's 2,000 external calls, plus the 100 calls in the outer loop, plus the initial call to <code>GetTopScrobblesAsync</code>, for a total of 2,101. </p> <p> When I first saw the example, I didn't know much about the overall context. I didn't know if these impure actions were database queries or web service calls, so I asked Oleksii Holub. </p> <p> It turns out that it's all web service calls, and as I interpret the response, <code>GetRecommendationsAsync</code> is being invoked from a background maintenance process. </p> <blockquote> <p> "It takes around 10 min in total while maintaining it." </p> <footer><cite><a href="https://twitter.com/Tyrrrz/status/1493369905869213700">Tweet</a>, Oleksii Holub, 2022</cite></footer> </blockquote> <p> That's good to know, because if we're going to consider an Impureim Sandwich, it implies reading gigabytes of data in the first phase. That's going to take some time, but if this is a background process, we <em>do</em> have time. </p> <h3 id="2160fbaf60434edd9554ff585de5df2b"> Memory estimates <a href="#2160fbaf60434edd9554ff585de5df2b">#</a> </h3> <p> One thing is to load an entire song catalogue into memory. That's what required 18 GB in 2012. Another thing is to load all scrobbles; i.e. statistics about plays. Fortunately, in order to produce song recommendations, we only need IDs. Consider again the data structures from the <a href="/2025/04/10/characterising-song-recommendations">previous article</a>: </p> <p> <pre><span style="color:blue;">public</span>&nbsp;<span style="color:blue;">sealed</span>&nbsp;<span style="color:blue;">record</span>&nbsp;<span style="color:#2b91af;">Song</span>(<span style="color:blue;">int</span>&nbsp;Id,&nbsp;<span style="color:blue;">bool</span>&nbsp;IsVerifiedArtist,&nbsp;<span style="color:blue;">byte</span>&nbsp;Rating); <span style="color:blue;">public</span>&nbsp;<span style="color:blue;">sealed</span>&nbsp;<span style="color:blue;">record</span>&nbsp;<span style="color:#2b91af;">Scrobble</span>(Song&nbsp;Song,&nbsp;<span style="color:blue;">int</span>&nbsp;ScrobbleCount); <span style="color:blue;">public</span>&nbsp;<span style="color:blue;">sealed</span>&nbsp;<span style="color:blue;">record</span>&nbsp;<span style="color:#2b91af;">User</span>(<span style="color:blue;">string</span>&nbsp;UserName,&nbsp;<span style="color:blue;">int</span>&nbsp;TotalScrobbleCount);</pre> </p> <p> Apart from the <code>UserName</code> all values are small predictable values: <code>int</code>, <code>byte</code>, and <code>bool</code>, and while a <code>string</code> may be arbitrarily long, we can make a guess at the average size of a user name. In the <a href="/2025/04/10/characterising-song-recommendations">previous article</a>, I assumed that the user name would be an alphanumeric string between one and twenty characters. </p> <p> How many songs might a system contain? Some numbers thrown around for a system like Spotify suggest a number on the order of 100 million. With an <code>int</code>, a <code>bool</code>, and a <code>byte</code>, we can estimate that a song requires 6 bytes, plus some overhead. Let's guess 8 bytes. A 100 million songs would then require 800 million bytes, or around 800 MB. That eliminates the smallest cloud instances, but is in itself easily within reach for all modern computers. Your phone has more memory than that. </p> <p> How about scrobbles? While I don't use Spotify, <a href="https://www.last.fm/user/ploeh">I do scrobble plays to Last.fm</a>. At the moment I have around 114,000 scrobbles, and while I don't listen to music as much as I used to when I was younger, I have, on the other hand, been at it for a long time: Since 2007. If we assume that each user has 200,000 scrobbles, and a scrobble requires 8 bytes, that's 1,600,000 bytes, or 1.6 MB. Practically nothing. </p> <p> The size of a <code>User</code> object depends on how long the user name is, but will probably, on average, be less than 32 bytes. Compared to the user's scrobbles, we can ignore the memory pressure of the user object itself. </p> <p> As the number of users grow, it will dominate the memory requirements for the catalogue. How many users should we assume? </p> <p> A million is probably too few, but for a frame of reference, that would require 1,6 TB. This is where it starts to sound unfeasible to keep all data in RAM. Even though servers with that much RAM exist, they're so expensive (still) that the above cost consideration no longer applies. </p> <p> Still, there are some naive assumptions above. Instead of storing each scrobble in a separate <code>Scrobble</code> object, you could store repeated plays as a single object with the appropriate <code>ScrobbleCount</code> value. If you've listened to the same song 50 times, it doesn't require 400 bytes of storage, but only 8 bytes. That is, after all, orders of magnitude less. </p> <p> In the end, back-of-the-envelope calculations are fine, but measurements are better. It might be worthwhile to develop a proof of concept and measure how much memory it requires. </p> <p> In three articles, I'll explore how a <em>song recommendations</em> Impureim Sandwich looks in various constellations: </p> <ul> <li><a href="/2025/05/05/song-recommendations-as-a-c-impureim-sandwich">Song recommendations as a C# Impureim Sandwich</a></li> <li>Song recommendations as an F# Impureim Sandwich</li> <li>Song recommendations as a Haskell Impureim Sandwich</li> </ul> <p> In the end, it may turn out that for this particular system, an Impureim Sandwich truly is unfeasible. Keep in mind, though, that the purpose of this article series is to demonstrate alternative designs. The <em>song recommendations</em> problem is just a placeholder example. Perhaps you have another system where, intuitively, an Impureim Sandwich sounds impossible, but once you run the numbers, it might actually be not only possible, but desirable. </p> <h3 id="d38416a3535743159ba26abb4a78948b"> Conclusion <a href="#d38416a3535743159ba26abb4a78948b">#</a> </h3> <p> Modern computers have so vast storage capacities that intuition often fails us. We may think that billions of data points sounds like something that can't possibly fit in RAM. When you run the numbers, however, it may turn out that the required data fits on a normal server. </p> <p> If so, an Impureim Sandwich may still be an option. Load data into memory, pass it as argument to a pure function, and handle the return value. </p> <p> The <em>song recommendations</em> scenario is interesting because an Impureim Sandwich seems to be pushing the envelope. It probably <em>is</em> impractical, but still worth a proof of concept. On the other hand, if it's impractical, it's worthwhile to also explore alternatives. Later articles will do that, but first, if you're interested, the next articles look at the proofs of concept in three languages. </p> <p> <strong>Next:</strong> <a href="/2025/05/05/song-recommendations-as-a-c-impureim-sandwich">Song recommendations as a C# Impureim Sandwich</a>. </p> </div><hr> This blog is totally free, but if you like it, please consider <a href="https://blog.ploeh.dk/support">supporting it</a>. Mark Seemann https://blog.ploeh.dk/2025/04/28/song-recommendations-as-an-impureim-sandwich