ploeh blog danish software design
Song recommendations with C# free monads
Just because it's possible. Don't do this at work.
This is the last article in a series named Alternative ways to design with functional programming. In it, you've seen various suggestions on how to model a non-trivial problem with various kinds of functional-programming patterns. The previous two articles showed how to use free monads to model the problem in Haskell and F#. In this article, I'll repeat the exercise in C#. It's not going to be pretty, but it's possible.
What is the point of this exercise? Mostly to supply parity in demo code. If you're more familiar with C# than F# or Haskell, this may give you a better sense of what a free monad is, and how it works. That's all. I don't consider the following practical code, and I don't endorse it.
The code shown here is based on the free
branch of the DotNet Git repository available with this article series.
Functor #
A free monad enables you to define a monad from any functor. In this context, you start with the functor that models an instruction set for a domain-specific language (DSL). The goal is to replace the SongService
interface with this instruction set.
As a reminder, this is the interface:
public interface SongService { Task<IReadOnlyCollection<User>> GetTopListenersAsync(int songId); Task<IReadOnlyCollection<Scrobble>> GetTopScrobblesAsync(string userName); }
I'll remind the reader that the code is my attempt to reconstruct the sample code shown in Pure-Impure Segregation Principle. I don't know if SongService
is a base class or an interface. Based on the lack of the idiomatic C# I
prefix for interfaces, one may suppose that it was really intended to be a base class, but since I find interfaces easier to handle than base classes, here we are.
In any case, the goal is to replace it with an instruction set, starting with a sum type. Since C# comes with no native support for sum types, we'll need to model it with either Church encoding or a Visitor. I've been over the Visitor territory enough times already, so here I'll go with the Church option, since it's a tad simpler.
I start by creating a new class called SongInstruction<T>
with this method:
public TResult Match<TResult>( Func<(int, Func<IReadOnlyCollection<User>, T>), TResult> getTopListeners, Func<(string, Func<IReadOnlyCollection<Scrobble>, T>), TResult> getTopScrobbles) { return imp.Match(getTopListeners, getTopScrobbles); }
If you squint sufficiently hard, you may be able to see how the getTopListeners
parameter corresponds to the interface's GetTopListenersAsync
method, and likewise for the other parameter.
I'm not going to give you a very detailed walkthrough, as I've already done that earlier in a different context. Likewise, I'll skip some of the implementation details. All the code is available in the Git repository.
To be a functor, the class needs a Select
method.
public SongInstruction<TResult> Select<TResult>(Func<T, TResult> selector) { return Match( t => SongInstruction.GetTopListeners( t.Item1, // songId users => selector(t.Item2(users))), t => SongInstruction.GetTopScrobbles( t.Item1, // userName songs => selector(t.Item2(songs)))); }
The name t
is, in both cases, short for tuple. It's possible that more recent versions of C# finally allow pattern matching of tuples in lambda expressions, but the version this code is based on doesn't. In both cases Item2
is the 'continuation' function.
Monad #
The next step is to wrap the functor in a data structure that enables you to sequence the above instructions. That has to be another sum type, this time called SongProgram<T>
, characterized by this Match
method:
public TResult Match<TResult>( Func<SongInstruction<SongProgram<T>>, TResult> free, Func<T, TResult> pure) { return imp.Match(free, pure); }
A SelectMany
method is required to make this type a proper monad:
public SongProgram<TResult> SelectMany<TResult>( Func<T, SongProgram<TResult>> selector) { return Match( i => SongProgram.Free(i.Select(p => p.SelectMany(selector))), selector); }
The code is already verbose enough as it is, so I've used i
for instruction and p
for program.
Lifted helpers #
Although not strictly required, I often find it useful to add a helper method for each case of the instruction type:
public static SongProgram<IReadOnlyCollection<User>> GetTopListeners(int songId) { return Free(SongInstruction.GetTopListeners(songId, Pure)); } public static SongProgram<IReadOnlyCollection<Scrobble>> GetTopScrobbles(string userName) { return Free(SongInstruction.GetTopScrobbles(userName, Pure)); }
This just makes the user code look a bit cleaner.
Song recommendations as a DSL #
Using the composition from Song recommendations from C# combinators as a starting point, it doesn't take that many changes to turn it into a SongProgram
-valued function.
public static SongProgram<IReadOnlyList<Song>> GetRecommendations(string userName) { // 1. Get user's own top scrobbles // 2. Get other users who listened to the same songs // 3. Get top scrobbles of those users // 4. Aggregate the songs into recommendations return SongProgram.GetTopScrobbles(userName) .SelectMany(scrobbles => UserTopScrobbles(scrobbles) .Traverse(scrobble => SongProgram .GetTopListeners(scrobble.Song.Id) .Select(TopListeners) .SelectMany(users => users .Traverse(user => SongProgram .GetTopScrobbles(user.UserName) .Select(TopScrobbles)) .Select(Songs))) .Select(TakeTopRecommendations)); }
Notice how much it resembles the original GetRecommendationsAsync
method on the RecommendationsProvider
class. Instead of songService.GetTopScrobblesAsync
it just has SongProgram.GetTopScrobbles
, and instead of songService.GetTopListenersAsync
it has SongProgram.GetTopListeners
.
To support this composition, however, a traversal is required.
Traverse #
The traversal needs a Select
function on SongProgram
, so we'll start with that.
public SongProgram<TResult> Select<TResult>(Func<T, TResult> selector) { return SelectMany(x => SongProgram.Pure(selector(x))); }
This is the standard implementation of a functor from a monad.
It turns out that it's also useful to define a function to concatenate two sequence-valued programs.
private static SongProgram<IEnumerable<T>> Concat<T>( this SongProgram<IEnumerable<T>> xs, SongProgram<IEnumerable<T>> ys) { return xs.SelectMany(x => ys.Select(y => x.Concat(y))); }
This could, perhaps, be a public
function, but in this situation, I only need it to implement Traverse
, so I kept it private
. The Traverse
function, on the other hand, is public
.
public static SongProgram<IEnumerable<TResult>> Traverse<T, TResult>( this IEnumerable<T> source, Func<T, SongProgram<TResult>> selector) { return source.Aggregate( Pure(Enumerable.Empty<TResult>()), (acc, x) => acc.Concat(selector(x).Select(xr => new[] {xr}.AsEnumerable()))); }
Given a sequence of values, Traverse
applies selector
to each, and collects all resulting programs into a single sequence-valued program. You see it in use in the above GetRecommendations
composition.
Interpreter #
That last missing piece is an interpreter that can evaluate a program. Since I already have a class called FakeSongService
, adding an Interpret
method was the easiest implementation strategy.
public T Interpret<T>(SongProgram<T> program) { return program.Match( i => i.Match( t => Interpret(t.Item2(GetTopListernes(t.Item1))), t => Interpret(t.Item2(GetTopScrobbles(t.Item1)))), x => x); }
Here, GetTopListernes
and GetTopScrobbles
are two private
helper functions:
private IReadOnlyCollection<User> GetTopListernes(int songId) { var listeners = from kvp in users where kvp.Value.ContainsKey(songId) select new User(kvp.Key, kvp.Value.Values.Sum()); return listeners.ToList(); } private IReadOnlyCollection<Scrobble> GetTopScrobbles(string userName) { var scrobbles = users .GetOrAdd(userName, new ConcurrentDictionary<int, int>()) .Select(kvp => new Scrobble(songs[kvp.Key], kvp.Value)); return scrobbles.ToList(); }
The implementation closely mirrors the original Fake interface implementation, where users
and songs
are class fields on FakeSongService
. This class was first shown in Characterising song recommendations.
It's now possible to rewrite all the tests.
Refactoring the tests #
Since the original GetRecommendationsAsync
method was task-based, all tests had to run in task workflows. This is no longer necessary, as this simplified FsCheck property demonstrates:
[<Property>] let ``One user, some songs`` () = gen { let! user = Gen.userName let! songs = Gen.arrayOf Gen.song let! scrobbleCounts = Gen.choose (1, 100) |> Gen.arrayOfLength songs.Length return (user, Array.zip songs scrobbleCounts) } |> Arb.fromGen |> Prop.forAll <| fun (user, scrobbles) -> let srvc = FakeSongService () scrobbles |> Array.iter (fun (s, c) -> srvc.Scrobble (user, s, c)) let actual = RecommendationsProvider.GetRecommendations user |> srvc.Interpret Assert.Empty actual
Originally, this test had to be defined in terms of the task
computation expression, but now it's a pure function. In the act phase the test calls RecommendationsProvider.GetRecommendations user
and pipes the returned program to srvc.Interpret
. The result, actual
, is a plain IReadOnlyCollection<Song>
value.
Similarly, I was able to migrate all the example-based tests over, too.
[<Fact>] let ``One verified recommendation`` () = let srvc = FakeSongService () srvc.Scrobble ("cat", Song (1, false, 6uy), 10) srvc.Scrobble ("ana", Song (1, false, 5uy), 10) srvc.Scrobble ("ana", Song (2, true, 5uy), 9_9990) let actual = RecommendationsProvider.GetRecommendations "cat" |> srvc.Interpret Assert.Equal<Song> ([ Song (2, true, 5uy) ], actual)
Once all tests were migrated over to the new GetRecommendations
function, I deleted the old RecommendationsProvider
class as well as the SongService
interface, since none of them were required any longer.
Conclusion #
The lack of proper syntactic sugar, similar to do
notation in Haskell, or computation expressions in F#, means that free monads aren't a useful design option in C#. Still, perhaps the last three articles help a reader or two understanding what a free monad is.
Song recommendations with F# free monads
With an extensive computation expression.
This article is part of a series called Alternative ways to design with functional programming. As the title suggests, it examines alternative functional-programming architectures. It does so by looking at the same overall example problem: Calculating song recommendations from a large data set of 'scrobbles'; records of playback data for many users. In the previous article, you saw how to implement the desired functionality using a free monad in Haskell. In this article, you'll see how to use a free monad in F#.
If you are following along in the accompanying Git repository, the code shown here is from the fsharp-free branch. The starting point is fsharp-port.
Functor #
A free monad enables you to define a monad from any functor. In this context, you start with the functor that models an instruction set for a domain-specific language (DSL). The goal is to replace the SongService
interface with this instruction set.
As a reminder, this is the interface:
type SongService = abstract GetTopListenersAsync : songId : int -> Task<IReadOnlyCollection<User>> abstract GetTopScrobblesAsync : userName : string -> Task<IReadOnlyCollection<Scrobble>>
I'm following the F# free monad recipe, which, I'm happy to report, turned out to be easy to do.
First, the sum type:
type SongInstruction<'a> = | GetTopListeners of songId : int * (IReadOnlyCollection<User> -> 'a) | GetTopScrobbles of userName : string * (IReadOnlyCollection<Scrobble> -> 'a)
If you check with the F# free monad recipe and compare the SongService
interface methods with the SongInstruction
cases, you should be able to spot the similarity, and how to get from interface to discriminated union.
As shown in the previous article, in Haskell you can declaratively state that a type should be a Functor
instance. In F# you have to implement it yourself.
module SongInstruction = let map f = function | GetTopListeners ( songId, next) -> GetTopListeners ( songId, next >> f) | GetTopScrobbles (userName, next) -> GetTopScrobbles (userName, next >> f)
Here I've put it in its own little module, in order to make it clear which kind of data the map
function handles.
Monad #
The next step is to wrap the functor in a data structure that enables you to sequence the above instructions.
type SongProgram<'a> = | Free of SongInstruction<SongProgram<'a>> | Pure of 'a
A bind
function is required to make this type a proper monad:
module SongProgram = let rec bind f = function | Free inst -> inst |> SongInstruction.map (bind f) |> Free | Pure inst -> f inst
Again, I'm slavishly following the F# free monad recipe; please refer to it for more details.
Computation expression #
Technically, that's all it takes to make it possible to write programs in the little DSL. Doing it exclusively with the above bind
function would, however, but quite cumbersome, so you'll also appreciate some syntactic sugar in the form of a computation expression.
As you will see shortly, I needed to (temporarily) support some imperative language constructs, so I had to make it more involved than usually required.
type SongProgramBuilder () = member _.Bind (x, f) = SongProgram.bind f x member _.Return x = Pure x member _.ReturnFrom x = x member _.Zero () = Pure () member _.Delay f = f member _.Run f = f () member this.While (guard, body) = if not (guard ()) then this.Zero () else this.Bind (body (), fun () -> this.While (guard, body)) member this.TryWith (body, handler) = try this.ReturnFrom (body ()) with e -> handler e member this.TryFinally (body, compensation) = try this.ReturnFrom (body ()) finally compensation () member this.Using (disposable : #System.IDisposable, body) = let body' = fun () -> body disposable this.TryFinally (body', fun () -> match disposable with | null -> () | disp -> disp.Dispose ()) member this.For (sequence : seq<_>, body) = this.Using (sequence.GetEnumerator (), fun enum -> this.While (enum.MoveNext, this.Delay (fun () -> body enum.Current))) member _.Combine (x, y) = x |> SongProgram.bind (fun () -> y ())
I had to do quite a bit of yak shaving before I got the For
method right. Not surprisingly, Scott Wlaschin's series on computation expressions proved invaluable.
As you always do with computation expressions, you leave a builder object somewhere the rest of your code can see it:
let songProgram = SongProgramBuilder ()
You'll soon see an example of using a songProgram
computation expression.
Lifted helpers #
Although not strictly required, I often find it useful to add a helper function for each case of the instruction type:
let getTopListeners songId = Free (GetTopListeners (songId, Pure)) let getTopScrobbles userName = Free (GetTopScrobbles (userName, Pure))
This just makes the user code look a bit cleaner.
Imperative-looking program #
With all that machinery in place, you can now write a referentially transparent function that implements the same algorithm as the original class method.
let getRecommendations userName = songProgram { // 1. Get user's own top scrobbles // 2. Get other users who listened to the same songs // 3. Get top scrobbles of those users // 4. Aggregate the songs into recommendations let! scrobbles = getTopScrobbles userName let scrobblesSnapshot = scrobbles |> Seq.sortByDescending (fun s -> s.ScrobbleCount) |> Seq.truncate 100 |> Seq.toList let recommendationCandidates = ResizeArray () for scrobble in scrobblesSnapshot do let! otherListeners = getTopListeners scrobble.Song.Id let otherListenersSnapshot = otherListeners |> Seq.filter (fun u -> u.TotalScrobbleCount >= 10_000) |> Seq.sortByDescending (fun u -> u.TotalScrobbleCount) |> Seq.truncate 20 |> Seq.toList for otherListener in otherListenersSnapshot do // Impure let! otherScrobbles = getTopScrobbles otherListener.UserName // Pure let otherScrobblesSnapshot = otherScrobbles |> Seq.filter (fun s -> s.Song.IsVerifiedArtist) |> Seq.sortByDescending (fun s -> s.Song.Rating) |> Seq.truncate 10 |> Seq.toList otherScrobblesSnapshot |> List.map (fun s -> s.Song) |> recommendationCandidates.AddRange let recommendations = recommendationCandidates |> Seq.sortByDescending (fun s -> s.Rating) |> Seq.truncate 200 |> Seq.toList :> IReadOnlyCollection<_> return recommendations }
Notice how much it resembles the original GetRecommendationsAsync
method on the RecommendationsProvider
class. Instead of songService.GetTopScrobblesAsync
it just has getTopScrobbles
, and instead of songService.GetTopListenersAsync
it has getTopListeners
. The whole computation is wrapped in songProgram
, and the return type is SongProgram<IReadOnlyCollection<Song>>
.
Perhaps you're wondering how the local state mutation (of recommendationCandidates
) squares with my claim that this function is referentially transparent, but consider what referential transparency means. It means that you could replace a particular function call (say, getRecommendations "cat"
) with the value it returns. And you can. That the function used local mutation to arrive at that return value is of no concern to the caller.
Even so, later in this article, we'll refactor the code to something that looks more like a pure function. For now, however, we'll first see how to evaluate programs like the one returned by getRecommendations
.
Interpreter #
Again, I follow the F# free monad recipe. Since I already have a class called FakeSongService
, adding an Interpret
method was the easiest implementation strategy. A private recursive function takes care of the implementation:
let rec interpret = function | Pure x -> x | Free (GetTopListeners (songId, next)) -> users |> Seq.filter (fun kvp -> kvp.Value.ContainsKey songId) |> Seq.map (fun kvp -> user kvp.Key (Seq.sum kvp.Value.Values)) |> Seq.toList |> next |> interpret | Free (GetTopScrobbles (userName, next)) -> users.GetOrAdd(userName, ConcurrentDictionary<_, _> ()) |> Seq.map (fun kvp -> scrobble songs[kvp.Key] kvp.Value) |> Seq.toList |> next |> interpret
The implementation closely mirrors the original Fake interface implementation, where users
and songs
are class fields on FakeSongService
. This class was first shown in Characterising song recommendations.
Since I added the interpret
function in the class, we need a method that enables client code to call it:
member _.Interpret program = interpret program
It's now possible to rewrite all the tests.
Refactoring the tests #
Since the original GetRecommendationsAsync
method was task-based, all tests had to run in task workflows. This is no longer necessary, as this simplified FsCheck property demonstrates:
[<Property>] let ``One user, some songs`` () = gen { let! user = Gen.userName let! songs = Gen.arrayOf Gen.song let! scrobbleCounts = Gen.choose (1, 100) |> Gen.arrayOfLength songs.Length return (user, Array.zip songs scrobbleCounts) } |> Arb.fromGen |> Prop.forAll <| fun (user, scrobbles) -> let srvc = FakeSongService () scrobbles |> Array.iter (fun (s, c) -> srvc.Scrobble (user, s, c)) let actual = getRecommendations user |> srvc.Interpret Assert.Empty actual
Originally, this test had to be defined in terms of the task
computation expression, but now it's a pure function. In the act phase the test calls getRecommendations user
and pipes the returned program to srvc.Interpret
. The result, actual
, is a plain IReadOnlyCollection<Song>
value.
Similarly, I was able to migrate all the example-based tests over, too.
[<Fact>] let ``One verified recommendation`` () = let srvc = FakeSongService () srvc.Scrobble ("cat", song 1 false 6uy, 10) srvc.Scrobble ("ana", song 1 false 5uy, 10) srvc.Scrobble ("ana", song 2 true 5uy, 9_9990) let actual = getRecommendations "cat" |> srvc.Interpret Assert.Equal<Song> ([ song 2 true 5uy ], actual)
Once all tests were migrated over to the new getRecommendations
function, I deleted the old RecommendationsProvider
class as well as the SongService
interface, since none of them were required any longer. Notice how I managed to move in smaller steps than in the previous article. As described in Code That Fits in Your Head, I used the Strangler pattern to move incrementally. I should also have done that with the Haskell code base, but fortunately I didn't run into trouble.
With all the tests migrated to pure functions, it's time to go back to getRecommendations
to give it some refactoring love.
Traverse #
The local mutation in the above incarnation of getRecommendations
is located within a doubly-nested loop. You typically need a traversal if you want to refactor such loops to expressions.
The traversal needs a map
function, so we'll start with that.
let map f = bind (f >> Pure)
This function is included in the SongProgram
module shown above. That's the reason it can call bind
without a prefix. The same is true for the traverse
function.
let traverse f xs = let concat xs ys = xs |> bind (fun x -> ys |> map (Seq.append x)) Seq.fold (fun acc x -> concat acc (f x |> map Seq.singleton)) (Pure (Seq.empty)) xs
The local concat
function concatenates two SongProgram
values that contain sequences to a single SongProgram
value containing the concatenated sequence. The traversal uses this helper function to fold
over the xs
.
Refactored program #
You can now go through all the usual motions of replacing the nested loops and the local mutation with traversal, extracting helper functions and so on. If you're interested in the small steps I took, you may consult the Git repository. Here, I only present the getRecommendations
function in the state I decided to leave it.
let getRecommendations userName = songProgram { // 1. Get user's own top scrobbles // 2. Get other users who listened to the same songs // 3. Get top scrobbles of those users // 4. Aggregate the songs into recommendations let! scrobbles = getTopScrobbles userName let! otherlListeners = getUsersOwnTopScrobbles scrobbles |> SongProgram.traverse (fun s -> getTopListeners s.Song.Id) let! otherScrobbles = otherlListeners |> Seq.collect getOtherUsersWhoListenedToTheSameSongs |> SongProgram.traverse (fun u -> getTopScrobbles u.UserName) return otherScrobbles |> Seq.collect getTopScrobblesOfUsers |> aggregateTheSongsIntoRecommendations }
Notice the two applications of traverse
, which replace the twice-nested loop.
The type of getRecommendations
is unchanged, and all tests still pass.
Conclusion #
As expected, it required more boilerplate code to refactor the GetRecommendationsAsync
method to a design based on a free monad, than the corresponding refactoring did in Haskell. Even so, following the F# free monad recipe made it was smooth sailing. I did, however, hit a snag, mostly of my own manufacture, when implementing support for for
loops in the computation expression builder, but that problem turned out to be unrelated to the free monad.
Again, you may wonder what the point is. First, I'll refer you to the decision flowchart in the F# free monad recipe, as well as remind you that these articles are all intended to be descriptive rather than prescriptive. They only present what is possible. It's up to you to decide which option to choose.
Second, a SongProgram<'a>
may look similar to an interface, but consider how much more restricted it is. A method that uses dependency injection is inherently impure; in such a method, everything may happen, including every kind of bug.
The F# type system does not check whether or not unconstrained side effects or non-determinism takes place, but still, a type like SongProgram<'a>
strongly suggests that only SongProgram
-related actions take place.
I think that free monads still have a place in F#, although mostly as a niche use case. In C#, on the other hand, free monads aren't useful because the language lacks features to make this kind of programming workable. Still, for demonstration purposes only, it can be done.
Song recommendations with Haskell free monads
A surprisingly easy refactoring.
This article is part of a larger series titled Alternative ways to design with functional programming. In short, it uses Haskell, F#, and C# to present various internal architectures to deal with an example problem. Please refer to the table of contents included with the first article to get a sense of what has already been covered.
In this article, you'll see how a free monad may be used to address the problem that when data size is sufficiently large, you may need to load it piecemeal, based on the results of previous steps in an algorithm. In short, the problem being addressed is to calculate a list of song recommendations based on so-called 'scrobble' data from a multitude of other users' music playback history.
If you want to follow along with the Git repository, the code presented here is from the Haskell repository's free branch.
Starting point #
Instead of starting from scratch from the code base presented in Porting song recommendations to Haskell, I'll start at what was an interim refactoring step in Song recommendations from Haskell combinators:
getRecommendations :: SongService a => a -> String -> IO [Song] getRecommendations srvc un = do -- 1. Get user's own top scrobbles -- 2. Get other users who listened to the same songs -- 3. Get top scrobbles of those users -- 4. Aggregate the songs into recommendations -- Impure scrobbles <- getTopScrobbles srvc un -- Pure let scrobblesSnapshot = take 100 $ sortOn (Down . scrobbleCount) scrobbles -- Impure recommendations <- join <$> traverse (\scrobble -> fmap join $ traverse (\otherListener -> fmap scrobbledSong . take 10 . sortOn (Down . songRating . scrobbledSong) . filter (songHasVerifiedArtist . scrobbledSong) <$> getTopScrobbles srvc (userName otherListener)) . take 20 <$> sortOn (Down . userScrobbleCount) . filter ((10_000 <=) . userScrobbleCount) =<< getTopListeners srvc (songId $ scrobbledSong scrobble)) scrobblesSnapshot -- Pure return $ take 200 $ sortOn (Down . songRating) recommendations
The reason I wanted to start here is that with the IORef
out of the way, the only IO
-bound code remaining involves the SongService
methods.
The plan is to replace the SongService
type class with a free monad. Once that's done, there's no more IO
left; it will have been replaced by the free monad. That's why it's easiest to first get rid of everything else involving IO
. If I didn't, the refactoring wouldn't be as easy. As Kent Beck wrote,
"for each desired change, make the change easy (warning: this may be hard), then make the easy change"
Starting from the above incarnation of the code makes the change easy.
Functor #
As a reminder, this type class is the one I'm getting rid of:
class SongService a where getTopListeners :: a -> Int -> IO [User] getTopScrobbles :: a -> String -> IO [Scrobble]
Converting such an 'interface' to a sum type of instructions is such a well-defined process that I think it could be automated (if it were worth the effort). You take each method of the type class and make it a case in the sum type.
data SongInstruction a = GetTopListeners Int ([User] -> a) | GetTopScrobbles String ([Scrobble] -> a) deriving (Functor)
I usually think of such a type as representing possible instructions in a little domain-specific language (DSL). In this case, the DSL allows only two instructions: GetTopListeners
and GetTopScrobbles
.
If you've never seen a free monad before, you may be wondering: What's the a
? It's what we may call the carrier type. If you haven't worked with, say, F-Algebras, this takes some time getting used to, but the carrier type represents a 'potential'. As presented here, it may be anything, but when you want to evaluate or 'run' the program, the a
turns out to be the return type of the evaluation.
Since SongInstruction
is a Functor
, it's possible to map a
to b
, or one concrete type to another, so that the a
type parameter can take on more than one concrete type during a small 'program' written in the DSL.
In any case, at this point, SongInstruction a
is only a Functor
, and not yet a Monad
.
Free monad #
You may, if you want to, introduce a type alias that makes SongInstruction a
(or, rather, its Free
wrapper) a Monad
.
type SongProgram = Free SongInstruction
The Free
wrapper comes from Control.Monad.Free.
In the rest of the code listings in this article, I make no use of this type alias, so I only present it here because it's the type the compiler will infer when running the test. In other words, while never explicitly stated, this is going to be the de-facto free monad in this article.
From action to function #
Changing the getRecommendations
action to a function that returns (effectively) Free (SongInstruction [Song])
turned out to be easier than I had expected.
Since I've decided to omit type declarations whenever possible, refactoring is easier because the type inferencing system can allow changes to function and action types to ripple through to the ultimate callers. (To be clear, however, I here show some of the code with type annotations. I've only added those in the process of writing this article, as an aid to you, the reader. You will not find those type annotations in the Git repository.)
First, I added a few helper methods to create 'program instructions':
getTopListeners sid = liftF $ GetTopListeners sid id getTopScrobbles un = liftF $ GetTopScrobbles un id
The liftF
function wraps a Functor
(here, SongInstruction
) in a free monad. This makes it easer to write programs in that DSL.
The only changes now required to getRecommendations
is to remove srvc
in four places:
getRecommendations :: MonadFree SongInstruction m => String -> m [Song] getRecommendations un = do -- 1. Get user's own top scrobbles -- 2. Get other users who listened to the same songs -- 3. Get top scrobbles of those users -- 4. Aggregate the songs into recommendations -- Impure scrobbles <- getTopScrobbles un -- Pure let scrobblesSnapshot = take 100 $ sortOn (Down . scrobbleCount) scrobbles -- Impure recommendations <- join <$> traverse (\scrobble -> fmap join $ traverse (\otherListener -> fmap scrobbledSong . take 10 . sortOn (Down . songRating . scrobbledSong) . filter (songHasVerifiedArtist . scrobbledSong) <$> getTopScrobbles (userName otherListener)) . take 20 <$> sortOn (Down . userScrobbleCount) . filter ((10_000 <=) . userScrobbleCount) =<< getTopListeners (songId $ scrobbledSong scrobble)) scrobblesSnapshot -- Pure return $ take 200 $ sortOn (Down . songRating) recommendations
It's almost the same code as before. The only difference is that srvc
is not longer a parameter to either getRecommendations
, getTopListeners
, or getTopScrobbles
.
Again, I'll point out that I've only added the type annotation for the benefit of you, the reader. We see now, however, that the return type is m [Song]
, where m
is any MonadFree SongInstruction
.
Interpreter #
As described in Porting song recommendations to Haskell I use a FakeSongService
as a Test Double. SongService
is now gone, but I can repurpose the instance implementation as an interpreter of SongInstruction
programs.
interpret :: FakeSongService -> Free SongInstruction a -> a interpret (FakeSongService ss us) = iter eval where eval (GetTopListeners sid next) = next $ uncurry User <$> Map.toList (sum <$> Map.filter (Map.member sid) us) eval (GetTopScrobbles un next) = next $ fmap (\(sid, c) -> Scrobble (ss ! sid) c) $ Map.toList $ Map.findWithDefault Map.empty un us
Compare that code to the previous SongService
instance shown in Porting song recommendations to Haskell. The functions are nearly identical, only return
is replaced by next
, and a few other, minute changes.
Test changes #
The hardest part of this refactoring was to adjust the tests. As I've described in Code That Fits in Your Head, I don't like when I have to change the System Under Test and the test code in the same commit, but in this case I lacked the skill to do it more incrementally.
The issue is that since SongService
methods were IO
-bound, the tests ran in IO
. Particularly for the QuickCheck properties, I had to remove the ioProperty
and monadicIO
combinators. This also meant adjusting some of the assertions. The "One user, some songs"
test now looks like this:
testProperty "One user, some songs" $ \ (ValidUserName user) (fmap getSong -> songs) -> do scrobbleCounts <- vectorOf (length songs) $ choose (1, 100) let scrobbles = zip songs scrobbleCounts let srvc = foldr (uncurry (scrobble user)) emptyService scrobbles let actual = interpret srvc $ getRecommendations user return $ counterexample "Should be empty" (null actual)
Notice the interpreter in action, to the left of the application of getRecommendations
. Since interpret
reduces any Free SongInstruction a
to a
, it evaluates Free SongInstruction [Song]
to [Song]
; that's the type of actual
.
The change represents a net benefit in my view. All tests are now pure, instead of running in IO
. This makes them simpler.
Final refactoring #
The above version of the code was only a good starting point for making the changeover to a free monad. Extracting the usual helper functions, you can arrive at this variation of the getRecommendations
function:
getRecommendations un = do -- 1. Get user's own top scrobbles -- 2. Get other users who listened to the same songs -- 3. Get top scrobbles of those users -- 4. Aggregate the songs into recommendations userTops <- getTopScrobbles un <&> getUsersOwnTopScrobbles otherListeners <- traverse (getTopListeners . (songId . scrobbledSong)) userTops <&> getOtherUsersWhoListenedToTheSameSongs . join songs <- traverse (getTopScrobbles . userName) otherListeners <&> getTopScrobblesOfOtherUsers . join return $ aggregateTheSongsIntoRecommendations songs
Again, it looks a lot like one of the variations shown in Porting song recommendations to Haskell, just without the srvc
parameter.
Conclusion #
Refactoring from a type class, which in other languages could be an interface, is so easy that you may rightfully ask what the point is. To me, the most important benefit is that you restrict your options. As I discussed in a podcast episode some years ago, constraints liberate. Here, we move from allowing anything (IO
) to only allowing a limited set of instructions.
You could argue that a 'real' interpreter (rather than a test-specific interpreter) might run in IO
, such as the interpreter shown here. Still, I would argue that most interpreters are smaller, and more stable, than the programs you may write with free monads. Thus, the amount of code you have to review that's allowed to do anything is smaller than it otherwise would have been.
In any case, refactoring to a free monad wasn't too difficult in Haskell. How easy is it in F#?
Song recommendations with free monads
A Golden Hammer.
This article is part of a larger article series about alternative ways to design with functional programming, particularly when faced with massive data loads. In previous articles in the series, you've seen various alternatives that may or may not enable you to solve the example problem using functional programming. Each of the previous alternatives, applying the Recawr Sandwich pattern, employing functional combinators, or using pipes and filters, came with some trade-offs. Either there were limits to the practicality of the architecture, or it wasn't really functional after all.
In this, and the following three articles, we finally reach for a universal solution: Free monads.
By universal I mean: It's always possible to do this (if your language supports it). It doesn't follow that it's always a good idea.
So there are trade-offs here, too.
As is usually the case, although sometimes I forget to write it, this and the following articles are descriptive, not prescriptive. In no way do I insist that things must be done this way. I only present examples as options.
The last resort #
If using a free monad is a universal possibility, then why is that not the default option? Why have I waited this long in the article series before covering it?
For more than a single reason.
It's an 'advanced' technique, and I predict that to many programmers, it looks like black magic. If you're working with a team unready for free monads, foisting it upon them are unlikely to lead to anything good.
The following story may illustrate the problem. It's not about free monads, but rather about Dependency Injection (DI) Containers. A specific one, actually. I was consulting with a team, working as a temporary lead developer for about half a year. I made a number of technical decisions, some of them good and others not so much. Among the latter was the decision to use the Castle Windsor DI Container.
Not that Castle Windsor was a bad library. After having done the research and written a book, I considered it the best DI Container available.
The problem, rather, was that the rest of the team considered it 'automagical'. Not that they weren't sophisticated programmers, but they had no interest learning the Castle Windsor API. They didn't consider it part of their core work, and they didn't think it provided enough benefit compared to the maintenance burden it imposed.
They humoured me as long as I stayed on, but also explicitly told me that they'd rip it out as soon as I was gone. In the meantime, I became the Castle Windsor bottleneck. Everything related to that piece of technology had to go through me, since I was the only person who understood how it worked.
A few years later, I returned to that team and that code base on a new project, and they'd kept their word. They'd removed Castle Windsor in favour of Pure DI, bless them.
This experience was crucial in making me realize that, despite their intellectual attraction, DI Containers are rarely the correct choice.
Free monads look to me as though they belong to the same category. While I don't have a similar story to tell, I'm wary of recommending them unless other options are played out. I'll refer you to the decision flowchart in the article F# free monad recipe.
Language support #
Another concern about free monads is whether your language of choice supports them. They fit perfectly in Haskell, which is also the reason I start this sub-series of articles with a Haskell example.
- Song recommendations with Haskell free monads
- Song recommendations with F# free monads
- Song recommendations with C# free monads
In F# you'll have to do some more legwork, but once you've added the required infrastructure, the 'user code' based on free monads is perfectly fine.
C#, on the other hand, doesn't have all the language features that will make free monads a good experience. I wouldn't suggest using a free monad in C#. I include the article with the C# free monad for demonstration purposes only.
These articles will not introduce free monads to novice readers. For a gentler introduction, see the article series Pure interactions.
Conclusion #
When all else fails, and you just must write pure functions, reach for free monads. They are not for everyone, or every language, but they do offer a functional solution to problems with large data sets or rich interaction with users or the environment.
Testing races with a synchronizing Decorator
Synchronized database reads for testing purposes.
In a previous article, you saw how to use a slow Decorator to test for race conditions. Towards the end, I discussed how that solution is only near-deterministic. In this article, I discuss a technique which is, I think, properly deterministic, but unfortunately less elegant.
In short, it works by letting a Decorator synchronize reads.
The problem #
In the previous article, I used words to describe the problem, but really, I should be showing, not telling. Here's a variation on the previous test that exemplifies the problem.
[Fact] public async Task NoOverbookingRace() { var date = DateTime.Now.Date.AddDays(1).AddHours(18.5); using var service = new RestaurantService(); using var slowService = from repo in service select new SlowReservationsRepository(TimeSpan.FromMilliseconds(100), repo); var task1 = slowService.PostReservation(new ReservationDtoBuilder() .WithDate(date) .WithQuantity(10) .Build()); await Task.Delay(TimeSpan.FromSeconds(1)); var task2 = slowService.PostReservation(new ReservationDtoBuilder() .WithDate(date) .WithQuantity(10) .Build()); var actual = await Task.WhenAll(task1, task2); Assert.Single( actual, msg => msg.StatusCode == HttpStatusCode.InternalServerError); var ok = Assert.Single(actual, msg => msg.IsSuccessStatusCode); // Check that the reservation was actually created: var resp = await service.GetReservation(ok.Headers.Location); resp.EnsureSuccessStatusCode(); var reservation = await resp.ParseJsonContent<ReservationDto>(); Assert.Equal(10, reservation.Quantity); }
Apart from a single new line of code, this is is identical to the test shown in the previous article. The added line is the Task.Delay
between task1
and task2
.
What's the point of adding this delay there? Only to demonstrate a problem. That's one of the situations I described in the previous article: Even though the test starts both tasks without awaiting them, they aren't guaranteed to run in parallel. Both start as soon as they're created, so task2
is going to be ever so slightly behind task1
. What happens if there's a delay between the creation of these two tasks?
Here I've explicitly introduced such a delay for demonstration purposes, but such a delay could happen on a real system for a number of reasons, including garbage collection, thread starvation, the OS running a higher-priority task, etc.
Why might that pause matter?
Because it may produce false negatives. Imagine a situation where there's no transaction control; where there's no TransactionScope around the database interactions. If the pause is long enough, the tasks effectively run in sequence (instead of in parallel), in which case the system correctly rejects the second attempt.
This is even when using the SlowReservationsRepository
Decorator.
How long does the pause need to be before this happens?
As described in the previous article, with a configured delay of 100 ms for the SlowReservationsRepository
, creating a new reservation is delayed by 300 ms. This bears out. Experimenting on my own machine, if I change that explicit, artificial delay to 300 ms, and remove the transaction control, the test sometimes fails, and sometimes passes. With the above one-second delay, the test always passes, even when transaction control is missing.
You could decide that a 300 ms pause at just the worst possible time is so unlikely that you're willing to simply accept those odds. I would probably be, too. Still, what to test, and what not to test is a function of context. You may find yourself in a context where that's not good enough. What other options are there?
Synchronizing Decorator #
What you really need to reproduce the race condition is to synchronize the database reads. If you could make sure that the Repository only returns data when enough reads have been performed, you can deterministically reproduce the problem.
Again, start with a Decorator. This time, build into it a way to synchronize reads.
internal sealed class SynchronizedReaderRepository : IReservationsRepository { private readonly CountdownEvent countdownEvent = new CountdownEvent(2); public SynchronizedReaderRepository(IReservationsRepository inner) { Inner = inner; } public IReservationsRepository Inner { get; }
Here I've used a CountdownEvent object to ensure that reads only progress when the countdown reaches zero. It's possible that more appropriate threading APIs exist, but this serves well as a proof of concept.
The method you need to synchronize is ReadReservations
, so you can leave all the other methods to delegate to Inner
. Only ReadReservations
is special.
public async Task<IReadOnlyCollection<Reservation>> ReadReservations( int restaurantId, DateTime min, DateTime max) { var result = await Inner .ReadReservations(restaurantId, min, max); countdownEvent.Signal(); countdownEvent.Wait(); return result; }
This implementation also starts by delegating to Inner
, but before it returns the result
, it signals the countdownEvent
and blocks the thread by waiting on the countdownEvent
. Only when both threads have signalled it does the counter reach zero, and the methods may proceed.
If we assume that, while the test is running, no other calls to ReadReservations
is made, this guarantees that both threads receive the same answer. This will make both competing threads come to the answer that they can accept the reservation. If no transaction control is in place, the system will overbook the requested time slot.
Testing with the synchronizing Repository #
The test that uses SynchronizedReaderRepository
is almost identical to the previous test.
[Fact] public async Task NoOverbookingRace() { var date = DateTime.Now.Date.AddDays(1).AddHours(18.5); using var service = new RestaurantService(); using var syncedService = service.Select(repo => new SynchronizedReaderRepository(repo)); var task1 = syncedService.PostReservation(new ReservationDtoBuilder() .WithDate(date) .WithQuantity(10) .Build()); var task2 = syncedService.PostReservation(new ReservationDtoBuilder() .WithDate(date) .WithQuantity(10) .Build()); var actual = await Task.WhenAll(task1, task2); Assert.Single( actual, msg => msg.StatusCode == HttpStatusCode.InternalServerError); var ok = Assert.Single(actual, msg => msg.IsSuccessStatusCode); // Check that the reservation was actually created: var resp = await service.GetReservation(ok.Headers.Location); resp.EnsureSuccessStatusCode(); var reservation = await resp.ParseJsonContent<ReservationDto>(); Assert.Equal(10, reservation.Quantity); }
Contrary to using the slow Repository, this test doesn't allow false negatives. If transaction control is missing from the System Under Test (SUT), this test fails. And it passes when transaction control is in place.
Disadvantages #
That sounds great, so why not just do this, instead of using a delaying Decorator? Because, as usual, there are trade-offs involved. This kind of solution comes with some disadvantages that are worth taking into account.
In short, this could make the test more fragile. As shown above, SynchronizedReaderRepository
makes a specific assumption. It assumes that it needs to synchronize exactly two parallel readers. One problem with this is that this may be coupled to exactly one test. If you had other tests, you'd need to write a new Decorator, or generalize this one in some way.
Another problem is that this makes the test sensitive to changes in the SUT. What if a code change introduces a new call to ReadReservations
? If so, the countdownEvent
may unblock the threads too soon. One such change may be that the SUT decides to also query for surrounding times slots. You might be able to make SynchronizedReaderRepository
robust against such changes by keeping a dictionary of synchronization objects (such as the above CountdownEvent
) per argument set, but that clearly complicates the implementation.
And even so, it doesn't protect against identical 'double reads', even though these may be less likely to happen.
This Decorator is also vulnerable to caching. If you have a read-through cache that wraps around SynchronizedReaderRepository
, only the first query may get to it, which would then cause it to block forever. Perhaps, again, you could fix this with the Wait overload that takes a timeout value.
That said, if you cache reads, the pessimistic locking that TransactionScope uses isn't going to work. You could, perhaps, address that concern with optimistic concurrency, but that comes with its own problems.
Conclusion #
You can address race conditions in various ways, but synchronization has been around for a long time. Not only can you use synchronization primitives and APIs to make your code thread-safe, you can also use them to deterministically reproduce race conditions, or to test that such a bug is no longer present in the system.
I don't want to claim that this is universally possible, but if you run into such problems, it's at least worth considering if you could take advantage of synchronization to reproduce a problem.
Of course, the implication is that you understand what the problem is. This is often the hardest part of dealing with race conditions, and the ideas described in this article don't help with that.
Song recommendations with F# agents
MailboxProcessors as small Recawr Sandwiches.
This article is part of a series named Alternative ways to design with functional programming. As the title implies, over a multitude of articles, I present various alternatives for applying functional programming to a particular problem. When I present the Impureim Sandwich design pattern, the most common reaction is: What if you need to make additional, impure reads in the middle of an algorithm?
This article series looks at alternatives when this (in my experience rare) requirement seems inescapable. A previous article outlined a general alternative: Use some kind of pipes-and-filters architecture to turn an undisciplined Transaction Script into a composition of 'filters', where each filter is a self-contained Recawr Sandwich.
Depending on the specific technology choices you make, you may encounter various terminology related to this kind of architecture. Filters may also be called actors, message handlers, or, as is the case in this article, agents. For a consistent pattern language, see Enterprise Integration Patterns.
The code shown here is taken from the fsharp-agents branch of the example code Git repository.
Async instead of Task #
The F# base library comes with a class called MailboxProcessor, often called an agent. It's an in-memory message handler that can run in the background of other tasks, pulling messages off an internal queue one at a time.
It's been around for such a long time that its API is based on Async<T>, which precedes the now-ubiquitous Task<TResult>. While conversions exist, I thought it'd make the example code simpler if I first redefined the SongService
interface to return Async
-based results.
type SongService = abstract GetTopListenersAsync : songId : int -> Async<IReadOnlyCollection<User>> abstract GetTopScrobblesAsync : userName : string -> Async<IReadOnlyCollection<Scrobble>>
Keep in mind that, despite lack of the idiomatic I
prefix, this is an interface, not an abstract class. (It would have had to have the [<AbstractClass>]
attribute to be an abstract class.)
This is minor change that only affected a few lines of code where I had to change from task expressions to async expressions.
Gather own scrobbles #
As was also the case in the previous article, we may as well start at the beginning of the algorithm. Given a user name, we'd like to find that user's top scrobbles. Once we've found them, we'd like to post them to an agent. Since F# agents are message-based, we must define an appropriate message type.
type private ScrobbleMessage = Scrobble of Scrobble | EndOfStream
If you recall the previous article, with Reactive Extensions for .NET, you can use the OnCompleted method to signal the end of a stream. Such a method isn't available for an F# agent, because an agent is a consumer of messages, rather than a stream of values. For that reason, a ScrobbleMessage
may either be a Scrobble
or an EndOfStream
.
With the message definition in place, you can define the first step of the algorithm like this:
// string -> SongService -> MailboxProcessor<ScrobbleMessage> -> Async<unit> let private gatherOwnScrobbles userName (songService : SongService) (channel : MailboxProcessor<_>) = async { // Impure let! scrobbles = songService.GetTopScrobblesAsync userName // Pure let scrobblesSnapshot = scrobbles |> Seq.sortByDescending _.ScrobbleCount |> Seq.truncate 100 |> Seq.map Scrobble // Impure Seq.iter channel.Post scrobblesSnapshot channel.Post EndOfStream }
The gatherOwnScrobbles
action isn't itself an agent. Rather, it takes one as input, to which it posts messages. Notice that the operation returns an Async<unit>
value. In other words, it doesn't really return anything, but rather posts messages to the injected channel
.
Like in the previous article, once all scrobbles are posted, gatherOwnScrobbles
indicates the end of the stream by posting a final EndOfStream
message. This still works in this implementation, since F# agents (as far as I've been able ascertain) handle messages in order. If you're using a distributed messaging framework based on a message bus, and possibly handlers running on multiple machines, you can't always assume this to be the case. As I wrote in Song recommendations with pipes and filters, you'll need to extrapolate from both this and the previous article in such cases. This is where a pattern language as presented in Enterprise Integration Patterns may come in handy. Perhaps you need a Message Sequence in this case.
Be that as it may, the gatherOwnScrobbles
action is a small Recawr Sandwich with clearly delineated steps.
The natural next step is to implement a MailboxProcessor that can receive those scrobble messages.
Gather other listeners #
To handle the messages posted by gatherOwnScrobbles
we'll create an agent. This one, however, isn't going to complete the algorithm. Rather, it's going to publish even more messages that yet another agent may deal with. For that, we need another message type:
type private UserMessage = User of User | EndOfStream
We see what starts to look like a pattern: A 'payload' case, and a case to indicate that no more message will be coming.
The following action creates an agent that handles scrobble messages and publishes user messages:
// SongService -> MailboxProcessor<UserMessage> -> MailboxProcessor<ScrobbleMessage> let private gatherOtherListeners (songService : SongService) (channel : MailboxProcessor<_>) = MailboxProcessor.Start <| fun inbox -> let rec loop () = async { let! message = inbox.Receive () match message with | Scrobble scrobble -> // Impure let! otherListeners = songService.GetTopListenersAsync scrobble.Song.Id // Pure let otherListenersSnapshot = otherListeners |> Seq.filter (fun u -> u.TotalScrobbleCount >= 10_000) |> Seq.sortByDescending _.TotalScrobbleCount |> Seq.truncate 20 |> Seq.map User // Impure Seq.iter channel.Post otherListenersSnapshot return! loop () | ScrobbleMessage.EndOfStream -> channel.Post EndOfStream } loop ()
If you can look past some of the infrastructure required to initialize and implement the agent (MailboxProcessor
), the main message handler is, once again, a small Recawr Sandwich. The other case in the match
expression maps one EndOfStream
case to another EndOfStream
case. Notice that this case does not recursively call loop
. This means that once the agent receives an EndOfStream
message, it stops all further message processing.
You may have noticed that the loop
starts with an 'unmarked' impure step to receive a message. Once a message arrives, it matches on the message. You may argue that there seems to be more than one impure step in the sandwich, but as I've previously outlined, sometimes a sandwich has more that three layers.
I could have compressed the code that receives and dispatches the message to a single line of code:
match! input.Receive () with
I felt, however, that for readers who aren't familiar with F# agents, it would help to instead make things more explicit by having a named message
value in the code. It's an example of using an explicit variable for readability purposes.
A third agent, created by gatherOtherScrobbles
, handles the messages published by gatherOtherListeners
by publishing even more song messages. We'll get back to that message type in a moment, but the agent looks similar to the one shown above. You may consult the Git repository if you're curious about the details.
Collecting the recommendations #
The final agent is a bit different, because it needs to do two things:
- Handle song messages
- Return the recommendations once they're ready
Because of that extra responsibility, the message type isn't a two-way discriminated union. Instead, it has a third case that we haven't yet seen.
type private SongMessage = | Song of Song | EndOfStream | Fetch of AsyncReplyChannel<Option<IReadOnlyCollection<Song>>>
The Song
and EndOfStream
cases are similar to what we've already seen. These are the two messages that the gatherOtherScrobbles
agent publishes.
What does the third case, Fetch
, do? It looks odd, with its AsyncReplyChannel
payload. In a moment, you'll see how it's used, but essentially, this is how F# agents support the Request-Reply pattern. Let's see it all in action:
// unit -> MailboxProcessor<SongMessage> let private collectRecommendations () = MailboxProcessor.Start <| fun input -> let rec loop recommendations isDone = async { let! message = input.Receive () match message with | Song song -> return! loop (song :: recommendations) false | SongMessage.EndOfStream -> let recommendations = recommendations |> List.sortByDescending _.Rating |> List.truncate 200 return! loop recommendations true | Fetch replyChannel -> if isDone then replyChannel.Reply (Some recommendations) else replyChannel.Reply None return! loop recommendations isDone } loop [] false
The purpose of this agent is to collect all the songs that the previous agent recommended. Once it receives an EndOfStream
message, it sorts the songs and keeps only the top 200.
Note that the recursive loop
action takes two parameters, recommendations
and isDone
. The loops starts with an empty song list and the flag set to false
. When a new Song
arrives, the loop prepends the song onto the song list and recurses. Notice that in that case, the flag remains false
.
Only when an EndOfStream
message arrives does the agent calculate the final recommendations. Afterwards, it recursively calls loop
with the flag raised (set to true
). Notice, however, that the agent doesn't stop handling messages, like the other agents do when encountering an EndOfStream
message.
At any time during execution, a Fetch
message may arrive. This is a request to return the recommendations, if they're ready. In that case, the recommendations
are wrapped in a Some
case and returned. If the recommendations are not yet ready, None
is returned instead.
This enables the overall, blocking method to poll for the recommendations until they are ready. You'll see how this works in a moment.
Polling for results #
The MailboxProcessor
class defines a PostAndAsyncReply
method that does, indeed, fit the Fetch
case of the above SongMessage
type. This enables us to implement a polling mechanism like this:
let rec private poll (agent : MailboxProcessor<_>) = task { match! agent.PostAndAsyncReply Fetch with | Some result -> return result | None -> return! poll agent }
This recursive action uses PostAndAsyncReply
to keep polling its agent until it receives a useful reply. Since this code is mostly meant for illustration purposes, I've allowed myself a few shortcuts.
First, this effectively implements a busy loop. Whenever it receives a None
reply, it immediately recurses to try again. A more reasonable implementation may put a small delay there, but I think that finding the optimal delay time may be a matter of experimentation. After all, if you're concerned with performance, race your horses. Given that this is demo code, I don't have any real horses to race, so I'm not going to try. From observation, however, it doesn't seem as though the tests, at least, run any slower despite the tight loop.
Secondly, the poll
loop keeps going until it receives a useful response. What if that never happens? A more robust implementation should implement some kind of timeout or ceiling that enables it to give up if it's been running for too long.
Apart from all that, how does the poll
action even type-check? On a MailboxProcessor<'Msg>
object, the PostAndAsyncReply
method has this type:
(AsyncReplyChannel<'Reply> -> 'Msg) -> Async<'Reply>
ignoring an optional timeout parameter.
The above Fetch
case constructor fits the type of PostAndAsyncReply
, since it has the type
AsyncReplyChannel<Option<IReadOnlyCollection<Song>>> -> SongMessage
This means that we can infer 'Reply
to be Option<IReadOnlyCollection<Song>>
. It also means that 'Msg
must be SongMessage
, and again we can infer that the agent
parameter has the type MailboxProcessor<SongMessage>
.
Composition #
With all components ready, we can now compose them as a blocking method. Notice that, in the following, the GetRecommendationsAsync
method hasn't changed type or observable behaviour. The change from Task
to Async
(described above) required some trivial changes to FakeSongService
, but apart from that, I had to change no test code to make this refactoring.
As a first attempt, we may compose the agents using the idiomatic left-to-right pipeline operator, like this:
type RecommendationsProvider (songService : SongService) = member _.GetRecommendationsAsync userName = let collect = collectRecommendations () task { do! collect |> gatherOtherScrobbles songService |> gatherOtherListeners songService |> gatherOwnScrobbles userName songService return! poll collect }
First, the task
expression starts all the agents, and then proceeds to poll
the collect
agent until it arrives at a result.
This passes all tests, but has at least two problems. One problem is that the composition seems backwards. It looks as though the process starts with collect
, then proceeds to gatherOtherScrobbles
, and so on. In reality, it's the other way around. You should really understand the composition as being defined 'bottom-up', or right-to-left, if we put it on a single line. We'll return to the other problem in a moment, but let's first see if we can do something about this one.
My first attempt to fix this problem was to try to use the reverse pipeline operator <|
, but due to precedence rules, it didn't work without parentheses. And if we need parentheses anyway, there's no reason to use the reverse pipeline operator.
type RecommendationsProvider (songService : SongService) = member _.GetRecommendationsAsync userName = let collect = collectRecommendations () task { do! gatherOwnScrobbles userName songService ( gatherOtherListeners songService ( gatherOtherScrobbles songService ( collect))) return! poll collect }
This composition uses a slightly unorthodox code formatting style. Since collect
is really nested inside of gatherOtherScrobbles
, it should really have been indented to the right of it. Likewise, gatherOtherScrobbles
is nested inside of gatherOtherListeners
, and so on. A more idiomatic formatting of the code might be something like this:
type RecommendationsProvider (songService : SongService) = member _.GetRecommendationsAsync userName = let collect = collectRecommendations () task { do! gatherOwnScrobbles userName songService ( gatherOtherListeners songService ( gatherOtherScrobbles songService collect)) return! poll collect }
This, however, blurs the semantics of the composition in favour of the mechanics of it. I don't consider it an improvement.
All of this, however, turns out to be moot because of the other problem. The MailboxProcessor
class implements IDisposable, and to be good citizens, we ought to dispose of the objects once we're done with them. This is possible, but we're now back to the backwards order of composition.
type RecommendationsProvider (songService : SongService) = member _.GetRecommendationsAsync userName = task { use collect = collectRecommendations () use otherScrobbles = gatherOtherScrobbles songService collect use otherListeners = gatherOtherListeners songService otherScrobbles do! gatherOwnScrobbles userName songService otherListeners return! poll collect }
This may not be an entirely unsolvable problem, but this is where I'm no longer interested in pursuing this line of inquiry much further. Instead of those 'factory actions' that create and return agents, you could refactor each agent into a separate object that, when disposed of, also disposes of any inner agents it may contain. If so, you could again compose these objects as shown above, and only dispose of the outer object.
Evaluation #
These various attempts to make the agents compose nicely, in a way that also works as self-documenting code, reveals that F# agents aren't as composable as ReactiveX. To be fair, the MailboxProcessor
class also predates Reactive Extensions, so we shouldn't blame it for not being as good as a more recent technology.
One major problem is that agents don't compose naturally, like IObservable<T> does. I briefly considered whether it'd be possible to make MailboxProcessor<'Msg>
a monad, but armed with the knowledge of variance imparted by Thinking with Types, I quickly realized that the type is invariant. This is easily seen because one method, Post
, is contravariant in 'Msg
, whereas most other methods are covariant. I'm using deliberately vague language, since there's no reason to calculate the kind of variance for all methods when you've already found two incompatible members.
Another fly in the ointment is that the collectRecommendations
action looks messy. As presented, it's not a pretty Impureim Sandwich. Most of the 'middle' message handling could be extracted to a pure function, were it not for the Fetch
case. Calling replyChannel.Reply
has a side effect, and while I know of refactorings that move side effects around, I'd need access to the replyChannel
in order to impart that effect from somewhere else. This would still be possible if I returned an action to be invoked, but I don't see much point in that. In general, that request-reply API doesn't strike me as particularly functional.
Based on all that griping, you may be wondering whether this kind of architecture is worth the trouble. Keep in mind, though, that some of the issues I just outlined is the result of the particular MailboxProcessor
API. It's not a given that if you use some other message-based framework, you'll run into the same issues.
You may also find the notion of posting messages to a chain of agents, only to poll one of them for the result, as carrying coals to Newcastle. Keep in mind, however, that the code presented here refactors a blocking method call that apparently takes about ten minutes to run to completion. It's possible that I read too much into the situation, but I'm guessing that the 'real' code base that was the inspiration for the example code, doesn't actually block for ten minutes in order to return a result. Rather, I still speculate, it's probably a background batch job that produces persisted views; e.g. as JSON files. If so, you'd really just want to trigger a new batch job and let it run to completion in the background. In such a scenario, I'd find an asynchronous, message-based architecture suitable for the job. In that case, you'd need no polling loop. Rather, you serve a persisted view whenever anyone asks for it, and once in a while, that persisted view has been updated by the background process.
Conclusion #
Compared to the previous article, which used Reactive Extensions to compose self-contained Recawr Sandwiches, using F# agents is a move towards a more standard kind of message-based architecture. Hopefully, it does a good enough job of illustrating how you can refactor an impure action into a composition of individual sandwiches, even if some of the details here are particular to F# agents.
It's not necessarily always the best solution to the underlying problem being addressed in this article series, but it seems appropriate if the problem of large data sets is combined with long running time. If you can convert the overall problem to a fire-and-forget architecture, a message-based system may be suitable.
If, on the other hand, you need to maintain the blocking nature of the operation, you may need to reach for the big, universal hammer.
Song recommendations with C# Reactive Extensions
Observables as small Recawr Sandwiches.
This article is part of a series titled Alternative ways to design with functional programming. In the previous article in the series, you read some general reflections on a pipes-and-filters architecture. This article gives an example in C#.
The code shown here is from the rx branch of the example code Git repository. As the name implies, it uses ReactiveX, also known as Reactive Extensions for .NET.
To be honest, when refactoring an algorithm that's originally based on sequences of data, there's nothing particularly reactive going on here. You may, therefore, argue that it's a poor fit for this kind of architecture. Be that as it may, keep in mind that the example code in reality runs for about ten minutes, so moving towards an architecture that supports progress reporting or cancellation may not be entirely inappropriate.
The advantage of using Reactive Extensions for this particular example is that, compared to full message-bus-based frameworks, it offers a lightweight developer experience, which enables us to focus on the essentials of the architecture.
Handle own scrobbles #
We'll start with the part of the process that finds the user's own top scrobbles. Please consult with Oleksii Holub's original article, or my article on characterizing the implementation, if you need a refresher.
When using Reactive Extensions, should we model this part as IObservable<T> or IObserver<T>?
Once you recall that IObservable<T>
, being a monad, is eminently composable, the choice is clear. The IObserver<T>
interface, on the other hand, gives rise to a contravariant functor, but since that abstraction has weaker language support, we should go with the monad.
We can start by declaring the type and its initializer:
public sealed class HandleOwnScrobblesObservable : IObservable<Scrobble> { private readonly string userName; private readonly SongService _songService; public HandleOwnScrobblesObservable(string userName, SongService songService) { this.userName = userName; _songService = songService; } // Implementation goes here...
Given a userName
we want to produce a (finite) stream of scrobbles, so we declare that the class implements IObservable<Scrobble>
. An instance of this class is also going to need the songService
, so that its implementation can query the out-of-process system that holds the data.
What's that, you say? Why does _songService
have a leading underscore, while the userName
field does not? Because Oleksii Golub used that naming convention for that service, but I don't feel obliged to stay consistent with that.
Given that we already have working code, the implementation is relatively straightforward.
public IDisposable Subscribe(IObserver<Scrobble> observer) { return Observable.Create<Scrobble>(Produce).Subscribe(observer); } private async Task Produce(IObserver<Scrobble> obs) { // Impure var scrobbles = await _songService.GetTopScrobblesAsync(userName); // Pure var scrobblesSnapshot = scrobbles .OrderByDescending(s => s.ScrobbleCount) .Take(100); // Impure foreach (var scrobble in scrobblesSnapshot) obs.OnNext(scrobble); obs.OnCompleted(); }
If you are unused to Reactive Extensions, the hardest part may be figuring out how to implement Subscribe
without getting bogged down with having to write too much of the implementation. Rx is, after all, a reusable library, so it should come with some building blocks for that, and it does.
It seems that the simplest way to implement Subscribe
is to delegate to Observable.Create
, which takes an expression as input. You can write the implementation inline as a lambda expression, but here I've used a private
helper method to slightly decouple the implementation from the library requirements.
The first impure step is the same as we've already seen in the 'reference implementation', and the pure step should be familiar too. In the final impure step, the Produce
method publishes the scrobbles to any subscribers that may be listening.
This is the step where you'll need to extrapolate if you want to implement this kind of architecture on another framework than Reactive Extensions. If you're using a distributed message-based framework, you may have a message bus on which you publish messages, instead of obs
. So obs.OnNext
may, instead, be bus.Publish
, or something to that effect. You may also need to package each scrobble in an explicit message object, and add correlation identifier and such.
In many message-based frameworks (NServiceBus, for example), you're expected to implement some kind of message handler where messages arrive at a Handle
method, typically on a stateless, long-lived object. This enables you to set up robust, distributed systems, but also comes with some overhead that requires you to coordinate or correlate messages.
In this code example, userName
is just a class field, and once the object is done producing messages, it signals so with obs.OnCompleted()
, after which the stream has ended.
The Rx implementation is simpler than some message-based systems I just outlined. That's the reason I chose it for this article. It doesn't mean that it's better, because that simplicity comes at the expense of missing capabilities. This system has no persistence, and I while I'm no expert in this field, I don't think it easily expands to a distributed system. And again, to perhaps belabour the obvious, I'm not insisting that any of those capabilities are always needed. I'm only trying to outline some of the trade-offs you should be aware of.
Handle other listeners #
HandleOwnScrobblesObservable
objects publish Scrobble
objects. What does the next 'filter' look like? It's another observable stream, implemented by a class called HandleOtherListenersObservable
. It implements IObservable<User>
, and its class declaration, constructor, and Subscribe
implementation look a lot like what's already on display above. The main difference is the Produce
method.
private async Task Produce(IObserver<User> obs) { // Impure var otherListeners = await _songService .GetTopListenersAsync(scrobble.Song.Id); // Pure var otherListenersSnapshot = otherListeners .Where(u => u.TotalScrobbleCount >= 10_000) .OrderByDescending(u => u.TotalScrobbleCount) .Take(20); // Impure foreach (var otherListener in otherListenersSnapshot) obs.OnNext(otherListener); obs.OnCompleted(); }
Compared to the reference architecture, this implementation is hardly surprising. The most important point to make is that, as was the goal all along, this is another small Recawr Sandwich.
A third observable, HandleOtherScrobblesObservable
, handles the third step of the algorithm, and looks much like HandleOtherListenersObservable
. You can see it in the Git repository.
Composition #
The three observable streams constitute most of the building blocks required to implement the song recommendations algorithm. Notice the 'fan-out' nature of the observables. The first step starts with a single userName
and produces up to 100 scrobbles. To handle each scrobble, a new instance of HandleOtherListenersObservable
is required, and each of those produces up to twenty User
notifications, and so on.
In the abstract, we may view the HandleOwnScrobblesObservable
constructor as a function from string
to IObservable<Scrobble>
. Likewise, we may view the HandleOtherListenersObservable
constructor as a function that takes a single Scrobble
as input, and gives an IObservable<User>
as return value. And finally, HandleOtherScrobblesObservable
takes a single User
as input to return IObservable<Song>
as output.
Quick, what does that look like, and how do you compose them?
Indeed, those are Kleisli arrows, but in practice, we use monadic bind to compose them. In C# this usually means SelectMany
.
public async Task<IReadOnlyList<Song>> GetRecommendationsAsync(string userName) { // 1. Get user's own top scrobbles // 2. Get other users who listened to the same songs // 3. Get top scrobbles of those users // 4. Aggregate the songs into recommendations var songs = await new HandleOwnScrobblesObservable(userName, _songService) .SelectMany(s => new HandleOtherListenersObservable(s, _songService)) .SelectMany(u => new HandleOtherScrobblesObservable(u, _songService)) .ToList(); return songs .OrderByDescending(s => s.Rating) .Take(200) .ToArray(); }
The fourth step of the algorithm, you may notice, isn't implemented as an observable, but rather as a standard LINQ pipeline. This is because sorting is required, and if there's a way to sort an observable, I'm not aware of it. After all, given that observable objects may represent infinite data streams, I readily accept that there's no OrderByDescending
method on IObservable<T>
. (But then, the System.Reactive library defines Min
and Max
operations, and exactly how those work when faced with infinite streams, I haven't investigated.)
While I could have created a helper function for that small OrderByDescending
/Take
/ToArray
pipeline, I consider it under the Fairbairn threshold.
Query syntax #
You can also compose the algorithm using query syntax, which I personally find prettier.
IObservable<Song> obs = from scr in new HandleOwnScrobblesObservable(userName, _songService) from usr in new HandleOtherListenersObservable(scr, _songService) from sng in new HandleOtherScrobblesObservable(usr, _songService) select sng; IList<Song> songs = await obs.ToList(); return songs .OrderByDescending(s => s.Rating) .Take(200) .ToArray();
In this code snippet I've used explicit variable type declarations (instead of using the var
keyword) for the sole purpose of making it easier to see which types are involved.
Conclusion #
This article shows an implementation example of refactoring the song recommendations problem to a pipes-and-filters architecture. It uses Reactive Extensions for .NET, since this showcases the (de)composition in the simplest possible way. Hopefully, you can extrapolate from this to a more elaborate, distributed asynchronous message-based system, if you need something like that.
The next example makes a small move in that direction.
Song recommendations with pipes and filters
Composing small Recawr Sandwiches.
This article is part of a larger series that outlines various alternative ways to design with functional programming. That (first) article contains a table of contents, as well as outlines the overall programme and the running example. In short, the example is a song recommendation engine that works on large data sets.
Previous articles in this series have explored refactoring to a pure function and composing impure actions with combinators. In the next few articles, we'll look at how to use message-based architectures to decouple the algorithm into smaller Recawr Sandwiches. As an overall concept, we may term such an architecture pipes and filters. The book Enterprise Integration Patterns is a great resource for this kind of (de)composition. Don't be mislead by the title: In essence, it's neither about enterprise programming nor integration.
Workflows of small sandwiches #
As speculated in Song recommendations as an Impureim Sandwich, the data sets involved when finding song recommendations for a user may be so large that an Impureim Sandwich is impractical.
On the other hand, a message-based system essentially consists of many small message handlers that each may be implemented as Impureim Sandwiches. I've already briefly discussed this kind of decomposition in Pure interactions, where each message handler or 'actor' is a small process equivalent to a web request handler (Controller) or similar.
When it comes to the song recommendation problem, we may consider a small pipes-and-filters architecture that uses small 'filters' to start even more work, handled by other filters, and so on.
Depending on the exact implementation details, you may call this pipes and filters, reactive functional programming, the actor model, map/reduce, or something else. What I consider crucial in this context is that each 'filter' is a small Recawr Sandwich. It queries the out-of-process music data service, applies a pure function to that data, and finally sends more messages over some impure channel. In the following articles, you'll see code examples.
As I'm writing this, I don't plan to supply a Haskell example. The main reason is that I've no experience with writing asynchronous message-based systems with Haskell, and while a quick web search indicates that there are plenty of libraries for reactive functional programming, on the other hand, I can't find much when it comes to message-bus-based asynchronous messaging.
Perhaps it'd be more idiomatic to supply an Erlang example, but it's been too many years since I tried teaching myself Erlang, and I never became proficient with it.
Modelling #
It turns out that when you decompose the song recommendation problem into smaller 'filters', each of which is a Recawr Sandwich, you arrive at a decomposition that looks suspiciously close to what we saw in Song recommendations from combinators. And as Oleksii Holub wrote,
"However, instead of having one cohesive element to reason about, we ended up with multiple fragments, each having no meaning or value of their own. While unit testing of individual parts may have become easier, the benefit is very questionable, as it provides no confidence in the correctness of the algorithm as a whole."
This remains true here. Why even bother, then?
The purpose of this article series is to present alternatives, just like Scott Wlaschin's excellent article Thirteen ways of looking at a turtle. It may be that a given design alternative isn't the absolute best fit for the song recommendation problem, but it may, on the other hand, be a good fit for some other problem that you run into. If so, you should be able to extrapolate from the articles in this series.
Conclusion #
Given that we know that the real code that the song recommendation example is based off runs for about ten minutes, some kind of asynchronous process that may support progress indication or cancellation may be worth considering. This easily leads us in the direction of some kind of pipes-and-filters architecture.
You can implement such an architecture with in-memory message streams, as the next article does, or you can go with a full-fledged messaging system based on persistent message buses and distributed, restartable message handlers. The details vary, but it's essentially the same kind of architecture.
A parser and interpreter for a very small language
A single Haskell script file.
I recently took the final exam in a course on programming language design. One of the questions was about a tiny language, and since this was a take-home exam running over many days, I had time to spare. Although it wasn't part of any questions, I decided to write an interpreter to back up some claims I made in my answers.
This article documents my prototype parser and interpreter.
Language description #
To be clear, the exam question was not to implement an interpreter, but rather some questions about attributes of the language. The description here is reprinted with kind permission from Torben Ægidius Mogensen.
Consider a functional language where values can be Booleans and pairs. A syntax for the language is given below:
Program → Function+ Function → Fid Pattern+ = Exp Pattern → Vid | true
|false
| (Pattern, Pattern)Exp → Vid | true
|false
| Fid Exp+ | (Exp)where Fid denotes function identifiers (which are lower case) and Vid denotes variable identifiers (which are upper case). There can be multiple rules for each functions, but rules must have disjoint patterns. All function calls must be fully applied (no partial applications, so no higher-order functions). A program is executed by calling any function with any argument constructed by pairs and Booleans. An example program is
and true X = X and false X = false alltrue true = true alltrue false = false alltrue (X, Y) = and (alltrue X) (alltrue Y)Calling
alltrue (true, (false, true))
will returnfalse
, butalltrue ((true, true), (true, true))
will returntrue
.
The exam goes on to ask some questions about termination as a property of the language, and whether or not it's Turing complete, but that's not the scope of this article. Rather, I'd like to describe a prototype parser and interpreter I wrote as a single throwaway script file in Haskell.
Declarations and imports #
The code is a single Haskell module that I interacted with via GHCi (the GHC REPL). It starts with a single pragma, a module declaration, and imports.
{-# LANGUAGE FlexibleContexts #-} module Bopa where import Control.Monad.Identity (Identity) import Data.Bifunctor (first) import Data.Foldable (find) import Data.List.NonEmpty (NonEmpty((:|))) import qualified Data.List.NonEmpty as NE import Data.Set (Set) import qualified Data.Set as Set import Text.Parsec
The parsec API requires the FlexibleContexts
language pragma. The name, Bopa I simply derived from Bools and Pairs, although I'm aware that this little combination of letters has quite an alternative connotation for many Danes, including myself.
Apart from the base
library, the packages parsec and containers are required. I didn't use an explicit build system, but if they're not already present on your system, you can ask GHCi to load them.
AST #
The above language description is a context-free grammar, which translates easily into Haskell type declarations.
type Program = NonEmpty Function data Function = Function { fid :: String, fpats :: NonEmpty Pattern, fbody :: Exp } deriving (Eq, Show) data Pattern = VarPat String | TPat | FPat | PairPat Pattern Pattern deriving (Eq, Show) data Exp = VarExp String | TExp | FExp | CallExp String (NonEmpty Exp) deriving (Eq, Show)
The description doesn't explicitly state how to interpret the superscript +, but I've interpreted it as meaning one or more. Therefore, a Program
is a NonEmpty list of Function
values. The same line of reasoning applies for the other places where the + sign appears.
Notice that there's more than one representation of Boolean values; TPat
is the true
pattern, while TExp
is the true
expression, and likewise for false
.
These types describe the entire language, and you can, in principle, create programs directly using this API. While I didn't do that (because I wrote a parser instead), here's what the above and
function looks like as an abstract syntax tree (AST):
Function "and" (TPat :| [VarPat "X"]) (VarExp "X") :| [Function "and" (FPat :| [VarPat "X"]) FExp]
Recall that the :|
operator is the NonEmpty
data constructor.
Source code parsers #
I wanted to be able to write programs directly in the Bopa language, and not just as ASTs, so the next step was writing parsers for each of the data types defined above. As strongly implied by the above imports, I used the parsec package for that.
The Program
type is only an alias, and once I have a parser for Function
, that one should be straightforward. The parser of Function
values is, however, more involved.
functionParser :: Stream s m Char => ParsecT s () m Function functionParser = do fnid <- many1 lower let pp = many1 (char ' ') >> patternParser -- Next line based on https://stackoverflow.com/a/65570028/126014 patterns <- (:|) <$> pp <*> pp `manyTill` try (many1 (char ' ') >> char '=') skipMany1 (char ' ') Function fnid patterns <$> expParser
I readily admit that I don't have much experience with parsec, so it's possible that this could be done more elegantly. As the comment indicates, I struggled somewhat with a detail or two. I had trouble making it consume patterns until it meets the '='
character.
The functionParser
depends on another parser named pairPatParser
, which again is composed from smaller parsers that handle each case of the Pattern
sum type.
varPatParser :: Stream s m Char => ParsecT s () m Pattern varPatParser = VarPat <$> many1 upper tPatParser :: Stream s m Char => ParsecT s () m Pattern tPatParser = string "true" >> return TPat fPatParser :: Stream s m Char => ParsecT s () m Pattern fPatParser = string "false" >> return FPat pairPatParser :: Stream s m Char => ParsecT s () m Pattern pairPatParser = do _ <- char '(' p1 <- patternParser _ <- char ',' _ <- skipMany $ char ' ' p2 <- patternParser _ <- char ')' return $ PairPat p1 p2 patternParser :: Stream s m Char => ParsecT s () m Pattern patternParser = try varPatParser <|> try tPatParser <|> try fPatParser <|> try pairPatParser
You might argue that the first three are so simple that they may not really qualify for the status of top-level values, but being a parsec newbie, I found that it helped me to structure the code that way. The only one of those values more complicated than a one-liner is, obviously, pairPatParser
. I later discovered between and sepBy1, so it's possible I could also have defined pairPatParser
as a composition of such combinators. I didn't, however, try, since this is, after all, throwaway prototype code, and what's there already works as intended.
As an aside, I would usually keenly attempt such refactorings, but I was working without automated tests. Yes, shocking, I know, but setting up unit tests for Haskell is, unfortunately, a bit of a hassle, and given the nature of the work, I considered doing without tests a reasonable trade-off.
This takes care of parsing Pattern
values, but notice that functionParser
also depends on expParser
, which, not surprisingly, parses Exp
values. Like patternParser
it does that by defining a helper parser for each sum type case, and then combining them into one larger parser.
varExpParser :: Stream s m Char => ParsecT s () m Exp varExpParser = VarExp <$> many1 upper tExpParser :: Stream s m Char => ParsecT s () m Exp tExpParser = string "true" >> return TExp fExpParser :: Stream s m Char => ParsecT s () m Exp fExpParser = string "false" >> return FExp callExpParser :: Stream s m Char => ParsecT s () m Exp callExpParser = do fnid <- many1 lower skipMany1 (char ' ') exps <- NE.fromList <$> expParser `sepBy1` many1 (char ' ') return $ CallExp fnid exps expParser :: Stream s m Char => ParsecT s () m Exp expParser = try varExpParser <|> try tExpParser <|> try fExpParser <|> try callExpParser <|> between (char '(') (char ')') expParser
Even though I generally favoured implementing each sum type case in a separate, named parser, I inlined parsing of the parenthesized expression; partly because it's so simple, and partly because I didn't know what to call it.
You can see that at this point, I'd discovered the between
and sepBy1
combinators.
Finally, it's possible to compose all these smaller parsers together to a parser of Bopa programs.
programParser :: Stream s m Char => ParsecT s () m (NonEmpty Function) programParser = NE.fromList <$> functionParser `sepEndBy1` many1 endOfLine
This, however, is parser. How do you run it?
Here's a way:
parseProgram :: Stream s Identity Char => s -> Either ParseError (NonEmpty Function) parseProgram = parse programParser ""
You may, for example, try to parse the above and
function:
ghci> parseProgram "and true X = X\nand false X = false" Right (Function {fid = "and", fpats = TPat :| [VarPat "X"], fbody = VarExp "X"} :| [Function {fid = "and", fpats = FPat :| [VarPat "X"], fbody = FExp}])
(Output manually formatted to improve readability.)
In practice, however, I didn't much do that. Instead, I created source code files and loaded them with the basic file-reading APIs included in the base
package. You'll see examples of this later.
Arguments #
As described, running a program requires construction of a Boolean value, or pairs of Boolean values, something the language itself does not allow. That's the reason I haven't yet modelled it.
data Arg = TArg | FArg | PairArg Arg Arg deriving (Eq, Ord, Show)
Notice that true
and false
gets yet another representation as either TArg
or FArg
.
If I want to be able to run programs by typing alltrue (true, (false, true))
, instead of painstakingly creating ASTs, I need a parser for this data type as well. That's not going to be a source code parser, but rather part of a command-line parser.
tArgParser :: Stream s m Char => ParsecT s () m Arg tArgParser = string "true" >> return TArg fArgParser :: Stream s m Char => ParsecT s () m Arg fArgParser = string "false" >> return FArg pairArgParser :: Stream s m Char => ParsecT s () m Arg pairArgParser = do _ <- char '(' p1 <- argParser _ <- char ',' _ <- skipMany $ char ' ' p2 <- argParser _ <- char ')' return $ PairArg p1 p2 argParser :: Stream s m Char => ParsecT s () m Arg argParser = tArgParser <|> fArgParser <|> pairArgParser
To be honest, I think that I just copied and pasted pairPatParser
and changed a few things. It looks that way, doesn't it?
Entry points #
In order to execute a program, you need more than arguments. You need to define which function to call. I decided that this was close enough to defining a program entry point that it gave name to the next type.
data Entry = Entry String (NonEmpty Arg) deriving (Eq, Show)
The String
value identifies the desired function by name, and the NonEmpty
list supplies the arguments.
Since I wish to be able to run a program by writing e.g. alltrue ((true, true), (true, true))
, I need a parser for that, too.
entryParser :: Stream s m Char => ParsecT s () m Entry entryParser = do fnid <- many1 lower skipMany1 (char ' ') args <- NE.fromList <$> argParser `sepBy1` many1 (char ' ') return $ Entry fnid args
This, again, is a parser; it's convenient to also define a function to run it against input.
parseEntry :: Stream s Identity Char => s -> Either ParseError Entry parseEntry = parse entryParser ""
Let's see if it works:
ghci> parseEntry "alltrue ((true, true), (true, true))" Right (Entry "alltrue" (PairArg (PairArg TArg TArg) (PairArg TArg TArg) :| []))
That seems promising.
Parameter binding #
Armed with the ability to parse programs as well as entry points, 'all' that remains is to execute the program. To that end, I wrote an interpreter. It works with a few helper functions, the first of which attempts to bind patterns to arguments.
For example, if we have a variable-name pattern such as X
and an argument such as (true, false)
, we can bind X
to that value. Some examples will help, but I'll show the function first, and then talk you through it.
-- Attempt pattern matching and, if possible, bind variables to arguments. -- Returns an association list of bound variables (an 'environment'), if any. -- Returns Left with an error message if no match. tryBind :: NonEmpty Pattern -> NonEmpty Arg -> Either String [(String, Arg)] tryBind (VarPat p :| []) (arg :| []) = Right [(p, arg)] tryBind (TPat :| []) (TArg :| []) = Right [] tryBind (FPat :| []) (FArg :| []) = Right [] tryBind (PairPat p1 p2 :| []) ((PairArg a1 a2) :| []) = let b1 = tryBind (NE.singleton p1) (NE.singleton a1) b2 = tryBind (NE.singleton p2) (NE.singleton a2) in (++) <$> b1 <*> b2 tryBind (pat :| (p:ps)) (arg :| (a:as)) = let b = tryBind (NE.singleton pat) (NE.singleton arg) bs = tryBind (p :| ps) (a :| as) in (++) <$> b <*> bs tryBind _ args = Left ("Could not match " ++ show args ++ ".")
Notice the type declaration: The function takes a NonEmpty
list of Pattern
values, and another NonEmpty
list of Arg
values. The first precondition in order to achieve a successful result is that these two lists need to have the same length. If we have more arguments than patterns, we run out of patterns. If we have more patterns than arguments, we can't bind all the parameters in the patterns, and partial application is not allowed.
The first four rules of the tryBind
function attempt to match a single Pattern
value to a single Arg
value; notice the use of the :|
NonEmpty
data constructor: In all four cases, the tail of the NonEmpty
lists only matches the empty list []
.
The first rule, for example, has a single variable pattern, where p
is the variable name, and a single argument arg
, so that pattern matching succeeds and the variable name is bound to the argument. Here's an example:
ghci> tryBind (VarPat "X" :| []) (PairArg TArg FArg :| []) Right [("X",PairArg TArg FArg)]
The result is a variable environment in which the variable name X
is bound to the value PairArg TArg FArg
(that is, (true, false)
).
Sometimes, when matching literals, no variables are bound, in which case the environment is empty:
ghci> tryBind (TPat :| []) (TArg :| []) Right []
While the environment itself is empty, the result is still a Right
case, indicating that the pattern matched the argument. This, of course, need not be the case:
ghci> tryBind (TPat :| []) (FArg :| []) Left "Could not match FArg :| []."
The rule that attempts to match a pair with a pair argument recursively calls tryBind
for the left and the right element, and then uses the Applicative
nature of Either
to compose those two results.
ghci> tryBind (PairPat TPat (VarPat "Y") :| []) (PairArg TArg FArg :| []) Right [("Y",FArg)]
In this example, you see how a pair pattern composed of (true, Y)
matches the argument (true, false)
, resulting in the variable environment where Y
is bound to false
.
The final Right
-valued match is when there's more than a single pattern, and more than a single argument. In that case, the function recursively calls itself with the heads of each NonEmpty
list, as well as the tails of each NonEmpty
list.
ghci> tryBind (PairPat TPat (VarPat "Y") :| [VarPat "Z"]) (PairArg TArg FArg :| [PairArg FArg TArg]) Right [("Y",FArg),("Z",PairArg FArg TArg)]
In this example, we try to bind the variables in the patterns (true, Y) Z
with the arguments (true, false) (false, true)
, producing the variable environment where Y
is bound to false
, and Z
is bound to (false, true)
.
This exhausts all the legal bindings, so the final, wildcard pattern in tryBind
returns a Left
value indicating the failure. You've already seen an example of that, above.
That function is a bit of a mouthful, but fortunately, we've now covered a major part of the interpreter.
Pattern matching #
The tryBind
function attempts to bind a single list of patterns to a list of arguments. A function may, however, list several (non-overlapping) rules, so if the first pattern list doesn't match, the interpreter must try the second, the third, and so on, until there are no more patterns to try. While tryBind
does the heavy lifting, another function goes through the list of rules.
-- Goes through one or more function rules, looking for a match. -- All the functions in the function list are assumed to have the same name, so -- that they are all rules of the same function. -- This precondition is not checked here, but handled by the caller. This isn't -- the best implementation decision, but this is, after all, a prototype. tryMatch :: NonEmpty Function -> NonEmpty Arg -> Either [Char] ([(String, Arg)], Exp) tryMatch (Function _ pats body :| []) args = (, body) <$> tryBind pats args tryMatch (Function _ pats body :| (f : fs)) args = case tryBind pats args of Right b -> Right (b, body) Left _ -> tryMatch (f :| fs) args
There are two (Haskell) rules for tryMatch
: One where there's only one Function
rule, and one where there's more than one.
In the first case, tryMatch
delegates to tryBind
, but if the binding attempt succeeds, also returns the body.
ghci> tryMatch (Function "and" (FPat :| [VarPat "X"]) FExp :| []) (FArg :| [TArg]) Right ([("X",TArg)],FExp)
This example attempts to bind the second rule of the above and
function. Compare the input to the AST for and
shown above. The result is a tuple where the first, or left, element is the variable environment, and the second, or right, element is the expression that matched.
It's important to return the matching expression, since tryMatch
doesn't in itself evaluate the body
. In case of multiple rules, the interpreter needs to know which body is associated with the matching pattern.
ghci> tryMatch (Function "and" (TPat :| [VarPat "X"]) (VarExp "X") :| [Function "and" (FPat :| [VarPat "X"]) FExp]) (TArg :| [TArg]) Right ([("X",TArg)],VarExp "X") ghci> tryMatch (Function "and" (TPat :| [VarPat "X"]) (VarExp "X") :| [Function "and" (FPat :| [VarPat "X"]) FExp]) (FArg :| [TArg]) Right ([("X",TArg)],FExp)
(Inputs manually formatted for improved readability.)
These two examples try to pattern match the above and
function. In the first example, the input is true false
, which matches the first rule and true X = X
. Therefore, the return value is Right ([("X",TArg)],VarExp "X")
, indicating a new variable environment in which X
is bound to true
, and the matching body
is VarExp "X"
, indicating that the variable X
is returned.
In the second example, the input is (false, true)
, which now matches the second rule and false X = false
. The returned tuple now indicates that X
is still bound to true
, but the returned body
is now FExp
, indicating the constant return value false
.
In both cases, tryMatch
starts in the second (Haskell) rule, since there are two parameters. In the first example, the first call to tryBind
immediately returns a Right
result, which is then returned. In the second example, on the other hand, the first call to tryBind
returns a Left
-value result, which causes tryMatch
to recurse back on itself with the remaining (Bopa) rules.
Evaluation #
Given a variable environment and an expression, it's now possible to evaluate the expression to a value.
-- Evaluate an expression, given a program (AST) and an environment. -- Also required as input is a set used for cycle detection. Set elements are -- tuples, each consisting of a function identifier (name) and arguments to that -- function. If the evaluator recursively sees that tuple again, it has detected -- a cycle, and stops further evaluation. eval :: Foldable t => Set (String, NonEmpty Arg) -> t (NonEmpty Function) -> [(String, Arg)] -> Exp -> Either String Arg eval _ _ env (VarExp name) = maybe (Left ("Could not find variable " ++ name ++ ".")) Right $ lookup name env eval _ _ _ TExp = Right TArg eval _ _ _ FExp = Right FArg eval observedCalls prog env (CallExp fnid exps) = do rules <- maybe (Left ("Could not find function " ++ fnid ++ ".")) Right $ find ((fnid ==) . fid . NE.head) prog args <- traverse (eval observedCalls prog env) exps (env', body) <- tryMatch rules args if Set.member (fnid, args) observedCalls then Left "Cycle detected." else eval (Set.insert (fnid, args) observedCalls) prog env' body
This looks like quite a mouthful, but notice that almost half of this code listing is a comment and a type declaration.
As the comment indicates, this function includes cycle detection, which was prompted by the exam questions related to the property of termination. You'll see an example of this later.
The eval
function pattern matches the four different cases of the Exp
sum type. In the first case, if the expression is a variable expression, it tries to lookup
the variable in the environment. If found, it's returned; otherwise, an error message is returned.
The two next (Haskell) rules simply translate the Boolean representations from patterns to argument values.
Finally, if the expression is a function call, more work needs to be done. First, eval
tries to find
the function in the program. The eval
function expects the program prog
to be grouped in function rules. For example, it'd expect the above and
function to be a NonEmpty
list of Function
values, and it'd expect, say, alltrue
to be another NonEmpty
list containing three Function
values.
If eval
finds the named function, it proceeds to evaluate all the expressions (exps
) that make up the arguments. It traverses exps
and calls itself recursively for each argument.
Armed with both rules
and args
it calls tryMatch
to get a new variable environment and the body
that matched. If it gets past the cycle detection, it proceeds to call itself recursively with the new environment and the body
that matched.
Supplying a direct example of calling this function is becoming awkward, as it requires balancing quite a few parentheses, but it can be done.
ghci> eval Set.empty [Function "and" (TPat :| [VarPat "X"]) (VarExp "X") :| [Function "and" (FPat :| [VarPat "X"]) FExp]] [("X",TArg)] TExp Right TArg
(Input manually formatted for improved readability.)
This example starts with an empty cycle-detection set, the rules group for and
, a variable environment in which X
is already bound to true
, and evaluates the expression TExp
(i.e. true
). The result is TArg
(i.e. true
) wrapped in Right
, indicating that evaluation was successful.
Interpretation #
All building blocks for an interpreter are now in place.
-- Interpret a program (AST), given an entry point and its arguments. interpret :: Foldable f => f Function -> Entry -> Either String Arg interpret prog (Entry fnid args) = do let functions = NE.groupWith fid prog -- Group function rules together -- The rules that make up `fnid`: rules <- maybe (Left ("Could not find function " ++ fnid ++ ".")) Right $ find ((fnid ==) . fid . NE.head) functions (env, body) <- tryMatch rules args eval Set.empty functions env body
This function expects that the program (prog
) supplied to it is the raw result of parsing a program. The parser doesn't group identically-named function rules together, so that's the first thing that interpret
does.
It then proceeds to look through functions
to find
the function indicated by the entry point. If it succeeds, it calls tryMatch
to identify the environment and the body to be evaluated. Finally, it calls eval
with these values.
ghci> interpret [Function "and" (TPat :| [VarPat "X"]) (VarExp "X"), Function "and" (FPat :| [VarPat "X"]) FExp] (Entry "and" (TArg :| [TArg])) Right TArg
(Input manually formatted for improved readability.)
Like all the above examples, this example processes the and
function, calling it with the input values true true
, which returns a value representing true
, just as we'd expect.
The interpreter seems to be working as intended, but it works on the AST. It's time to connect the parsers with the interpreter.
Formatting results #
It'd be more convenient if we feed some source code and a function call into a function and have it spit out the result. In order to make the result prettier, I first added a little formatter for Arg
:
formatArg :: Arg -> String formatArg TArg = "true" formatArg FArg = "false" formatArg (PairArg a1 a2) = "(" ++ formatArg a1 ++ ", " ++ formatArg a2 ++ ")"
Not surprisingly, formatArg
calls itself recursively in order to deal with pairs, and nested pairs.
ghci> formatArg (PairArg TArg (PairArg FArg TArg)) "(true, (false, true))"
It's not really required in order to parse and run a program, but I think that such a function should produce output that looks like the input fed into it.
Running programs #
All building blocks are now in place to compose a function that parses and runs a program.
-- Run a given program source and a command that identifies entry point and -- arguments. -- Despite the generalized type, it can be called as -- String -> String -> Either String String run :: (Stream s1 Identity Char, Stream s2 Identity Char) => s1 -> s2 -> Either String String run source cmd = do prog <- first show $ parseProgram source exec <- first show $ parseEntry cmd formatArg <$> interpret prog exec
As the comment suggests, you can call it by feeding it two string literals:
ghci> run "and true X = X\nand false X = false" "and true true" Right "true"
Having to supply entire programs from the REPL gets old fast, however, so instead you can save source code as files. I saved the original examples (containing and
and alltrue
) in a file named ex.bopa
. This enabled me to load the file and call functions in it:
ghci> run <$> readFile "ex.bopa" <*> pure "alltrue (true, (false, true))" Right "false" ghci> run <$> readFile "ex.bopa" <*> pure "alltrue ((true, true), (true, true))" Right "true"
Those are the two examples originally included in the exam set, and fortunately the results are correct.
A few more examples #
I wanted to subject my code to a bit more testing, so wrote a few more example programs. This one I saved in a file called evenodd.bopa
:
and true X = X and false X = false or true X = true or false X = X not true = false not false = true odd true = true odd false = true odd (X, Y) = or (and (odd X) (even Y)) (and (even X) (odd Y)) even X = not (odd X)
The idea with odd
is that it indicates whether the input contains an odd number of Boolean values; of course, even
is then the negation of odd
.
ghci> run <$> readFile "evenodd.bopa" <*> pure "odd true" Right "true" ghci> run <$> readFile "evenodd.bopa" <*> pure "even true" Right "false" ghci> run <$> readFile "evenodd.bopa" <*> pure "odd (true, false)" Right "false" ghci> run <$> readFile "evenodd.bopa" <*> pure "even (true, false)" Right "true" ghci> run <$> readFile "evenodd.bopa" <*> pure "odd (true, (false, true))" Right "true"
Ad hoc tests like these gave me confidence that things aren't completely wrong.
Cycle detection #
Finally, you may be curious to see whether the cycle detection works. The simplest example I could come up with was this:
ghci> run "forever X = forever X" "forever false" Left "Cycle detected."
Even so, I also wanted to test that it works for a small cycle that involves more than one function, so I saved the following in a file called tictactoe.bopa
:
tic X = tac X tac X = toe X toe X = tic X foo (false, Y) = Y foo (true, Y) = tic Y
These functions may cause an infinite cycle, depending on input.
ghci> run <$> readFile "tictactoe.bopa" <*> pure "foo (false, (true, false))" Right "(true, false)" ghci> run <$> readFile "tictactoe.bopa" <*> pure "foo (true, (true, false))" Left "Cycle detected."
The run
function implements an algorithm that is always able to determine, in finite time, whether a program terminates or not. Thus, in case you're wondering: The language isn't Turing complete.
Conclusion #
Implementing a parser and interpreter for the Bopa language wasn't part of the exam question, but I had some time to spare, and also found that I had trouble describing, in unambiguous terms, how to detect termination. I decided to write the interpreter to show a code example, and then took on the parser as an extra exercise.
It took me a long day of intense coding to produce the prototype shown here, including the various example Bopa programs. No AI was involved. It was fun.
Song recommendations from Haskell combinators
Traversing lists of IO. A refactoring.
This article is part of a series named Alternative ways to design with functional programming. In the previous article, you saw how to refactor the example code base to a composition of standard F# combinators. It's a pragmatic solution to the problem of dealing with lots of data in a piecemeal fashion, but although it uses concepts and programming constructs from functional programming, I don't consider it a proper functional architecture.
You'd expect the Haskell version to be the most idiomatic of the three language variations, but ironically, I had more trouble making the code in this article look nice than I had with the F# variation. You'll see what the problem is later, but it boils down to a combination of Haskell's right-to-left default composition order, and precedence rules of some of the operators.
Please consult the previous articles for context about the example code base. The code shown in this article is from the combinators Git branch. It refactors the code shown in the article Porting song recommendations to Haskell.
The goal is to extract pure functions from the overall recommendations algorithm and compose them using standard combinators, such as =<<
, <$>
, and traverse
.
Getting rid of local mutation #
My first goal was to get rid of the IORef
-based local mutation shown in the 'baseline' code base. That wasn't too difficult. If you're interested in the micro-commits I made to get to that milestone, you can consult the Git repository. The interim result looked like this:
getRecommendations srvc un = do -- 1. Get user's own top scrobbles -- 2. Get other users who listened to the same songs -- 3. Get top scrobbles of those users -- 4. Aggregate the songs into recommendations -- Impure scrobbles <- getTopScrobbles srvc un -- Pure let scrobblesSnapshot = take 100 $ sortOn (Down . scrobbleCount) scrobbles -- Impure recommendations <- join <$> traverse (\scrobble -> fmap join $ traverse (\otherListener -> fmap scrobbledSong . take 10 . sortOn (Down . songRating . scrobbledSong) . filter (songHasVerifiedArtist . scrobbledSong) <$> getTopScrobbles srvc (userName otherListener)) . take 20 <$> sortOn (Down . userScrobbleCount) . filter ((10_000 <=) . userScrobbleCount) =<< getTopListeners srvc (songId $ scrobbledSong scrobble)) scrobblesSnapshot -- Pure return $ take 200 $ sortOn (Down . songRating) recommendations
Granted, it's not the most readable way to present the algorithm, but it is, after all, only an intermediate step. As usual, I'll remind the reader that Haskell code should, by default, be read from right to left. When split over multiple lines, this also means that an expression should be read from the bottom to the top. Armed with that knowledge (and general knowledge of Haskell), combined with some helpful indentation, it's not altogether unreadable, but not something I'd like to come back to after half a year. And definitely not something I would foist upon (hypothetical) colleagues.
The careful reader may notice that I've decided to use the reverse bind operator =<<
, rather than the standard >>=
operator. I usually do that with Haskell, because most of Haskell is composed from right to left, and =<<
is consistent with that direction. The standard >>=
operator, on the other hand, composes monadic actions from left to right. You could argue that that's more natural (to Western audiences), but since everything else stays right-to-left biased, using >>=
confuses the reading direction.
As a Westerner, I prefer left-to-right reading order, but in general I've found it hard to fight Haskell's bias in the other direction.
As the -- Pure
and -- Impure
comments indicate, interleaving the pure functions with impure actions makes the entire expression impure. The more I do that, the less pure code remains.
Single expression #
Going from from the above snapshot to a single impure expression doesn't require many more steps.
getRecommendations srvc un = -- 1. Get user's own top scrobbles -- 2. Get other users who listened to the same songs -- 3. Get top scrobbles of those users -- 4. Aggregate the songs into recommendations take 200 . sortOn (Down . songRating) <$> ((\scrobbles -> join <$> traverse (\scrobble -> fmap join $ traverse (\otherListener -> fmap scrobbledSong . take 10 . sortOn (Down . songRating . scrobbledSong) . filter (songHasVerifiedArtist . scrobbledSong) <$> getTopScrobbles srvc (userName otherListener)) . take 20 <$> sortOn (Down . userScrobbleCount) . filter ((10_000 <=) . userScrobbleCount) =<< getTopListeners srvc (songId $ scrobbledSong scrobble)) (take 100 $ sortOn (Down . scrobbleCount) scrobbles)) =<< getTopScrobbles srvc un)
Neither did it improve readability.
Helper functions #
As in previous incarnations of this exercise, it helps if you extract some well-named helper functions, like this one:
getUsersOwnTopScrobbles :: [Scrobble] -> [Scrobble] getUsersOwnTopScrobbles = take 100 . sortOn (Down . scrobbleCount)
As a one-liner, that one perhaps isn't that impressive, but none of them are particularly complicated. The biggest function is this:
getTopScrobblesOfOtherUsers :: [Scrobble] -> [Song] getTopScrobblesOfOtherUsers = fmap scrobbledSong . take 10 . sortOn (Down . songRating . scrobbledSong) . filter (songHasVerifiedArtist . scrobbledSong)
You can see the rest in the Git repository. None of them are exported by the module, which makes them implementation details that you may decide to change or remove at a later date.
You can now compose the overall action.
getRecommendations srvc un = -- 1. Get user's own top scrobbles -- 2. Get other users who listened to the same songs -- 3. Get top scrobbles of those users -- 4. Aggregate the songs into recommendations (aggregateTheSongsIntoRecommendations . getTopScrobblesOfOtherUsers) . join <$> ((traverse (getTopScrobbles srvc . userName) . getOtherUsersWhoListenedToTheSameSongs) . join =<< (traverse (getTopListeners srvc . (songId . scrobbledSong)) . getUsersOwnTopScrobbles =<< getTopScrobbles srvc un))
Some of the parentheses break over multiple lines in a non-conventional way. This is my best effort to format the code in a way that emphasises the four steps comprising the algorithm, while still staying within the bounds of the language, and keeping hlint silent.
I could try to argue that if you squint a bit, the operators and other glue like join
should fade into the background, but in this case, I don't even buy that argument myself.
It bothers me that it's so hard to compose the code in a way that approaches being self-documenting. I find that the F# composition in the previous article does a better job of that.
Syntactic sugar #
The stated goal in this article is to demonstrate how it's possible to use standard combinators to glue the algorithm together. I've been complaining throughout this article that, while possible, it leaves the code less readable than desired.
That one reader who actually knows Haskell is likely frustrated with me. After all, the language does offer a way out. Using the syntactic sugar of do
notation, you can instead write the composition like this:
getRecommendations srvc un = do -- 1. Get user's own top scrobbles -- 2. Get other users who listened to the same songs -- 3. Get top scrobbles of those users -- 4. Aggregate the songs into recommendations userTops <- getTopScrobbles srvc un <&> getUsersOwnTopScrobbles otherListeners <- traverse (getTopListeners srvc . (songId . scrobbledSong)) userTops <&> getOtherUsersWhoListenedToTheSameSongs . join songs <- traverse (getTopScrobbles srvc . userName) otherListeners <&> getTopScrobblesOfOtherUsers . join return $ aggregateTheSongsIntoRecommendations songs
By splitting the process up into steps with named variables, you can achieve the much-yearned-for top-to-bottom reading order. Taking advantage of the <&>
operator from Data.Functor we also get left-to-right reading order on each line.
That's the best I've been able to achieve under the constraint that the IO
-bound operations stay interleaved with pure functions.
Conclusion #
Mixing pure functions with impure actions like this is necessary when composing whole programs (usually at the entry point; i.e. main
), but shouldn't be considered good functional-programming style in general. The entire getRecommendations
action is impure, being non-deterministic.
Still, even Haskell code eventually needs to compose code in this way. Therefore, it's relevant covering how this may be done. Even so, alternative architectures exist.
Comments
A more general solution would be to use the parallel random testing described in the talk by John Hughes here.