Song recommendations with Haskell free monads by Mark Seemann
A surprisingly easy refactoring.
This article is part of a larger series titled Alternative ways to design with functional programming. In short, it uses Haskell, F#, and C# to present various internal architectures to deal with an example problem. Please refer to the table of contents included with the first article to get a sense of what has already been covered.
In this article, you'll see how a free monad may be used to address the problem that when data size is sufficiently large, you may need to load it piecemeal, based on the results of previous steps in an algorithm. In short, the problem being addressed is to calculate a list of song recommendations based on so-called 'scrobble' data from a multitude of other users' music playback history.
If you want to follow along with the Git repository, the code presented here is from the Haskell repository's free branch.
Starting point #
Instead of starting from scratch from the code base presented in Porting song recommendations to Haskell, I'll start at what was an interim refactoring step in Song recommendations from Haskell combinators:
getRecommendations :: SongService a => a -> String -> IO [Song] getRecommendations srvc un = do -- 1. Get user's own top scrobbles -- 2. Get other users who listened to the same songs -- 3. Get top scrobbles of those users -- 4. Aggregate the songs into recommendations -- Impure scrobbles <- getTopScrobbles srvc un -- Pure let scrobblesSnapshot = take 100 $ sortOn (Down . scrobbleCount) scrobbles -- Impure recommendations <- join <$> traverse (\scrobble -> fmap join $ traverse (\otherListener -> fmap scrobbledSong . take 10 . sortOn (Down . songRating . scrobbledSong) . filter (songHasVerifiedArtist . scrobbledSong) <$> getTopScrobbles srvc (userName otherListener)) . take 20 <$> sortOn (Down . userScrobbleCount) . filter ((10_000 <=) . userScrobbleCount) =<< getTopListeners srvc (songId $ scrobbledSong scrobble)) scrobblesSnapshot -- Pure return $ take 200 $ sortOn (Down . songRating) recommendations
The reason I wanted to start here is that with the IORef
out of the way, the only IO
-bound code remaining involves the SongService
methods.
The plan is to replace the SongService
type class with a free monad. Once that's done, there's no more IO
left; it will have been replaced by the free monad. That's why it's easiest to first get rid of everything else involving IO
. If I didn't, the refactoring wouldn't be as easy. As Kent Beck wrote,
"for each desired change, make the change easy (warning: this may be hard), then make the easy change"
Starting from the above incarnation of the code makes the change easy.
Functor #
As a reminder, this type class is the one I'm getting rid of:
class SongService a where getTopListeners :: a -> Int -> IO [User] getTopScrobbles :: a -> String -> IO [Scrobble]
Converting such an 'interface' to a sum type of instructions is such a well-defined process that I think it could be automated (if it were worth the effort). You take each method of the type class and make it a case in the sum type.
data SongInstruction a = GetTopListeners Int ([User] -> a) | GetTopScrobbles String ([Scrobble] -> a) deriving (Functor)
I usually think of such a type as representing possible instructions in a little domain-specific language (DSL). In this case, the DSL allows only two instructions: GetTopListeners
and GetTopScrobbles
.
If you've never seen a free monad before, you may be wondering: What's the a
? It's what we may call the carrier type. If you haven't worked with, say, F-Algebras, this takes some time getting used to, but the carrier type represents a 'potential'. As presented here, it may be anything, but when you want to evaluate or 'run' the program, the a
turns out to be the return type of the evaluation.
Since SongInstruction
is a Functor
, it's possible to map a
to b
, or one concrete type to another, so that the a
type parameter can take on more than one concrete type during a small 'program' written in the DSL.
In any case, at this point, SongInstruction a
is only a Functor
, and not yet a Monad
.
Free monad #
You may, if you want to, introduce a type alias that makes SongInstruction a
(or, rather, its Free
wrapper) a Monad
.
type SongProgram = Free SongInstruction
The Free
wrapper comes from Control.Monad.Free.
In the rest of the code listings in this article, I make no use of this type alias, so I only present it here because it's the type the compiler will infer when running the test. In other words, while never explicitly stated, this is going to be the de-facto free monad in this article.
From action to function #
Changing the getRecommendations
action to a function that returns (effectively) Free (SongInstruction [Song])
turned out to be easier than I had expected.
Since I've decided to omit type declarations whenever possible, refactoring is easier because the type inferencing system can allow changes to function and action types to ripple through to the ultimate callers. (To be clear, however, I here show some of the code with type annotations. I've only added those in the process of writing this article, as an aid to you, the reader. You will not find those type annotations in the Git repository.)
First, I added a few helper methods to create 'program instructions':
getTopListeners sid = liftF $ GetTopListeners sid id getTopScrobbles un = liftF $ GetTopScrobbles un id
The liftF
function wraps a Functor
(here, SongInstruction
) in a free monad. This makes it easer to write programs in that DSL.
The only changes now required to getRecommendations
is to remove srvc
in four places:
getRecommendations :: MonadFree SongInstruction m => String -> m [Song] getRecommendations un = do -- 1. Get user's own top scrobbles -- 2. Get other users who listened to the same songs -- 3. Get top scrobbles of those users -- 4. Aggregate the songs into recommendations -- Impure scrobbles <- getTopScrobbles un -- Pure let scrobblesSnapshot = take 100 $ sortOn (Down . scrobbleCount) scrobbles -- Impure recommendations <- join <$> traverse (\scrobble -> fmap join $ traverse (\otherListener -> fmap scrobbledSong . take 10 . sortOn (Down . songRating . scrobbledSong) . filter (songHasVerifiedArtist . scrobbledSong) <$> getTopScrobbles (userName otherListener)) . take 20 <$> sortOn (Down . userScrobbleCount) . filter ((10_000 <=) . userScrobbleCount) =<< getTopListeners (songId $ scrobbledSong scrobble)) scrobblesSnapshot -- Pure return $ take 200 $ sortOn (Down . songRating) recommendations
It's almost the same code as before. The only difference is that srvc
is not longer a parameter to either getRecommendations
, getTopListeners
, or getTopScrobbles
.
Again, I'll point out that I've only added the type annotation for the benefit of you, the reader. We see now, however, that the return type is m [Song]
, where m
is any MonadFree SongInstruction
.
Interpreter #
As described in Porting song recommendations to Haskell I use a FakeSongService
as a Test Double. SongService
is now gone, but I can repurpose the instance implementation as an interpreter of SongInstruction
programs.
interpret :: FakeSongService -> Free SongInstruction a -> a interpret (FakeSongService ss us) = iter eval where eval (GetTopListeners sid next) = next $ uncurry User <$> Map.toList (sum <$> Map.filter (Map.member sid) us) eval (GetTopScrobbles un next) = next $ fmap (\(sid, c) -> Scrobble (ss ! sid) c) $ Map.toList $ Map.findWithDefault Map.empty un us
Compare that code to the previous SongService
instance shown in Porting song recommendations to Haskell. The functions are nearly identical, only return
is replaced by next
, and a few other, minute changes.
Test changes #
The hardest part of this refactoring was to adjust the tests. As I've described in Code That Fits in Your Head, I don't like when I have to change the System Under Test and the test code in the same commit, but in this case I lacked the skill to do it more incrementally.
The issue is that since SongService
methods were IO
-bound, the tests ran in IO
. Particularly for the QuickCheck properties, I had to remove the ioProperty
and monadicIO
combinators. This also meant adjusting some of the assertions. The "One user, some songs"
test now looks like this:
testProperty "One user, some songs" $ \ (ValidUserName user) (fmap getSong -> songs) -> do scrobbleCounts <- vectorOf (length songs) $ choose (1, 100) let scrobbles = zip songs scrobbleCounts let srvc = foldr (uncurry (scrobble user)) emptyService scrobbles let actual = interpret srvc $ getRecommendations user return $ counterexample "Should be empty" (null actual)
Notice the interpreter in action, to the left of the application of getRecommendations
. Since interpret
reduces any Free SongInstruction a
to a
, it evaluates Free SongInstruction [Song]
to [Song]
; that's the type of actual
.
The change represents a net benefit in my view. All tests are now pure, instead of running in IO
. This makes them simpler.
Final refactoring #
The above version of the code was only a good starting point for making the changeover to a free monad. Extracting the usual helper functions, you can arrive at this variation of the getRecommendations
function:
getRecommendations un = do -- 1. Get user's own top scrobbles -- 2. Get other users who listened to the same songs -- 3. Get top scrobbles of those users -- 4. Aggregate the songs into recommendations userTops <- getTopScrobbles un <&> getUsersOwnTopScrobbles otherListeners <- traverse (getTopListeners . (songId . scrobbledSong)) userTops <&> getOtherUsersWhoListenedToTheSameSongs . join songs <- traverse (getTopScrobbles . userName) otherListeners <&> getTopScrobblesOfOtherUsers . join return $ aggregateTheSongsIntoRecommendations songs
Again, it looks a lot like one of the variations shown in Porting song recommendations to Haskell, just without the srvc
parameter.
Conclusion #
Refactoring from a type class, which in other languages could be an interface, is so easy that you may rightfully ask what the point is. To me, the most important benefit is that you restrict your options. As I discussed in a podcast episode some years ago, constraints liberate. Here, we move from allowing anything (IO
) to only allowing a limited set of instructions.
You could argue that a 'real' interpreter (rather than a test-specific interpreter) might run in IO
, such as the interpreter shown here. Still, I would argue that most interpreters are smaller, and more stable, than the programs you may write with free monads. Thus, the amount of code you have to review that's allowed to do anything is smaller than it otherwise would have been.
In any case, refactoring to a free monad wasn't too difficult in Haskell. How easy is it in F#?