Porting song recommendations to Haskell by Mark Seemann
An F# code base translated to Haskell.
This article is part of a larger article series that examines variations of how to take on a non-trivial problem using functional architecture. In a previous article we established a baseline C# code base. Future articles are going to use that C# code base as a starting point for refactored code. On the other hand, I also want to demonstrate what such solutions may look like in languages like F# or Haskell. In this article, you'll see how to port the baseline to Haskell. To be honest, I first ported the C# code to F#, and then used the F# code as a guide to implement equivalent Haskell code.
If you're following along in the Git repositories, this is a repository separate from the .NET repositories. The code shown here is from its master branch.
If you don't care about Haskell, you can always go back to the table of contents in the 'root' article and proceed to the next topic that interests you.
Data structures #
When working with statically typed functional languages like Haskell, it often makes most sense to start by declaring data structures.
data User = User { userName :: String , userScrobbleCount :: Int } deriving (Show, Eq)
This is much like an F# or C# record declaration, and this one echoes the corresponding types in F# and C#. The most significant difference is that here, a user's total count of scrobbles is called userScrobbleCount
rather than TotalScrobbleCount
. The motivation behind that variation is that Haskell data 'getters' are actually top-level functions, so it's usually a good idea to prefix them with the name of the data structure they work on. Since the data structure is called User
, both 'getter' functions get the user
prefix.
I found userTotalScrobbleCount
a bit too verbose to my tastes, so I dropped the Total
part. Whether or not that's appropriate remains to be seen. Naming in programming is always hard, and there's a risk that you don't get it right the first time around. Unless you're publishing a reusable library, however, the option to rename it later remains.
The other two data structures are quite similar:
data Song = Song { songId :: Int , songHasVerifiedArtist :: Bool , songRating :: Word8 } deriving (Show, Eq) data Scrobble = Scrobble { scrobbledSong :: Song , scrobbleCount :: Int } deriving (Show, Eq)
I thought that scrobbledSong
was more descriptive than scrobbleSong
, so I allowed myself that little deviation from the idiomatic naming convention. It didn't cause any problems, but I'm still not sure if that was a good decision.
How does one translate a C# interface to Haskell? Although type classes aren't quite the same as C# or Java interfaces, this language feature is close enough that I can use it in that role. I don't consider such a type class idiomatic in Haskell, but as an counterpart to the C# interface, it works well enough.
class SongService a where getTopListeners :: a -> Int -> IO [User] getTopScrobbles :: a -> String -> IO [Scrobble]
Any instance of the SongService
class supports queries for top listeners of a particular song, as well as for top scrobbles for a user.
To reiterate, I don't intend to keep this type class around if I can help it, but for didactic reasons, it'll remain in some of the future refactorings, so that you can contrast and compare the Haskell code to its C# and F# peers.
Test Double #
To support tests, I needed a Test Double, so I defined the following Fake service, which is nothing but a deterministic in-memory instance. The type itself is just a wrapper of two maps.
data FakeSongService = FakeSongService { fakeSongs :: Map Int Song , fakeUsers :: Map String (Map Int Int) } deriving (Show, Eq)
Like the equivalent C# class, fakeSongs
is a map from song ID to Song
, while fakeUsers
is a bit more complex. It's a map keyed on user name, but the value is another map. The keys of that inner map are song IDs, while the values are the number of times each song was scrobbled by that user.
The FakeSongService
data structure is a SongService
instance by explicit implementation:
instance SongService FakeSongService where getTopListeners srvc sid = do return $ uncurry User <$> Map.toList (sum <$> Map.filter (Map.member sid) (fakeUsers srvc)) getTopScrobbles srvc userName = do return $ fmap (\(sid, c) -> Scrobble (fakeSongs srvc ! sid) c) $ Map.toList $ Map.findWithDefault Map.empty userName (fakeUsers srvc)
In order to find all the top listeners of a song, it finds all the fakeUsers
who have the song ID (sid
) in their inner map, sum all of those users' scrobble counts together and creates User
values from that data.
To find the top scrobbles of a user, the instance finds the user in the fakeUsers
map, looks each of that user's scrobbled song up in fakeSongs
, and creates Scrobble
values from that information.
Finally, test code needs a way to add data to a FakeSongService
value, which this test-specific helper function accomplishes:
scrobble userName s c (FakeSongService ss us) = let sid = songId s ss' = Map.insertWith (\_ _ -> s) sid s ss us' = Map.insertWith (Map.unionWith (+)) userName (Map.singleton sid c) us in FakeSongService ss' us'
Given a user name, a song, a scrobble count, and a FakeSongService
, this function returns a new FakeSongService
value with the new data added to the data already there.
QuickCheck Arbitraries #
In the F# test code I used FsCheck to get good coverage of the code. For Haskell, I'll use QuickCheck.
Porting the ideas from the F# tests, I define a QuickCheck generator for user names:
alphaNum :: Gen Char alphaNum = elements (['a'..'z'] ++ ['A'..'Z'] ++ ['0'..'9']) userName :: Gen String userName = do len <- choose (1, 19) first <- elements $ ['a'..'z'] ++ ['A'..'Z'] rest <- vectorOf len alphaNum return $ first : rest
It's not that the algorithm only works if usernames are alphanumeric strings that start with a letter and are no longer than twenty characters, but whenever a property is falsified, I'd rather look at a user name like "Yvj0D1I"
or "tyD9P1eOqwMMa1Q6u"
(which are already bad enough), than something with line breaks and unprintable characters.
Working with QuickCheck, it's often useful to wrap types from the System Under Test in test-specific Arbitrary wrappers:
newtype ValidUserName = ValidUserName { getUserName :: String } deriving (Show, Eq) instance Arbitrary ValidUserName where arbitrary = ValidUserName <$> userName
I also defined a (simpler) Arbitrary
instance for Song
called AnySong
.
A few properties #
With FakeSongService
in place, I proceeded to add the test code, starting from the top of the F# test code, and translating each as faithfully as possible. The first one is an Ice Breaker Test that only verifies that the System Under Test exists and doesn't crash when called.
testProperty "No data" $ \ (ValidUserName un) -> ioProperty $ do actual <- getRecommendations emptyService un return $ null actual
As I've done since at least 2019, it seems, I've inlined test cases as anonymous functions; this time as QuickCheck properties. This one just creates a FakeSongService
that contains no data, and asks for recommendations. The expected result is that actual
is empty (null
), since there's nothing to recommend.
A slightly more involved property adds some data to the service before requesting recommendations:
testProperty "One user, some songs" $ \ (ValidUserName user) (fmap getSong -> songs) -> monadicIO $ do scrobbleCounts <- pick $ vectorOf (length songs) $ choose (1, 100) let scrobbles = zip songs scrobbleCounts let srvc = foldr (uncurry (scrobble user)) emptyService scrobbles actual <- run $ getRecommendations srvc user assertWith (null actual) "Should be empty"
A couple of things are worthy of note. First, the property uses a view pattern to project a list of songs from a list of Arbitraries, where getSong
is the 'getter' that belongs to the AnySong
newtype
wrapper.
I find view patterns quite useful as a declarative way to 'promote' a single Arbitrary
instance to a list. In a third property, I take it a step further:
(fmap getUserName -> NonEmpty users)
This not only turns the singular ValidUserName
wrapper into a list, but by projecting it into NonEmpty
, the test declares that users
is a non-empty list. QuickCheck picks all that up and generates values accordingly.
If you're interested in seeing this more advanced view pattern in context, you may consult the Git repository.
Secondly, the "One user, some songs"
test runs in monadicIO
, which I didn't know existed before I wrote these tests. Together with pick
, run
, and assertWith
, monadicIO
is defined in Test.QuickCheck.Monadic. It enables you to write properties that run in IO
, which these properties need to do, because getRecommendations
is IO
-bound.
There's one more QuickCheck property in the code base, but it mostly repeats techniques already shown here. See the Git repository for all the details, if necessary.
Examples #
In addition to the properties, I also ported the F# examples; that is, 'normal' unit tests. Here's one of them:
"One verified recommendation" ~: do let srvc = scrobble "ana" (Song 2 True 5) 9_9990 $ scrobble "ana" (Song 1 False 5) 10 $ scrobble "cat" (Song 1 False 6) 10 emptyService actual <- getRecommendations srvc "cat" [Song 2 True 5] @=? actual
This one is straightforward, but as I already discussed when characterizing the original code, some of the examples essentially document quirks in the implementation. Here's the relevant test, translated to Haskell:
"Only top-rated songs" ~: do -- Scale ratings to keep them less than or equal to 10. let srvc = foldr (\i -> scrobble "hyle" (Song i True (toEnum i `div` 2)) 500) emptyService [1..20] actual <- getRecommendations srvc "hyle" assertBool "Should not be empty" (not $ null actual) -- Since there's only one user, but with 20 songs, the implementation -- loops over the same songs 20 times, so 400 songs in total (with -- duplicates). Ordering on rating, only the top-rated 200 remains, that -- is, those rated 5-10. Note that this is a Characterization Test, so not -- necessarily reflective of how a real recommendation system should work. assertBool "Should have 5+ rating" (all ((>= 5) . songRating) actual)
This test creates twenty scrobbles for one user: One with a zero rating, two with rating 1, two with rating 2, and so on, up to a single song with rating 10.
The implementation of GetRecommendationsAsync uses these twenty songs to find 'other users' who have these top songs as well. In this case, there's only one user, so for every of those twenty songs, you get the same twenty songs, for a total of 400.
There are more unit tests than these. You can see them in the Git repository.
Implementation #
The most direct translation of the C# and F# 'reference implementation' that I could think of was this:
getRecommendations srvc un = do -- 1. Get user's own top scrobbles -- 2. Get other users who listened to the same songs -- 3. Get top scrobbles of those users -- 4. Aggregate the songs into recommendations -- Impure scrobbles <- getTopScrobbles srvc un -- Pure let scrobblesSnapshot = take 100 $ sortOn (Down . scrobbleCount) scrobbles recommendationCandidates <- newIORef [] forM_ scrobblesSnapshot $ \scrobble -> do -- Impure otherListeners <- getTopListeners srvc $ songId $ scrobbledSong scrobble -- Pure let otherListenersSnapshot = take 20 $ sortOn (Down . userScrobbleCount) $ filter ((10_000 <=) . userScrobbleCount) otherListeners forM_ otherListenersSnapshot $ \otherListener -> do -- Impure otherScrobbles <- getTopScrobbles srvc $ userName otherListener -- Pure let otherScrobblesSnapshot = take 10 $ sortOn (Down . songRating . scrobbledSong) $ filter (songHasVerifiedArtist . scrobbledSong) otherScrobbles forM_ otherScrobblesSnapshot $ \otherScrobble -> do let song = scrobbledSong otherScrobble modifyIORef recommendationCandidates (song :) recommendations <- readIORef recommendationCandidates -- Pure return $ take 200 $ sortOn (Down . songRating) recommendations
In order to mirror the original implementation as closely as possible, I declare recommendationCandidates
as an IORef so that I can incrementally add to it as the action goes through its nested loops. Notice the modifyIORef
towards the end of the code listing, which adds a single song to the list.
Once all the looping is done, the action uses readIORef
to pull the recommendations
out of the IORef
.
As you can see, I also ported the comments from the original C# code.
I don't consider this idiomatic Haskell code, but the goal in this article was to mirror the C# code as closely as possible. Once I start refactoring, you'll see some more idiomatic implementations.
Conclusion #
Together with the previous two articles in this article series, this establishes a baseline from which I can refactor the code. While we might consider the original C# code idiomatic, this port to Haskell isn't. It is, on the other hand, similar enough to both its C# and F# peers that we can compare and contrast all three.
Particularly two design choices make this Haskell implementation less than idiomatic. One is the use of IORef
to update a list of songs. The other is using a type class to model an external dependency.
As I cover various alternative architectures in this article series, you'll see how to get rid of both.
Next: Song recommendations as an Impureim Sandwich.