Porting song recommendations to Haskell

An F# code base translated to Haskell.

This article is part of a larger article series that examines variations of how to take on a non-trivial problem using functional architecture. In a previous article we established a baseline C# code base. Future articles are going to use that C# code base as a starting point for refactored code. On the other hand, I also want to demonstrate what such solutions may look like in languages like F# or Haskell. In this article, you'll see how to port the baseline to Haskell. To be honest, I first ported the C# code to F#, and then used the F# code as a guide to implement equivalent Haskell code.

If you're following along in the Git repositories, this is a repository separate from the .NET repositories. The code shown here is from its master branch.

If you don't care about Haskell, you can always go back to the table of contents in the 'root' article and proceed to the next topic that interests you.

Data structures #

When working with statically typed functional languages like Haskell, it often makes most sense to start by declaring data structures.

data User = User
  { userName :: String
  , userScrobbleCount :: Int }
  deriving (Show, Eq)

This is much like an F# or C# record declaration, and this one echoes the corresponding types in F# and C#. The most significant difference is that here, a user's total count of scrobbles is called userScrobbleCount rather than TotalScrobbleCount. The motivation behind that variation is that Haskell data 'getters' are actually top-level functions, so it's usually a good idea to prefix them with the name of the data structure they work on. Since the data structure is called User, both 'getter' functions get the user prefix.

I found userTotalScrobbleCount a bit too verbose to my tastes, so I dropped the Total part. Whether or not that's appropriate remains to be seen. Naming in programming is always hard, and there's a risk that you don't get it right the first time around. Unless you're publishing a reusable library, however, the option to rename it later remains.

The other two data structures are quite similar:

data Song = Song
  { songId :: Int
  , songHasVerifiedArtist :: Bool
  , songRating :: Word8 }
  deriving (Show, Eq)
 
data Scrobble = Scrobble
  { scrobbledSong :: Song
  , scrobbleCount :: Int }
  deriving (Show, Eq)

I thought that scrobbledSong was more descriptive than scrobbleSong, so I allowed myself that little deviation from the idiomatic naming convention. It didn't cause any problems, but I'm still not sure if that was a good decision.

How does one translate a C# interface to Haskell? Although type classes aren't quite the same as C# or Java interfaces, this language feature is close enough that I can use it in that role. I don't consider such a type class idiomatic in Haskell, but as an counterpart to the C# interface, it works well enough.

class SongService a where
  getTopListeners :: a -> Int -> IO [User]
  getTopScrobbles :: a -> String -> IO [Scrobble]

Any instance of the SongService class supports queries for top listeners of a particular song, as well as for top scrobbles for a user.

To reiterate, I don't intend to keep this type class around if I can help it, but for didactic reasons, it'll remain in some of the future refactorings, so that you can contrast and compare the Haskell code to its C# and F# peers.

Test Double #

To support tests, I needed a Test Double, so I defined the following Fake service, which is nothing but a deterministic in-memory instance. The type itself is just a wrapper of two maps.

data FakeSongService = FakeSongService
  { fakeSongs :: Map Int Song
  , fakeUsers :: Map String (Map Int Int) }
  deriving (Show, Eq)

Like the equivalent C# class, fakeSongs is a map from song ID to Song, while fakeUsers is a bit more complex. It's a map keyed on user name, but the value is another map. The keys of that inner map are song IDs, while the values are the number of times each song was scrobbled by that user.

The FakeSongService data structure is a SongService instance by explicit implementation:

instance SongService FakeSongService where
  getTopListeners srvc sid = do
    return $
      uncurry User <$>
      Map.toList (sum <$> Map.filter (Map.member sid) (fakeUsers srvc))
  getTopScrobbles srvc userName = do
    return $
      fmap (\(sid, c) -> Scrobble (fakeSongs srvc ! sid) c) $
      Map.toList $
      Map.findWithDefault Map.empty userName (fakeUsers srvc)

In order to find all the top listeners of a song, it finds all the fakeUsers who have the song ID (sid) in their inner map, sum all of those users' scrobble counts together and creates User values from that data.

To find the top scrobbles of a user, the instance finds the user in the fakeUsers map, looks each of that user's scrobbled song up in fakeSongs, and creates Scrobble values from that information.

Finally, test code needs a way to add data to a FakeSongService value, which this test-specific helper function accomplishes:

scrobble userName s c (FakeSongService ss us) =
  let sid = songId s
      ss' = Map.insertWith (\_ _ -> s) sid s ss
      us' = Map.insertWith (Map.unionWith (+)) userName (Map.singleton sid c) us
  in FakeSongService ss' us'

Given a user name, a song, a scrobble count, and a FakeSongService, this function returns a new FakeSongService value with the new data added to the data already there.

QuickCheck Arbitraries #

In the F# test code I used FsCheck to get good coverage of the code. For Haskell, I'll use QuickCheck.

Porting the ideas from the F# tests, I define a QuickCheck generator for user names:

alphaNum :: Gen Char
alphaNum = elements (['a'..'z'] ++ ['A'..'Z'] ++ ['0'..'9'])
 
userName :: Gen String
userName = do
  len <- choose (1, 19)
  first <- elements $ ['a'..'z'] ++ ['A'..'Z']
  rest <- vectorOf len alphaNum
  return $ first : rest

It's not that the algorithm only works if usernames are alphanumeric strings that start with a letter and are no longer than twenty characters, but whenever a property is falsified, I'd rather look at a user name like "Yvj0D1I" or "tyD9P1eOqwMMa1Q6u" (which are already bad enough), than something with line breaks and unprintable characters.

Working with QuickCheck, it's often useful to wrap types from the System Under Test in test-specific Arbitrary wrappers:

newtype ValidUserName = ValidUserName { getUserName :: String } deriving (Show, Eq)
 
instance Arbitrary ValidUserName where
  arbitrary = ValidUserName <$> userName

I also defined a (simpler) Arbitrary instance for Song called AnySong.

A few properties #

With FakeSongService in place, I proceeded to add the test code, starting from the top of the F# test code, and translating each as faithfully as possible. The first one is an Ice Breaker Test that only verifies that the System Under Test exists and doesn't crash when called.

testProperty "No data" $ \ (ValidUserName un) -> ioProperty $ do
  actual <- getRecommendations emptyService un
  return $ null actual

As I've done since at least 2019, it seems, I've inlined test cases as anonymous functions; this time as QuickCheck properties. This one just creates a FakeSongService that contains no data, and asks for recommendations. The expected result is that actual is empty (null), since there's nothing to recommend.

A slightly more involved property adds some data to the service before requesting recommendations:

testProperty "One user, some songs" $ \
  (ValidUserName user)
  (fmap getSong -> songs)
  -> monadicIO $ do
  scrobbleCounts <- pick $ vectorOf (length songs) $ choose (1, 100)
  let scrobbles = zip songs scrobbleCounts
  let srvc = foldr (uncurry (scrobble user)) emptyService scrobbles
 
  actual <- run $ getRecommendations srvc user
 
  assertWith (null actual) "Should be empty"

A couple of things are worthy of note. First, the property uses a view pattern to project a list of songs from a list of Arbitraries, where getSong is the 'getter' that belongs to the AnySong newtype wrapper.

I find view patterns quite useful as a declarative way to 'promote' a single Arbitrary instance to a list. In a third property, I take it a step further:

(fmap getUserName -> NonEmpty users)

This not only turns the singular ValidUserName wrapper into a list, but by projecting it into NonEmpty, the test declares that users is a non-empty list. QuickCheck picks all that up and generates values accordingly.

If you're interested in seeing this more advanced view pattern in context, you may consult the Git repository.

Secondly, the "One user, some songs" test runs in monadicIO, which I didn't know existed before I wrote these tests. Together with pick, run, and assertWith, monadicIO is defined in Test.QuickCheck.Monadic. It enables you to write properties that run in IO, which these properties need to do, because getRecommendations is IO-bound.

There's one more QuickCheck property in the code base, but it mostly repeats techniques already shown here. See the Git repository for all the details, if necessary.

Examples #

In addition to the properties, I also ported the F# examples; that is, 'normal' unit tests. Here's one of them:

"One verified recommendation" ~: do
  let srvc =
        scrobble "ana" (Song 2 True 5) 9_9990 $
        scrobble "ana" (Song 1 False 5) 10 $
        scrobble "cat" (Song 1 False 6) 10 emptyService
 
  actual <- getRecommendations srvc "cat"
 
  [Song 2 True 5] @=? actual

This one is straightforward, but as I already discussed when characterizing the original code, some of the examples essentially document quirks in the implementation. Here's the relevant test, translated to Haskell:

"Only top-rated songs" ~: do
  -- Scale ratings to keep them less than or equal to 10.
  let srvc =
        foldr (\i -> scrobble "hyle" (Song i True (toEnum i `div` 2)) 500) emptyService [1..20]
 
  actual <- getRecommendations srvc "hyle"
 
  assertBool "Should not be empty" (not $ null actual)
  -- Since there's only one user, but with 20 songs, the implementation
  -- loops over the same songs 20 times, so 400 songs in total (with
  -- duplicates). Ordering on rating, only the top-rated 200 remains, that
  -- is, those rated 5-10. Note that this is a Characterization Test, so not
  -- necessarily reflective of how a real recommendation system should work.
  assertBool "Should have 5+ rating" (all ((>= 5) . songRating) actual)

This test creates twenty scrobbles for one user: One with a zero rating, two with rating 1, two with rating 2, and so on, up to a single song with rating 10.

The implementation of GetRecommendationsAsync uses these twenty songs to find 'other users' who have these top songs as well. In this case, there's only one user, so for every of those twenty songs, you get the same twenty songs, for a total of 400.

There are more unit tests than these. You can see them in the Git repository.

Implementation #

The most direct translation of the C# and F# 'reference implementation' that I could think of was this:

getRecommendations srvc un = do
  -- 1. Get user's own top scrobbles
  -- 2. Get other users who listened to the same songs
  -- 3. Get top scrobbles of those users
  -- 4. Aggregate the songs into recommendations

  -- Impure
  scrobbles <- getTopScrobbles srvc un
 
  -- Pure
  let scrobblesSnapshot = take 100 $ sortOn (Down . scrobbleCount) scrobbles
 
  recommendationCandidates <- newIORef []
  forM_ scrobblesSnapshot $ \scrobble -> do
    -- Impure
    otherListeners <- getTopListeners srvc $ songId $ scrobbledSong scrobble
 
    -- Pure
    let otherListenersSnapshot =
          take 20 $
          sortOn (Down . userScrobbleCount) $
          filter ((10_000 <=) . userScrobbleCount) otherListeners
 
    forM_ otherListenersSnapshot $ \otherListener -> do
      -- Impure
      otherScrobbles <- getTopScrobbles srvc $ userName otherListener
 
      -- Pure
      let otherScrobblesSnapshot =
            take 10 $
            sortOn (Down . songRating . scrobbledSong) $
            filter (songHasVerifiedArtist . scrobbledSong) otherScrobbles
 
      forM_ otherScrobblesSnapshot $ \otherScrobble -> do
        let song = scrobbledSong otherScrobble
        modifyIORef recommendationCandidates (song :)
 
  recommendations <- readIORef recommendationCandidates
  -- Pure
  return $ take 200 $ sortOn (Down . songRating) recommendations

In order to mirror the original implementation as closely as possible, I declare recommendationCandidates as an IORef so that I can incrementally add to it as the action goes through its nested loops. Notice the modifyIORef towards the end of the code listing, which adds a single song to the list.

Once all the looping is done, the action uses readIORef to pull the recommendations out of the IORef.

As you can see, I also ported the comments from the original C# code.

I don't consider this idiomatic Haskell code, but the goal in this article was to mirror the C# code as closely as possible. Once I start refactoring, you'll see some more idiomatic implementations.

Conclusion #

Together with the previous two articles in this article series, this establishes a baseline from which I can refactor the code. While we might consider the original C# code idiomatic, this port to Haskell isn't. It is, on the other hand, similar enough to both its C# and F# peers that we can compare and contrast all three.

Particularly two design choices make this Haskell implementation less than idiomatic. One is the use of IORef to update a list of songs. The other is using a type class to model an external dependency.

As I cover various alternative architectures in this article series, you'll see how to get rid of both.

Next: Song recommendations as an Impureim Sandwich.

Published: Monday, 21 April 2025 10:19:00 UTC

Porting song recommendations to Haskell by Mark Seemann