Anagrams kata as a one-liner by Mark Seemann
A futile exercise in code compaction.
Recently I was doing the Anagrams kata in F# with Grzegorz Dziadkiewicz, and along the way realised that the implementation is essentially a one-liner. I thought it would be fun to redo the exercise in Haskell and see how compact code I could get away with.
In short, in the exercise, you're given a list of words, and you need to find all the anagrams in the list. For example, given the list bar, foo, bra, the result should be bar, bra, and foo shouldn't be part of the output, since it's not an anagram of any other word in the list.
A pipeline of transformations #
My idea was to collect all the words in a Map (dictionary) keyed by the string, but sorted. Even if the sorted string is a nonsense word, all anagrams sort to the same sequence of letters:
ghci> sort "bar" "abr" ghci> sort "bra" "abr"
Each of the keys should contain a Set of words, since I don't care about the order.
Once I have that map of sets, I can throw away the singleton sets, and then the keys. Or perhaps first throw away the keys, and then the singleton sets. The order of those two steps doesn't matter.
The reason I don't want the singleton sets is that a set with only one word means that no anagrams were found.
Creating the map #
How to create the desired map? The
Map module exports the fromListWith function that enables you to go through an association list and combine values along the way, in case you encounter the key more than once. That sounds useful, but means that first I have to convert the list of words to an association list.
Importing Control.Arrow, I can do it like this:
ghci> fmap (sort &&& Set.singleton) ["bar", "foo", "bra"] [("abr",fromList ["bar"]),("foo",fromList ["foo"]),("abr",fromList ["bra"])]
Each element in the list is a pair of a key, and a set containing a single word. Notice that the set containing
"bar" has the same key as the set containing
"bra". When using
fromListWith, the function will have to unite these two sets whenever it encounters the same key.
ghci> Map.fromListWith Set.union $ fmap (sort &&& Set.singleton) ["bar", "foo", "bra"] fromList [("abr",fromList ["bar","bra"]),("foo",fromList ["foo"])]
The two anagrams
"bra" now belong to the same set, while
"foo" is still solitary.
Finding the anagrams #
Now that we've grouped sets according to key, we no longer need the keys:
ghci> Map.elems $ Map.fromListWith Set.union $ fmap (sort &&& Set.singleton) ["bar", "foo", "bra"] [fromList ["bar","bra"],fromList ["foo"]]
The anagrams are those sets that have more than one element, so we can throw away those that are smaller.
ghci> filter ((1 <) . Set.size) $ Map.elems $ Map.fromListWith Set.union $ fmap (sort &&& Set.singleton) ["bar", "foo", "bra"] [fromList ["bar","bra"]]
The expression has now grown to such a width that I've broken it into two lines to make it more readable. It really is just one line, though.
To save a bit of space, I eta-reduced the expression before I made it a function:
anagrams :: Ord a => [[a]] -> Set (Set [a]) anagrams = Set.fromList . filter ((1 <) . Set.size) . Map.elems . Map.fromListWith Set.union . fmap (sort &&& Set.singleton)
Set.fromList converts the list of anagrams to a
Set of anagrams, since I didn't think that it was a postcondition that the anagrams should be returned in a specific order.
Unfortunately the expression is still so wide that I found it necessary to break it into two lines.
Just for the hell of it, I tried to fix the situation by changing the imports:
import Control.Arrow import Data.List (sort) import Data.Map.Strict (fromListWith, elems) import Data.Set (Set, fromList, singleton)
With this very specific set of imports, the expression now barely fits on a single line:
anagrams :: Ord a => [[a]] -> Set (Set [a]) anagrams = fromList . filter ((1 <) . length) . elems . fromListWith (<>) . fmap (sort &&& singleton)
Here, I also took advantage of
Semigroup append (
<>) being equal to
Is it readable? Hardly.
My main goal with the exercise was to implement the desired functionality as a single expression. Perhaps I was inspired by Dave Thomas, who wrote:
"I hacked a solution together in 25 lines of Ruby."
25 lines of Ruby? I can do it in one line of Haskell!
Is that interesting? Does it make sense to compare two languages? Why not? By trying out different languages you learn the strengths and weaknesses of each. There's no denying that Haskell is expressive. On the other hand, what you can't see in this blog post is that compilation takes forever. Not for this code in particular, but in general.
I'm sure Dave Thomas was done with his Ruby implementation before my machine had even finished compiling the empty, scaffolded Haskell code.
Dave Thomas also wrote:
"It runs on this wordlist in 1.8s on a 1.7GHz i7."
Usually I don't care that much about performance as long as it's adequate. Or rather, I find that good software architecture with poor algorithms usually beats bad architecture with good algorithms. But I digress.
How fares my one-liner against Dave Thomas' implementation?
ghci> :set +s ghci> length . anagrams . lines <$> readFile "wordlist.txt" 20683 (3.56 secs, 1,749,984,448 bytes)
Oh, 3.56 seconds isn't particularly g Holy thunk, Batman! 1.7 gigabytes!
That's actually disappointing, I admit. If only I could excuse this by running on a weaker machine, but mine is a 1.9 GHz i7. Nominally faster than Dave Thomas' machine.
At least, the time it takes to run through that 3.7 MB file is the same order of magnitude.
Since I had a good idea about the kind of implementation I was aiming for, I didn't write that many tests. Only three, actually.
main :: IO () main = defaultMain $ hUnitTestToTests (TestList [ "Examples" ~: do (words, expected) <- [ (["foo", "bar", "baz"], Set.empty), (["bar", "foo", "bra"], Set.fromList [Set.fromList ["bar", "bra"]]), (["foo", "bar", "bra", "oof"], Set.fromList [ Set.fromList ["foo", "oof"], Set.fromList ["bar", "bra"]]) ] let actual = anagrams words return $ expected ~=? actual ])
Doing katas is a good way to try out new ideas, dumb or otherwise. Implementing the Anagrams kata as a one-liner was fun, but the final code shown here is sufficiently unreadable that I wouldn't recommend it.
You could still write the
anagrams function based on the idea presented here, but in a shared code base with an indefinite life span, I'd break it up into multiple expressions with descriptive names.