Enumerate Wordle combinations with an applicative functor by Mark Seemann
An example of ad-hoc programming.
Like so many others, I recently started solving the daily Wordle puzzle. As is normal when one is a beginner, I initially struggled a bit. One day, I couldn't find a good word to proceed.
To be clear, this article isn't really about Wordle strategies or tools. Rather, it's an example of ad-hoc programming. Particularly, it's an example of how the applicative nature of lists can be useful when you need to enumerate combinations. While I've previously shown a similar example, I think one more is warranted.
Inured from tears #
Last Monday, I'd started with the word TEARS. (I've subsequently learned that better starting words exist.) The puzzle responded with a yellow E and a green R:
In case you haven't played the game, this means that the fourth letter of the hidden word is an R, and that the word also contains an E. The second letter, however, is not an E. Also, the hidden word contains neither T, A, nor S.
While the green and yellow letters may be repeated, you only have six guesses, so it's a good idea to attempt to exhaust as many letters as possible. The first compatible word with five distinct letters that I could think of was INURE.
This gave me a bit of new information. The hidden word also contains a U, but not as the third letter. Furthermore, the E isn't the last letter either. Keep in mind that from TEARS we also know that E isn't the second letter.
While I believe myself to have a decent English vocabulary, at this point I was stuck. While I knew that the E had to be in either first or third position, I couldn't think of a single word that fit all observations.
After trying to come up with a word for ten minutes, I decided that, instead of giving up, I'd use the applicative nature of lists to generate all possible combinations. I was hoping that with the observations already in place, there wouldn't be too many to sift through.
While you could do this in other languages (such as F# or C#), it's natural to use Haskell because it natively understands applicative functors. Thus, I launched GHCi (Glasgow Haskell Compiler interactive - the Haskell REPL).
Wordle is kind enough to show a keyboard with colour-coded keys:
All letters except the dark ones remain valid, so I first defined a list of all available letters:
> avail = "QWERYUOPDFGHJKLZXCVBM"
avail is a
String, but in Haskell, a
String is a type synonym for a (linked) list of
Char values (characters). Since lists form an applicative functor, that'll work.
Most of the letters are still in play - only five letters are already out of the picture: T, I, A, S, and N. Thus,
availstill spells out most of the alphabet.
Next, I wrote an expression that enumerates all five-letter combinations of these letters, with one constraint: The fourth letter must be an R:
> candidates = (\x y z æ ø -> [x,y,z,æ,ø]) <$> avail <*> avail <*> avail <*> "R" <*> avail
The leftmost expression (
(\x y z æ ø -> [x,y,z,æ,ø])) is a lambda expression that takes five values (one from each list of available letters) and combines them to a single list. Each value (
y, and so on) is a
Char value, and since
String in Haskell is the same as
[Char], the expression
[x,y,z,æ,ø] is a
String. In Danish we have three more letters after
z, so I after I ran out of the the usual Latin letters, I just continued with the Danish
Notice that between each of the
<*> operators (apply) I've supplied the list of available letters. In the fourth position there's no choice, so there the list contains only a single letter. Recall that a
String is a list of characters, so
"R" is the same as
How many combinations are there? Let's ask GHCi:
> length candidates 194481
Almost 200,000. That's a bit much to look through, but we can peek at the first ten as a sanity check:
> take 10 candidates ["QQQRQ","QQQRW","QQQRE","QQQRR","QQQRY","QQQRU","QQQRO","QQQRP","QQQRD","QQQRF"]
There are no promising words in this little list, but I never expected that.
I needed to narrow down the options.
How do you make a collection smaller? You could filter it.
candidates contains illegal values. For example, the third value in the above list (of the ten first candidates) is
"QQQRE". Yet, we know (from the INURE attempt) that the last letter isn't E. We can filter out all strings that end with
> take 10 $ filter (\s -> s!!4 /= 'E') candidates ["QQQRQ","QQQRW","QQQRR","QQQRY","QQQRU","QQQRO","QQQRP","QQQRD","QQQRF","QQQRG"]
!! is the indexing operator, so
s!!4 means the (zero-based) fourth element of the string
/= is the inequality operator, so the lambda expression
(\s -> s!!4 /= 'E') identifies all strings where the fifth element (or fourth element, when starting from zero) is different from
We know more than this, though. We also know that the second element can't be E, and that the third element isn't U, so add those predicates to the
> take 10 $ filter (\s -> s!!1 /= 'E' && s!!2 /= 'U' && s!!4 /= 'E') candidates ["QQQRQ","QQQRW","QQQRR","QQQRY","QQQRU","QQQRO","QQQRP","QQQRD","QQQRF","QQQRG"]
How many are left?
> length $ filter (\s -> s!!1 /= 'E' && s!!2 /= 'U' && s!!4 /= 'E') candidates 168000
Still too many, but we aren't done yet.
Notice that all of the first ten values shown above are invalid. Why? Because the word must contain at least one E, and none of them do. Let's add that predicate:
> take 10 $ filter (\s -> s!!1 /= 'E' && s!!2 /= 'U' && s!!4 /= 'E' && 'E' `elem` s) candidates ["QQERQ","QQERW","QQERR","QQERY","QQERU","QQERO","QQERP","QQERD","QQERF","QQERG"]
The Boolean expression
'E' `elem` s means that the character
'E' must be an element of the string (list)
The same rule applies for U:
> take 10 $ filter (\s -> s!!1 /= 'E' && s!!2 /= 'U' && s!!4 /= 'E' && 'E' `elem` s && 'U' `elem` s) candidates ["QQERU","QWERU","QRERU","QYERU","QUERQ","QUERW","QUERR","QUERY","QUERU","QUERO"]
There's a great suggestion already! The eighth entry spells QUERY! Let's try it:
QUERY was the word of the day!
A bit embarrassing that I couldn't think of query, given that I often discuss Command Query Separation.
Was that just pure luck? How many suggestions are left in the filtered list?
> length $ filter (\s -> s!!1 /= 'E' && s!!2 /= 'U' && s!!4 /= 'E' && 'E' `elem` s && 'U' `elem` s) candidates 1921
Okay, a bit lucky. If I ask GHCi to display the filtered list in its entirety, no other word jumps out at me, looking like a real word.
While I admit that I was a bit lucky that QUERY was among the first ten of 1,921 possible combinations, I still find that applicative combinations are handy as an ad-hoc tool. I'm sure there are more elegant ways to solve a problem like this one, but for me, this approach had low ceremony. It was a few lines of code in a terminal. Once I had the answer, I could just close the terminal and no further clean-up was required.
I'm sure other people have other tool preferences, and perhaps you'd like to leave a comment to the effect that you have a better way with Bash, Python, or APL. That's OK, and I don't mind learning new tricks.
I do find this capability of applicative functors to do combinatorics occasionally useful, though.
It's not "better", but here's a similar approach in Python.