The Art of Wordle

There’s a new game that made it onto internet, Wordle. Somewhat similar to Mastermind, your aim is to guess a five letter word. After each guess, you are told which letters you got right (highlighted green), which letters you got right albeit in the wrong places (highlighted yellow), and which letters you guessed completely wrong (left grey). And then you try again, up to six times.

Now, where Wordle differs from Mastermind is that each guess must be an actual word, and the answer is an actual word. This means that letters, and indeed combinations of letters, aren’t equally likely.

This got me thinking — what is the optimal way of solving the game? Should you use common letters, or rare letters? Rare letters are great if they are found, but are unlikely to be found. Common letters are likely to be found, but this won’t help you too much.

So I decided to explore the game with a bit of python code.

[Note, I have since improved my logic, keep reading here, but afterwards check out my update.]

To start, I needed a vocabulary, and I found a website that had 58,109 words. I reduced this down to 2973 which had 5 letters and didn’t end in the letter S (I didn’t want plurals).

I next wrote a little function that would score a particular guess, returning for each letter a 2 if correct, 1 if correct but wrong place, and 0 if wrong. So in this example, only the A in LATER appears in the actual word ABACK, albeit in the wrong position.

Now, when I guess, my goal is to pick a guess that will maximise the information content of the results I get back. So I wrote a function that would score a guess against each potential word, and tallying how many potential words got each score:

So, we can see with this guess, there were 107 different potential actual words that give us the same score, which isn’t ideal. Ideally we want lots of possible scores, with low counts for each score. I decided to use an objective function minimising the average count, but I might also have sought to minimise the maximum count, or maximised the number of words with counts of 1.

I then tested this on all possible guesses. For each one, I checked what the average count was across all possible scores. Here I show the best and worst 10 guesses based on this metric:

It turns out guesses with rare letters aren’t great, and guesses with repeated letters particularly unhelpful. In fact, my initial guess of LATER was actually not far off optimal, with an average count of 20.9.

Now, if I guess TRACE, I will get one of 152 scores. The worst score to get back is [0,0,0,0,0], which means it is one of 283 potential words. The best is to get back is one of the 32 scores that specify the word exactly, for example, if it got [0,1,1,1,2] it could only be CARVE.

So, I next want to go a level deeper. If I guessed TRACE, for each of those 152 potential scores, I will work out the reduced set of words, and see what my best next guess will be and what its average count is.

So from this, we can see that if we got (0,0,0,2,1), this would indicate it was one of 10 words. We would be best to guess LEECH next, which would lead to an average count of 1.111. This means that for 8 of those words, our subsequent score would give it exactly, and otherwise we’d be left guessing between the other two.

If we had (0,0,0,0,1), this would indicate it was one of 238 words. We would be best to guess LINED, which would lead to an average count of 4.1. Similarly, if we had (0,1,0,0,1) we would be best to guess RILED, which would lead to an average count of 4.44. In both of these cases, we would need to go a step deeper.

So, using the example of LINED, we reduce the words to 238, and try again. It tells us that the worst case scenario we get a score of (0,0,0,2,2), for which there are 43 words. In this case we are best to guess MOPED, which leads to an average score of 3.58.

Going a few steps further, in each case assuming the worst outcome but then making the best guess:

So, in the worse outcome, by guessing TRACE, LINED, MOPED, BOWED, DOSED and JOKED, in most cases we would then know the answer, unless we got (0,2,0,2,2) in which case it would be FOXED or OOZED.

Applying this strategy with today’s puzzle:, we would have first guessed TRACE, then SLIMY, and finally SLUMP, which is one less guess than I managed.

Applying this strategy with yesterday’s puzzle (BANAL), we would have first guessed TRACE, then SALON, then MANLY and finally known it was BANAL.

Using this strategy, of the 2973 actual words in my dictionary, 1 would have been guessed in the single turn (TRACE), 151 on the second guess, 1183 on the third guess, 1190 on the fourth guess, 316 on the fifth guess, 90 on the 60th guess, 32 on the 7th guess, 9 on the 8th, and 1 on the 9th guess.

This isn’t quite the optimal way of guessing. I’m assuming the best thing to do is to pick the word which minimises the average count. What would be better would have been to apply recursion, working out which word would actually lead to a lower expected number of turns, or to maximise my chance of solving it within say 4 or 6 turns.

And obviously, if I was willing to make further assumptions about the words that might or might not be in the dictionary, it would lead to a more optimal strategy.

Anyway — I hope that hasn’t ruined the fun of the game!

For an update showing some improvements to the algorithm, check out my next post.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Guy Lipman

Fascinated by what makes societies and markets work, especially in sustainable energy. Views not necessarily reflect those of my employer.