The Art of Wordle
There’s a new game that made it onto internet, Wordle. Somewhat similar to Mastermind, your aim is to guess a five letter word. After each guess, you are told which letters you got right (highlighted green), which letters you got right albeit in the wrong places (highlighted yellow), and which letters you guessed completely wrong (left grey). And then you try again, up to six times.
Now, where Wordle differs from Mastermind is that each guess must be an actual word, and the answer is an actual word. This means that letters, and indeed combinations of letters, aren’t equally likely.
This got me thinking — what is the optimal way of solving the game? Should you use common letters, or rare letters? Rare letters are great if they are found, but are unlikely to be found. Common letters are likely to be found, but this won’t help you too much.
So I decided to explore the game with a bit of python code.
[Note, I have since improved my logic, keep reading here, but afterwards check out my update.]
To start, I needed a vocabulary, and I found a website that had 58,109 words. I reduced this down to 2973 which had 5 letters and didn’t end in the letter S (I didn’t want plurals).
import requests
words = requests.get('http://www.mieliestronk.com/corncob_caps.txt')
words = words.text.splitlines()
words = [w for w in words if (len(w)==5)&(w[-1]!='S')]
I next wrote a little function that would score a particular guess, returning for each letter a 2 if correct, 1 if correct but wrong place, and 0 if wrong. So in this example, only the A in LATER appears in the actual word ABACK, albeit in the wrong position.
def score_word(guess, actual):
outcome = []
for i, c in enumerate(guess):
if actual[i]==c:
outcome.append(2)
elif c in actual:
outcome.append(1)
else:
outcome.append(0)
return tuple(outcome)score_word('LATER', 'ABACK')> (0,1,0,0,0)
Now, when I guess, my goal is to pick a guess that will maximise the information content of the results I get back. So I wrote a function that would score a guess against each potential word, and tallying how many potential words got each score:
def get_scores(guess, words):
results = {}
for word in words:
score = score_word(guess, word)
results[score] = results.get(score, 0) + 1
return resultsget_scores('LATER', words)> {(0, 2, 0, 0, 1): 40,
(0, 1, 0, 0, 0): 107,
...
(1, 1, 0, 2, 2): 2,
(0, 1, 1, 0, 1): 30,
(1, 1, 2, 0, 1): 1,
(1, 2, 1, 2, 0): 1}
So, we can see with this guess, there were 107 different potential actual words that give us the same score, which isn’t ideal. Ideally we want lots of possible scores, with low counts for each score. I decided to use an objective function minimising the average count, but I might also have sought to minimise the maximum count, or maximised the number of words with counts of 1.
I then tested this on all possible guesses. For each one, I checked what the average count was across all possible scores. Here I show the best and worst 10 guesses based on this metric:
def get_score_counts_for_each_guess(words):
outer_results = {}
for guess in words:
results = list(get_scores(guess, words).values())
outer_results[guess] = sum(results)/len(results)
return outer_resultsscore_counts = get_score_counts_for_each_guess(words)import pandas as pd
pd.Series(score_counts).sort_values()[:10]> TRACE 19.559211
SLATE 19.688742
CRATE 19.820000
TILER 19.820000
TRIED 19.953020
RATED 20.363014
PARSE 20.363014
CARED 20.363014
RILED 20.363014
TREAD 20.363014pd.Series(score_counts).sort_values()[-10:]
> YUMMY 80.351351
WHIZZ 80.351351
QUAFF 80.351351
QUIFF 82.583333
KAYAK 82.583333
FUZZY 82.583333
CIVIC 82.583333
FLUFF 82.583333
JAZZY 90.090909
QUEUE 99.100000
It turns out guesses with rare letters aren’t great, and guesses with repeated letters particularly unhelpful. In fact, my initial guess of LATER was actually not far off optimal, with an average count of 20.9.
Now, if I guess TRACE, I will get one of 152 scores. The worst score to get back is [0,0,0,0,0], which means it is one of 283 potential words. The best is to get back is one of the 32 scores that specify the word exactly, for example, if it got [0,1,1,1,2] it could only be CARVE.
So, I next want to go a level deeper. If I guessed TRACE, for each of those 152 potential scores, I will work out the reduced set of words, and see what my best next guess will be and what its average count is.
def get_best_guess_for_score(score, new_words):
if len(new_words)==1:
return new_words[0], 1
new_score_counts = get_score_counts_for_each_guess(new_words)
best_guesses = pd.Series(new_score_counts).sort_values()
return best_guesses.index[0], best_guesses.values[0]def get_outcomes_for_each_score(guess, words):
score_counts = get_scores(guess, words)
results = {}
for k, v in score_counts.items():
new_words = [word for word in words
if score_word(guess, word)==score]
best_guess = get_best_guess_for_score(k, new_words)
results[k] = [v, best_guess[0], best_guess[1]]
return resultsguess = 'TRACE'
get_outcomes_for_each_score(guess, words)>{(0, 0, 0, 2, 1): [10, 'LEECH', 1.1111111111111112],
(0, 0, 0, 0, 1): [238, 'LINED', 4.103448275862069],
...
(0, 1, 0, 0, 1): [160, 'RILED', 4.444444444444445]}
So from this, we can see that if we got (0,0,0,2,1), this would indicate it was one of 10 words. We would be best to guess LEECH next, which would lead to an average count of 1.111. This means that for 8 of those words, our subsequent score would give it exactly, and otherwise we’d be left guessing between the other two.
If we had (0,0,0,0,1), this would indicate it was one of 238 words. We would be best to guess LINED, which would lead to an average count of 4.1. Similarly, if we had (0,1,0,0,1) we would be best to guess RILED, which would lead to an average count of 4.44. In both of these cases, we would need to go a step deeper.
So, using the example of LINED, we reduce the words to 238, and try again. It tells us that the worst case scenario we get a score of (0,0,0,2,2), for which there are 43 words. In this case we are best to guess MOPED, which leads to an average score of 3.58.
guess = 'TRACE'
score = (0,0,0,0,1)
new_words = [word for word in words
if score_word(guess, word)==score]
new_guess = 'LINED'
get_outcomes_for_each_score(new_guess, new_words)
>{(0, 0, 0, 1, 0): [10, 'BEEFY', 1.25],
(0, 1, 1, 1, 0): [7, 'BEGIN', 1.1666666666666667],
(0, 0, 1, 1, 0): [8, 'EBONY', 1.1428571428571428],
(1, 0, 0, 1, 0): [19, 'WHELP', 1.3571428571428572],
(1, 0, 0, 2, 0): [13, 'BLEEP', 1.4444444444444444],
(0, 2, 0, 2, 2): [14, 'BIDED', 2.8],
(1, 0, 0, 2, 2): [6, 'DOLED', 1.5],
(1, 0, 1, 1, 2): [1, 'BLEND', 1.0],
(0, 0, 0, 2, 2): [43, 'MOPED', 3.5833333333333335],
...}
Going a few steps further, in each case assuming the worst outcome but then making the best guess:
guess = 'TRACE'
score = (0,0,0,0,1)
new_words = [word for word in words if score_word(guess, word)==score]
new_guess = 'LINED'
score = (0, 0, 0, 2, 2)
new_words = [word for word in new_words if score_word(new_guess, word)==score]
new_guess = 'MOPED'
score = (0,2,0,2,2)
new_words = [word for word in new_words if score_word(new_guess, word)==score]
new_guess = 'BOWED'
score = (0,2,0,2,2)
new_words = [word for word in new_words if score_word(new_guess, word)==score]
new_guess = 'DOSED'
score = (1,2,0,2,2)
new_words = [word for word in new_words if score_word(new_guess, word)==score]
new_guess = 'JOKED'
get_outcomes_for_each_score(new_guess, new_words)
>{(0, 2, 0, 2, 2): [2, 'FOXED', 1.0],
(2, 2, 2, 2, 2): [1, 'JOKED', 1.0],
(2, 2, 0, 2, 2): [1, 'JOYED', 1.0],
(0, 2, 2, 2, 2): [1, 'YOKED', 1.0]}
So, in the worse outcome, by guessing TRACE, LINED, MOPED, BOWED, DOSED and JOKED, in most cases we would then know the answer, unless we got (0,2,0,2,2) in which case it would be FOXED or OOZED.
Applying this strategy with today’s puzzle:, we would have first guessed TRACE, then SLIMY, and finally SLUMP, which is one less guess than I managed.
ANSWER = 'SLUMP'
guess = 'TRACE'
score_word(guess, ANSWER)
#> (0,0,0,0,0)
score = (0,0,0,0,0)
new_words = [word for word in words if score_word(guess, word)==score]
get_best_guess_for_score(score, new_words)
#> ('SLIMY', 3.824324324324324)
new_guess = 'SLIMY'
score_word(new_guess, ANSWER)
#> (2,2,0,2,0)
score = (2,2,0,2,0)
new_words = [word for word in new_words if score_word(new_guess, word)==score]
get_best_guess_for_score(score, new_words)
#> ('SLUMP', 1)
Applying this strategy with yesterday’s puzzle (BANAL), we would have first guessed TRACE, then SALON, then MANLY and finally known it was BANAL.
ANSWER = 'BANAL'
guess = 'TRACE'
score_word(guess, ANSWER)
#> (0,0,1,0,0)
score = (0,0,1,0,0)
new_words = [word for word in words if score_word(guess, word)==score]
get_best_guess_for_score(score, new_words)
#> ('SALON', 3.35)
new_guess = 'SALON'
score_word(new_guess, ANSWER)
#> (0, 2, 1, 0, 1)
score = (0, 2, 1, 0, 1)
new_words = [word for word in new_words if score_word(new_guess, word)==score]
get_best_guess_for_score(score, new_words)
#> ('MANLY', 1.0)
new_guess = 'MANLY'
score_word(new_guess, ANSWER)
#> (0, 2, 2, 1, 0)
score = (0, 2, 2, 1, 0)
new_words = [word for word in new_words if score_word(new_guess, word)==score]
get_best_guess_for_score(score, new_words)
# ('BANAL', 1)
Using this strategy, of the 2973 actual words in my dictionary, 1 would have been guessed in the single turn (TRACE), 151 on the second guess, 1183 on the third guess, 1190 on the fourth guess, 316 on the fifth guess, 90 on the 60th guess, 32 on the 7th guess, 9 on the 8th, and 1 on the 9th guess.
This isn’t quite the optimal way of guessing. I’m assuming the best thing to do is to pick the word which minimises the average count. What would be better would have been to apply recursion, working out which word would actually lead to a lower expected number of turns, or to maximise my chance of solving it within say 4 or 6 turns.
And obviously, if I was willing to make further assumptions about the words that might or might not be in the dictionary, it would lead to a more optimal strategy.
Anyway — I hope that hasn’t ruined the fun of the game!
For an update showing some improvements to the algorithm, check out my next post.