Wordle Best Starting Words is AROSE not ADIEU - Episode 2463

If you like this content, please consider visiting the video on YouTube and subscribe to the MrExcel Channel to show your support!
This video has been published on Jan 25, 2022.
Everyone has their theories on the best starting word for the Wordle game. A lot of people suggest ADIEU. But when you look at the frequency of each letter in the 8000 possible words, a better start word has AROSE.

This video starts out with the most popular words. After that video wraps, a deep dive into the Dynamic Array Formulas that made all of this possible.

Table of Contents
(0:00) What's the best Wordle Starting Word?
(0:48) Statistical analysis of most common letters
(1:20) A few alternate best starting words
(2:11) Scoring ADIEU and other common starting words
(2:49) Learn Excel on Retrieve
(3:10) Organist Nancy Faust plays
(3:21) Using UNIQUE with By_Col set to True
(3:50) VSTACK from Charles Williams
(3:55) Getting rid of 0 from empty cells using FILTER
(4:23) FIND to find an array of letters
(5:29) XLOOKUP of each letter in a word.
maxresdefault.jpg


Transcript of the video:
Wordle what's the best starting word?
It's not ADIEU.
AROSE run circles.
Hey welcome back to the MrExcel Netcast, I am Bill Jelen.
As a follow up to my video about the Wordle Helper that I built in Excel.
Question is what is the best Wordle starting word?
There's lots of articles about this.
A lot of people like ADIEU because it gets rid of a lot of vowels. In this great Tweet here: “On the day that the Wordle answer is ADIEU, social media will implode”.
I don't think ADIEU is a great word.
These are the words that I found online.
And a lot of them seem to be using the Wheel of Fortune strategy.
You know at the end of Wheel of Fortune they give you R, S, T, L, N, and E?
So they are looking for words that have a lot of those.
Rather than the Wheel of Fortune strategy, let's take a statistical look at this.
In column A, I have the 8000 plus Scrabble words that are 5 letters.
Use some Excel functions here, which I'll explain in the out take to break that out to figure out how many words have the letter A or B or C.
We end up with this pivot table over here.
And the top five most popular letters are S, E, A, R, and O.
A word that has those five letters is going to win.
I think the best word is AROSE.
It has all five target letters.
If you're looking at the top 6 letters which includes the letter “i”, you also get ARISE and OSIER.
If you think about it, RAISE and SERAI are just basically anagrams of ARISE, so I think we throw those out.
Amongst the items here, I came up with how many of the target letters does it have.
But then also this one.
For the Score over here, I'm adding up the count of letters for each of the five letters.
The theory being that if we guess an “S” and we get a grey tile, then we eliminate 4124.
That's more value to me than eliminating let's say 924 if we find out there's not a K or 79 words If there's not a Q.
Comparing my word AROSE or these backup words here?
I mean if you think about it, if we all started using AROSE every day, the Wordle people would take advantage of that.
So it's nice, once in awhile throw in ARISE just for a little variability.
But let's compare these scores to the words that I find online all the time.
I've used some of these STARE, or STORE or ADIEU or AUDIO.
You know the problem with ADIEU and AUDIO is the “U”, right?
Getting, throwing the “U” out, the “U” is used in relatively so few of 1657.
Compared to the O or the I, it really scores very poorly.
So while ADIEU used to be my go to starting word from now on, it's going to be AROSE.
Give that a try on your next Wordle.
If you want to see some of the geeky Excel behind this, stick around for the out takes.
Otherwise let's wrap up.
Now, if you love Excel, check out my new courses on the Retrieve platform.
They're video courses but you just type what you're looking for.
It takes you right to that spot in the video and there's a complete transcript in several languages.
It's a super fast way to learn.
If you like these videos, please, down below, Like, Subscribe. and Ring the Bell.
Feel free to post any questions or comments down in the comments below.
As Nancy plays us out, don't forget to hang around if you want to see some of the Excel deep dive behind this.
I want to thank you for stopping by.
We'll see you next time for another netcast from MrExcel.
Alrighty, let's check out some of this cool Excel here to break a 5 letter word into five individual letters.
Using the MID function but asking for the numbers 1, 2, 3, 4 and 5e which gives me A, A, H, E, D.
But then this cool second argument to the UNIQUE saying that it's by column.
That's the first time I think I've ever used By Column.
And that gets rid of the duplicate extra A in there and gets me A, H, E, D like that.
The VSTACK - this is a Charles Williams function.
And then to get rid of the zeros that are being returned by that, just a simple little FILTER function there.
Give me everything from H2 Hash where H2 Hash is not equal to zero.
Straight pivot table here.
I'm actually wondering if I should have just done this with a couple of array functions.
Then you put in how many letters you're looking for.
Cool little function here.
The INDEX asking for the sequence of 1, 3, 4, 5, 6, 6, which gets me S, E, A, R, O, I.
And then this one – I was really, really happy with this little formula over here.
Let me break that one down for you.
Alright, So what I did is I said start up by saying I want to figure out how many of my target letters S, E, A, R, O, I are in AROSE.
So I asked for the equal FIND.
And what am I trying to find?
I'm going to try and find all six of those letters so 05 Hash.
05 hash right now.
Putting in a hash, I still need to press F4 to lock that down.
And where are we looking?
We're looking in the word AROSE.
And so for each of the six letters we're looking for, we're going to get where it's found.
So the S is found in the fourth position.
The E is found in the fifth position, and so on.
But the I down here is not found at all, right?
So in this case it's returning an array of 6.
I would ask for the top 7 letters, I'm getting an array of seven.
You can see how that works, right?
To simplify it down to just a single value, I wrap that in the COUNT function.
The COUNT function will throw out the #VALUE!
errors and can be just the ones that hit.
I was really happy with that.
OK, in this big formula here for Score.
Working from the inside.
Asking for the MID of AROSE and I passed the sequence of 1 through 5.
Of course, SEQUENCE one through 5 gives me the five letters.
So that forces MID to give me A, R, O, S, E which we'll see right here.
In case there's any duplicates there, I get rid of them.
That way if there were two S’s, it doesn't count twice.
So then I have those five letters A, R, O, S, E and I do an XLOOKUP of each of those letters into this table.
Looking for the letter, and then the count of letters.
And finally sending the whole thing back into the SUM function to sum up the score.
Cool little function there, so all kinds of great dynamic array formulas being used throughout this.
 
YouTube is driving me nuts, Bill, repeatedly deleting my reply to your comment there, seconds after posting it, with a "Returned error" message. And, no further edits I apply are able fix it. So, here's my reply, which will hopefully make it through to you through this alternative path:
_____

Larry Robinson

I always learn so much from your problem-solving Excel formulas in these videos, Bill. But, I was wondering, isn't there a simpler scoring algorithm that yields the same scores?

Why not rank all ~9000 Scrabble dictionary words across the entire alphabet? In other words, fix the "Based on Top" variable to 26 on your sheet? The resultant rank order of your top words stays the same, since all your top scores would still be ranked identically in this more comprehensive scoring.

However, this would also properly calibrate all the lesser value words as well. Thus, once and done, with simpler formulas. Plus, the resultant word scores would then be truly scalar across this particular vocabulary set, so their relative strengths could be directly compared. Just curious. If my logic is off here, Bill or any other commenter, please reply. Thanks!

BTW, this game reminds me a lot of MasterMind, all the rage 1/2 century ago. In it, you try to guess 4 colored pegs on a board instead of 5 letters in a word as in Wordle, with color and position feedback similarly provided after each guess. But, there's a big difference between guessing 6 color combinations versus 26 letter possibilities in these respective games.

To commenter Jeff Davis' optimal "guess #2" question, Bill responds with a smart recursion of his same algorithm. An add'l solving tactic might be borrowable from those MasterMind solvers from so long ago. They had a play called the "impossible negative." Even though it's a guaranteed wrong answer, it maximally narrows the field of remaining possibilities. Those MasterMind gurus proved its efficacy with formal logic and, visually, with a set of interlocking truth tables illustrating the mechanism at work, play by play. Because of that game's inherent symmetry, they were able to demonstrate there's a fixed strategy which guarantees a solution in 5 guesses or less. Of course, the structure of the English language affords no such solution symmetries to Wordle here ...

Anyway, a delightful puzzle. And, such a smart solver/helper routine illustrated here!
_____

Highlighted reply
MrExcel.com

I used to play MasterMind all the time and I forgot about the impossible negative strategy! I was convinced that my iterative strategy would allow me to solve in 3 guesses because I had a three-day run of three guesses. But then today, my third guess was wrong and I had to use a 4th guess (although only one word was still "in play" by then). I am going to consider how to apply the MasterMind to this.
_____

Larry Robinson

Thanks for the reply from a fellow old-time MasterMind aficionado, Bill. You mention you are currently further refining this same tool. So, in that spirit, here are a few add'l ideas you might wish to consider.

Recursively applying your initial algo as you propose above would certainly work. However, after the initial guess, I'd argue that the objective changes. As in MasterMind, the goal of successive guesses is to maximally narrow the field, until the odds of actually guessing the word surpass the benefits of further winnowing down the possibilities list.

Your existing algo could be expanded a bit further to enable a standard chess app strategy of building out a decision tree of possible next moves and then value-scoring them. Now, such a tree in Wordle gets broad fast, of course. However, Excel's calc engine is clearly up to the task. And, unlike chess, where grandmasters routinely need to see 10-15 moves ahead, the advantage of a 2 move Bayesian look-ahead in Wordle would beat any human challenger, due to that very same tree breadth difference between these two games.

Finally, if you want to apply a game theory twist to this resultant decision tree, consider whether your opponent (the word creator) is active or passive. If words are selected at random, then no further adjustment is necessary. However, if your opponent is judged to be either friendly or hostile, a further weighting layer applied to the objective function merits consideration.

We can speculate that the Wordle creator is not hostile, since deliberately picking words which only Scrabble nerds would know would probably needlessly constrain the propagation of this game across its target web audience. On the other hand, if the creator is biased toward selecting only common 5 letter words, then it would make sense to further weight your Scrabble dictionary words here with their relative frequency in common English usage.

A quick scan of the Internet surfaces such a digital data set here:

NEW: COCA 2020 data
This site contains what is probably the most accurate word frequency data for English. The data is based on the one billion word Corpus of Contemporary American English (COCA) -- the only corpus of English that is large, up-to-date, and balanced between many genres.
Source: Word frequency: based on one billion word COCA corpus

Now, the Excel skill required to implement such an algorithm is way beyond my pay grade. But, perhaps, not yours. Plus, you'd have the honor of building the Wordle solver equivalent of "Deep Blue" vs Gary Kasparov in chess. <smile>

You recalled MasterMind from the '80's. So, how about Instant Insanity from the '70's? It was a similar sensation back then: 4 cubes, 6 colors. The goal was to display different colors on each of the four outward-facing sides. So long ago, all I had available to build my solver was Fortran on a mainframe. Primitive, yes -- but, it did the job.
_____
 
Hi Larry

I had not heard of Instant Insanity. I had previously built a solver for the 1978 board game Black Box using a TI-99/4A. (There were multiple games called Black Box... the one I am referring to is this.)

Since publishing my videos. I found a couple of interesting word lists.
First... there are only 2315 words that will ever be the Wordle solution.
Second, there are 10657 words that you are allowed to guess but that will never be the solution.

If I look at the letter frequency of the 2315 words, then there is a 3-way anagram tie for using alert, alter, or later as the first guess. irate is one point less.

We are thinking of a tool to analyze all 2315 words.
Loop through all 2315 words assuming the Word(i) is the correct word.
Loop through the top 10 first words. Guess each word and see the response.
Then test the top 20 words from either the list of 2315 or the list of (2315+10657). Get the response for each of those 20 words.
My goal is to always win by Step 3. For the third guess, my theory is that I can only analyze the 2315 words since I don't want to guess something that can not be a winner.
I am pretty sure after Guess 2 or after Guess 3, there will only be one solution.

This all seems fairly do-able in Excel. 2315 words x 10 x 10 is only 231,500 iterations.

I am honestly torn for guess 2. Using the Mastermind Negative theory, it is fine to guess something that could not be a winner if it will narrow down guess 3 for the win.

All of this sounds great, but my friend and co-author Oz from ExcelOnFire has finished his pages for our new book and I need to finish mine. So - the development is going to get tucked away into a spare hour some day.

Bill
 
"Black Box - The Game of Hide and Seek." Now, that's a game I've never heard of, Bill. But, from your Pinterest photo, it looks intriguing. And, writing a solver for it at age 17 on a "Trash-99?" ****. Here's a snapshot of Instant Insanity, that '70's game I mentioned, in case you're curious: https://www.jaapsch.net/puzzles/images/insanity.jpg

Back to Wordle, your further intelligence gathering to reduce the target word set by 75% makes a world of difference in sol'n design. You also took the 2-deep decision tree idea and added a crucial, practical element to it. Namely, you prune the tree as you go along, winnowing it down to only the most likely candidates. Your proposed heuristic reduces an otherwise exhaustive search down to a doable quarter million iterations on a personal computer.

Regarding your ambivalence to the Mastermind "impossible negative" tactic, the simple test you describe is a pretty straightforward expected value calc at the tip of each decision tree branch: which choice yields the better payoff for each endgame situation, pass or punt? <wink>

"It is fine to guess something that could not be a winner if it will narrow down guess 3 for the win." (Bill J)

I know you've got lots of other real work to do down there in Florida, so I don't expect a response. (The luxury of retirement allows me to toy with these kinds of puzzles to my heart's content.) At some point, though, I would love to see the endgame Guess #3 histogram of residual word candidates over multiple runs, whenever you eventually implement your algorithm:

"I am pretty sure after Guess 2 or after Guess 3, there will only be one solution." (Bill J)

Then, if you subsequently elect to run this solver engine against every single word in the Wordle dictionary, that would constitute pretty decisive proof of your algo's efficacy.

In closing, young Bill and Larry may have been justly proud of their teenage computer prowess. However, here's a tidbit to mull over. A few years back, I contacted a young college kid who'd written a Minesweeper solver, complete with web player interface. (This and Solitaire were Microsoft's free PC games back then, I'm sure you remember.) He shared both his code (C++, running on a real computer, not a PC) and his algorithm, which was just stunning in its sophistication. And, unlike yours and my rote sol'ns from way back, his was an exhaustive, analytic solver, guaranteeing the best possible probabilistic play given any 10x10 game board. Admirable. And, humbling. I assume Google or Wall Street has long since snapped him up ...

I remember studying Arthur Samuel's checkers player, written in assembler for an IBM 701 in 1952, followed a couple decades later by Mikhail Botvinnik's chess-playing algorithm. He combined his ~20 year world chess champion skills with his "day job," back in the USSR, as an electrical engineer and computer scientist, to develop one of the 1st workable chess-playing programs. His algo took an entirely different approach to the game from my measly coder perspective. Very illuminating. And, just plain cool stuff!

Anyway, hope to hear your Wordle routine results, whenever time allows you to get back to it. Then, all your fans out here across the Internet will look forward to awarding you the "Big Blue" crown for Wordle. Cheers!
 
Hands down, your proposed further algo evolution yields the best bang for the buck, Bill -- a straightforward, logical extrapolation of your work to-date on it. But, I just can't resist expanding on those chess-like decision tree build and prune thoughts from my prior post. Now, you've got better things to do with your life, I realize. I just want to lay it out here, for you and your Excel brain trust's consideration, in case it could spark some hybrid ideas there in your own shop. So, here goes ...

The MasterMind analogy is instructive, at least for its proof of the utility of selectively invoking "impossible negative" guesses. However, unlike MasterMind, Wordle clue results are neither symmetric nor transitive. And, thus, such a pure logic sol'n which is derivable for MasterMind is not also possible for Wordle. Yet, Wordle is, like MasterMind, still a deterministic game. (As are checkers and chess, btw -- at least theoretically.) And, unlike chess, Wordle's complete possible outcome set is actually calculable.

Altho' the number of potential Wordle games is very large, it is still reasonably finite. It's 2500 for all possible 1 guess games, + 2500x2500 for all 2 guess games, + 2500x2500x2500 for 3 guess games, etc. Now, the vast majority of such games would be idiotic, of course, and thus obviously not worth exploring. Yet, this means that a unique "best guess" sequence can be determined for each and every one of those 2500 possible game words. The key question is, how much calculation is worth cranking through for how much incremental sol'n improvement? Here's where a prudent, iterative decision tree building and pruning algorithm proves essential. And, this is my proposal for such an algorithm:

Guess #1: Select each of the 2500 possible solution words, one at a time.
Clue #1: For each of these possible solution words, generate its 3^5 = 243 possible clue results, with each clue yielding its own unique, reduced candidate word array. (Utilizing your magnificent winnowing engine, Bill, already built!)
Tally these individual clue candidate word counts across all 243 arrays into a single aggregate word count score. This resultant count ultimately becomes the comparative rating system for the relative "resolving power" across all 2500 possible guess #1 words. So, low score wins, since the objective of each guess is to minimize the remaining list of eligible solution words.
However, that's just round 1, yielding only a provisional solution. It isn't sufficient/definitive, because the various sequences of guess #1 + guess #2 + guess #3 + guess #n turn out to be highly inter-dependent. Hence, we really need chained results (Bayesian what/ifs) across each complete game in order to answer what is truly the "best guess" combo for any particular solution word. (Plus, such a determination can be made with confidence only in retrospect!)
Guess #2: So, as an heuristic, select now, say, the 10 lowest guess #1 scoring words, one by one.
Clue #2: Expand out exactly the same sort of 243 clue results tree as done for clue #1, above, and similarly score their respective "resolving powers."
Guess #n and Clue #n: Apply this same algorithm, recursively, 'til all 2500 solution words are "present and accounted for," however long each of their individual "best guess" word chains turns out to be. Thus, of all the zillions of possible combos enumerated in paragraph 2, above, only this tiny optimal subset remains.

Is this a perfect solution? No! Because we've somewhat arbitrarily pruned the decision tree down to its 10 provisionally best candidates after each guess, based on their discriminant value. An analogy would be to a chess-playing algorithm which sees ahead 10 half-moves (which is magnificent, btw). Yet, this same algorithm is blind to a potential checkmate opportunity at move #11, which a human grandmaster would see in a heartbeat.

Fortunately, unlike chess games, which are typically 40, 50, 60, or more moves long, it appears that Wordle games are usually over after just 3 or 4 expert guesses. Plus, the winnowing process in Wordle appears to be much more uniform and orderly than trying to apply a similar scheme to the multitude of potential moves available from any given chess position.

So, how aggressively you prune your decision tree after each move affects how confident you can be that you've truly reached the "best possible" solution. Now, some further sensitivity testing, experimenting with varying pruning depths, could probably resolve such a question to any level of certainty desired.

Finally, you ask, should you include all 10,000 "impossible negative" words, from that list you've recently uncovered, in your early guesses? Almost certainly the answer is yes, since at least some of them will likely prove to have greater "resolving power" scores than the much smaller 2500 word set of actual possible solutions. Only in later guesses, once the solution candidate list has been greatly reduced, will you need to run that expected value tradeoff calc, as described in my previous post, to decide whether to "fish or cut bait." Modifying my proposed algorithm above to explore all 12,500 words (2500 possible + 10,000 impossible) would be, while not trivial, merely a linear expansion of the computational effort required, not a geometric or exponential one. Thus, it's probably worth such an expansion for its likely improved sol'n yield.

Whew. Long post. Sorry. I look forward to seeing where you eventually land on all this, Bill, once you find the time to build and publish your planned further refinements to this Wordle solver (and, accept appropriate kudos), out there on YouTube.
 
The Insanity puzzle now looks vaguely familiar.
A Minesweeper Solver is something that I would buy for myself for Christmas.

Do we really need to study all 3^5 243 possibilities?
After guess 1, I am usually down to 80 possible words. After guess 2, I am down to 4-6 words.

As I solve each day's puzzle, I am realizing that I am in an interesting quandary when I get to four words left.
(Each of the last three days, I have been down to four words after two guesses).

Obviously, I have a 25% chance of winning on this guess.
But let's say I don't win on this guess. I want to make sure that I minimize the words left after guess 3 to one word. Having 2 words or (horrors!) 3 words left after guess 3 is not desired.

I've been manually drawing out this 4x4 matrix each day on a piece of paper. What if I guess X and the answer is Y?
PenultimateGuess.png


Then I look at the four Results that could happen in each case to see if I have four unique responses in that column.

In the image above, guessing the word DRUNK is a bad choice, because if I am wrong, I have learned nothing about the remaining three words.
For Friday and Saturday, there was one definitive guess that generated four unique answers.

Today, though, none of the four words would guarantee a penultimate guess. If I would guess CRUMB, there is a 25% chance of winning in three, 25% chance of winning for sure in 4, and a 50% chance of [50% winning in 4, 50% of winning in 5].

Does the logic have to change when I get to 4, 5, or 6 words left? Or is this all logic that I should be using before Guess 1? If the matrix is 2,322 x 2,322, I could generate a list of guesses that would maximize the number of unique values and/or minimize the How Many Times Did This Answer happen?

I need to built out some functions here. Like you say, I can prove these theories by running the algo. I am working on the some of the algo functions now.
 

Forum statistics

Threads
1,223,640
Messages
6,173,501
Members
452,517
Latest member
SoerenB

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top