Python Wordle on GitHub

Let’s save some test time by saving some data. And let’s look at the data to see what we can learn naively. Spoiler: I quickly ditch the caching idea.

By “naively”, I mean “without resorting to information theory”, which I did study briefly some decades ago, but about which I only remember a bit1 now. I do semi-plan to learn more someday and see if I’d like to write about it, but today is not that day. And that day may never come.

Yesterday’s exercise created a dictionary of information for each word in the big list, containing an entry for each word in the solutions, indexed by the score for that word against the original word. In other words (sorry) we can use that dictionary given a guess and its score, to instantly return all the solutions that could possibly be represented.

Note to Self:
30 seconds for 12000 words. I’ll return to that thought.

As I was about to say, creating that dictionary of dictionaries takes 30 seconds to create, which is a long time if you’re running unit tests. In addition, my tests are creating word lists from files rather often. How often? 14 times, and there are only three files, and we only really use two of them very often. A bit of caching of those files seems to make some sense. Our WordCollection object—the object into which we read a file—is immutable as far as I know, so we are surely safe caching them.

Let’s do that, not because it’s actually important, but just because it’s a bit of fun and it’s always good to speed up the tests.

But first, that note to self. The data structure I described above provides, for each possible guess, a collection of the possible scores that guess might see, and for each score, the words from the solution list that would provide that score. It takes 30 seconds to create that 12000+ entry structure. So it takes less than 0.0025 seconds to create one word’s information. If we were willing to cheat, i.e. to write a solver that knew the solutions list, it would be quite simple, given a guess, to fetch all the possible solutions and retain the list. Do that a couple of times and the intersection of those lists would be the answer.

Unfortunately, we can’t do that for all 12000 words, because it would be truly unfair to submit them all to get their score. Unless we were to spawn 12000 bots into the internet, log each one in to Wordle … no, that would be bad.

Change of Plan

I’ve lost interest in caching my file reads. Instead, I want to get some statistics from the big dictionary thing. It seems to me that we can get some ideas about good words to use as guesses. We reason as follows:

We have, for each possible guess, the specific scores it could get, and how many solutions would have provided that score. A good word to choose would get a large number of groups, with each group being about the same size. Why? Because, then, for our next guess, we would have, on the average, the smallest number of possible solutions to eliminate.

Let’s get some statistics from our dictionary structure. These runs will take some time but they will be interesting.

First question: min, max, and average number of score dictionaries for each word. Second question: min, max, average number of elements in that word’s score dictionaries.

Let’s write a test against SolutionDictionary to answer these questions. We’ll use a subset of the data to drive out the statistics.

    def test_easy_statistics(self):
        all_guesses = WordCollection.from_file("valid_combined.txt")
        guesses = all_guesses.words[0:10000:500]
        all_solutions = WordCollection.from_file("valid_solutions.txt")
        solutions = all_solutions.words[0:2000:100]
        sd = SolutionDictionary(guesses, solutions)
        stats = sd.create_statistics()
        stat = stats[0]
        assert stat["buckets"] == 100

I’m thinking here that the stats will be a dictionary of statistics for each word. I hate that idea already. We’ll make a little data class.

    def test_easy_statistics(self):
        all_guesses = WordCollection.from_file("valid_combined.txt")
        guesses = all_guesses.words[0:10000:500]
        all_solutions = WordCollection.from_file("valid_solutions.txt")
        solutions = all_solutions.words[0:2000:100]
        sd = SolutionDictionary(guesses, solutions)
        stats = sd.create_statistics()
        stat = stats[0]
        assert stat.number_of_buckets == 100

Now to write create_statistics:

class SolutionDictionary:
    def create_statistics(self):
        stats = []
        for word in self.dict:
            word_dict = self.dict[word]
            stat = Statistic(word, len(word_dict))
            stats.append(stat)

        def my_key(stat: Statistic):
            return -stat.number_of_buckets

        stats.sort(key=my_key)
        return stats

The good news is that that creates a list of words and associates the number of buckets with each of them, and sorts them high to low. I modified the tests to print a bit:

    def test_easy_statistics(self):
        all_guesses = WordCollection.from_file("valid_combined.txt")
        guesses = all_guesses.words[0:10000:500]
        all_solutions = WordCollection.from_file("valid_solutions.txt")
        solutions = all_solutions.words[0:2000:100]
        sd = SolutionDictionary(guesses, solutions)
        stats = sd.create_statistics()
        print()
        for stat in stats:
            print(stat)
        stat = stats[0]
        assert stat.number_of_buckets == 100

That gives me some insight:

S('berth' buckets:  15
S('cupel' buckets:  15
S('chuts' buckets:  14
S('doris' buckets:  14
S('salal' buckets:  14
S('aahed' buckets:  13
S('kutis' buckets:  13
S('ardri' buckets:  12
S('eupad' buckets:  12
S('herns' buckets:  12
S('recon' buckets:  12
S('jeans' buckets:  11
S('pases' buckets:  11
S('brown' buckets:  10
S('goeth' buckets:  10
S('mitis' buckets:  10
S('powan' buckets:  10
S('fouds' buckets:   9
S('nixed' buckets:   8
S('lownd' buckets:   7

What even ARE these words??? I begin to see why Grant Sanderson fetched a list of common words to use. I’m not ever going to guess “mitis”.

What have I learned from this first statistics test? Quite a bit, mostly not about Wordle.

  1. It was hard to think about the dictionary structure. This is a result, of course, of using a native collection of native collections. There’s no semantics in there, other than in my head, which is a very bad place to keep things. We should come up with a couple of classes here. We might even enlist their help in creating statistics, since when the SolutionDictionary is created, we have the numbers as our fingertips. (We should probably rename that class to something other than “dictionary” while we’re at it.)

  2. Getting information from tests this way is kind of tedious. I am beginning to see why Hill, I happen to know, has been writing little GUIs to display interesting statistics. I have avoided learning tkinter or any other GUI-making Python thing, so I’d have to divert to learn, but it might pay off. For now, we’ll continue to suffer.

  3. A little bit better formatting might help.

  4. Maybe saving the statistics somewhere will be handy. Possibly a StatisticsList that knows how to sort on various columns would be handy. Here again we re-learn the lesson about not using native collections.

Let’s address the native collections issue in the SolutionDictionary. It looks like this now:

class Statistic:
    def __init__(self, word, buckets):
        self.word = word
        self.number_of_buckets = buckets

    def __repr__(self):
        return f"S('{self.word.word}' buckets: {self.number_of_buckets:3d}"

class SolutionDictionary:
    def __init__(self, guesses, solutions):
        self.dict = self.create_dict(guesses, solutions)

    @staticmethod
    def create_dict(guesses, solutions):
        solutions_dict = {}  # guess -> dict (score -> [solutions])
        for guess in guesses:
            guess_dict = {}  # score -> [solutions]
            solutions_dict[guess] = guess_dict
            for solution in solutions:
                score = guess.score(solution)
                if not score in guess_dict:
                    guess_dict[score] = []
                guess_dict[score].append(solution)
        return solutions_dict

    def create_statistics(self):
        stats = []
        for word in self.dict:
            word_dict = self.dict[word]
            stat = Statistic(word, len(word_dict))
            stats.append(stat)

        def my_key(stat: Statistic):
            return -stat.number_of_buckets

        stats.sort(key=my_key)
        return stats

    def solutions_for(self, word, score):
        try:
            return self.dict[word][score]
        except KeyError:
            return []

Let’s think about what this this is trying to be. Maybe it’s an Analysis, or a Directory. We want to ask it for the possible solutions given a guess and a score.

Change that method right now:

    def solutions_for(self, guess, score):
        try:
            return self.dict[guess][score]
        except KeyError:
            return []

Now looking at it, it’s returning, well, from its name, solutions. It’s the possible solutions, given that Word guess scored score. And it’s a list of Words.

But the code. The code says self.dict[guess][score]. What is the object that self.dict[guess] returns? it is a dictionary, a map, from score to words with that score, given guess. Let’s call that inner thing a ScoredWords.

It is a kind of dictionary, clearly. I am reluctant to subclass system classes. We’ll create a class that has a dictionary.

This is a refactoring: We have tests that will break until this works. But I think I’d like to drive out some classes directly. What does a ScoredWords do? It has a score, and a list of words. Wait. We have a perfectly good thing to be a list of words, a WordCollection. Let’s use it.

    def test_drive_out_scored_words(self):
        scored = ScoredWords(10101)
        assert scored.score == 10101

Create the new class.

class ScoredWords:
    def __init__(self, score):
        self.score = score

That passes. Make the test do a bit more:

    def test_drive_out_scored_words(self):
        scored = ScoredWords(10101)
        assert scored.score == 10101
        assert not scored.words

Implement a bit:

class ScoredWords:
    def __init__(self, score):
        self.score = score
        self.words = WordCollection()

Passing. Commit: Early days ScoredWords. Now we’ll add some words and make sure we get them back.

    def test_drive_out_scored_words(self):
        scored = ScoredWords(10101)
        assert scored.score == 10101
        assert not scored.words
        scored.add_word(Word("abcde"))
        scored.add_word(Word("fghij"))
        assert len(scored.words) == 2

And:

class ScoredWords:
    def __init__(self, score):
        self.score = score
        self.words = WordCollection()

    def add_word(self, word: Word):
        self.words.append(word)

And the test runs. Commit: Can add words to ScoredWords.

Now let’s use it in our SolutionDictionary:

    @staticmethod
    def create_dict(guesses, solutions):
        solutions_dict = {}  # guess -> dict (score -> [solutions])
        for guess in guesses:
            guess_dict = {}  # score -> [solutions]
            solutions_dict[guess] = guess_dict
            for solution in solutions:
                score = guess.score(solution)
                if not score in guess_dict:
                    guess_dict[score] = ScoredWords(score)
                guess_dict[score].add_word(solution)
        return solutions_dict

This breaks a test, which needs a small fix:

    def test_solution_dictionary(self):
        all_guesses = WordCollection.from_file("valid_combined.txt")
        all_solutions = WordCollection.from_file("valid_solutions.txt")
        guesses = all_guesses.words[0:10000:500]
        assert len(guesses) <= 20
        assert Word("berth") in guesses
        solutions = all_solutions.words[0:2000:100]
        assert len(solutions) <= 20
        assert Word("frail") in solutions
        solution_dict = SolutionDictionary(guesses, solutions)
        guess = Word("berth")
        solution = Word("frail")
        score = guess.score(solution)
        assert score == 100
        solutions = solution_dict.solutions_for(guess, score)
        expected = ScoredWords(score)
        items = [Word("frail"), Word("grasp"), Word("rival")]
        for item in items:
            expected.add_word(item)
        assert solutions == expected

And we need to implement __eq__ on ScoredWords, which we do naively:

    def __eq__(self, other):
        return self.score == other.score and self._matches(other)

    def _matches(self, other):
        my_words = set(self.words.words)
        his_words = set(other.words.words)
        return my_words == his_words

Now a convenience method on ScoredWords. In the test:

...
        solutions = solution_dict.solutions_for(guess, score)
        expected = ScoredWords(score).with_strings("frail", "grasp", "rival")
        assert solutions == expected

And:

    def with_strings(self, *strings):
        for string in strings:
            self.add_word(Word(string))
        return self

I think that’s a bit naff, because it is a mutator and returns self. We could do a class method. called this way:

...
        expected = ScoredWords.from_strings(score,"frail", "grasp", "rival" )
        assert solutions == expected

Implemented:

class ScoredWords:
    def __init__(self, score, words=None):
        self.score = score
        if words:
            self.words = words
        else:
            self.words = WordCollection()

    @classmethod
    def from_strings(cls, score, *strings):
        words = WordCollection.from_strings(*strings)
        return cls(score, words)

class WordCollection:
    @classmethod
    def from_strings(cls, *strings):
        words = [Word(string) for string in strings]
        return cls(words)

We’re green. Commit: from strings methods tidy testing a bit.

Reflection

I was leaning too far forward, taking too long strides in that last bit. I nearly lost the thread. Let’s settle down.

We were doing statistics and observed that creating them, and creating the SolutionDictionary itself was a bit messy. We created the new ScoredWords object to help improve clarity.

Here’s the statistics creator now. Let’s try to add in another statistic, the maximum number of words in any ScoredWords.

I’ll do this without a test, shall I? No, now that I asked, I shan’t. I’ll posit a new stat and assert on it:

        stat = stats[0]
        assert stat.number_of_buckets == 15
        assert stat.max_words == 30

No such value on stat. Add it:

class SolutionDictionary:
    def create_statistics(self):
        stats = []
        for word in self.dict:
            word_dict = self.dict[word]  # {score -> scoredWords}
            max_words = max(len(bucket) for bucket in word_dict.values())
            stat = Statistic(word, len(word_dict), max_words)
            stats.append(stat)

        def my_key(stat: Statistic):
            return -stat.number_of_buckets

        stats.sort(key=my_key)
        return stats

Let’s do the min and average. Then I need a serious think or two.

class SolutionDictionary:
    def create_statistics(self):
        stats = []
        for word in self.dict:
            word_dict = self.dict[word]  # {score -> scoredWords}
            
            number_of_buckets = len(word_dict)
            max_words = max(len(bucket) for bucket in word_dict.values())
            min_words = min(len(bucket) for bucket in word_dict.values())
            avg_words = sum(len(bucket) for bucket in word_dict.values()) / number_of_buckets
            stat = Statistic(word, number_of_buckets, max_words, min_words, avg_words)
            stats.append(stat)

        def my_key(stat: Statistic):
            return -stat.number_of_buckets

        stats.sort(key=my_key)
        return stats

class Statistic:
    def __init__(self, word, number_of_buckets, max_words, min_words, avg_words):
        self.word = word
        self.number_of_buckets = number_of_buckets
        self.max_words = max_words
        self.min_words = min_words
        self.avg_words = avg_words

    def __repr__(self):
        return (f"{self.word.word} {self.number_of_buckets:4d} {self.min_words:5d} "
                f"{self.avg_words:7.2f}{self.max_words:5d}")

    @classmethod
    @property
    def header(cls):
        return "\nWord  Buckets Min   Avg   Max"

And tweaking the test a bit:

Word  Buckets Min   Avg   Max
berth   15     1    1.33    3
cupel   15     1    1.33    4
chuts   14     1    1.43    3
doris   14     1    1.43    4
salal   14     1    1.43    4
aahed   13     1    1.54    5
kutis   13     1    1.54    5
ardri   12     1    1.67    5
eupad   12     1    1.67    7
herns   12     1    1.67    4
recon   12     1    1.67    4
jeans   11     1    1.82    5
pases   11     1    1.82    4
brown   10     1    2.00    5
goeth   10     1    2.00    6
mitis   10     1    2.00    6
powan   10     1    2.00    8
fouds    9     1    2.22    6
nixed    8     1    2.50    8
lownd    7     1    2.86    8

Reflection

Commit: Producing min, average, and max bucket size in stats.

We’ve created a couple of little objects to help us get information and to help make things more clear. I think we’re doing fairly well on the info, and not very well on the clarity.

Our new objects include Statistic, which includes a score, a bucket count and the minimum, average and maximum number of words in each bucket. That’s close to useful, in that a “good” statistic is one with a high number of buckets, and a narrow range of min and max, so that the buckets are about the same size.

We also created the ScoredWords object, which is just an object containing a score and a collection of words with that score. (The ScoredWords doesn’t know how those values came about. Perhaps it should: if it had the original guess it could in principle recreate its contents.) The SolutionDictionary is now a dictionary from guess to a dictionary of score to instances of ScoredWord. So we haven’t really made SolutionDictionary more clear … yet.

I want to think about it a bit more but probably we want a SolutionDictionary to be something like a map from guess to a ScorePartition, which is (inside) a map / dictionary from score to ScoredWords. What I’m wanting is a set of terms that I can use to describe how the thing works, and that a listener like me or you could remember. Right now, dictionary of dictionary doesn’t ring any chimes in my very limited brain.

All that said, what I think we might want to do is to create the statistics as we go rather than after the fact. I could be wrong about that. We’re going to create a bit map / dictionary either way, and iterating it to get information won’t be much slower than getting the information along the way, and might offer more analytical flexibility, so that we might query the big thing for the words with some characteristic or other.

In fact … that might actually be interesting to do. In fact, while I finish up here, I’ll run the big statistics and print a few lines.

Summary

What you’re observing here is someone thinking about Wordle and what information is out there, and fiddling with it to gain enough intuition about it to work out something useful, perhaps good words to guess. I do not have a real plan, nor a deep thought understanding of Wordle. I’m using the computer to gain understanding, hoping then to come up with something we could imagine was useful in playing the game.

I do not see much fun in actually having the program play the game other than to get statistics or performance data on strategies that computers might follow to solve the game. What will happen? I honestly do not know.

News Flash

Here are the first 20 items from the full scan:

Word  Buckets Min   Avg   Max
trace  150     1   15.43  246
crate  148     1   15.64  246
salet  148     1   15.64  221
reast  147     1   15.75  227
slate  147     1   15.75  221
carte  146     1   15.86  246
parse  146     1   15.86  270
caret  145     1   15.97  246
peart  145     1   15.97  268
carle  144     1   16.08  249
crane  142     1   16.30  263
earst  142     1   16.30  227
stale  142     1   16.30  221
heart  141     1   16.42  261
reist  141     1   16.42  247
least  140     1   16.54  221
react  140     1   16.54  246
carse  139     1   16.65  237
cater  139     1   16.65  246
prase  138     1   16.78  270

Interesting. I happen to know that crate and salet, as well as slate and crane, have been among words that other analysts have found to be potentially powerful initial choices. And I am wondering about that Min = 1 throughout. I wonder if there are any stats where min is not 1. Change test:

        stats = sd.create_statistics()
        print(Statistic.header)
        for i in range(20):
            print(stats[i])
        non_ones = [stat for stat in stats if stat.min_words > 1]
        print(len(non_ones))
        # print("here", non_ones[0])
        assert False

Sure enough it says there are none. I must look into this. Later. For now, I deserve a break.

See you next time!


  1. Ha Ha, “bit”, get it?