An odd Wordle observation. W13.

Python Wordle on GitHub

GeePaw Hill and I both noticed something odd about the Wordle words. Is it an inherent property of the game, chance, or intelligent design? Spoiler: It’s neither!

Hill and I both noticed something odd about the Wordle dictionaries, which I will call the Singleton Property:

For every guess g in the guess set G, there is exactly one score such that there is exactly one solution s in the solution set S that gets that score.

Foreshadowing: I want to emphasize that Hill and I both agreed that this property held. I think that I asserted it, and that Hill confirmed, perhaps without checking his statistics. Mistakes in this idea, if any, are mine.

I felt that this must somehow be an inherent mathematical property of the game. Hill felt that the solutions were somehow curated to produce this property. I think we both agree that the property is very unlikely to have arisen by chance.

I convinced myself this morning, while trying not to get up too early, that it cannot be an inherent property. The reasoning is simple:

For a given guess g there is one solution s in the solutions S such that score(g, s) is X, and no other solution s' in S produces that score. Therefore, if the solution s were not in the solutions, the singleton property would not hold. Therefore, the singleton property is not some inherent property of arbitrary sets of guesses and solutions.

I suppose it could be some more complex mathematical property of a solution set or guess set, but I’m at least convinced that it’s down to chance or intelligent design, that we have a set of guesses and solutions with this property.

I propose this morning to do some work to verify that we really have this property and to explore whatever that brings to mind.

Down Tuit

In my Wordle code, I have an object Statistic:

class Statistic:
    def __init__(self, word, number_of_buckets, max_words, min_words, avg_words, expected_info):
        self.word = word
        self.number_of_buckets = number_of_buckets
        self.max_words = max_words
        self.min_words = min_words
        self.avg_words = avg_words
        self.expected_info = expected_info

The SolutionDictionary can produce instances of Statistic:

class SolutionDictionary:
    def create_statistics(self):
        stats = []
        for word in self.dict:
            guess_description = self.dict[word]  # {score -> scoredWords}
            expected_info = guess_description.expected_information()
            number_of_buckets = guess_description.number_of_buckets
            max_words = max(len(bucket) for bucket in guess_description.buckets)
            min_words = min(len(bucket) for bucket in guess_description.buckets)
            avg_words = sum(len(bucket) for bucket in guess_description.buckets) / number_of_buckets
            stat = Statistic(word, number_of_buckets, max_words, min_words, avg_words, expected_info)
            stats.append(stat)

I would like to begin by exploring whether min_words is always 1 (meaning that there is at least one score such that only one solution generates that score against the given word), then whether there is always only one such bucket, and then what the guess and solution words actually are.

I think that for convenience, I’ll just pass the SolutionDictionary into the Statistic, which will let us increase its intelligence to help answer questions.

Change Signature.

class Statistic:
    def __init__(self, word, number_of_buckets, max_words, min_words, avg_words, expected_info, solution_dictionary):
        self.word = word
        self.number_of_buckets = number_of_buckets
        self.max_words = max_words
        self.min_words = min_words
        self.avg_words = avg_words
        self.expected_info = expected_info
        self.solution_dictionary = solution_dictionary

class SolutionDictionary:
    def create_statistics(self):
        stats = []
        for word in self.dict:
            guess_description = self.dict[word]  # {score -> scoredWords}
            expected_info = guess_description.expected_information()
            number_of_buckets = guess_description.number_of_buckets
            max_words = max(len(bucket) for bucket in guess_description.buckets)
            min_words = min(len(bucket) for bucket in guess_description.buckets)
            avg_words = sum(len(bucket) for bucket in guess_description.buckets) / number_of_buckets
            stat = Statistic(word, number_of_buckets, max_words, min_words, avg_words, expected_info, self)
            stats.append(stat)

Now, the things we want to know about:

Is min_words always 1?
Is there always just one bucket of length one?
What are the guess and the solution?

Immediate Reversal: Upon refection, I think I could have done better to pass in the GuessDescription rather than the whole SolutionDictionary. Let’s just fix that.

class Statistic:
    def __init__(self, word, number_of_buckets, max_words, min_words, avg_words, expected_info, guess_description):
        self.word = word
        self.number_of_buckets = number_of_buckets
        self.max_words = max_words
        self.min_words = min_words
        self.avg_words = avg_words
        self.expected_info = expected_info
        self.guess_description = guess_description

class SolutionDictionary:
    def create_statistics(self):
        stats = []
        for word in self.dict:
            guess_description = self.dict[word]  # {score -> scoredWords}
            expected_info = guess_description.expected_information()
            number_of_buckets = guess_description.number_of_buckets
            max_words = max(len(bucket) for bucket in guess_description.buckets)
            min_words = min(len(bucket) for bucket in guess_description.buckets)
            avg_words = sum(len(bucket) for bucket in guess_description.buckets) / number_of_buckets
            stat = Statistic(word, number_of_buckets, max_words, min_words, avg_words, expected_info, guess_description)
            stats.append(stat)

Reflection

Could I have avoided that immediate correction by thinking harder or being smarter? Perhaps. But seeing the concrete code allowed me to focus better on how I’d use it. So I am satisfied by what happened.

Back Tuit

Let’s count the singletons. I’ll do it in the init.

class Statistic:
    def __init__(self, word, number_of_buckets, max_words, min_words, avg_words, expected_info, guess_description):
        self.word = word
        self.number_of_buckets = number_of_buckets
        self.max_words = max_words
        self.min_words = min_words
        self.avg_words = avg_words
        self.expected_info = expected_info
        self.guess_description = guess_description
        self.singletons = [bucket for bucket in guess_description.buckets if len(bucket) == 1]
        
    def __repr__(self):
        return f"{self.word.word} #singletons: {len(self.singletons)}"

And I have a test that does some stats. I run it and get this:

berth  3.78   15     1    1.33    3 #singletons: 11
cupel  3.72   15     1    1.33    4 #singletons: 12
chuts  3.61   14     1    1.43    3 #singletons: 11
doris  3.58   14     1    1.43    4 #singletons: 11
salal  3.58   14     1    1.43    4 #singletons: 11
aahed  3.44   13     1    1.54    5 #singletons: 9
herns  3.42   12     1    1.67    4 #singletons: 6
kutis  3.40   13     1    1.54    5 #singletons: 10
recon  3.35   12     1    1.67    4 #singletons: 8
ardri  3.34   12     1    1.67    5 #singletons: 7
pases  3.18   11     1    1.82    4 #singletons: 7
jeans  3.17   11     1    1.82    5 #singletons: 7
eupad  3.14   12     1    1.67    7 #singletons: 9
brown  3.04   10     1    2.00    5 #singletons: 5
goeth  3.01   10     1    2.00    6 #singletons: 5
mitis  2.97   10     1    2.00    6 #singletons: 6
fouds  2.81    9     1    2.22    6 #singletons: 5
powan  2.78   10     1    2.00    8 #singletons: 7
nixed  2.55    8     1    2.50    8 #singletons: 4
lownd  2.25    7     1    2.86    8 #singletons: 4

Right. Or should I say WRONG? Every word in this sample has multiple singletons, scores such that only one solution produces that score. It is true that every word has at least one singleton, but it is not true that any of the words here has exactly one singleton.

My whole assertion about the singleton property is mistaken!

Now I wonder if there are any words such that there is exactly one singleton. I am suspecting not.

So I change this:

    def create_statistics(self):
        stats = []
        for word in self.dict:
            guess_description = self.dict[word]  # {score -> scoredWords}
            expected_info = guess_description.expected_information()
            number_of_buckets = guess_description.number_of_buckets
            max_words = max(len(bucket) for bucket in guess_description.buckets)
            min_words = min(len(bucket) for bucket in guess_description.buckets)
            avg_words = sum(len(bucket) for bucket in guess_description.buckets) / number_of_buckets
            stat = Statistic(word, number_of_buckets, max_words, min_words, avg_words, expected_info, guess_description)
            stats.append(stat)

        def my_key(statistic: Statistic):
            return statistic.singleton_count

        stats.sort(key=my_key, reverse=False)
        return stats

Now we’re sorting on singleton_count, so if there are any words with just one singleton, they’ll print first. The print says this:

Word  Info  Buckets Min   Avg   Max
abaca  3.16   38     1   60.92  993 #singletons: 3
chizz  2.90   34     1   68.09 1112 #singletons: 3
qajaq  1.89   18     1  128.61 1369 #singletons: 3
zizit  2.60   28     1   82.68 1170 #singletons: 3
irids  4.08   59     1   39.24  555 #singletons: 4
jugum  2.39   36     1   64.31 1410 #singletons: 4
kukus  2.43   28     1   82.68 1258 #singletons: 4
nikau  4.21   58     1   39.91  411 #singletons: 4
quiff  2.60   35     1   66.14 1172 #singletons: 4
quoll  3.45   45     1   51.44  890 #singletons: 4
vizir  3.06   44     1   52.61  950 #singletons: 4
...

And so on. So there are in fact no words with my supposed singleton property.

Debacle? Or Devilish Cleverness?

Here I am, writing about a case where, based on what I thought my stats were telling me, I drew a completely erroneous conclusion. What’s up with that?

Well, my practice here is to be quite up front about my mistakes, although mostly the mistakes I write about are in the code, showing how I work my way through the twisty little paths all different that lead from where we are to where we want to be. There will be missteps, plain typographical errors, poor design decisions, all the mistakes that I can make while finding my way toward a solution that works and whose code I can tolerate.

But here … this was a pure and seemingly harmless belief that I had about the data, the idea that each word had one mystical aspect, a single word in the solutions that produced a score that no other solution produced, and that that was the only such word in the solutions.

It turns out that there are many such words in the solutions.

Does it matter? Well, if we were going to write a solver for Wordle, it might well matter, because we might find some clever way to take advantage of the singleton property. If we were to do that, there’s a good chance that our solver would be, um, inferior.

So this morning, dozing, I realized that the singleton property couldn’t be built into reality, that if a guess did have the singleton property, removing one solution from the dictionary would make it no longer have the property. And that led me to want to explore the data further, and that exploration led me to the truth … or something a lot like the truth:

For every word in the guesses, and for every word in the solutions, in the collection of score:{solutions_with_that_score}, there are several such scores where only one word in the solutions will produce that score.

Lesson?

It’s better to measure than to guess? Don’t trust Ron? Don’t trust your guesses as much as you’d like to?

Me, I’m a bit embarrassed by having made a mistake, but frequent readers know that I make many mistakes, so that my incremental embarrassment per mistake is pretty small.

I’m glad we had this little chat.