Python Wordle on GitHub

A bit of study tells me that my scoring algorithm is wrong, although there seems to be some debate about it. Then I have to figure out what to do next. I burn an hour of CPU time.

There is debate about the handling of matches like this:

g: AAXXX
s: ABCDE

The question is what the score for guess “g” should be for the second A. Should it be yellow (my 1) because there is an A in the solution, or should it be grey (my 0) because the first A has “used up” the only A in the solution “s”. I believe that the correct answer is grey / 0, but there are versions of the game out there that work both ways.

I propose to go with what Hill has started to call the “strict” scoring method. It will be interesting to see how we will do it. Let’s begin with a test:

    def test_strict_score(self):
        g = Word("aaxxx")
        s = Word("abcde")
        score = g.score(s)
        assert score == [2, 0, 0, 0, 0]

This should fail saying [2, 1, 0, 0, 0], if I am not mistaken, which I frequently am.

Expected :[2, 0, 0, 0, 0]
Actual   :[2, 1, 0, 0, 0]

Ha! (Tallies a point in the air.) Just as planned. Now the minor detail of how to do it. Here’s the score method now:

    def score(self, solution):
        # word:  abcde
        # sol:   ecbdx
        # score: 01121
        score = [0, 0, 0, 0, 0]
        sol = solution.word
        for i, c in enumerate(self.word):
            if sol[i] == c:
                score[i] = 2
            elif c in sol:
                score[i] = 1
        return score

So. The “idea” is that when a letter is found, whether in the exact spot or elsewhere, the matching letter in the solution should be removed from contention thereafter. And we need to do that in some way that doesn’t mess up the order of things for the subsequent letters. (We need a test for that. I’ve written a note on a card for that.)

The loop depends on the two inputs being parallel lists / vectors / strings whatever, so that we can check for exact match as we do. So we cannot just remove the matched letter from the list, at least not as constituted now. We could, however, replace it with something that cannot match.

Let’s try that. First, we’ll need to make a list that we can safely change, containing the letters of the solution. I think we can just do that.

    def score(self, solution):
        # word:  abcde
        # sol:   ecbdx
        # score: 01121
        score = [0, 0, 0, 0, 0]
        sol = list(solution.word)
        for i, c in enumerate(self.word):
            if sol[i] == c:
                score[i] = 2
            elif c in sol:
                score[i] = 1
        return score

Tests run as before, the new one still failing of course.

The if side is easy, and I think doing just that side will make our test pass.

    def score(self, solution):
        # word:  abcde
        # sol:   ecbdx
        # score: 01121
        score = [0, 0, 0, 0, 0]
        sol = list(solution.word)
        for i, c in enumerate(self.word):
            if sol[i] == c:
                score[i] = 2
                sol[i] = 0
            elif c in sol:
                score[i] = 1
        return score

I just toss in a numeric zero, which will match no letter. Our new test passes. Let’s write another, that will fail.

    def test_strict_score_2(self):
        g = Word("aaxxx")
        s = Word("zzzza")
        score = g.score(s)
        assert score == [1, 0, 0, 0, 0]

This should fail with [1, 1, 0, 0, 0].

Expected :[1, 0, 0, 0, 0]
Actual   :[1, 1, 0, 0, 0]

I love it when a plan comes together. So how to make it work? I guess we want not just whether c is in sol, but its index. I think I’ll go for a simple but perhaps inefficient solution using index after in:

    def score(self, solution):
        # word:  abcde
        # sol:   ecbdx
        # score: 01121
        score = [0, 0, 0, 0, 0]
        sol = list(solution.word)
        for i, c in enumerate(self.word):
            if sol[i] == c:
                score[i] = 2
                sol[i] = 0
            elif c in sol:
                score[i] = 1
                sol[sol.index(c)] = 0
        return score

Green. I have this card saying “test that pool removal does not break exact matches”. I feel confident, but when invited to write a test to check something, it’s a good idea to accept the invitation.

    def test_exact_after_removal(self):
        g = Word("azczx")
        s = Word("axcxq")
        score = g.score(s)
        assert score == [2, 0, 2, 0, 1]

This passes, not surprising me a bit. Commit: Convert to strict scoring.

What now? Let me brainstorm a bit. Well, partly cloudy anyway.

  • analyze the data a bit
  • do something about the disjoint sets of words (!)
  • start working on selecting desirable guesses
  • change score format to color or EYX or something?

Let me explain the one with the bang. The data I’m using came in two files, “valid_guesses.txt” and “valid_solutions.txt”. Those are used, I suppose, in the real Wordle program somehow. But I have shown something important with this test that I wrote yesterday when you weren’t watching:

    def test_no_solutions_in_guesses(self):
        sols = WordCollection.from_file("valid_solutions.txt")
        guesses = WordCollection.from_file("valid_guesses.txt")
        for solution_word in sols:
            assert not guesses.has_word(solution_word), f"guesses includes {solution_word}"

This test checks and determines that none of the solutions are in the valid guesses. As such, if we were to write a program that uses valid guesses as its source of ideas … it could never win, because it could never guess the solution, because it doesn’t know that word. This seems unfair.

One simple solution would be to create a combined file and use it. Yes, why not. I can just do that in Sublime.

Done. Called the file “valid_combined.txt”. Sublime happily sorted it for me. I add a new test to my two existing file tests, with a nice cross-check that it worked:

    def test_reading_guesses(self):
        with open("valid_guesses.txt", "r") as guesses:
            lines = guesses.readlines()
        assert len(lines) == 10657

    def test_reading_solutions(self):
        with open("valid_solutions.txt", "r") as solutions:
            lines = solutions.readlines()
        assert len(lines) == 2315

    def test_reading_combined(self):
        with open("valid_combined.txt", "r") as combined:
            lines = combined.readlines()
            assert len(lines) == 10657 + 2315

Since the files are disjoint, the lines in combined is the sum of the lines in the other two. Life is good. Committed.

Now what?

  • analyze the data a bit
  • do something about the disjoint sets of words (!)
  • start working on selecting desirable guesses
  • change score format to color or EYX or something?

I think that as we start working toward a strategy, we’ll want to score all the words against all the solutions, and it seems likely to me that we’ll want to find words that “do well”. What will we mean by “do well”? Certainly one thing we’ll do is something like this:

Select a word as solution. Select a word as initial guess. Get score. Eliminate from guesses all the words containing letters that scored 0 in our initial guess. Hm. Let’s write a test for that, more to get a sense of how to say it than how to do it, though we’ll move to that quickly. I’m going to make a new set of tests for this. The tests so far have been pretty rudimentary.

class TestCollectionManipulation:
    def test_elimination(self):
        strings = "abcde fghij aghij fghid".split(" ")
        words = [Word(s) for s in strings]
        coll = WordCollection(words)
        assert len(coll) == 4

That’s there to drive out the ability to pass a list of words to create a WordCollection, implemented thus:

class WordCollecion:
    def __init__(self, words=None):
        if words:
            self.words = words
        else:
            self.words = []

Test passes. Commit: WordCollection can be created with list of Words.

Now (a bit late) I rename that test test_creation. We’ll commit that in a moment. No, we’ll commit it now: rename test.

Now lets test elimination, which was the original idea.

I’m just feeling my way here. Here’s what I’ve got:

    def test_elimination(self):
        guess = Word("abcde")
        soln = Word("dayyy")
        score = guess.score(soln)
        assert score == [1, 0, 0, 1, 0]
        strings = "abcde fghij aghij fghid".split(" ")
        words = [Word(s) for s in strings]
        coll = WordCollection(words)
        to_eliminate = guess.to_eliminate(score)
        assert to_eliminate == "bce"

The idea here is that to_eliminate, sent to a word, will return a string containing all the letters that cannot occur in the answer.

That fails for want of the method.

    def to_eliminate(self, score):
        result = ""
        for i, s in enumerate(score):
            if not s:
                result += self.word[i]
        return result

That’s correct but I don’t like the string append that’s going on there. We’re going to be doing a lot of these things, and I know that Ken has had issues with speed. We’ll let it ride for now.

Now to finally do the elimination. Best rename that test along the way. No, I’ll extend it.

    def test_elimination(self):
        guess = Word("abcde")
        soln = Word("dayyy")
        score = guess.score(soln)
        assert score == [1, 0, 0, 1, 0]
        strings = "abcde fghij aghij fghid".split(" ")
        words = [Word(s) for s in strings]
        coll = WordCollection(words)
        to_eliminate = guess.to_eliminate(score)
        assert to_eliminate == "bce"
        limited_words = words.eliminate(to_eliminate)
        assert len(limited_words) == 2

I think we’re going to eliminate the first and last of the words. If I’m not mistaken. Test fails for want of eliminate:

class WordCollection:
    def eliminate(self, string):
        new_words = [word for word in self.words if word.contains_none(string)]
        return WordCollection(new_words)

class Word:
    def contains_none(self, string):
        my_string = self.word
        for c in string:
            if c in my_string:
                return False
        return True

I sort of expect that to work. The test does not agree Good news, it’s wrong, it’s processing a list not a WordCollection. Fix it:

    def test_elimination(self):
        guess = Word("abcde")
        soln = Word("dayyy")
        score = guess.score(soln)
        assert score == [1, 0, 0, 1, 0]
        strings = "abcde fghij aghij fghid".split(" ")
        words = [Word(s) for s in strings]
        coll = WordCollection(words)
        to_eliminate = guess.to_eliminate(score)
        assert to_eliminate == "bce"
        limited_words = coll.eliminate(to_eliminate)  # <===
        assert len(limited_words) == 2

Still fails, gets 3 not 2. It’s returning these three:

[Word(fghij), Word(aghij), Word(fghid)]

It’s right and I’m not. I do want to eliminate two so I’ll change the input list.

    def test_elimination(self):
        guess = Word("abcde")
        soln = Word("dayyy")
        score = guess.score(soln)
        assert score == [1, 0, 0, 1, 0]
        strings = "abcde fghij ageij fghid".split(" ")
        words = [Word(s) for s in strings]
        coll = WordCollection(words)
        to_eliminate = guess.to_eliminate(score)
        assert to_eliminate == "bce"
        limited_words = coll.eliminate(to_eliminate)
        assert len(limited_words) == 2

I changed the third word to contain “e”. I am confident in this code, but I do know it is likely to change. Is this testing robust enough? I think we’ll let it go. If it bites me later, well, you’ll know where the mistake was made.

Commit: WordCollection can eliminate words given a string of letters to eliminate.

Let’s see what happens if we run an elimination on the real data.

I get this far and find a problem:

    def test_big_elimination(self):
        combined = WordCollection.from_file("valid_combined.txt")
        solution = Word("aback")
        guess = Word("abbas")
        score = guess.score(solution)
        assert score == [2, 2, 0, 1, 0]
        to_eliminate = guess.to_eliminate(score)
        assert to_eliminate == "ck"

My to_eliminate is returning “bs”, not “ck” as I expected. This is in fact BS: the right answer is that “abbas” should eliminate only the “s” character. Change the test, then let’s see how to do this.

    def test_big_elimination(self):
        combined = WordCollection.from_file("valid_combined.txt")
        solution = Word("aback")
        guess = Word("abbas")
        score = guess.score(solution)
        assert score == [2, 2, 0, 1, 0]
        to_eliminate = guess.to_eliminate(score)
        assert to_eliminate == "s"

Back to to_eliminate:

    def to_eliminate(self, score):
        result = ""
        for i, s in enumerate(score):
            if not s:
                result += self.word[i]
        return result

I think I’ll capture the ones to keep and then remove them to get the ones to eliminate.

    def to_eliminate(self, score):
        keep = ""
        for i, s in enumerate(score):
            if s:
                keep += self.word[i]
        result = ""
        for c in self.word:
            if c not in keep:
                result += c
        return result

That’s terribly inefficient but works. I want to get to the end of this test and then address efficiency a bit. And we’ll talk about it first.

Tests are green. Commit: fix issue in to_eliminate. Finish the test:

    def test_big_elimination(self):
        combined = WordCollection.from_file("valid_combined.txt")
        solution = Word("aback")
        guess = Word("abbas")
        score = guess.score(solution)
        assert score == [2, 2, 0, 1, 0]
        to_eliminate = guess.to_eliminate(score)
        assert to_eliminate == "s"
        remaining = combined.eliminate(to_eliminate)
        assert len(remaining) == 7036
        check = Word("sssss")
        for word in remaining.words:
            assert word.score(check) == [0, 0, 0, 0, 0]

Green. Commit: can eliminate words from collection based on score. Needs improvement.

I decide to profile the tests, and get this:

part of profile report

There are about a billion lines here. I’ve highlighted the range containing two of my actual methods, score and append. We’re dominated here by the file I/O, and we should deal with that anyway, in a setup method somewhere or something like that.

In the tests so far, we’re doing a lot of scoring and not very much eliminating, so I think that when we get to deeper analysis we’ll likely find the elimination functions coming up higher in the list. Let’s take a quick look at things, maybe make some improvements, but mostly think in terms of first, how to do these things much faster by using a different approach, and second, how to eliminate the need to do them so often.

Suppose score was blindingly fast. Then we could do our elimination by creating a string of the letters to eliminate and removing all the words whose score was non-zero. (We could do that now, actually. I wonder if we should.)

But my point is, we need to look for better ways, not just faster ways.

Premature Optimization?

Is it premature to be thinking about optimizing the program? A case could be made that it is: we have no reason to believe that it is too slow. I can probably think of one, but I need more thinking anyway.

Suppose we wanted to figure out what word is a good guess for the first word. We would want to assess each word in the guess list against each word in the solutions list, and pick the word that “does best”. What would be a good definition for “does best”? I don’t know. Probably we want an information measure. But an ad-hoc measure might be to give each guess word a score vs each test word, perhaps the number of words it eliminates from the guess list given its score.

That might look something like this:

for guess in guesses:
    scores[guess] = 0
    for word in solutions:
        score = guess.score(word)
        elim = guess.to_eliminate(score)
        retained = guesses.eliminate(elim)
        scores[guess] += len(retained)

I’m going to run that.

    def test_massive(self):
        combined = WordCollection.from_file("valid_combined.txt")
        solutions = WordCollection.from_file("valid_solutions.txt")
        scores = {}
        for word in combined.words:
            scores[word.word] = 0
            for soln in solutions. words:
                score = word.score(soln)
                elim = word.to_eliminate(score)
                retained = combined.eliminate(elim)
                scores[word.word] += len(retained)

So far it has been running for about five minutes on my M1 MacBook Air. I think my point is made about needing something more optimal. But the most optimal code is the code we don’t run at all, so if we can think of ways to avoid doing some of this work, that’ll be ideal. It’s definitely not OK for my tests to run for five minutes. Fifteen now.

What we probably need is a more fundamental data structure here, not these clever objects. Twenty minutes now. (I’ve been wandering around the house, updating phones and such.)

Musing about what to do, I suspect that we’d do well to work with a list of strings rather than a list of instances of Word containing strings.

The loop ran for over an hour. Clearly we need to do better. Meanwhile I have made a few possibly useful changes.

In Word, I’ve optimized the scoring and elimination logic a bit:

class Word:
    def __init__(self, word):
        assert isinstance(word, str)
        self.word = word
        self._optimize_score = [0, 0, 0, 0, 0]
        # saved so we never reallocate the list

    def score(self, solution):
        # word:  abcde
        # sol:   ecbdx
        # score: 01121
        score = self._optimize_score
        sol = list(solution.word)  # cannot cache this, we destroy it
        for i, c in enumerate(self.word):
            if c in sol:
                if sol[i] == c:
                    score[i] = 2
                    sol[i] = 0
                else:
                    score[i] = 1
                    sol[sol.index(c)] = 0
            else:
                score[i] = 0
        return score

Here, I’ve created a score list of five elements in __init__ and reuse it in score. I rearranged the logic a bit so that it always stores an explicit zero, one, or two in each member of score. Since lists are implemented as arrays and grow and shrink when we append, this code will avoid allocating new lists and the appends we were previously doing.

This is a bit naff, because the _optimize_score list will hold an old score until we reuse it, and if we were to mess this up, we’d get bad scores. I think it’ll be worth something. I freely grant that I have no tests to prove it. I’ve made a note to write one.

Note also that the code that creates the sol variable has a comment on it about caching. I thought I could cache that list, but that’s clearly wrong since we are removing things from it by zeroing them out. I am sorely tempted to resort to some bit-twiddling here, but not yet.

That variable needs a better name. Hill calls his pool or something like that. I’ll try avaailable_letters for a while.

    def score(self, solution):
        # word:     abcde
        # solution: ecbdx
        # score:    01121
        score = self._optimize_score
        available_letters = list(solution.word)  # cannot cache this, we destroy it
        for i, c in enumerate(self.word):
            if c in available_letters:
                if available_letters[i] == c:
                    score[i] = 2
                    available_letters[i] = 0
                else:
                    score[i] = 1
                    available_letters[available_letters.index(c)] = 0
            else:
                score[i] = 0
        return score

OK, what else did I do? I changed how contains_none works:

    def contains_none(self, string):
        word = self.word
        return all((c not in word for c in string))

This method is given a string and returns True if none of the characters of the string are in the word. It caches self.word because we found out last night that a tight loop calling self is sped up notably by caching the value.

Is it sufficiently clear that “word contains no c” is the same as “all c are not in word”?

We could say, um …

return not any(c in word for c in string)

I think I like that better. Done and committed.

Let’s close this out, it’s terribly long already.

Summary

I’m feeling that I’m beginning to get my arms around this data. And I clearly have a big performance problem if I’m going to do anything like analyze all the words against all the solutions. Oh, we should glance at that in prep for thinking:

    def eliminate(self, string):
        new_words = [word for word in self.words if word.contains_none(string)]
        return WordCollection(new_words)

Is that good or bad? Kind of depends on how the comprehension works. If it is going to allocate and reallocate a list a lot, it’s terrible. If it doesn’t, not so bad.

I think we’ll have to table the question of the best way to do that operation. It’s tempting to set a flag in each Word saying whether it is alive or dead. I bet that would be faster than creating a new list. But could we make use of if effectively?

Enough speculation. I’m clearly fried.

We have pretty solid tests and the ability to cull the lists. We may want to find a better way to do it. In fact I’m sure we will.

Oh! One idea comes to me. In my massive test, I don’t really need to create the list of still-alive objects … I just need to count them. See? A rest is already helping me.

See you next time!

P.S. I did the arithmetic. The massive test is trying to do about 300 billion iterations of its inner loop. US billion, not European. Still, tho …