Python Wordle on GitHub

Trying to do 300 billion operations makes me think I should take a new angle. I need a better idea, or a computer that is 1000 times as fast as this one. A better idea may be possible …

So, right, 389,551,494,960 operations is probably going to take a while, especially when I was going to restrict a list in the inner loop. I think I need to approach this differently. Which is fine, I just started, and I don’t know anything, so learning is what needs to happen.

I have some random ideas, including trying out numpy arrays, and keeping the score in an integer instead of a list. Let’s just play around a bit.

First, let’s pack the score list into an integer. That may cost a bit of computing but will save memory, and memory seems important. Anyway, it’s a thing to get me going.

``````    def test_list_score_to_integer(self):
score = [2, 1, 0, 2, 3]
int_score = 0
for s in score:
int_score = 10*int_score + s
assert int_score == 21023

def test_integer_to_list_score(self):
int_score = 21023
score = []
for i in range(5):
score.insert(0, int_score%10)
int_score //= 10
assert score == [2, 1, 0, 2, 3]
``````

I want a better way. Here’s one:

``````    def test_integer_to_list_score(self):
int_score = 21023
assert str(int_score) == "21023"
score = [int(i) for i in str(int_score)]
assert score == [2, 1, 0, 2, 3]
``````

At least that’s a one-liner. Can we go the other way? We can:

``````    def test_list_score_to_int_reduce(self):
score = [2, 1, 0, 2, 3]
int_score = reduce(lambda product, factor: 10*product+factor, score)
assert int_score == 21023
``````

OK. Had to pull `functools.reduce` out of our bag of tricks, but getting a one-liner will be worth it.

I wonder how badly switching over to this kind of score would hurt us. Let’s commit these tests and give it a try. Commit: test converting score lists to integer.

Now let’s just change the score in Word to return an integer and see who explodes.

I just went ahead and recoded `Word.score`:

``````    def score(self, solution):
# word:     abcde
# solution: ecbdx
# score:    01121
score = 0
available_letters = list(solution.word)  # cannot cache this, we destroy it
for i, c in enumerate(self.word):
if c in available_letters:
if available_letters[i] == c:
score = 10*score + 2
available_letters[i] = 0
else:
score = 10*score + 1
available_letters[available_letters.index(c)] = 0
else:
score = 10*score
return score
``````

Seven tests break. They all just require me to change [2, 0, 2, 0, 1] to 20201 and the like. Fixed. Commit: converted score to an integer.

Yikes, I am sure they all ran, but now two are breaking. They’re the ones that use the score, of course. The problem comes down to this:

``````    def to_eliminate(self, score):
keep = ""
for i, s in enumerate(score):
if s:
keep += self.word[i]
result = ""
for c in self.word:
if c not in keep:
result += c
return result
``````

I need the decimal digits of score here. And I am sure that my conversion test didn’t handle leading zero.

I’ll do this ugly and then try for nice.

``````    def to_eliminate(self, score):
keep = ""
string = f"00000{score:d}"[-5:]
for i, s in enumerate(string):
if s != "0":
keep += self.word[i]
result = ""
for c in self.word:
if c not in keep:
result += c
return result
``````

This and a bit of value changing in tests to match the integer representation of score and we’re green. Commit: fix issue with eliminate in previous commit. Sorry.

That’s not the most clear code I’ve ever written. A bit of research tells me that f strings can do leading zeros. Let’s write a test:

``````    def test_leading_zero(self):
n = 0
assert f"{n:05}" == "00000"
n = 123
assert f"{n:05}" == "00123"
n = 20123
assert f"{n:05}" == "20123"
``````

I should mention that I often like to record odd little bits of language discovery in my tests. It helps me remember the idea better and sometimes I even look back and check them again. YMMV.

With that in hand, we can do this:

``````    def to_eliminate(self, score):
keep = ""
for i, s in enumerate(f"{score:05}"):
if s != "0":
keep += self.word[i]
result = ""
for c in self.word:
if c not in keep:
result += c
return result
``````

My tests continue to pass. I trust the elimination tests, since they just broke when I had this wrong.

Can we do better? I think the last bit can be a comprehension. Maybe.

``````    def to_eliminate(self, score):
keep = ""
for i, s in enumerate(f"{score:05}"):
if s != "0":
keep += self.word[i]
return "".join([c for c in self.word if c not in keep])
``````

Sure enough. The comprehension is the list of letters to keep, the join appends them together, separated by an empty string.

What about the first bit? And, while we’re at it, maybe we should look at who uses this thing. Aren’t they going to iterate the expanded list?

The idea was—maybe still is—that these are some letters which cannot occur in the solution, because given this word’s score, they were present zero times in the solution. So we intend to use these letters to eliminate other potential words.

We actually do test eliminating the words:

``````    def test_elimination(self):
guess = Word("abcde")
soln = Word("dayyy")
score = guess.score(soln)
assert score == 10010
strings = "abcde fghij ageij fghid".split(" ")
words = [Word(s) for s in strings]
coll = WordCollection(words)
to_eliminate = guess.to_eliminate(score)
assert to_eliminate == "bce"
limited_words = coll.eliminate(to_eliminate)
assert len(limited_words) == 2

def test_big_elimination(self):
combined = WordCollection.from_file("valid_combined.txt")
solution = Word("aback")
guess = Word("abbas")
score = guess.score(solution)
assert score == 22010
to_eliminate = guess.to_eliminate(score)
assert to_eliminate == "s"
remaining = combined.eliminate(to_eliminate)
assert len(remaining) == 7036
check = Word("sssss")
for word in remaining.words:
assert word.score(check) == 0
``````

So right, there is a method `eliminate` that is using that result. Let’s see what that wants.

``````class WordCollection:
def eliminate(self, string):
new_words = [word for word in self.words if word.contains_none(string)]
return WordCollection(new_words)

def contains_none(self, string):
word = self.word
return not any((c in word for c in string))
``````

That would work just as well if the parameter `string` to `contains_none` were a list instead of a string, and it would save us converting. However, it is quite possible that the `in` operator is faster on a string than on a list. Let’s test that.

``````    def test_in_speed(self):
n = 10000000
r0 = time.time()
for i in range(n):
pass
r1 = time.time()
raw = r1 - r0
string_time_0 = time.time()
for i in range(n):
t = "z" in "abc"
string_time_1 = time.time()
string_time = string_time_1 - string_time_0 - raw
list_time_0 = time.time()
for i in range(n):
t = "z" in ["a", "b", "c"]
list_time_1 = time.time()
list_time = list_time_1 - list_time_0 - raw
assert string_time*3 < list_time
``````

Using `in` against a string is about 3x faster than using it against a list. That’s settled, then. We’ll go to the trouble of packing up the string. After some messing about, heree is a new version of `to_eliminate`:

``````    def to_eliminate(self, score: int):
keep = [c for c, s in zip(self.word, score_with_leading_zeros) if s != "0"]
return "".join([c for c in self.word if c not in keep])
``````

Yes, that’s not easy to decode. But it is compact and probably pretty efficient.

1. suppose guess is `abcde` with solution `byodu`, perhaps a friend of Grogu.
2. the score will be 01020, or as we say in English, 1020.
3. score_with_leading_zeros will be “01020” if score is 1020.
4. zip iterates the word characters and the score characters in parallel, so
5. keep will be a list of the characters with non-zero score, or [“b”, “d”]
6. the second comprehension selects characters in word not in keep, or
7. [“a”, “c”, “e”]
8. join will join them into a single string with no separator, or
9. “ace”

Let’s prove that with a test.

``````    def test_to_eliminate(self):
word = Word("abcde")
score = 1020
assert word.to_eliminate(score) == "ace"
``````

Passes. Commit: revising `to_elininate` for compactness and possibly speed. speed not very important.

QED, etc. Let’s talk about where we are and what we ought to be doing.

## Reflection

We can score a guess word against a solution word, returning an integer which, if expressed with leading zeros, has zero if the guess character does not occur in the solution, one if it occurs but not in the same position, and two if it occurs in the same position. The score does the right thing if the guess contains duplicate characters and there are no duplicates in the solution.

Or does it always handle duplicates correctly? It’s supposed to but maybe we should try some more combinations. I’ll make a test note about that.

We can eliminate all the words from a word collection that contain characters known not to be in the solution. Those are words that cannot possibly be a solution. (I could imagine using one as a probe anyway, for example if I couldn’t think of a word that tested what I wanted to test, but without a letter that I knew was not included.)

I just ran this test with a timeout of 300 seconds:

``````    @pytest.mark.timeout(300)
@pytest.mark.skip("slow")
def test_massive(self):
# processed 185 words in 300 seconds.
# about 185*2000*2000 = 740 000 000 iterations
count = 0
combined = WordCollection.from_file("valid_combined.txt")
solutions = WordCollection.from_file("valid_solutions.txt")
scores = {}
for word in combined.words:
count += 1
print(count)
scores[word.word] = 0
for soln in solutions.words:
score = word.score(soln)
elim = word.to_eliminate(score)
tally = 0
for candidate in solutions.words:
if candidate.contains_none(elim):
tally += 1
scores[word.word] += tally
assert not scores
``````

It processed 185 of my about 13000 combined words. To run to completion will require nearly six hours, if I’ve done the arithmetic properly. It could happen.

Clearly I need to think of something better than this.

That will do for today.

## Summary

I have some possibly useful utility things, but I seriously need to think about what I really want to do to analyze these word lists.

Could we compute the score for each word against each solution? 13000 times 2000 values? 26 million? Seems a lot. If we did, what good would it be. What uses of the solutions list are “fair”, and what uses are “unfair” as we do this work?

I truly need a better idea. See you next time!