Sudoku: !@#$%#!!

Yesterday’s work useless. Last three days generally ragged. Today, I’ll try to be more focused and go for speed. See Kerth’s Prime Directive.

Norm Kerth, the father of retrospectives in what we used to call “Agile Software Development”, defined the “Prime Directive”:

Regardless of what we discover, we understand and truly believe that everyone did the best job they could, given what they knew at the time, their skills and abilities, the resources available, and the situation at hand.

Back when that was happening, I took the directive as a stance. I didn’t “truly believe” that people always do the best they can. I do believe it now, because I have come to understand that “the best we can” isn’t measured by what we might possibly do on the best day of our lives in the best possible situation. I have come to see that, on a given day, we give what we’ve got. Some days, we just don’t have a lot to give, between the cat being sick, the ranting of the politicians, the cold we feel coming on, and that music we almost hear even though Teddy has his earphones on. We have to get through the day … so we do the best we can.

That doesn’t mean that we should feel great about that day. It just means that we don’t blame ourselves, or another, for not operating at their peak 24/7.

So I’m not casting blame upon myself here, but the past few days haven’t been my best. I don’t know why, I’m not going to explore it. I observe it, and today I think I can do better.

The low point was the neat little ListEditor object that I wrote yesterday. (Note the use of “I”: I usually say “we”, in an attempt to draw you in to thinking along with me, except when mistakes are made. Then, I generally try to say “I”, because pace Kerth, you clearly didn’t make the mistake.) But I digress.

The ListEditor is a neat little object that uses the “piece table” notion. It sits on top of a list and when you add things to the ListEditor, it remembers them, and when you ask it for the value at an index, it returns the changed value, if you’ve changed it, and otherwise the original. It’s a neat little object, implementing only dunder methods, enough to act somewhat like a list.

The thing is, our use of lists is not to edit them! As the game evolves, its individual candidates lists don’t get edited. They get reduced. They start at 1 through 9, and only ever have things removed. ListEditor doesn’t even support removal.

ListEditor is essentially useless for our purposes. I enjoyed writing it, but useless. It’s gotta go.

This is not starting out well.

I tell a lie. It’s not really quite useless after all. In use, the ListEditor holds the inner lists of the Candidates object. It is true that we only delete from those … but all the ListEditor does is accept those new slightly smaller collections. ListEditor handles that well enough, and saves us a very tedious process of copying the whole candidates list.

And: It will turn out that using the ListEditor speeds up Candidates.copy and is part of how we’ll quickly resolve the speed issues below.

Well, except that it was supposed to speed things up and it appears that it didn’t. It may be useful. It may also be spreading out a chunk of execution that was localized, without actually reducing the time involved.

Enough. This is sounding like blame or self-castigation or something. What’s the plan for today?

Speed

Today, we’re going to see what we can do about speeding up the Solver. Why? Well, mostly because I’m sure that I’ve seen it solving the 9x9 in about 200 milliseconds and it is now requiring over 20 seconds. That’s like 100 times longer than it used to require.

So we are going to do some speed improvement. We’ll take two approaches. First, since we do have some history written down, we’ll review the articles and see what we can find. Second, we’ll use our profiling information to find things to improve. Third, since we have Git, we’ll check out some old versions and see what we can find out.

Not a lie, just not correct: We won’t use the profiling information, because the search into the past gives me an idea that, witha little work, essentially resolves the performance problem.

As always happens, I call “two” and get three or more. It’s a puzzlement.

Let’s do some checking out. On 30 June, the article says that the tests run in 200-some milliseconds. Let’s verify that.

The Solver really does run in 200 ms on the 9x9.

Let’s do a bit of probing. Maybe we can find what we did. It only takes moments to check out a version and run the tests.

date	time (ms)
30 Jun	233
9 Jul	344
11 Jul	430
14 Jul	468
15 Jul 0948	600
15 Jul 1100	27000

Well, well, well. I think we have a winner. Let’s be looking at the diff between these two versions.

In the fast version, we had not yet finished the Possibilities object, now called Candidates, I think, which analyzed the puzzle to see which candidates to consider. The fast version has:

    def fill_in_solved(self, a_game_string):
        pass

And the next version, the slow one, has:

    def fill_in_solved(self, a_game_string):
        for cell, value in enumerate(a_game_string):
            if value != '0':
                self.possibilities[cell] = [value]
                row = Component.row(self, cell)
                col = Component.column(self, cell)
                sub = Component.sub_grid(self, cell)
                positions = set(row.positions) | set(col.positions) | set(sub.positions)
                positions.discard(cell)
                for pos in positions:
                    self.possibilities[pos] = [v for v in self.possibilities[pos] if v != value]
                self.possibilities[cell] = [value]

The check-in message on the slow one, by the way, is “ignore slow tests temporarily”. I think I’ll review the 15 July article to see what I thought I was doing.

A little comparing and a bit of judicious commenting in the code tells me that the issue is in the creation of the “possibilities” in the creation of Puzzle instances.

Let’s check out today’s version and work there a bit. The corresponding code is this:

    def __init__(self, a_game_string):
        Puzzle.puzzle_count += 1
        if len(a_game_string) == 9:
            self.available_values = '123'
            self.line_size = 3
        elif len(a_game_string) == 81:
            self.available_values = '123456789'
            self.line_size = 9
        else:
            raise ValueError('problem must have length 9 or 81')
        self.candidates = Candidates(a_game_string)
        self.game = a_game_string

What happens if we comment out that candidates line? Well, unfortunately, lots of things break, because of this:

class Puzzle:
    def possible_answers(self, position):
        if self.game[position] != '0':
            return [self.game[position]]
        return self.candidates.at(position)

Let’s put that back the way it used to be in the olden days.

    def possible_answers(self, position):
        if self.game[position] != '0':
            return [self.game[position]]
        row = Component.row(self, position)
        column = Component.column(self, position)
        sub_grid = Component.sub_grid(self, position)
        return [c for c in self.available_values if c not in row and c not in column and c not in sub_grid]

Still not quite. We have to comment out a bit:

class Puzzle:
    def copy(self):
        new_puzzle = Puzzle(self.game)
        # new_puzzle.candidates = self.candidates.copy()
        return new_puzzle

Now the tests run in about 1 second, Much better than 24, but we have some breakage. All the failures are directly checking Candidates, or in the case of a few Technique tests, the code references candidates.

Let’s roll back.

Think (for a change)

OK, we know that one major issue in our current performance is the creation of the Candidates list, formerly known as possibilities. I daresay it’s the major issue.

We know that the Solver can run very rapidly without looking at Candidates, except for the candidates at the location where it wants to guess. We know that when we calculate those candidates and only those, we go much faster. This should not be a surprise, since calculating them all will be 81 times slower than calculating just the one.

The Bug

As is so often true, the bug was the human. The human (YT) set a test (or two) to be skipped because it was too slow. XP 101 says that when our tests are too slow we have to make them not too slow. Instead, YT buried their head in the sand and pressed on. I don’t even see any real thinking about making it faster later. Just marked and moved on. Thinking about something else, no thought for tomorrow.

Well, we understand and truly believe that YT was doing their best.

The Question

The question, of course, is what we should do now. Possibilities include:

Find a way to make creating 81 possibilities as fast as creating one is now. This seems unlikely.
Maybe the candidates collection could be made lazy, calculating possibilites on the fly. Since the Solver doesn’t use them, it would fly. Not a terrible idea, but not very thoughtful either.
Rip out the Candidates and all the tests and code that rely on them, and start from there.

Of these, #1 seems very unlikely, and #3, while rewarding in a way, seems extreme. Worth exploring what the minimum loss might be.

#2 kind of finesses the problem. If it can be made to work—and I’m sure it can—it would keep all the tests running and speed up the brute. I think the main objection to it is that if Candidates is a bigger issue than I currently realize, it defers dealing with it.

Denial isn’t just a river in Egypt. Let’s try #2.

Note: This turns out to be a good idea. With a few related changes, we run the Solver without ever computing any Candidates, since it doesn’t use them.

Step one will be to store the Candidates in _candidates and have an accessor:

    def __init__(self, a_game_string):
        Puzzle.puzzle_count += 1
        if len(a_game_string) == 9:
            self.available_values = '123'
            self.line_size = 3
        elif len(a_game_string) == 81:
            self.available_values = '123456789'
            self.line_size = 9
        else:
            raise ValueError('problem must have length 9 or 81')
        self.candidates = Candidates(a_game_string)
        self.game = a_game_string

    @property
    def candidates(self):
        return self._candidates

    @candidates.setter
    def candidates(self, value):
        self._candidates = value

Green. Let’s make it lazy:

    def __init__(self, a_game_string):
        Puzzle.puzzle_count += 1
        if len(a_game_string) == 9:
            self.available_values = '123'
            self.line_size = 3
        elif len(a_game_string) == 81:
            self.available_values = '123456789'
            self.line_size = 9
        else:
            raise ValueError('problem must have length 9 or 81')
        self._candidates = None
        self.game = a_game_string

    @property
    def candidates(self):
        if not self._candidates:
            self._candidates = Candidates(self.game)
        return self._candidates

    @candidates.setter
    def candidates(self, value):
        self._candidates = value

One test fails, the one that times copies:

    def test_puzzle_copies(self):
        puzzle_list = [
            '020000000',
            '000600003',
            '074080000',
            '000003002',
            '080040010',
            '600500000',
            '000010780',
            '500009000',
            '000000040'
        ]
        puzzle = Puzzle.from_list(puzzle_list)
        count = 1000
        start = timeit.default_timer()
        for i in range(count):
            pass
        stop = timeit.default_timer()
        overhead = stop - start
        start = timeit.default_timer()
        for i in range(count):
            new_puzzle = puzzle.copy()
        stop = timeit.default_timer()
        total = stop - start - overhead
        microseconds = int(total * 1000000 // count)
        assert microseconds == pytest.approx(145, abs=10)

This is good, because:

Expected :145 ± 1.0e+01
Actual   :1

One microsecond to copy now. What does copy do? I may be about to be confused. Ah. Because we are copying the same puzzle, this code:

    def copy(self):
        new_puzzle = Puzzle(self.game)
        new_puzzle.candidates = self.candidates.copy()
        return new_puzzle

That code only generates the candidates once, because the lazy init caches them and we are copying the same puzzle over and over. We, however, are paying the price for Candidates.copy:

class Candidates:
    def __init__(self, a_game_string, candidates=None):
        self.game = a_game_string
        if not candidates:
            self.line_size = 0
            self.candidates = self.create_raw_candidates(a_game_string)
            self.fill_in_solved(a_game_string)
        else:
            self.line_size = candidates.line_size
            self.candidates = ListEditor(candidates.candidates)

    def copy(self):
        return Candidates(self.game, self)

Our ListEditor, useless though it may be, is a quick copy, so this is also fast.

Yay!: I was wrong when I said that ListEditor was useless! Without it, the copy would be much slower. We don’t get any good out of it, but at least we don’t waste time copying stuff.

All our tests that use Candidates in anger run. Let’s turn on the slow tests and see what we get.

Tests require 15 seconds with all three long tests turned on. And a couple of things break.

The test with techniques enabled is doing fewer hits on only_one than expected:

    # @pytest.mark.skip("tedious and boring")
    def test_first_9x9_with_technique(self):
        assert self.puzzle_list == self.puzzle_list_2  # ensuring no typos
        puzzle = Puzzle.from_list(self.puzzle_list)
        Solver.solver_count = 0
        SudokuTechniques.only_one = 0
        solver = Solver(puzzle, apply_techniques=True)
        solved_puzzle = solver.solve()
        assert solved_puzzle.is_correctly_solved
        assert Solver.solver_count < 4000
        assert SudokuTechniques.only_one > 70000

We get about 10,000 instead of 70,000. I think that’ll be OK, though it should be thought about.

Turns out: It turns out, after wasting more time than I might have, this change was intentional and I just never updated the test because it was skipped. Something something never skip tests!

This one is more interesting:

    # @pytest.mark.skip("tedious and boring")
    def test_is_ambiguous(self):
        puzzle = Puzzle.from_list(self.puzzle_list)
        puzzle_1 = Puzzle.evolve_with_change(puzzle, '1', 71)
        solved_1 = Solver(puzzle_1).solve()
        pretty_print(solved_1.game)
        assert solved_1 is not None and solved_1.is_correctly_solved
        puzzle_6 = Puzzle.evolve_with_change(puzzle, '6', 71)
        solved_6 = Solver(puzzle_6).solve()
        assert solved_6 is not None and solved_6.is_correctly_solved

The theory here is that the test mentioned there has two solutions. But the test is now saying that it isn’t ambiguous. On the other hand, we haven’t run this test for a long time.

I think I’ll commit and try that test as of a bit in the past. I could stash, I suppose. I’ve never done that, let’s try it.

I’m not sure what happened there. I’m going to get back to where I was, commit, then check out an older version again.

I think I’m back where I was. Remind me not to try some new tool when I’m in the middle of juggling three knives and a kitten.

Three tests failing as above. Committing anyway: lazy candidates, three tests failing, be patient.

Now to check out something a ways back.

On July 20 the test ran for 2 minutes and failed.

When was it ever passing? Dammit, it NEVER was. Somehow I had concluded that the test was ambiguous but it isn’t. And, again, didn’t change the test because by then I had skipped it? I guess so. Something something NEVER skip tests.

Check out main and remove that test altogether. Commit: still two failing tests, remove ambiguous one because flawed.

I know a lot of folks would be working on a branch here. Like most of my close colleagues I eschew branches. (Bless me.) Of course I wouldn’t be pushing, but I prefer always to be working at HEAD.

Now there’s this test:

    # @pytest.mark.skip("tedious and boring")
    def test_first_9x9_with_technique(self):
        assert self.puzzle_list == self.puzzle_list_2  # ensuring no typos
        puzzle = Puzzle.from_list(self.puzzle_list)
        Solver.solver_count = 0
        SudokuTechniques.only_one = 0
        solver = Solver(puzzle, apply_techniques=True)
        solved_puzzle = solver.solve()
        assert solved_puzzle.is_correctly_solved
        assert Solver.solver_count < 4000
        assert SudokuTechniques.only_one > 70000

That’s coming back with

Expected :70000
Actual   :10669

Why? Let’s see how that gets set.

class SudokuTechniques:
    def apply(self):
        self.only_one_possible_value()
        self.only_one_possible_position()
        return self.puzzle

    def only_one_possible_value(self):
        changed = True
        while changed:
            changed = False
            for cell in self.puzzle.unknown_cells():
                available = self.puzzle.possible_answers(cell)
                # print(f'{cell} {available}')
                if len(available) == 1:
                    SudokuTechniques.only_one += 1
                    # print(f'forced {cell=} = {available[0]}')
                    self.puzzle = Puzzle.evolve_with_change(self.puzzle, available[0], cell)
                    changed = True

I don’t trust anyone any more. I’m going to go back into the past and run that one too.

Sufficient research points out that I changed how the count was done but did not revise the test.Duh.

Now I’m down to this test:

    def test_puzzle_copies(self):
        puzzle_list = [
            '020000000',
            '000600003',
            '074080000',
            '000003002',
            '080040010',
            '600500000',
            '000010780',
            '500009000',
            '000000040'
        ]
        puzzle = Puzzle.from_list(puzzle_list)
        count = 1000
        start = timeit.default_timer()
        for i in range(count):
            pass
        stop = timeit.default_timer()
        overhead = stop - start
        start = timeit.default_timer()
        for i in range(count):
            new_puzzle = puzzle.copy()
        stop = timeit.default_timer()
        total = stop - start - overhead
        microseconds = int(total * 1000000 // count)
        assert microseconds == pytest.approx(145, abs=10)

I’m going to rename this test and leave it running with a new answer, on the grounds that we’ve changed how copies work. Nothing else is broken or deferred.

    def test_time_for_puzzle_copies(self):
        puzzle_list = [
            '020000000',
            '000600003',
            '074080000',
            '000003002',
            '080040010',
            '600500000',
            '000010780',
            '500009000',
            '000000040'
        ]
        puzzle = Puzzle.from_list(puzzle_list)
        count = 1000
        start = timeit.default_timer()
        for i in range(count):
            pass
        stop = timeit.default_timer()
        overhead = stop - start
        start = timeit.default_timer()
        for i in range(count):
            new_puzzle = puzzle.copy()
        stop = timeit.default_timer()
        total = stop - start - overhead
        microseconds = int(total * 1000000 // count)
        assert microseconds == pytest.approx(1, abs=10)

We now have no skipped tests. That one may fail at some future time, or may not. If it does, it’ll be interesting.

We are green. Tests take 2 seconds including two 9x9 solutions, one with techniques on and one with them off. I am curious about the two times.

382 ms for with techniques, 2148 without. Remind me what the difference is?

Right. We run this before trying our guesses:

    def solve(self) -> Puzzle | None:
        if self.puzzle.is_filled_in:
            return self.puzzle
        for new_puzzle in self.puzzle.find_next_puzzles():
            new_puzzle = self.try_techniques(new_puzzle)
            solved_puzzle = Solver(new_puzzle, self.apply_techniques).solve()
            if solved_puzzle:
                return solved_puzzle
        return None

And this technique just finds all the cells that have just one possible answer and fills them in:

~~~python
    def only_one_possible_value(self):
        changed = True
        while changed:
            changed = False
            for cell in self.puzzle.unknown_cells():
                available = self.puzzle.possible_answers(cell)
                if len(available) == 1:
                    SudokuTechniques.only_one += 1
                    self.puzzle = Puzzle.evolve_with_change(self.puzzle, available[0], cell)
                    changed = True

This was slow when we had a slow copy, but we have a fast copy now and it is fast.

Two issues:

Techniques are supposed to deal with the candidate values. Their purpose is to reduce the number of candidates in various cells until we’re down to at least one cell with just one, at which point we fill it in, rinse repeat. So this is not a technique, since it fills in values.
We aren’t constrained to have the dumbest possible Solver. We already consider only possible values, not all values, which reduces the problem space from two zillion down to something manageable. So we can have a better Solver, by folding this technique right into it.
(This always happens.) We know running the techniques before solving is better, at least now with only one very useful “technique”, so let’s remove the flag and just always run them.

class Solver:
    solver_count = 0

    def __init__(self, puzzle):
        Solver.solver_count += 1
        self.puzzle = puzzle

    def solve(self) -> Puzzle | None:
        if self.puzzle.is_filled_in:
            return self.puzzle
        for new_puzzle in self.puzzle.find_next_puzzles():
            new_puzzle = self.try_techniques(new_puzzle)
            solved_puzzle = Solver(new_puzzle).solve()
            if solved_puzzle:
                return solved_puzzle
        return None
    
    def try_techniques(self, new_puzzle):
        techniques = SudokuTechniques(new_puzzle)
        return techniques.apply()

    def test_first_9x9_with_technique(self):
        assert self.puzzle_list == self.puzzle_list_2  # ensuring no typos
        puzzle = Puzzle.from_list(self.puzzle_list)
        Solver.solver_count = 0
        SudokuTechniques.only_one = 0
        solver = Solver(puzzle)
        solved_puzzle = solver.solve()
        assert solved_puzzle.is_correctly_solved
        print(f'\n{Solver.solver_count=}')
        print(f'{Puzzle.puzzle_count=}')
        print(f'{Puzzle.guess_count=}')
        assert Solver.solver_count < 800
        assert SudokuTechniques.only_one < 15000

I removed the other 9x9 test, because we always apply techniques now, so the two are equivalent.

We are green. Tests require 390 milliseconds. Used to be 24 seconds.

Final commit for the day: Solver always applies techniques. Related tests consolidated. Tidying. Rename and adjust timing test.

Summary

Two really excellent results here:

We have much better information about what’s going on, because we’re running a 9x9 Solver execution, which will flag any further slowdowns.
We are reminded that it’s rarely a good idea to skip a test because it is slow. Better to suffer until you make it faster.
(As usual.) The brute solver, with two changes, is far faster than it was 3 hours ago.

What might have happened with that test getting so slow, had I not just told it to shut up? Hard to say for sure, but I think it’s pretty clear that at worst we’d have done what we did today sooner, and I think we’d have avoided some work as well. It’s not clear, though, because as long as we do copy the candidates forward, we’d have paid some price until we did something, perhaps something like the ListEditor.

I’m not allowed to go back on my own timeline, for obvious reasons, so there’s not much point in speculating woulda shoulda coulda, but I don’t think life would have been worse and probably would have been better.

But we are here now, and it’s a good place to be. What about tomorrow?

Tomorrow and beyond, I think we would do well to focus on Techniques, aimed at having two separate ways of solving Sudoku, the brute way, which is really pretty fast now, and the more human way consisting of building and applying known techniques that people use. We might come up with a suite of puzzles and check out how many can be solved when we apply yea many techniques, and so on.

There is also an interesting excursion into whether it’s “yay”, “yea” or “yeigh”. I’ll leave that to you, unless I turn up something really interesting.

Bottom line, I think I’ll try much harder never to set a test to skip because it runs too long, and I think I’d do well to focus more on separating the brute from the human approach, and to focus more on the human one. It might be fun to optimize the brute further. I do have some ideas on that as well.

You never know what will catch my attention. Nor do I. But I’ll at least try harder to keep my eye on the ball. Whatever that is.

See you next time!