Sudoku: An Optimization

I wonder whether it would be that difficult to have constant Component instances rather than creating them dynamically? Kent Beck haunts me.

Kent Beck taught us that we should not undertake an optimization in the code without a performance measurement having shown that it is needed. I mention that because I have an interesting optimization in mind, and I wanted to put myself in the position of either doing the right thing, or doing the wrong thing right in front of you. Of course, I could just erase this paragraph.

The row, column, and sub-grid objects are represented by the class Component. At heart, the Component is a collection of (usually) 9 positions in the grid. (There is a way to get a component pointing to all the positions that affect a given position, row, column, or sub-grid.) Creation of a Component typically looks like this:

class Component:
    def __init__(self, puzzle, start, steps):
        self.puzzle = puzzle
        self.start = start
        self.positions = [start + step for step in steps]

    @classmethod
    def row(cls, puzzle, position):
        if puzzle.line_size == 9:
            steps = [0, 1, 2, 3, 4, 5, 6, 7, 8]
        else:
            steps = [0, 1, 2]
        start = position - position % puzzle.line_size
        return cls(puzzle, start, steps)

There are various methods on Component, accessing either the puzzle value at a position or the candiates at a position. They’re not what interests me at the moment. What interests me is this line from the init:

        self.positions = [start + step for step in steps]

That line creates a list of positions, typically nine long. I believe that it happens rather often during puzzle solution. Let’s measure to find out whether that’s true:

class Component:
    count = 0

    def __init__(self, puzzle, start, steps):
        self.puzzle = puzzle
        self.start = start
        self.positions = [start + step for step in steps]
        Component.count += 1

And I’ll check the count in our 9x9 test.

    def test_first_9x9_with_technique(self):
        assert self.puzzle_list == self.puzzle_list_2  # ensuring no typos
        puzzle = Puzzle.from_list(self.puzzle_list)
        Solver.solver_count = 0
        Component.count = 0
        SudokuTechniques.only_one = 0
        solver = Solver(puzzle)
        solved_puzzle = solver.solve()
        assert solved_puzzle.is_correctly_solved
        print(f'\n{Solver.solver_count=}')
        print(f'{Puzzle.puzzle_count=}')
        print(f'{Puzzle.guess_count=}')
        print(f'{Component.count=}')
        assert Component.count == 0
        assert Solver.solver_count < 800
        assert SudokuTechniques.only_one < 15000

That’ll fail with the answer.

Solver.solver_count=779
Puzzle.puzzle_count=11456
Puzzle.guess_count=798
Component.count=34425

test_solver.py:63 (TestSolver.test_first_9x9_with_technique)
34425 != 0

Expected :0
Actual   :34425

So we’re creating 34,000 Component instances for one puzzle. Maybe we shouldn’t care: the test runs in just under 400 milliseconds. If the total cost of the exercise was in creating those arrays, they’d be using 11 microseconds each.

I’ll run a profile. That was interesting before, probably will be interesting again.

Here is the profile. I’ll highlight the relevant lines.

Those lines seem to be saying, if we look over at “own time”, that the list comprehension in our init is using up a whopping 13 milliseconds of the total, which, under profiling, is about 800 milliseconds in solve.

OK, great, be that way. I had a really neat idea for caching those arrays so that they’d only be created 81 times instead of 34,000. But according to these figures, it’s not worth it.

The graph is interesting. Here’s the top of it:

graph

If we look at Own Time in those cells, we see that at: is using up 460ms of our approximately 900, most of it in the __getitem__. And we see that there are 175 ms of own time in the evolve.

If we are to make any difference, and unless we plan to solve a million Sudoku there may be no reason to do any further optimization. Let’s at least look at those two methods to see what’s going on. The question in my mind is: which at:? I don’t see how we relate these back to classes.

There is only one, in Component:

class Component:
    def at(self, cell):
        return self.candidates[cell]

We recall, or not, that candidates has two forms. It can be a list of lists, or it can be a ListEditor pointing to a list of lists. That’s why there is a loop around the __getitem__, because the ListEditor looks like this:

class ListEditor:
    def __init__(self, covered):
        self.cover = dict()
        self.covered = covered

    def __setitem__(self, key, value):
        self.cover[key] = value

    def __getitem__(self, key):
        try:
            return self.cover[key]
        except KeyError:
            return self.covered[key]

I don’t see anything clever to do there … well wait. The ListEditor assumes that the value sought might be in its dictionary, so it always looks there first. A performance measure has shown that doing something here could pay off. Let’s not create the dictionary until a set happens.

Like this:

class ListEditor:
    def __init__(self, covered):
        self.cover = None
        self.covered = covered

    def __setitem__(self, key, value):
        if not self.cover:
            self.cover = dict()
        self.cover[key] = value

    def __getitem__(self, key):
        if self.cover:
            try:
                return self.cover[key]
            except KeyError:
                return self.covered[key]
        else:
            return self.covered[key]

I expect that to reduce my time and to make a difference to the profile.

I am mistaken. It made no difference. If anything, we’re a bit slower. Roll back.

So that was interesting. I guess that we do set items in all cases. I am curious to know where. A breakpoint tells me that it’s coming from evolve_with_change. We were going to look at that anyway.

class Puzzle:
    @classmethod
    def evolve_with_change(cls, puzzle, change, position):
        old_string = puzzle.game
        new_string = old_string[:position] + change + old_string[position+1:]
        copy = puzzle.copy()
        copy.game = new_string
        positions_needing_update = Component.all_impacted_positions(copy, position)
        for position in positions_needing_update:
            values = [v for v in copy.candidates.at(position) if v != change]
            copy.candidates.set_at(position, values)
        return copy

Ah yes. We make our copy and then immediately adjust the candidates affected by the change we’re making. That seems to make sense, since our Solver does now look for forced cells:

class SudokuTechniques:
    def only_one_possible_value(self):
        changed = True
        while changed:
            changed = False
            for cell in self.puzzle.unknown_cells():
                available = self.puzzle.possible_answers(cell)
                if len(available) == 1:
                    SudokuTechniques.only_one += 1
                    self.puzzle = Puzzle.evolve_with_change(self.puzzle, available[0], cell)
                    changed = True

possible_answers returns the candidates at that position, so we need the updated values.

Looking at the evolve above, one small optimization would be to fetch candidates from the original puzzle, since at this point they are the same.

    @classmethod
    def evolve_with_change(cls, puzzle, change, position):
        old_string = puzzle.game
        new_string = old_string[:position] + change + old_string[position+1:]
        copy = puzzle.copy()
        copy.game = new_string
        positions_needing_update = Component.all_impacted_positions(copy, position)
        for position in positions_needing_update:
            values = [v for v in puzzle.candidates.at(position) if v != change]
            copy.candidates.set_at(position, values)
        return copy

That saves about 40 ms out of nearly 400. Ten percent isn’t bad. Commit: use puzzle rather than copy in evolve for small cost savings.

I expect that the real cost is in the list comprehension, where we spin through the original candidates weeding out the changed value, creating a new list. I run the profile one more time with this graph:

graph2

We see in evolve_with_change a drop from 741 ms to 662, fairly nice. The graph is a bit hard to follow, but the bulk of the time is still in __getitem__. I don’t have a magic solution for that one, but it might be worth some thinking … if we actually care to get a solution down to a fraction of its current time, around 350 milliseconds per solution. If we had to solve a million, mumble 350,000 seconds, 5800 minutes, about 100 hours. Even if we could get 10x, I’m not going to run my MacBook Air for 10 hours solving Sudoku.

Of course there are lots of cores … but no.

Let’s sum up.

Summary

I had a really neat plan for caching the Component definitions for each cell, resulting in creating far fewer of them. A performance measurement suggested that creating them was at a cost of about 11 microseconds each, so that didn’t seem like a good idea.

Performance measurement did point out a few hot spots. I thought that lazy-init on the ListEditor’s local dictionary would save some double __getitem__ calls, but that was not the case, as a quick measurment showed me. I actually made that change, tested it, then didn’t commit it as it wasn’t worth anything. I suppose I could have figured out from the code that it would be wasted, but it was just a few lines to try it.

The graph then led us to look at evolve_with_change, where we happened to notice that we could fetch old values from the old puzzle, instead of indirectly through ListEditor in the new puzzle. That change was a one-liner and saved about ten percent of the full 9x9 solution.

I’d rate that one as worth doing by a small margin.

Our final graph still shows the bulk of the action in __getitem__ in the candidates list. In the solution of the 9x9, there are over a million calls to that method, at a cost of about 400 milliseconds in profiling mode. If we really wanted to cut the time, we’d have to cut that number: there’s not likely much time to be saved in the actual get.

There’s one more measurement of potential interest. Here are the counts from the 9x9 solution:

Solver.solver_count=779
Puzzle.puzzle_count=11456
Puzzle.guess_count=798

The solver recurs almost 800 times … but we create 11,000 puzzles. Why is that? I think it’s because when we check for forced values, we create a new puzzle for every forced value:

    def only_one_possible_value(self):
        changed = True
        while changed:
            changed = False
            for cell in self.puzzle.unknown_cells():
                available = self.puzzle.possible_answers(cell)
                if len(available) == 1:
                    SudokuTechniques.only_one += 1
                    self.puzzle = Puzzle.evolve_with_change(self.puzzle, available[0], cell)
                    changed = True

If we were a bit more clever with that feature, we might be able to reduce the calls to evolve, which would eliminate a corresponding bunch of __getitem__ calls.

In some rare circumstances, we might chase these milli- and microseconds. Here, honestly, it’s great exercise but there’s no reason other than practice to do it.

But that’s why we’re here, isn’t it, to explore problems in a small example that we can almost understand, with an eye to what we might do in a more real, larger, situation. And it’s fun.

The big learning? Well, for me, it is that I still cannot guess well about what changes will have significant impact on overall performance, and that I would do well to adhere to Beck’s voice, saying, every time someone on the team spoke of optimizing something:

A performance measurement having shown … ?

Hated it then. Hate it now. But he was ever so right. What would happen if you tried more performance measuring before optimizing? What would you learn? And, even if your guesses are better than mine, it’s good to have the proof.

See you next time!