Sudoku: a nError?
I thought that I had made a misteak, but now I think I was wrong. I refactor to very small methods. Too far? Not for me, what about you?
Just about bedtime last night, somehow it popped into my mind that with the current “naked pairs” code, if the same pair appears more than twice in the component, something bad might happen. So, after turning in, I ran through a few scenarios in my mind.
The concern is this: the current code has a two-phase search to find naked pairs:
def find_pairs_of_indices_of_naked_pairs(self):
result = []
for first_position in range(9):
candidates = self.component.candidates_at(first_position)
if len(candidates) == 2:
for second_position in range(first_position+1, 9):
second_candidates = self.component.candidates_at(second_position)
if second_candidates == candidates:
result.append([first_position, second_position])
return result
So if a naked pair, say [‘8’, ‘9’] occurs in the component at positions 1 and 3, this code will add [1, 3]to the results, and so on. My concern was what it might do if a third occurrence of [‘8’, ‘9’] was in the component. Mind you, if that were to happen, there is something wrong with the puzzle … or we have made a mistake in our Candidates. The former is quite unlikely. As we are experienced programmers here in this room, we know that the latter is entirely possible. And experienced Sudoku solvers are probably also aware that they can make mistakes in their notes about what’s possible, as well.
What will happen if there are three or more occurrences of the same naked pair in a component? Well, if the third one was at, oh, 5, I think the find will return a result that includes [1, 3] and [3, 5]. In due time, we’ll do the second phase of the technique, which will be to remove any ‘8’ and ‘9’ values. First we’ll remove them from every cell but 1 and 3, removing the pair at 5, and reducing it to empty. And then, we’ll do everything but 3 and 5, which will remove the pair at 1, leaving us with an empty there. If we were wise, we’d test that.
Let’s do.
def test_find_result_with_three(self):
empty = [
'000000000', '000000000', '000000000', '000000000',
'000000000', '000000000', '000000000', '000000000',
'000000000',
]
puzzle = Puzzle.from_list(empty)
row = Component.row(puzzle, 0)
row.set_candidates_at(1, ['7', '8'])
row.set_candidates_at(3, ['7', '8'])
row.set_candidates_at(5, ['7', '8'])
technique = NakedPairsTechnique(row)
pairs = technique.find_pairs_of_indices_of_naked_pairs()
assert pairs[0] == [1, 3]
assert pairs[1] == [3, 5]
The test fails, surprising me for at least a moment:
Expected :[3, 5]
Actual :[1, 5]
We found both the trailing pairs in the first find. Reviewing the code, we see why: upon finding a second pair, the inner for loop doesn’t break, it continues, so it will sweep up all the copies. It reports 1, 3 first and then 1, 5.
- But you know what?
- There’s probably a third result in that result. We should be checking that … and we will in a moment.
That’s interesting. If the inner for loop does find more than one copy, the puzzle candidates are already malformed. We could use that fact to do something. One thing that I generally don’t want is an exception, but it’s worth thinking about.
One thing that we could do would be to modify the find to return the positions of all copies of the pair, but if we did that, we’d still find 1, 3 and 5 first, and then 3, 5 next.
Let’s improve this test to check the results list length. I’ll change the others to do so as well.
This passes:
def test_find_result_with_three(self):
empty = [
'000000000', '000000000', '000000000', '000000000',
'000000000', '000000000', '000000000', '000000000',
'000000000',
]
puzzle = Puzzle.from_list(empty)
row = Component.row(puzzle, 0)
row.set_candidates_at(1, ['7', '8'])
row.set_candidates_at(3, ['7', '8'])
row.set_candidates_at(5, ['7', '8'])
technique = NakedPairsTechnique(row)
pairs = technique.find_pairs_of_indices_of_naked_pairs()
assert len(pairs) == 3
assert pairs[0] == [1, 3]
assert pairs[1] == [1, 5]
assert pairs[2] == [3, 5]
Hmm, I thought there were more tests for find, but there are just two. They suffice, but why is the first one named test_find_etc if there wasn’t one named test_find or test_find_something? I’ll commit this new test and then browse some history. Commit: test more than two copies of pairs.
Well, I don’t find any other tests. So be it. We do have the ones that test the full application of the technique. Let’s document (and verify for ourselves) what will happen with the malformed case.
Looking at the technique, we observe at least one issue:
class NakedPairsTechnique:
def __init__(self, component):
self.component = component
def fake_apply(self):
result = ['3', '4', '5', '6', '7', '8', '9']
for cell in [1, 2, 3, 5, 6, 7, 8]:
self.component.set_candidates_at(cell, result)
def apply(self):
pair_indices = self.find_pairs_of_indices_of_naked_pairs()
if not pair_indices:
return
for pair in pair_indices:
cells = set(range(9)) - set(pair)
to_remove = self.component.candidates_at(pair[0])
for cell in cells:
self.component.remove_candidates_at(cell, to_remove)
def find_pairs_of_indices_of_naked_pairs(self):
result = []
for first_position in range(9):
candidates = self.component.candidates_at(first_position)
if len(candidates) == 2:
for second_position in range(first_position+1, 9):
second_candidates = self.component.candidates_at(second_position)
if second_candidates == candidates:
result.append([first_position, second_position])
return result
Well, at least two issues1. One is that the class is still in the test file. I often start new classes in the test file and then move them when they mature. This one is nearly ready to move, I think. A second issue is that apply method. What is wrong with that, you ask?
Long ago, in his “best practice patterns” books, Kent Beck wrote about the notion of Composed Method, which leads us to methods that just do one identifiable thing, and whose code is all at the same level of abstraction. That’s not the case with apply:
It begins with a simple call to a method that gets pairs. Then, written out longhand, it processes those pairs. Composed Method would have me extract that latter part to a separate method. I’ll do it and then we can discuss how it improves things … at least for some of us.
PyCharm isn’t quite smart enough to extract the whole batch with the return in the middle. I think I’m smart enough but I would prefer to use machine refactorings where possible. Will PyCharm invert that if for me? Yes. “Invert ‘if’ condition” and we have this:
def apply(self):
pair_indices = self.find_pairs_of_indices_of_naked_pairs()
if pair_indices:
for pair in pair_indices:
cells = set(range(9)) - set(pair)
to_remove = self.component.candidates_at(pair[0])
for cell in cells:
self.component.remove_candidates_at(cell, to_remove)
Now is PyCharm smart enough to realize that there is no point testing? Not as things stand: it’s not sure what comes back from the find. No problem, we are here to do an extract, like this:
def apply(self):
pair_indices = self.find_pairs_of_indices_of_naked_pairs()
self.remove_pair_from_other_candidates(pair_indices)
def remove_pair_from_other_candidates(self, pair_indices):
if pair_indices:
for pair in pair_indices:
cells = set(range(9)) - set(pair)
to_remove = self.component.candidates_at(pair[0])
for cell in cells:
self.component.remove_candidates_at(cell, to_remove)
The tests are of course green though this, since PyCharm rarely if ever messes up a refactoring. That’s why we use it. I’ll do one thing manually, though. No, wait, I want to try something.
def remove_pair_from_other_candidates(self, pair_indices: list):
if pair_indices:
for pair in pair_indices:
cells = set(range(9)) - set(pair)
to_remove = self.component.candidates_at(pair[0])
for cell in cells:
self.component.remove_candidates_at(cell, to_remove)
I had hoped that declaring pair_indices to be a list would hint to PyCharm that the if isn’t needed. No problem, we’ll do this one by hand:
def remove_pair_from_other_candidates(self, pair_indices: list):
for pair in pair_indices:
cells = set(range(9)) - set(pair)
to_remove = self.component.candidates_at(pair[0])
for cell in cells:
self.component.remove_candidates_at(cell, to_remove)
You know what? We could refactor further. The code above can be described as “for every pair, do some thing that is a bit tricky”.
Let’s bear down on refactoring this. I expect to like the result, and I expect some of you to agree. Assuming without facts in evidence that there are more than one of you. Anyway, I expect to like it.
First rename a lot. I’m not sure this helps now, but I think it’ll help when we read this at 3 AM.
def remove_pair_from_other_candidates(self, naked_pair_positions: list):
for pair_positions in naked_pair_positions:
other_candidate_positions = set(range(9)) - set(pair_positions)
pair_to_remove = self.component.candidates_at(pair_positions[0])
for other_position in other_candidate_positions:
self.component.remove_candidates_at(other_position, pair_to_remove)
Now let’s pull out the for loop body … oops, in setting out to do so I realize I want this method to reflect that it processes multiple pairs.
def remove_each_pair_from_other_candidates(self, naked_pair_positions: list):
for pair_positions in naked_pair_positions:
other_candidate_positions = set(range(9)) - set(pair_positions)
pair_to_remove = self.component.candidates_at(pair_positions[0])
for other_position in other_candidate_positions:
self.component.remove_candidates_at(other_position, pair_to_remove)
As soon as I come up with that name, the name of the for loop body becomes more clear:
def apply(self):
pair_indices = self.find_pairs_of_indices_of_naked_pairs()
self.remove_each_pair_from_other_candidates(pair_indices)
def remove_each_pair_from_other_candidates(self, naked_pair_positions: list):
for pair_positions in naked_pair_positions:
self.remove_this_pair_from_other_candidates(pair_positions)
def remove_this_pair_from_other_candidates(self, pair_positions):
other_candidate_positions = set(range(9)) - set(pair_positions)
pair_to_remove = self.component.candidates_at(pair_positions[0])
for other_position in other_candidate_positions:
self.component.remove_candidates_at(other_position, pair_to_remove)
In for a penny … extract that last for and more, until:
def apply(self):
pair_indices = self.find_pairs_of_indices_of_naked_pairs()
self.remove_each_pair_from_other_candidates(pair_indices)
def remove_each_pair_from_other_candidates(self, naked_pair_positions: list):
for pair_positions in naked_pair_positions:
self.remove_this_pair_from_other_candidates(pair_positions)
def remove_this_pair_from_other_candidates(self, pair_positions):
other_candidate_positions = set(range(9)) - set(pair_positions)
pair_to_remove = self.component.candidates_at(pair_positions[0])
self.remove_this_pair_from_these_positions(pair_to_remove, other_candidate_positions)
def remove_this_pair_from_these_positions(self, pair_to_remove, other_candidate_positions):
for other_position in other_candidate_positions:
self.remove_pair_from_component(pair_to_remove, other_position)
def remove_pair_from_component(self, pair_to_remove, other_position):
self.component.remove_candidates_at(other_position, pair_to_remove)
I think I’d like this better without quite so many “this” and “these”. How about this:
def apply(self):
pair_indices = self.find_pairs_of_indices_of_naked_pairs()
self.remove_each_pair_from_other_candidates(pair_indices)
def remove_each_pair_from_other_candidates(self, naked_pair_positions: list):
for pair_positions in naked_pair_positions:
self.remove_pair_from_other_candidates(pair_positions)
def remove_pair_from_other_candidates(self, pair_positions):
other_candidate_positions = set(range(9)) - set(pair_positions)
pair_to_remove = self.component.candidates_at(pair_positions[0])
self.remove_pair_from_these_positions(pair_to_remove, other_candidate_positions)
def remove_pair_from_these_positions(self, pair_to_remove, other_candidate_positions):
for other_position in other_candidate_positions:
self.remove_pair_from_component(pair_to_remove, other_position)
def remove_pair_from_component(self, pair_to_remove, other_position):
self.component.remove_candidates_at(other_position, pair_to_remove)
We have remained green through all this. Let’s commit: refactoring to tiny methods.
- Mercy
- At this point I mercifully delete a start at refactoring the other large method in this class. We’ll leave it for another time.
I wouldn’t mind a break. Let’s break and sum up.
Summary
After verifying that my fear of a mistake was ill-founded, but that there is some odd behavior if the candidates are in an impossible state, we observed that the code for the NakedPairsTechnique involves some rather large, convoluted methods. Yes, I do consider a method of ten or eleven lines involving a couple of nested for loops or (horrors!) a for-if-for-if, to be large and convoluted.
So by way of offering an opportunity for our reader, if there is one, to consider their own preferences, and steps they might want to take, we broke seven of those lines down into four methods, each a few lines long, doing approximately one thing each. From this:
def apply(self):
pair_indices = self.find_pairs_of_indices_of_naked_pairs()
if not pair_indices:
return
for pair in pair_indices:
cells = set(range(9)) - set(pair)
to_remove = self.component.candidates_at(pair[0])
for cell in cells:
self.component.remove_candidates_at(cell, to_remove)
To this:
def apply(self):
pair_indices = self.find_pairs_of_indices_of_naked_pairs()
self.remove_each_pair_from_other_candidates(pair_indices)
def remove_each_pair_from_other_candidates(self, naked_pair_positions: list):
for pair_positions in naked_pair_positions:
self.remove_pair_from_other_candidates(pair_positions)
def remove_pair_from_other_candidates(self, pair_positions):
other_candidate_positions = set(range(9)) - set(pair_positions)
pair_to_remove = self.component.candidates_at(pair_positions[0])
self.remove_pair_from_these_positions(pair_to_remove, other_candidate_positions)
def remove_pair_from_these_positions(self, pair_to_remove, other_candidate_positions):
for other_position in other_candidate_positions:
self.remove_pair_from_component(pair_to_remove, other_position)
def remove_pair_from_component(self, pair_to_remove, other_position):
self.component.remove_candidates_at(other_position, pair_to_remove)
That’s ten lines to fifteen, twenty if you count blank lines. How could I consider that to be better?
Well, let me tell you. Suppose we want to know what the technique does. We can read the apply method and find out:
find pairs of indices of naked pairs; remove each pair from other candidates.
Done. Suppose we wonder how we remove a pair? We read that method and find out:
for each of the pair positions, we remove the pair from the other candidates.
How?
find the other candidate positions; find the pair to remove; remove the pair from the positions.
How?
for each of the other positions, remove the pair.
How?
ask our component to remove the pair values from that position.
I think that’s easier to understand. Most Smalltalkers would agree with me. I think that most XP people would agree with me. And I know that there are lots of people who would not agree, who would rather read the one nested loop thing and figure it out.
And, I’ll bet, if we were sitting beside them while they did that, they would say, at some point:
I see what he’s doing here. He’s finding pairs of indices and then he goes down and removes the pair from each of the other candidates.
Here, we’re trying to help people avoid figuring that out, by coming closer to telling them.
For some of us, that’s good. For some, well, some don’t prefer this way. We writes our own code and they pays us our money.
You get to decide how you work. And I’m here to show you how I work, and what happens when I do what I do.
I’m surprised with where we went today, and pleased. I hope that you are pleased as well.
See you next time!
-
No, I’m not going to do the Spanish Inquisition bit. I’m not. Really. ↩