Polishing and Cleanup
I think the new generator is working well. Let’s see if we can improve the way it’s written. A very nice refactoring sequence, if I do say so myself. Thanks to Bill for the nudge.
It’s interesting how a simple comment about the code can trigger a new look and serve as the “root cause” of a better idea. That has certainly happened after Bill Wake’s gentle remarks about it being a bit odd to stuff a random piece of info into each Cell. Don’t forget to watch for Bill’s Twitch sessions, especially if you’re into Swift.
My current feeling is that the new generator_with_function is just about good. Its tests indicate that it works, and since the original generate now calls it, we can be even more confident in it than if it were just running its direct tests. There are issues, not least that while I can tolerate the method generate, generate_with_function really doesn’t make me happy. But first, let’s review and improve the code. We take a look:
def generate_with_function(self, init=0, *, condition=None, randomness=0.0, function=None):
def default_all_cells(_cell):
return True
def default_zero(cell, value):
return 0
cond = condition or default_all_cells
func = function or default_zero
frontier = [(self, init)]
examined = {self}
while frontier:
next_cell, value = frontier.pop(0)
yield next_cell, value
examined.add(next_cell)
if random.random() < randomness:
neighbors = next_cell.random_neighbors
else:
neighbors = next_cell.neighbors
for neighbor in neighbors:
if neighbor not in examined and cond(neighbor):
if neighbor not in (pair[0] for pair in frontier):
frontier.append((neighbor, func(next_cell, value)))
The first thing I note is that the init parameter is on the wrong side of the asterisk that makes the method require keyword arguments. Change Signature. Green. Commit: tidying generate_with_function
def generate_with_function(self, *,
init=0,
condition=None,
randomness=0.0,
function=None,
):
Now let’s order things more sensibly. And rename. init is the initial value for the function, the value associated with self in the tuples. rename it and put it after the function. Move randomness down to last.
def generate_with_function(self, *,
condition=None,
function=None,
initial_value=0,
randomness=0.0
):
Commit. condition is a function that should return True if the cell being looked at should be returned. Rename to select. Commit. Look a bit further:
def generate_with_function(self, *,
select=None,
function=None,
initial_value=0,
randomness=0.0):
def default_all_cells(_cell):
return True
def default_zero(cell, value):
return 0
cond = select or default_all_cells
func = function or default_zero
Let’s inline the two inner default functions. I had to modify generate, because it was explicitly passing None, which overrode the defaults:
def generate(self, *, condition=lambda cell: True, randomness=0.0):
for cell, _ in self.generate_with_function(select=condition, randomness=randomness):
yield cell
def generate_with_function(self, *,
select=lambda cell: True,
function=lambda cell, value: 0,
initial_value=0,
randomness=0.0):
frontier = [(self, initial_value)]
examined = {self}
...
But now we have duplication of those defaults. I tried making them class variables but couldn’t connect to them. They could probably be top-level values, but I’m reluctant to do that. We’ll leave them as is for now.
I wonder if a better name for generate might be flood. As for generate_with_function, maybe flood_applying?
I remain tempted to eliminate the generate that returns just the cell, and always return a pair, (cell, value) and let the users ignore the items. We could call it flood_items or some such.
As an experiment, let’s look at senders of generate and see how hard they’d be to change.
There are only four, all pretty much like this one:
for c in origin.generate(select=lambda c: c.is_available):
All we have to do is say:
for c, _ in origin.generate(select=lambda c: c.is_available):
I’m gonna do it. We’ll delete the current generate, rename the other, fix the defect. I think that’ll work.
It took a few more changes than I thought, more like 8 than 4, but we are green. Now everything is calling the function that accepts the function. Let’s rename that to flood, and we have a convenience method to rename as well:
def flood(self, *,
select=lambda cell: True,
function=lambda cell, value: 0,
initial_value=0,
randomness=0.0):
frontier = [(self, initial_value)]
examined = {self}
while frontier:
next_cell, value = frontier.pop(0)
yield next_cell, value
examined.add(next_cell)
if random.random() < randomness:
neighbors = next_cell.random_neighbors
else:
neighbors = next_cell.neighbors
for neighbor in neighbors:
if neighbor not in examined and select(neighbor):
if neighbor not in (pair[0] for pair in frontier):
frontier.append((neighbor, function(next_cell, value)))
def flood_in_rooms(self):
return self.flood(select=lambda c: c.is_in_a_room)
Commit. Missed one above. My bad. All good now.
There are two things in flood that trouble me a bit. One is the if-else on randomness, which decided dynamically whether to use random neighbors or keep them in the standard order. That lets us tune the wandering effect on some paths or room builds.
The other troubling thing is (pair[0] for pair in frontier), where we check to see whether the cell neighbor is already in frontier, in which case we don’t put it there again.
Maybe we could make frontier into a dictionary, cell pointing at value. Then we could change the check of concern into a check against frontier.keys. We’d have to remove the key but I think dictionaries support pop, so that might just work.
We’re at a save point, let’s try it.
def flood(self, *,
select=lambda cell: True,
function=lambda cell, value: 0,
initial_value=0,
randomness=0.0):
frontier = {self: initial_value}
examined = {self}
while frontier:
next_cell = next(iter(frontier))
value = frontier.pop(next_cell)
yield next_cell, value
examined.add(next_cell)
if random.random() < randomness:
neighbors = next_cell.random_neighbors
else:
neighbors = next_cell.neighbors
for neighbor in neighbors:
if neighbor not in examined and select(neighbor):
if neighbor not in frontier:
frontier[neighbor] = function(next_cell, value)
That’s much nicer. The next(iter(...)) is a bit odd but probably better than frontier.keys[0]. Commit.
We could make the last if into an and, I think.
for neighbor in neighbors:
if (neighbor not in examined and
select(neighbor) and
neighbor not in frontier):
frontier[neighbor] = function(next_cell, value)
Yes. Commit.
One more thing, the if on neighbors. Let’s extract a method. And then, with a little help from me:
def flood(self, *,
select=lambda cell: True,
function=lambda cell, value: 0,
initial_value=0,
randomness=0.0):
frontier = {self: initial_value}
examined = {self}
while frontier:
next_cell = next(iter(frontier))
value = frontier.pop(next_cell)
yield next_cell, value
examined.add(next_cell)
neighbors = self.get_neighbors(next_cell, randomness)
for neighbor in neighbors:
if (neighbor not in examined and
select(neighbor) and
neighbor not in frontier):
frontier[neighbor] = function(next_cell, value)
@staticmethod
def get_neighbors(next_cell, randomness):
if random.random() < randomness:
return next_cell.random_neighbors
return next_cell.neighbors
Commit. I think we’ll wrap here and sum up.
Summary
Six commits in a bit over an hour. Could have been seven or eight but sometimes I could commit and don’t remember to. I’d like to remember, because life is better when a revert, if necessary, will change less code, losing less good stuff.
We reduced two methods down to one, while still allowing simple-enough parameter lists through some useful defaults. We have a better name—according to me—in flood vs generate, and certainly it’s better than generate_with_function.
The parameter condition is better named select, but I still wish for a better word than function, which is pretty generic. The generate method itself is better looking without the embedded long-hand default functions. I like the lambda much better and wish that Python were more accepting of them. Life is tough, I guess.
Converting frontier to a dictionary was triggered by not liking the weird pair[0] expression that worked but made you look about three times before understanding it. The dictionary code may look odd to a non-Python programmer but I think it’s pretty Pythonic.
I am pretty sure that we could extract a couple of methods from flood, possibly simplifying its form a bit but I’m not sure if it would actually help understanding. It is a pretty big method at 18 lines, probably bigger than anything but a big room builder, possibly the longest method in the program. But part of that size is the nature of the beast, which has to keep track of all the cells it examines, ensuring that they get processed once but no more than once.
In that light (or shadow) I do wonder if we could get rid of the apparent duplication of use of examined and frontier but when I try to see if it can be done, it always appears to me that we need them both. I freely grant that I looked up the flood fill code on Red Blob Games, so it is not as clear in my mind as it would be if I were clever enough to have invented flood fill, which I am not now and may never have been.
Bottom line for today, a very nice refactoring sequence, mostly machine-supported, tests running all the time except for the expected adjustment of users of flood to expect two items back instead of one. Fixing up a raft of broken tests came down to inserting , _ about eight or ten times.
Fun morning, good result. Time for iced chai. See you next time!