PathFinder uses generate. But there’s work to be done. And let’s talk about generalizing generate.

In a very courteous DM, @blackoverflow wondered why we had not added the pathfinding capability to generate. It would make generate more capable and—in that sense—more general. Here, before we start with the code, are some thoughts.

Although I think I may not have written about it, I did consider generalizing Cell:generate and then using it in PathFinder. One solid reason why I decided not to put the path stuff directly into Cell was that it creates a large dictionary and many / most users will not want it. So I held off.

In addition, although it does seem that the path stuff might belong on Cell, it seemed to me to be premature to make that decision. And, I felt, it would be easier to modify PathFinder to do what it needed with what the current generate provided. (I may or may not have been right about that.)

There were other considerations that are kind of built into how I work:

Generalizing

I don’t recall why I set that thought aside, but my general thinking on generality is that it comes in two forms.

In one form, we take a relevant object and add additional capability to it, making it able to address more problems.

In the other form, it’s more like we are removing limitations on our object, enabling it to support more problems, but without adding any substantial behavior, though we might consolidate behavior into it, often found via “feature envy”1.

I suspect that in both cases, the object being “generalized” increases in size, but in the first case the changes themselves tend to be large, and in the second they tend to be small, often leading to an overall reduction in code size.

I freely grant that the above may not ring a bell in everyone’s head. I don’t have a solid explanation of the feeling, but to me, sometimes we generalize by adding complexity, and sometimes we add capability without complexity. I prefer the latter kind.

Discovery

Part of my focus in the five million words that make up this site is on “discovering” opportunities to make the code better, after it gets however it is. Again and again in these articles, we are working on some code, or just stumble across some code that needs improvement. And generally(!) without much difficulty we find that we can make an overall improvement to the design of the code without great difficulty or “large” refactorings. We find that we can usually even make those improvements incrementally, over a longer period of time.

The point of this “discovering opportunity” style is to help folx realize that even after the code gets messy, it can be improved with small investments of available time, not “large” efforts that often stall productivity, or that never happen at all.

Working Where We Are

Generally in software development, we have some “story”, some short-term mission that we are out to resolve. We start working there, and while the work may lead us elsewhere, we need to remain somewhat focused on the initial point of our work.

I would not go so far as to suggest that I reasoned through all of that and then decided firmly that the right thing to do was to change PathFinder to use the current form of generate. That didn’t happen. Somehow all the thinking I did converged on “change PathFinder to use generate”, and that’s what we got working yesterday.

We have made it work. Now let’s look at making it right. The new PathFinder code is committed to the repo, and the drawing of paths now uses it, but there are tests that are still exercising the old code. We’d like to keep those tests if they are valuable, running the new code, and remove them if they are not. And we want to remove all the old code from PathFinder.

And then we’ll see what else we might do.

I think I’ll just remove the old code and see what breaks.

I comment out flood and two tests fail:

    def test_flood(self):
        bank = CellSpace(5, 5)
        finder = PathFinder(bank)
        finder.put(bank.at(1, 2))
        finder.flood(None)
        assert len(finder.queue) == 0
        assert len(finder.reached) == 25


    def test_flood_uses_bank(self):
        bank = CellSpace(5, 5)
        taken = [(3,3), (1,4), (2,1)]
        for t in taken:
            bank.take_xy(*t, None)
        finder = PathFinder(bank)
        finder.put(bank.at(1, 2))
        finder.flood(None)
        assert len(finder.reached) == 25 - len(taken)
        for t in taken:
            assert t not in finder.reached

I think we can remove the first test because it tests flood, which is now handled by generate, with its own tests. The second, I think we could repurpose, but there are two much more robust tests of PathFinder below. We’ll remove these and the resulting unused code in PathFinder.

Along the way we find some tests for expand, which has no uses. Remove those as well.

In a few moments, we have removed redundant tests and code and PathFinder looks like this:

class PathFinder:
    def __init__(self, bank):
        self.bank = bank
        self.queue: list[Cell] = []
        self.reached: dict[Cell, Cell|None] = dict()
        self.room_list = []

    def find_path_between_rooms(self, source_room: Room, target_room: Room):
        self.room_list = [source_room, target_room]
        return self.find_path(source_room.origin, target_room.origin)

    def find_path(self, source: Cell, target: Cell) -> list[Cell]:
        self.queue.append(source)
        self.reached[source] = None
        self.generate(source, target)
        path = []
        start = target
        while start is not None:
            path.append(start)
            start = self.reached.get(start)
        return path

    def generate(self, source, target):
        for c in source.generate(self.can_use):
            neighbors = c.neighbors
            for neighbor in random.sample(neighbors, len(neighbors)):
                if neighbor not in self.reached and self.can_use(neighbor):
                    self.reached[neighbor] = c
                    if neighbor == target:
                        return

    def can_use(self, neighbor) -> bool:
        if neighbor.is_available():
            return True
        for room in self.room_list:
            if neighbor in room: # isn't this searching the whole room?
                return True      # can't we just check neighbor.room() now?
        return False

I notice my comments on can_use, of course. I think we can do this:

    def can_use(self, neighbor) -> bool:
        if neighbor.is_available():
            return True
        return neighbor.room() in self.room_list

We’re green. Could have committed once already. Get to it: refactoring PathFinder, removing unused code and tests.

Of course we can simplify can_use further:

    def can_use(self, neighbor) -> bool:
        return neighbor.is_available() \
            or neighbor.room() in self.room_list

Green. Commit.

Now that we have some of the cruft out of the way, let’s take a deeper look at PathFinder:

class PathFinder:
    def __init__(self, bank):
        self.bank = bank
        self.queue: list[Cell] = []
        self.reached: dict[Cell, Cell|None] = dict()
        self.room_list = []

    def find_path_between_rooms(self, source_room: Room, target_room: Room):
        self.room_list = [source_room, target_room]
        return self.find_path(source_room.origin, target_room.origin)

    def find_path(self, source: Cell, target: Cell) -> list[Cell]:
        self.queue.append(source)
        self.reached[source] = None
        self.generate(source, target)
        path = []
        start = target
        while start is not None:
            path.append(start)
            start = self.reached.get(start)
        return path

    def generate(self, source, target):
        for c in source.generate(self.can_use):
            neighbors = c.neighbors
            for neighbor in random.sample(neighbors, len(neighbors)):
                if neighbor not in self.reached and self.can_use(neighbor):
                    self.reached[neighbor] = c
                    if neighbor == target:
                        return

    def can_use(self, neighbor) -> bool:
        return neighbor.is_available() or neighbor.room() in self.room_list

The first thing I notice is the reference to queue in __init__ and find_path. It seems that we no longer need that: it’s left over from when we were doing our own flooding. Remove them, see if anything breaks. Green. Commit.

Is bank used? No. Remove it. Commit. Now we don’t need the bank parameter on init. Change Signature. Green: commit.

What about reached and room_list?

room_list is set up on find, and it is only used in can_use. I think we could remove that as a member if we pass it down to our generate and then use it in a closure. Let’s try that.

However …

I’m beginning to feel that I’m not getting the help I need from my objects, probably from Cell.generate. It was always clear that we were regenerating the neighbors of each cell we process, while generate will be doing the same:

class PathFinder:
    def generate(self, source, target):
        for c in source.generate(self.can_use):
            neighbors = c.neighbors
            for neighbor in random.sample(neighbors, len(neighbors)):
                if neighbor not in self.reached and self.can_use(neighbor):
                    self.reached[neighbor] = c
                    if neighbor == target:
                        return

class Cell:
def generate(start, condition=None):
    def accept_any(_cell):
        return True
    cond = condition or accept_any
    frontier = {start}
    examined = {start}
    while frontier:
        next = frontier.pop()
        yield next
        examined.add(next)
        for neighbor in next.neighbors:
            if neighbor not in examined and cond(neighbor):
                frontier.add(neighbor)

We have the recently-added randomization (sample) in PathFinder, which gives us “better” paths. Arguably we could randomize unconditionally in Cell.generate if we wanted to. It would probably be harmless and might produce more interesting behavior in some cases.

That said, I think it would be weird to put random behavior that deep in the system. Probably ill-advised.

Steady for now

Let’s stick to seeing if we can push the room list down, removing it as a member variable.

My first attempt at that failed. Not sure why, I just rolled back.

This time, I make a local function for can_use:

class PathFinder:
    def generate(self, source, target):
        def can_use(cell):
            return cell.is_available() or cell.room() in self.room_list
        for c in source.generate(can_use):
            neighbors = c.neighbors
            for neighbor in random.sample(neighbors, len(neighbors)):
                if neighbor not in self.reached and can_use(neighbor):
                    self.reached[neighbor] = c
                    if neighbor == target:
                        return

Now, instead of referencing self in the can_use, reference a local:

    def generate(self, source, target):
        rooms = self.room_list
        def can_use(cell):
            return cell.is_available() or cell.room() in rooms
        for c in source.generate(can_use):
            neighbors = c.neighbors
            for neighbor in random.sample(neighbors, len(neighbors)):
                if neighbor not in self.reached and can_use(neighbor):
                    self.reached[neighbor] = c
                    if neighbor == target:
                        return

Now let’s pass it down instead of making it a member.

I tried Change SIgnature and it broke things. Do it by hand.

    def find_path(self, source: Cell, target: Cell) -> list[Cell]:
        self.reached[source] = None
        self.generate(source, target, self.room_list)
        path = []
        start = target
        while start is not None:
            path.append(start)
            start = self.reached.get(start)
        return path

    def generate(self, source, target, rooms):
        def can_use(cell):
            return cell.is_available() or cell.room() in rooms
        for c in source.generate(can_use):
            neighbors = c.neighbors
            for neighbor in random.sample(neighbors, len(neighbors)):
                if neighbor not in self.reached and can_use(neighbor):
                    self.reached[neighbor] = c
                    if neighbor == target:
                        return

One step at a time. We could be committing but I’m not quite confident in this idea. I should be, but anyone just one more move, well, a few small edits:

class PathFinder:
    def __init__(self):
        self.reached: dict[Cell, Cell|None] = dict()

    def find_path_between_rooms(self, source_room: Room, target_room: Room):
        room_list = [source_room, target_room]
        return self.find_path(source_room.origin, target_room.origin, room_list)

    def find_path(self, source: Cell, target: Cell, room_list) -> list[Cell]:
        self.reached[source] = None
        self.generate(source, target, room_list)
        path = []
        start = target
        while start is not None:
            path.append(start)
            start = self.reached.get(start)
        return path

    def generate(self, source, target, rooms):
        def can_use(cell):
            return cell.is_available() or cell.room() in rooms
        for c in source.generate(can_use):
            neighbors = c.neighbors
            for neighbor in random.sample(neighbors, len(neighbors)):
                if neighbor not in self.reached and can_use(neighbor):
                    self.reached[neighbor] = c
                    if neighbor == target:
                        return

We’re down from four member variables to one. Let’s commit what we have, and get rid of that last member:

One: pass self.reached to generate:

    def find_path(self, source: Cell, target: Cell, room_list) -> list[Cell]:
        self.reached[source] = None
        self.generate(source, target, room_list, self.reached)
        path = []
        start = target
        while start is not None:
            path.append(start)
            start = self.reached.get(start)
        return path

    def generate(self, source, target, rooms, reached):
        def can_use(cell):
            return cell.is_available() or cell.room() in rooms
        for c in source.generate(can_use):
            neighbors = c.neighbors
            for neighbor in random.sample(neighbors, len(neighbors)):
                if neighbor not in reached and can_use(neighbor):
                    reached[neighbor] = c
                    if neighbor == target:
                        return

Two: make it local, removing the member:

class PathFinder:
    def __init__(self):
        pass

    def find_path_between_rooms(self, source_room: Room, target_room: Room):
        room_list = [source_room, target_room]
        return self.find_path(source_room.origin, target_room.origin, room_list)

    def find_path(self, source: Cell, target: Cell, room_list) -> list[Cell]:
        reached = {source: None}
        self.generate(source, target, room_list, reached)
        path = []
        start = target
        while start is not None:
            path.append(start)
            start = reached.get(start)
        return path

    def generate(self, source, target, rooms, reached):
        def can_use(cell):
            return cell.is_available() or cell.room() in rooms
        for c in source.generate(can_use):
            neighbors = c.neighbors
            for neighbor in random.sample(neighbors, len(neighbors)):
                if neighbor not in reached and can_use(neighbor):
                    reached[neighbor] = c
                    if neighbor == target:
                        return

Seems we could inline find_path but it has a direct test. Let’s do it anyway and then I want to chat about what’s going on.

After the inline, the test that fails is redundant, so I remove it. Here we are:

class PathFinder:
    def __init__(self):
        pass

    def find_path_between_rooms(self, source_room: Room, target_room: Room):
        room_list = [source_room, target_room]
        reached = {source_room.origin: None}
        self.generate(source_room.origin, target_room.origin, room_list, reached)
        path = []
        start = target_room.origin
        while start is not None:
            path.append(start)
            start = reached.get(start)
        return path

    def generate(self, source, target, rooms, reached):
        def can_use(cell):
            return cell.is_available() or cell.room() in rooms
        for c in source.generate(can_use):
            neighbors = c.neighbors
            for neighbor in random.sample(neighbors, len(neighbors)):
                if neighbor not in reached and can_use(neighbor):
                    reached[neighbor] = c
                    if neighbor == target:
                        return

I see a small issue, the use of Room.origin. I happen to know that it returns the first cell in the room but the intention, I believe, is that it could return an arbitrary cell. We shouldn’t call it twice, so:

    def find_path_between_rooms(self, source_room: Room, target_room: Room):
        room_list = [source_room, target_room]
        source_origin = source_room.origin
        target_origin = target_room.origin
        reached = {source_origin: None}
        self.generate(source_origin, target_origin, room_list, reached)
        path = []
        start = target_origin
        while start is not None:
            path.append(start)
            start = reached.get(start)
        return path

Chat

Right, I wanted to chat about what’s going on. It’s worth noting that with the last couple of changes, PyCharm is pointing out that PathFinder.generate could be static: it references no members of PathFinder. (And for that matter find_path_between_rooms only references generate). All that generate really thinks about is Cells, suggesting that it belongs on Cell.

What we have done over the past 450 lines of article was simply to refactor PathFinder, seeing opportunities to simplify it and taking them. Our changes were limited to PathFinder and its tests. And from those small refactoring tests something has emerged without our focusing on it: the generate method, which does at least half the work of PathFinder, seems to want to be a method on Cell.

Now it’s fair to say that this morning, I knew of this possibility, if only because @blackoverflow mentioned it to me yesterday. But even so, we weren’t working to bring that about: we were just working to improve PathFinder, to simplify PathFinder. That’s really all we were thinking about as we worked

And now, PathFinder is much simpler than it was, and we have a quite clear indication that about half of PathFinder probably belongs on Cell.

Summary

One way of looking at all this would to be ask, “If you’re just going to wind up putting the generation into Cell, why not do it right away?” That’s a fair question and I have three answers to it.

  1. If you would prefer to do that when you’re in a particular situation, go right ahead. You’re probably right and it will probably go OK.

  2. When I considered just going ahead and putting the more complex path saving into Cell, I wasn’t comfortable with the idea. It felt safer and easier, somehow. to put it into PathFinder and leave Cell alone. I don’t recall whether I even thought about moving the code later. It just didn’t seem right to me to make PathFinder better by working on Cell.

  3. I have come to prefer very small steps, such as those we took today, and I have come to trust a series of small steps with local reasons for being made. I trust that I can keep things running; I trust that I can stop improving at any time and be OK in “production”; I trust that if the objects aren’t quite right, that will become apparent and we can improve them at a suitable time.

You get to work in the ways that you find best for you, of course. I am not your uncle and I am not here to tell you how to be. I’m here to show you what I do and how it works out. Quite often things work out quite nicely, as they seem to be over the past couple of days. Other times, I crash and burn. But, like Fawkes, I seem to come back to life and recover.

When I work as I do, with small steps, lots of tests, lots of small refactoring steps, most days are about equally productive. Some are notably better than average, and some are notably worse. But overall, most days are satisfactory, and I think it has to do with how I try to work. It seems, often, that when I deviate from the practices just mentioned, things go worse.

So if you were to move around in your own work space, trying on some of these ideas, maybe you’d find some small ways to adjust that will make your work a bit happier, help you be more like the person you’re trying to be.

These things help me be the person I’m trying to be. If they give you useful ideas, so much the better!

Next Time

Probably, no promises, next time we’ll look at moving more of PathFinder over to Cell. See you then.

Do feel free to toot me: no need to DM, I work in public so open messages are fine. And, @blackoverflow, thanks again for your kind input!



  1. “Feature Envy” is the term that refers to a method in one class that seems to be entirely or mostly concerned with members of other classes. A common example might be code in several objects that adds together the members of two sequences, which suggests that a vector object understanding +might be useful.