NO! A CRASH!
- Warning:
-
You probably don’t want to read this but you might want to scan through. It records much of the thrashing I did while trying to sort out the defect. The succinct version is Found It!.
Everything started out just fine …
We’ll have a look at the note cards listing things we might do. We might do some of them.
I am fond of 3x5 cards with an 0.2 inch grid, for note taking, sketching ideas, and the like. I used to have them custom printed with my logo and such, but now I use some excellent cards from 321done.com. In my present list of things we might do, I’ve drawn square check-boxes by each item, for no good reason other than because it looks sort of appropriate. Anyway, here are the current notes, just as written:
- Clean up find_path to allow randomness to be more orderly
- Paths should not include room cells
- Change CellSpace to sparse?
- Cell class is large
- Can we better integrate PathHelper
- Are there iterators without conditions?
- cell generate random sample -> cell.neighbors ?Do not understand this?
- Shift arrow = move along obvious path
- Keyboard handler object or similar improvement
Those are the verbatim notes, including the comment about not understanding that one idea. Oh, wait, maybe I do. Here’s how we do that:
class Flooder:
def _get_neighbors(self, next_cell):
if random.random() < self._randomness:
return next_cell.random_neighbors
return next_cell.neighbors
I think we’ve already done what I had in mind here. At some point in the past, the code that corresponded to this method fetched next_cell.neighbors and randomized them locally. Now this code decides whether to ask the cell to deliver its neighbors in the canonical order or in random order. We could push this whole method over to cell, by passing the randomness factor on down. That would remove this method from Flooder, and add the same method to Cell. Since Cell already seems overly heavy, I think we won’t do this. Check off that box.
It does bring an idea to mind: we generally produce a cell’s neighbors in the order west, east, north, south, for no particular reason. What if we always randomized the order of neighbors?
A Crash!!
I thought I’d do an experiment to see if the pictures looked very different with that plan in place. When I tried to run main, which I haven’t done for a few days, it failed! I’ve been shipping bad code and did not know it!
All hands to the pumps!
Of course this means that there are tests that we should have that we do not have. So, rather than whatever we were going to do a few moments ago, we have to pull the andon cord and put all hands on this defect. Of course, all hands is still just me, so there won’t be a loud trampling sound as everyone runs over to see what’s up.
Good News: It seems rare. Bad news: intermittent.
The first thing that we find out is that the defect is very rare, probably nothing we’ve done recently. I had to run twenty or thirty times to get the failure again.
The failure is in this code:
class DungeonLayout:
@staticmethod
def find_suite(room) -> Suite:
suite_rooms: set[Room] = set()
start = room.cells[0]
for cell, _ in Flooder(start).in_any_room().flood():
suite_rooms.add(cell.room)
return Suite(suite_rooms)
On the line start = , room is None. We’re called from here:
class DungeonLayout:
def define_suites(self):
suites: list[Suite] = []
unexplored = self.rooms.copy()
while unexplored:
suite: Suite = self.find_suite(unexplored[0])
suites.append(suite)
unexplored = [room for room in unexplored if room not in suite.room_set]
return suites
So for that to happen, there must be a None in unexplored and therefore in self.rooms. How could that happen?
My eyes happen to fall on this method:
class DungeonLayout:
def _straight_path_room(self):
cells = self._straight_path_cells()
if cells:
return Room(cells, 'path')
else:
return None
Callers of that?
class DungeonLayout:
def ensure_connected(self):
while not self.is_fully_connected:
room = self._straight_path_room()
self.add_room(room)
And that is called from main. And there’s yer problem right there.
We can’t make a room with no cells: that would just fail on the next clause, and there is surely lots of code that expects to find at least one cell in a room.
Here is a proposed fix. Change this:
def add_room(self, room):
self.rooms.append(room)
To this:
def add_room(self, room):
if room:
self.rooms.append(room)
That should do the job. But we really should write a test that shows the defect and that than shows it as fixed.
I have an errand to run, and after that, we’ll do just that thing.
Hi, I’m Back!
So, how to prove that this defect exists? This method must have produced an empty list of cells:
def _straight_path_cells(self):
cells = []
suites = self.define_suites()
for s1, s2 in zip(suites, suites[1:]):
cells.extend(s1.find_path_cells(s2))
return list(set(cells))
For that to happen we need a set of suites such that one of them blocks any path between the other two. And I distinctly remember writing a test that showed us needing the while loop in ensure_connected. Let’s find that test. I think we’ll find that it needs improvement.
def test_connected(self):
layout = DungeonLayout()
Cell.create_space(5, 5)
r_0 = Room([Cell.at(2,0), Cell.at(2, 1), Cell.at(2,2),
Cell.at(3,2), Cell.at(4,2)])
layout.add_room(r_0)
layout.add_room(Room([Cell.at(0,4)]))
layout.add_room(Room([Cell.at(4, 0)]))
suites = layout.define_suites()
assert not layout.is_fully_connected
layout.ensure_connected()
assert layout.is_fully_connected
I suspect—but am not certain—that this test may already add a None to the layout. Let’s find out. I add this:
assert None not in layout.rooms
I was hoping it would fail. It did not. Bummer. Also I don’t understand that layout very well …
My plan was to use the string layout for clarity but that doesn’t work because we can’t guarantee the order of rooms. Well, wait, maybe we can … we can, but the test still doesn’t produce a None room. For sure, a failure to connect in one pass through ensure_connection doesn’t cause the failure. For there to be a None, if I’m not mistaken, we need a set of two or more suites such that they are not connected, and cannot be connected by a path between them. (If there are more than two, possibly between any of them?
)
OK what if one suite entirely surrounded another? Let’s look at the connection code more carefully:
class DungeonLayout:
def ensure_connected(self):
while not self.is_fully_connected:
room = self._straight_path_room()
self.add_room(room)
def _straight_path_room(self):
cells = self._straight_path_cells()
if cells:
return Room(cells, 'path')
else:
print('no path')
return None
def _straight_path_cells(self):
cells = []
suites = self.define_suites()
for s1, s2 in zip(suites, suites[1:]):
cells.extend(s1.find_path_cells(s2))
return list(set(cells))
class Suite:
def find_path_cells(self, suite) -> list[Cell]:
source, target = self.find_closest_pair(suite)
cells = source.find_path(target, [source.room, target.room])
on_path = cells[1:-1]
return on_path
We won’t drill into Cell.find_path yet, not unless we have to. The find_path_cells method could return an empty list in two cases. If it found no path whatsoever, which is what I’ve been looking for … or if it found a path of length 2, that is, where the two suites are adjacent. Now we would expect that two suites that are adjacent would be merged into one, resulting in the problem being solved. But can they be adjacent for a while and we don’t realize it?
No, I don’t see how that could happen. In the course of this operation, we don’t allocate the cells to Rooms other than in ensure_connected, and if that addition doesn’t do the job, we’ll call _straight_path_room again and that will redefine the suites, which should pick up the adjacency.
I don’t see (yet!) how to write a test that shows that our defect exists and then shows that we have fixed it. I am sure that the if above will fix it harmlessly, but I do not know how it comes about.
I’m going to get muddy here. We’ll write a test that repeatedly generates dungeons until this defect occurs and then we’re going to dump vast quantities of information.
I’ll start by reducing the size of the main dungeon, to see if a smaller map can be used, making it easier to see what has gone wrong. That seems fruitless, so we’ll go with the big area. I think the trick will be to dump the info and not throw the exception, so that after a bunch or runs, we’ll have a map on screen and a dump in the run window.
Arrgh. I managed to dump enough info while still displaying the map, to make me suspect that the code that finds the nearest points between two suites (or is it rooms) is not serving. It appears that it can select a cell in one or the other room such that a path from there cannot connect.
I need a break. I’ll look at that nearest code to get it somewhat fresh in my mind, and then step away for a bit.
def find_path_cells(self, suite) -> list[Cell]:
source, target = self.find_closest_pair(suite)
cells = source.find_path(target, [source.room, target.room])
on_path = cells[1:-1]
if not on_path:
print(f'{source.room}:{source} {target.room}:{target}')
print([cell for cell in cells])
return on_path
def find_closest_pair(self, suite):
my_best = None
his_best = None
distance = 1_000_000
for mine in self.cells:
for his in suite.cells:
dist = mine.manhattan_distance(his)
if dist < distance:
my_best = mine
his_best = his
distance = dist
return [my_best, his_best]
This seems to be an exhaustive search of all the cells in the two suites, looking for minimum distance. Hard to see how that could be wrong. I’m going to enhance my dump a bit.
Ah. When it occurs, the code is trying to find a path between a cell and itself, in two rooms named path. (Or, possibly,, the same room named path twice but I think not.) My leap to conclusion suggests that we can create a path room with the same cell in it more than once. If I recall, we do extend … but maybe not:
def _straight_path_cells(self):
cells = []
suites = self.define_suites()
for s1, s2 in zip(suites, suites[1:]):
cells.extend(s1.find_path_cells(s2))
if not cells:
print(f'({len(suites)}')
for suite in suites:
print(suite.room_set)
return list(set(cells))
This was the code I had in mind, and it does list(set()), which will remove duplicates from cells. So now I suspect that we can get the same cell into two rooms named path. I’ll dump the id, I guess. I should break now. But one more try …
It’s the same room twice. OK, i give up, take a break. My eyes are worn plumb out anyway. One more print …
OK, I have determined that two suites contain the same room: it is a path room. I’m outa here …
Wednesday AM
I even messed about a bit more last night. Really didn’t find out anything new at all, but I was distracted because it was FGNO. The group was off in political weeds and I wasn’t up for that level of horror, so was only half paying attention.
Anyway, a major issue here is that I have to run the program as many as 20 or 30 times, maybe more, could be infinite I suppose, to get a failure and the corresponding print. So tracking back through events is tedious. It only takes two seconds to do another run and see if it fails, but can you imagine waiting an entire MINUTE for test results? Make your blood boil? I should say!
So this morning, I’ll undergo the tedium once more and use it to find a random seed that will always fail. Then we can print or even debug. It is against my principles to use the debugger, but my principles aren’t totally rigid and once in a log while I’ll do it. We’ll see if my principles get bent this morning.
I just put random.seed(1) at the top of main and plan to increment it until I get a fail.
On the 35th run I get the failure. Let me give you a look at the current info dump, which now we can work to refine and zero in on the problem:
rooms in common:{Room(path: 4367708000)}
rooms in common:{Room(path: 4367708000)}
Room(path: 4367708000):Cell(15, 50) - 0 - Room(path: 4367708000):Cell(15, 50)
[Cell(15, 50)]
(2
{Room(unknown): 4373718048, Room(unknown): 4380828704, Room(cave: 4380828752), Room(cave: 4380828272), Room(round: 4380822656), Room(cave: 4380185216), Room(cave: 4380828320), Room(cave: 4380828368), Room(unknown): 4380828896, Room(unknown): 4380828416, Room(unknown): 4380828464, Room(unknown): 4380828512, Room(path: 4367708000), Room(diamond: 4380828032), Room(cave: 4380828560), Room(diamond: 4380828080), Room(round: 4380828128), Room(unknown): 4380828656}
{Room(cave: 4380828848), Room(unknown): 4380828224, Room(path: 4367708000), Room(cave: 4380828608)}
no path
define suites
suite_rooms={Room(unknown): 4373718048, Room(unknown): 4380828704, Room(cave: 4380828752), Room(cave: 4380828272), Room(round: 4380822656), Room(cave: 4380185216), Room(cave: 4380828320), Room(cave: 4380828368), Room(unknown): 4380828896, Room(unknown): 4380828416, Room(unknown): 4380828464, Room(unknown): 4380828512, Room(path: 4367708000), Room(diamond: 4380828032), Room(cave: 4380828560), Room(diamond: 4380828080), Room(round: 4380828128), Room(unknown): 4380828656}
unexplored=[Room(unknown): 4380828224, Room(cave: 4380828608), Room(cave: 4380828848)]
suite_rooms={Room(cave: 4380828848), Room(unknown): 4380828224, Room(path: 4367708000), Room(cave: 4380828608)}
unexplored=[]
defined
If this looks like unintelligible crapola, you’re not far off. I’ve put in a fair number of prints, some of which are mostly useless, but I’ve left them in, because each run took so long to get a failure that I was unwilling to lose any “information”.
Up there at the top we see that Room(path line with the - 0 -. That’s what’s causing the crash. Those two rooms have the same cell in common, so that when we look for the closest cells for the Suite calculation, we find that cell. We build a path between the cell and itself, which is empty, and we therefore create the None path and crash, except that yesterday we caught the attempt to add None as a room and ignored it. That fixes the apparent problem but not the real problem.
- NO!!!
- Added after a few paragraphs below. The above is wrong! The issue is two Suite instances with the same Room in them, not two Room instances with the same Cell. No harm done, I realized down below before I did anything badly-um-mistaken.
I would like to know one more bit of info, for my own amusement: which room does the Cell think it is in? I believe that what is happening is that we tell the cell it is in room X and then later tell it to be in room Y, without removing it from X. We could of course patch room assignment to be sure to remove the cell from any prior room it may have thought it belonged to, but that would just cover the bug, which is assigning a room cell to a different room without meaning to.
I think also that I would like to know where that common cell is on the map. That might be quite interesting. I think what I’ll try to do is to put Dot at that cell. Dot is not positioned until much later than all this happens, and in an object that doesn’t exist yet. We can’t quite get there from here.
Instead, I cause Dot to display her location, then navigate her to (15,50) to see where the shared cell is. It doesn’t tell me anything.

OK, that was interesting and educational, as I had forgotten how to display text in PyGame, but it wasn’t very useful. Let’s review the code. We want to see how we get the same room into two suites. (Realized I had misapprehended above, and added the NO!!! note.)
Now I imagine that many of us would be setting a breakpoint in the code and stepping around in define_suites to see what was going on. That is not my way: I have found that it often leads to hours of pointless stepping. So I prefer to study the code and work out what may be going on. I either see the defect, or see a place where I need more info.
This may or may not be faster, but it has the benefit that I learn the code. While stepping, I find that I retain very little detail. We’ll do it my way for now. Here’s what goes on:
def _straight_path_cells(self):
cells = []
suites = self.define_suites()
for ss1 in suites:
for ss2 in suites:
if ss1 != ss2:
intersect = ss1.room_set & ss2.room_set
if intersect:
print(f'rooms in common:{intersect}')
for s1, s2 in zip(suites, suites[1:]):
cells.extend(s1.find_path_cells(s2))
if not cells:
print(f'({len(suites)}')
for suite in suites:
print(suite.room_set)
return list(set(cells))
You’ll note my current print statements. Some of them may be useful, some probably not.
There’s where we discover that the suites have a room in common. So it happened, no surprise, in define_suites:
def define_suites(self):
suites: list[Suite] = []
unexplored = self.rooms.copy()
print('define suites')
while unexplored:
suite: Suite = self.find_suite(unexplored[0])
suites.append(suite)
unexplored = [room for room in unexplored if room not in suite.room_set]
print(f'{unexplored=}')
print('defined')
return suites
Now as I read that code, whatever set of rooms we have in the current suite, we remove from unexplored before we go for the new suite. So how could this happen? What does my trace tell us?
define suites
suite_rooms={Room(diamond: 4300154896), Room(cave: 4316374048), Room(unknown): 4316374576, Room(unknown): 4316374096, Room(unknown): 4316374144, Room(cave: 4306996352), Room(unknown): 4316374192, Room(path: 4300809200), Room(cave: 4316374240), Room(diamond: 4316373760), Room(round: 4316373808), Room(round: 4302944064), Room(unknown): 4316374336, Room(unknown): 4316374384, Room(cave: 4316373904), Room(cave: 4316374432), Room(cave: 4316373952), Room(unknown): 4315350512}
unexplored=[Room(unknown): 4316374000, Room(cave: 4316374288), Room(cave: 4316374528)]
suite_rooms={Room(cave: 4316374528), Room(cave: 4316374288), Room(path: 4300809200), Room(unknown): 4316374000}
unexplored=[]
defined
rooms in common:{Room(path: 4300809200)}
rooms in common:{Room(path: 4300809200)}
Room(path: 4300809200):Cell(15, 50) - 0 - Room(path: 4300809200):Cell(15, 50)
I notice that my rooms don’t all print the same, with the unknown ones having a different format. Not a concern for now, just a bit less obvious.
It’s not easy to see but I added blank lines above the “unexplored” print. Note that the following print of suite_rooms finds the three unexplored rooms … and also the path room that we found in the prior suite!
I’m starting to have a theory. Still too vague. Let’s dig down to the next method down, find_suite:
@staticmethod
def find_suite(room) -> Suite:
suite_rooms: set[Room] = set()
start = room.cells[0]
for cell, _ in Flooder(start).in_any_room().flood():
suite_rooms.add(cell.room)
print(f'{suite_rooms=}')
return Suite(suite_rooms)
OK, to find a Suite, we start at a room (one of the unexplored ones), and flood to find all the cells from there that are in any room at all. That should fill all the connected rooms and no others. So if there is a path between one of our rooms and any other, we should pick up the path and the room or rooms at the other end. And all those should and do turn up in the first suite, and unexplored should not and does not contain any of them.
But somehow, when we do the next find_suite, starting in one of the unexplored rooms, we find the path again.
After more fumbling and some interesting display work, I notice that there is but one Room named “path”, and that it contains all the path cells in the dungeon, no matter where they are, scattered all over. How does that even happen? I do recall a note that says, let me find it …, “are paths just one room or n rooms”. The item is checked off, whatever that may mean.
I feel like I’m getting closer. The clock tells me that I’ve been thrashing now for just over two hours. I have learned one probably key thing, maybe more:
- There is just one room named path in the dungeon, and it contains all the cells found in our layout connection code, all over the map;
- I think we never create a second room named path, which means, I think, that after that time, we do correctly discover that the map is disconnected, which is visibly clear, and then we do not find a path between the two resulting suites.
Basically, though, I am stuck. I’m going to make a chai, eat something, and wait for a better idea.
Thursday AM
After yesterday’s break, I was thinking about the issue in the afternoon and realized what was going on. The explanation and summary is Found It