Dungeon 306
Friday the 13th. That bodes well, always my lucky day. Besides, it was yesterday where I interfered with the canine. (Oh No! He did it again!)
In fairness to me, since if I’m not fair, you probably wouldn’t be, the actual canine interference was quite some time ago, when I originally created the EventBus, which, it happens, in a surprising turn of events, can and did retain its grasp on objects which otherwise, in the due and proper course of the game, should have and would have been dropped from consideration, vanishing if you will, all of which sadly was not the case, with said retention resulting in strange and mysterious phenomena such as ghostly appearances of objects and entities, in levels greater than one, unable to be interacted with, yet sometimes responding to queries with responses from no visible source.
It was bad, or, as I so eloquently put it in yesterdays article, “not good”.
The specific defect is now fixed, by virtue of destroying the EventBus each time we create a new level, and requiring the GameRunner to re-subscribe to the new one. There is a new tool that helped me diagnose the problem, but no decent test for the case.
Well, no, it’s not fixed, it’s worse than ever, as we’ll soon find out.
And what about tests for the whole game? Now, my friend and fellow competitor for best-looking software developer above a certain age, GeePaw Hill claims that he seldom / rarely / never writes system-level tests, relying on an iron-clad array of what must be perfect tests for the individual objects. After all, if all the objects work, surely the system itself must work. (It is possible that I am overstating GeePaw’s position just a bit, so as to draw the following comparison.)
The defect in question, ghostly objects showing up in dungeon levels other than the first, is simple to describe, yet quite subtle.
- The Bug
- The EventBus is a collection of all the objects that have subscribed to any message in the system. Each level in the system creates objects that use the Bus. When the player leaves the level, a new level is created, and all the objects from the prior level are no longer accessible to the program. In due time, they should be collected as garbage and recycled into even more terrifying monsters and more precious treasures.
-
However, the EventBus is still holding on to any object that has registered for events. All those objects that are mistakenly held will respond to
publish
andquery
messages. These include the message that gets the current contents of a tile in the game … so that objects that were on that tile in the previous level will respond “I’m there!”. Thus they’ll be drawn, or publish their messages, or otherwise do strange things.
I fixed the defect, as described above, by creating a new Bus and requiring the GameRunner, which does persist, to re-register. The objection to this is that there is a mysterious kind of coupling in this design: the EventBus has to be used in a particular way, so that objects that should disappear really do. If we accidentally changed the way we use the Bus, defects like this could appear again.
And at this moment, I don’t see how to really test this other than at the system level, as the system level is where the problem occurs.
How Might We Fix This?
Yellow Tape
One possibility is to put DFWT comments all around the EventBus, and HTBD comments as well.1
That might work but it’s a lot like putting yellow tape around the big ragged hole in your flooring where coyote’s anvil fell through: it’s not a good solution.
Remove the Bus
We could remove the bus and go back to objects knowing who to ask for information. That would require us to pass more objects around, and we would almost certainly create more globals, more indirection, or both. We’d probably wind up with more objects with very broad public interfaces, which has been a problem in the design.
Temporary Subscribers
This notion has drifted through my mind a few times since yesterday. What if subscribers to the Bus were generally temporary, and what if when we create a new level, we tell the Bus to remove temporary subscribers. Then, we’d add the notion of permanent subscribers, and set the GameRunner to permanent.
Scoped Subscribers
My friend Ryan Latta made a suggestion about solving the problem with “basic scoping”. We’ve not finished that conversation, and I don’t yet get the idea, but the word comes to mind with this idea: what if we could send messages to the Bus saying “scope X begins” and “scope X ends”. And suppose that objects could subscribe to any currently-existing scope. And suppose that when scope X ends, everyone subscribed under that scope is unsubscribed. Permanent/Temporary is a special case of this idea, I reckon.
That may or may not align with what Ryan had in mind, but I certainly owe him for the idea.
At this instant, I don’t know of more scopes than “whole game” and “one level” … but ideas do come streaming in. Current tests save the old Bus and create a temporary one. Instead they could begin a “testFoo” scope in before
and end it in after
. Maybe during combat we’d find a use for a special scope, or during a subquest or something. We could even add finalization if we needed it.
Hold Yer Horses There, Compadre
I can feel myself running with this idea. It must be related to what Ryan had in mind: it certainly isn’t something I thought before just now, though the perm/temp kind of implies more states. I am sure that I could over-engineer the heck out of this. As I envision it just now, scopes could overlap:
{A [B (C A} B] C)
They’d just be independently named notions. Maybe we could consider them “topics”. Be that as it may, there’s no reason why they should have to nest. I do think I’d say that opening X and then later opening X should do an implicit close for X. Nesting them is way beyond what we need.
If we went this way, what we need now are just two scopes, “Game”, and “Level”. Game would stay open for the whole game, and only GameRunner would subscribe with that key. Level would open when a level is created, and close when it is over. Everyone else would subscribe at the Level scope.
I feel a decision coming on.
YAGNI
We don’t really need to do anything: the current fix works and we can surround it with yellow tape. But, if I want to do it, I can certainly justify it by showing how an extended notion of pub-sub like scopes can be refactored into the system without a lot of trouble. The Bus is so nicely isolated that you probably already believe we could do it, but still, it should make a nice article or two.
We certainly don’t need to do anything extra for multiple scopes beyond two, but it seems likely that if we can do two, we can do N with the same code … although one can see “isPerm” and “isTemp” as a bit simpler.
For sure, I wouldn’t dive in and do some kind of stack-based complicated nesting recursive descent hyperbolic solution. I’m pretty good about not doing that.
I think, for now, we’ll ride with the existing fix, but we’ll keep this idea near the top of our minds for implementation soon.
Reflection
First of all, it’s good to wait for a bit to see whether ideas come to mind. Yesterday I didn’t have any of these good ideas.
Second, and arguably more important, it’s every so valuable to have other people to kick ideas around with. Ryan, GeePaw, Beth, Bryan, and others have chimed in with ideas. They don’t know the code and their ideas often don’t quite fit the system, and often I just don’t understand them, but they conversations always result in my doing something better than it would have been without my friends.
For now, I feel the current fix will hold, and that we have a decent idea—or direction—for a solid fix. I’m wanting to let that ride and look elsewhere for something small to do today.
Don’t rush, and chat with others. Important tip. Thanks, Egon. Let’s look for something small to do.
Something Right-Sized
A quick review and I find nothing that seems fun to do. I decide to play the game and immediately discover that it’s broken.
**The ?? button seems not to work at all. In fact, the whole crawl seems to be dead. **
Curses. And worses. My fix doesn’t work. Combat is broken. The inventory seems mostly to work, that is, things go in and out and potions and such add points.
We’re missing receivers. Let’s review all the subscribes … no, just the ones that might not work after resetting the Bus
Inventory:subscribe() -- in Main
-- seems to work, but why?
function Inventory:subscribe()
Bus:subscribe(Inventory, "touchBegan")
end
-- we call this one now when we restart the bus
function GameRunner:doSubscriptions()
Bus:subscribe(self, "createNewLevel")
Bus:subscribe(self, "darkness")
end
-- appears to be missed
function Dungeon:init(runner, tileCountX, tileCountY)
self.runner = runner
self.tileCountX = tileCountX or assert(false, "must provide tile count")
self.tileCountY = tileCountY or assert(false, "must provide tile count")
self.rooms = Array{}
self.tiles = Tiles()
Bus:subscribe(self, "requestinfo")
end
--- Seems not to be done
function Provider:init(default)
self.default = OP:display(default or "default")
self.items = {}
Bus:subscribe(self, "addTextToCrawl")
Bus:subscribe(self, "addItemsToCrawl")
end
This is hopeless. My assumption that only the GameRunner needed to resubscribe is flat wrong. The Floater breaks immediately. The Inventory, I think, will break on the second level. And so on.
Yesterday’s quick fix, despite a lot of in game testing, just doesn’t work. How I could have failed to notice that the crawl wasn’t running, I don’t know. I was just looking for ghosts, I guess.
Excuses or reasons aside, the system is down, second day in a row, with a new fatal defect today after yesterday’s late discovery of the ghosts, which were bad but not fatal. Today you can’t really play at all.
I’ve interfered with the canine again, worse even than before.
What can I do?
A Possible Fix
I see two ways to go. One is to create the scope notion, or at least the perm/temp notion, build it into the Bus, and make sure that everyone who is perm signs up as perm. That seems to me to be going to take a morning or two to do.
The other way is first, turn off the Bus recreation, and second, instead, do a special purge of the Bus that removes instances of dungeonObject from the subscription list. That, with any luck at all, should remove the ghosts and leave everything else running.
I think that’s our only hope for today. Let’s see about doing that function. Here’s a test:
_:test("flush all DungeonObjects, leave others", function()
local o1, o2
local d1, d2
o1 = Other()
o2 = Other()
d1 = DungObj()
d2 = DungObj()
local subs = Bus:allSubscribers()
_:expect(#subs).is(4)
Bus:flush()
subs = Bus:allSubscribers()
_:expect(#subs).is(2)
_:expect(subs).has(o1)
_:expect(subs).has(o2)
end)
That will require me to build Other, a local class which isn’t a dungeon object, and DungObj, a local class that is, plus two new methods on Bus, allSubscribers
and flush
.
Test and implement cyclically:
7: flush all DungeonObjects, leave others -- EventBus:73: attempt to call a nil value (global 'Other')
local Other = class()
function Other:init()
Bus:subscribe(self, "someMessage")
end
function Other:someMessage()
end
Then:
7: flush all DungeonObjects, leave others -- EventBus:84: attempt to call a nil value (global 'DungObj')
local DungObj = class(DungeonObject)
function DungObj:init()
Bus:subscribe(self, "someMessage")
end
function DungObj:someMessage()
end
And then:
7: flush all DungeonObjects, leave others -- EventBus:95: attempt to call a nil value (method 'allSubscribers')
This will be somewhat tricky. I would like to record each subscriber once only, even if they subscribe to more than one thing. Let’s enhance the objects to do that … they are part of the test, so I think it’s OK to do that …
local Other = class()
function Other:init()
Bus:subscribe(self, "someMessage")
Bus:subscribe(self, "anotherMessage")
end
function Other:someMessage()
end
function Other:anotherMessage()
end
local DungObj = class(DungeonObject)
function DungObj:init()
Bus:subscribe(self, "someMessage")
Bus:subscribe(self, "anotherMessage")
end
function DungObj:someMessage()
end
function DungObj:anotherMessage()
end
Now to do allSubscribers
:
function EventBus:allSubscribers()
local subs = {}
for event,sub in pairs(self.events) do
subs[sub] = true
end
local result = {}
for sub, _true in pairs(subs) do
table.insert(result,sub)
end
return result
end
I just slammed that in but it seems right. Test, expecting the test for 4 to work and the flush to fail.
7: flush all DungeonObjects, leave others --
Actual: 2, Expected: 4
7: flush all DungeonObjects, leave others --
EventBus:105: attempt to call a nil value (method 'flush')
No, I need a double loop:
function EventBus:allSubscribers()
local subs = {}
for event,subTable in pairs(self.events) do
for sub, _true in pairs(subTable) do
subs[sub] = true
end
end
local result = {}
for sub, _true in pairs(subs) do
table.insert(result,sub)
end
return result
end
Now then:
7: flush all DungeonObjects, leave others --
EventBus:105: attempt to call a nil value (method 'flush')
And we need flush. We just need to check each subscriber and if it is a DungeonObject nil it.
function EventBus:flush()
local subs = self:allSubscribers()
for _i,sub in ipairs(subs) do
if sub:is_a(DungeonObject) then
self:unsubscribeAll(sub)
end
end
end
I was going to loop directly over the tables, but deleting things while looping bugs me, so I did it as above. Test should run, if my luck is in.
It does run.
Now we can change how we do things in dcPrepare
, changing it from:
function DungeonBuilder:dcPrepare()
Bus = EventBus()
self.RUNNER:doSubscriptions()
TileLock = false
self:createTiles()
self:clearLevel()
DungeonContentsCollection()
MonsterPlayer(self.RUNNER)
Announcer:clearCache()
end
To:
function DungeonBuilder:dcPrepare()
Bus:flush()
TileLock = false
self:createTiles()
self:clearLevel()
DungeonContentsCollection()
MonsterPlayer(self.RUNNER)
Announcer:clearCache()
end
Let’s test. I expect all to be well.
I even went through the Learning Level to be sure no ghosts appeared. It pretty much fills the available space, so a ghost is sure to appear.
I think we have smashed this bug. Commit: Fixed fatal error causing crawl not to run, ?? not to work and so on.
Let’s sum up, with our tail between our legs a bit.
Summary
Well. Yesterday’s quick fix wasn’t a fix, and even though I ran the program extensively, I failed to notice tiny details like getting absolutely no crawl messages. The result was that yesterday’s release was worse than the one it was intended to fix.
I comfort myself that in a real situation, someone smarter than I am would also have run the program and asked me what the hell was wrong with me. On the other hand, if this were my personal game up on the App Store, this release wouldn’t have passed muster. Probably Apple would have refused it. I hope.
I think today’s fix is pretty solid. The things that were ghosting were all instances or subclasses of DungeonObject, and those are flushed. There are, I’m really quite sure, no DungeonObjects that survive between levels. (Arguably the NPC should, but she’s not even done yet.)
The flush, therefore leaves alone all the other subscribers that I noticed, Inventory, Floater, Provider, GameRunner, and so on. Hm … I just dumped the bus and noticed duplicate instances of Dungeon, and Buttons. I think those have been shown to be harmless. Fixing up Dungeon will be tricky, as it is created in GameRunner and then filled in by DungeonBuilder, which means that by the time we get to dcPrepare
we’ve already got two Dungeons in the Bus. Buttons … I don’t understand, because I’d expect them to be flushed … I wonder why they weren’t.
Let’s enhance flush to unsubscribe Buttons.
_:test("flush all DungeonObjects, leave others", function()
local o1, o2
local d1, d2
o1 = Other()
o2 = Other()
d1 = DungObj()
d2 = DungObj()
b1 = Button("x",0,0,10,10)
local subs = Bus:allSubscribers()
_:expect(#subs).is(5)
Bus:flush()
subs = Bus:allSubscribers()
_:expect(#subs).is(2)
_:expect(subs).has(o1)
_:expect(subs).has(o2)
end)
And …
function EventBus:flush()
local subs = self:allSubscribers()
for _i,sub in ipairs(subs) do
if sub:is_a(DungeonObject) or sub:is_a(Button) then
self:unsubscribeAll(sub)
end
end
end
Test. Everything seems solid, multiple levels work. We have two dungeons in EventBus but that has seemed to be harmless. We’ll recommit: Remove duplicate buttons from EventBus on flush.
Now where were we? Oh yes, we were about to beat me up.
This defect does make me sad, because it shows that I am not perfect, and even though I like for these articles to reflect the reality of being a human while programming, I don’t like it when things go badly wrong and I make them worse, and then don’t even recognize that there’s a problem.
If that never happens to you, well, please start writing books and articles. If it does, well, welcome me to your club. I do my best, and sometimes my best isn’t good enough. This is why we try to work in teams, why we try to have friends, and so on.
And … it’s why we think about things. At least I was thinking about this problem when I realized the system was broken, which made recognizing it easier, and left me with a sense of the object that allowed this quick fix.
I don’t consider the quick fix to be good enough for long term: it deserves to be improved at least as much as it did yesterday, with the single advantage of this one being that it nearly does what we want. Untangling that last bit with the Dungeon will be tricky. I expect that I’m up to it, and if not, there will be more laughing sessions coming up.
Stick with me, there’s more fun to be had!
-
DFWT: Don’t Interfere With This. HTBD: Here There Be Dragons. ↩