I’m wondering whether I need the complex publishing approach that I’m using in the Queue. Only one way to find out: reasoning fails me. (As always, I wind up somewhere else.)

And May the Fourth be with you.

As I was fending off a cat trying to wake me up this morning, I was thinking about the Queue and its relationship to the EventBus used in D2.

When the EventBus is asked to publish a message, it enters a loop sending it to all the subscribers. If one of those subscribers happened to publish a message, EventBus would immediately recur, publishing the second message, and then return to the outer invocation, completing the publication of the first. Contrariwise, the Queue appends sub-tweets to a queue, and completes each publication before undertaking the next.

So I was wondering, I was, whether the simpler style might work just fine. As far as I can recall my thinking, I just wanted the Queue to work that way. So I’d like to test the simpler case and see what, if anything, changes. In particular, I’m wondering if the tests will all run (unless there are tests that test the order of events.

Since the D3ngeon project is still just an experiment, I’ll just keep experimenting. I think I’ll replicate the Scenarios tab and define a new kind of Queue in it.

That much is easy:

local Queue = class()

function Queue:init()
    self.things = {}
    self.publishing = false
end

function Queue:publish(methodName,...)
    for i,thing in ipairs(self.things) do
        local method = thing[methodName]
        assert(method, "Receiver does not support method '"..methodName.."'.")
        method(thing, ...)
    end
end

Some tests are testing the order of things, so will need to be changed to reflect the new reality.

Using these classes:

local Sender = class()

function Sender:act()
    self.called = false
    Q:publish("hello", self)
    _:expect(self.called, "called too soon").is(false)
end

function Sender:greetings()
    table.insert(results,"sender greetings")
end

function Sender:hello()
    table.insert(results,"sender hello")
end

local Publisher = class()

function Publisher:greetings()
    table.insert(results,"publisher greetings")
end

function Publisher:hello()
    table.insert(results,"publisher hello")
    Q:publish("greetings")
end

I have this test:

        _:test("pub is complete before next pub", function()
            results = {}
            local sender = Sender()
            local publisher = Publisher()
            Q:subscribe(publisher)
            Q:subscribe(sender)
            local expect = {"publisher hello", "sender hello", "publisher greetings", "sender greetings"}
            sender:act()
            _:expect(Q:length()).is(0)
            for i,message in ipairs(expect) do
                print(message)
                _:expect(results[i]).is(message)
            end
        end)

Let’s see if we can reason our way to the right thing here. Nothing like a little mental exercise in the morning.

The name of the test needs a NOT. Done.

  1. Sender is told to act;
  2. Sender publishes “hello”;
  3. Publisher first on the list, sees “hello”;
  4. Publisher puts “publisher hello” into results;
  5. Publisher publishes “greetings”;
  6. Publisher, first on the list, sees “greetings”;
  7. Publisher puts “publisher greetings” into the table;
  8. Sender, second on list, sees “greetings”;
  9. Sender puts “sender greetings” into the table;
  10. “greetings” loop is finished; “hello” resumes;
  11. Sender, second on list, sees “hello”;
  12. Sender puts “sender hello” into the table.

Therefore, we should expect:

            local expect = {
                "publisher hello", 
                "publisher greetings", 
                "sender greetings", 
                "sender hello"}

Test. And the tests all run with the simpler Queue. Now let’s use the new Queue in the Movement tests:

function testMovement()
    
    _:describe("Movement", function()
        
        _:before(function()
            Q = GetQueue()
        end)
        
        _:after(function()
        end)
        
        _:ignore("Monster Moves", function()
            local c1 = Coordinates(10,10)
            local c2 = Coordinates(11,10)
            local monster = Monster()
            Q:subscribe(monster)
            monster:moveTo(c1)
            _:expect(monster:where()).is(c1)
            Q:publish("update")
            _:expect(monster:where()).is(c2)
        end)
        
        _:test("Monster Obstructed", function()
            local c1 = Coordinates(10,10)
            local c2 = Coordinates(11,10)
            local monster = Monster()
            Q:subscribe(monster)
            local obstacle = Obstacle()
            Q:subscribe(obstacle)
            obstacle:moveTo(c2)
            monster:moveTo(c1)
            _:expect(monster:where()).is(c1)
            Q:publish("update")
            _:expect(monster:where()).is(c1)
        end)
        
        _:test("Two Monsters Colliding", function()
            local c1 = Coordinates(10,10)
            local c2 = Coordinates(12,10)
            local c3 = Coordinates(11,9)
            local target = Coordinates(11,10)
            local m1 = Monster()
            Q:subscribe(m1)
            m1:moveTo(c1)
            local m2 = Monster()
            Q:subscribe(m2)
            m2:moveTo(c2)
            local m3 = Monster()
            Q:subscribe(m3)
            m3:moveTo(c3)
            Q:publish("update")
            _:expect(m1:where(),"m1 should have reverted").is(c1)
            _:expect(m2:where(),"m2 should have reverted").is(c2)
            _:expect(m3:where(),"m3 should not have reverted").is(target)
        end)
        
    end)
end

I implemented this function in the new QQscenarios tabL

        _:before(function()
            Q = GetQQueue()
        end)

Test and see whar she blows.

3: Two Monsters Colliding m1 should have reverted -- Actual: (11,10), Expected: (10,10)
3: Two Monsters Colliding m3 should not have reverted -- Actual: (11,9), Expected: (11,10)

That test should be named “Three Monsters Colliding”, by the way.

Let’s review the asserts:

            _:expect(m1:where(),"m1 should have reverted").is(c1)
            _:expect(m2:where(),"m2 should have reverted").is(c2)
            _:expect(m3:where(),"m3 should not have reverted").is(target)

What happened was that m1 did not revert, and m2 did. This isn’t surprising, because m1 will have grabbed the target and then m2 and m3 try and will be rebuffed by m1 in the recursive call.

The general impact is the same, only one gets to move, but it’s the first guy who wins rather than the last.

I change those tests to set a flag RunningQQ to true or false, depending on which version of Queue I am asking for:

            if RunningQQ then
                _:expect(m1:where(),"m1 should not have reverted").is(target)
                _:expect(m2:where(),"m2 should have reverted").is(c2)
                _:expect(m3:where(),"m3 should have reverted").is(c3)
            else
                _:expect(m1:where(),"m1 should have reverted").is(c1)
                _:expect(m2:where(),"m2 should have reverted").is(c2)
                _:expect(m3:where(),"m3 should not have reverted").is(target)
            end

These run appropriately. Commit: Two versions of Queue, one with deferred publication of sub-tweets, one that immediately recurs.

We must think about this.

Reflection

I have no specific recollection of why I wanted the new Queue to complete an upper call before doing any subcalls. The idea just pops onto the page during Queue implementation. One possible reason is this:

Suppose we wanted to have the Queue loop. For example, suppose we had a design where some starting message was published, and the last receiver object published that message again, to make the game run forever. If we put the messages into a Queue, as in our first implementation, the Queue would grow and then contract and then grow again. If we did things as in the second implementation, the final call to the top message would occur while the first loop was still running. We’re recur indefinitely, ultimately causing a stack overflow.

Or so it seems to me.

So one possible response to a recursive call is to complete each upper call before running a lower one, which would allow such a loop to be coded. That’s what yesterday’s implementation does.

Another response to the concern is “Don’t Do That!”.

As far as a mental model of the thing goes, if we think of the publish as a broadcast, and our receivers are spread out far enough, it is possible that nearby receivers will receive the message well before receivers that are further away. Mars gets our message before Saturn. Mars might therefore take action before Saturn even knows there is an issue. (In planetary situations, Mar’s reply can’t reach Saturn before the original message, unless there are relay stations involved. In our new QQ situation, replies always reach further planets before the original message. No analogy is perfect.)

What does all this mean to our larger concern, which is to evaluate this broadcasting scheme as a fundamental design principle for a dungeon program?

Well, one major point is that, assuming this simpler Queue can accommodate all our needs, we already have an equivalent capability in the D2 program. We wouldn’t need to add a new Queue thing, we could use the EventBus as it is. We might want to add the auto-subscribe that we’re contemplating here, but so far, we haven’t even put that into D3. And we allow the published message and the method called to differ in D2, while we don’t permit that in D3. Either program could be changed to do it the other way, so that’s not a critical factor.

The motivation for this idea came from a larger question, what would we do differently were we to start over, combined with a wild idea from Beth. The focus of a new system would be to keep things simpler, avoiding so many connections up and down the object hierarchy.

If we were actually going to do a new version of the program, I think we’d want to spike a few more ideas, such as the chest-opening protocol and combat, just to see how to do them. But we know that those things can be done, since we’ve already done them. What we do not know is whether there are interesting better ways to do them that rely on the Queue scheme.

And let me put this right on the table: I have absolutely no qualms about a mixed system where some messages reach our objects via publication, and others are sent directly. While it would be interesting to do everything via publication, as a pragmatic matter, I’d happily use both approaches if it seemed appropriate.

There is an interesting possibility with the newer version of our Queue, the one that recurs immediately on a second publication: Whenever an object publishes a message, all its receivers have seen the message by the time the publish returns, and because of the simpler form of the publication loop, we could easily accumulate results from the receivers, which means that we could publish a general question “whereIsThe Princess” and receive an answer immediately after the publication, rather than via a separate callback.

(I don’t think it is possible to do that with the first version of Queue, because sub-publication calls return immediately.)

Enough Musing, What’s the Bottom Line Here?

Well, impatient one, there is never a final bottom line, but right now my summary thought is that I’d like to see whether using the ideas I’ve gained in this small experiment would make D2 better.

Now, nearly 300 articles in, I’m tired of D2, though it continues to teach lessons. Still, I’d like to start something new, just for a change. But my strong belief, based on long experience, is that rewrites are rarely allowed, and rarely a good idea even if allowed. YMMV of course, and I’m sure many of us—even I—have a story of at least one successful rewrite. It’s just that I have more stories of unsuccessful ones.

That said, it’s certainly tempting to solve, or solve again, some of the many problems between this spike and a real dungeon program. The fact that this new scheme quickly bottoms out on Coordinates means that we’d have a rather different set of interactions from those implied by Dungeon having Tiles which have a Map which has some kind of coordinate magic going on.

Pushing the D3 experiment further would likely provide ideas for simplification of D2.

An analogy might be an R&D department in an auto manufacturing company. R&D comes up with a new battery that is lighter and holds more power than the ones currently used in the company’s EVs. This will allow cars to go further on a charge, and will reduce overall weight of the vehicle.

It is to the advantage of the company to pull advances like this into manufacturing as soon as possible, because it will increase the value of the product. This fact encourages R&D to come up with new technology that is easy to put in place in the existing process, so that R&D becomes more directed and manufacturing produces more and more advanced vehicles. (Meanwhile the people writing the maintenance manuals are going mad. No analogy is perfect.)

What does this mean for our Dungeon company? It might mean that we should keep the D3 research effort going, and begin as rapidly as possible to use its learning in the company’s existing product D2. A rewrite of the product would take ages and many dollars. But if we could enhance it forever … profit!

I wish I could think of a way to profit(!) from what I’m doing here. But I digress.

So, current bottom line, I want to keep D3 going as a learning and testing platform, and I want to explore how to move the ideas that come from D3 back into D2, to keep it alive by refactoring toward better, simpler design than it has at present.

On the R&D side, I’d like to see how the drawing might be better separated from the logic of game objects interacting in a coordinate space, which could lead to a much cleaner separation of game logic from drawing code.

We might even discover that things aren’t as bad as I feel they are. Most of my concerns come from the difficulty of setting up tests, which seems to involve GameRunners and Dungeons and Tiles and FooFaRaws and Woozles, just to test some simple thing.

On the other hand, probably things are as bad as I feel they are, and we can only benefit from having a clear notion of a better design.

Next Steps

I think one next step will be to find some query or operation in D2 that refers upward to Dungeon or GameRunner, and see if we can improve it with a pub-sub approach. We might be able to disconnect some object linkages.

For example, Monster and Player have a pointer to the GameRunner. Might we be able to remove that connection using pub-sub? If so, couldn’t we then make testing those classes easier by simply providing a suitable TestingEventBus or the like?

We’ll see. I think the general scheme will be to use D3 as a laboratory for ideas that can help us improve the design of D2. That’s not as glamorous as rewriting D2. But consider why rewrites are a problem.

Suppose we spawn off two parallel efforts. One is to produce the fantastic new D3 product, better than D2 in design, supporting all its capabilities and able to support even more wonderful things.

The other effort has the drudge job of maintaining D2, trying to keep it alive to serve as the cash cow that funds D3. However, the D2 team doesn’t want to lose. They want to win. So they watch what D3 does, and they steal ideas from D3 and plug them into D2 when it makes sense. And they implement new features. They aren’t as fast at new features as we plan to be, but the D3 team keeps getting those new features added to their queue. They have to do everything D2 can do. It’s a drag, man.

If I had to choose a team to be on, I think I’d choose D2. If I had to choose a team to invest in, I’d almost certainly choose D2. Our motto would be “D3 Delenda Est”, and we would end all our stand-ups by shouting it.

Summary

I started this exercise with a simple question about what would happen if I used a simpler scheme in D3’s publication Queue. The answer was, roughly, “it would be OK either way”.

But that experiment made me think more clearly about something I already “knew”, namely that D2’s EventBus could certainly do the broadcasting thing that D3 has.

That led me to think about improving D2 using this brand new (well, sort of brand new) idea from D3. That led me to think of treating D3, not as an alternative or competitor to D2, and not as a replacement, but instead as a source of ideas and better understanding of what D2 needs and how we might get it.

I think this is an example of why we do better when we don’t get all worked up about “extra work”< or about trying things and throwing them away, or about implementing a simple solution today and improving it tomorrow, or even about implementing a complex solution today and simplifying it tomorrow.

When we write software, we are exploring a complicated web of paths and tunnels that we are creating as we go. We discover interesting things as we travel, and we incorporate those things into our understanding and into our code. We can’t possibly plan the program details in advance, because we can’t possibly know what we’ll encounter when we begin to do the work.

Kent Beck used to say “Let the code participate in your design sessions”. He may say it now, for all I know. Whether he does or not, this is the way.

GeePaw Hill writes about Rework Avoidance Theory to much the same purpose. Changing things is what we do. We change our plans, our designs, our tests, our code, from day one on.

Plan all the time, design all the time, test all the time, code all the time. Things go better that way.