Wastage?

The present state of the code leads me to think about its evolution, and in particular, whether evolving it as I did led to substantial ‘wastage’ in code or tests. The answer may not surprise you!¹

When we started this exercise, I very consciously chose not to concern myself with the client-server aspect of things. Even though that’s clearly a major architectural characteristic of what we need, I wanted to find out what would happen if we deferred doing it. Would it be possible to change the program’s architecture so radically after it had been written without that separation built in?

The answer has become evident: it was possible. It was also a bit tedious, and it has taken me at least a dozen of my article-writing sessions, nominally two hours each, to do it. So probably less than a week of work, spread over the period from October 24th to November 8th, today. And we have not actually set up to run over a network connection yet.

To me, it was always clearly possible. I never had any doubt that we could convert it to client-server. It was, however, a bit more tedious than I had anticipated, but everything always is.

The whole effort started back in mid August, with a viewer showing a robot running around released on August 22. Between then and late October, programming focused on adding capability to the Bots, and corresponding capability to the World. For most of that time, the architecture of the game was simple: when a bot wanted to do something, it called a method on World, World mediated the message and returned the Bot’s new state.

That’s what I would expect for a simple old-school video game, such as one where you had come keyboard controls or something, and could drive your Bot around picking things up, and such. It was not at all what one would need for client-server.

I tell a lie. Actually, it was a lot like what you’d need if you had a high-speed connection between Bot and World, such that each Bot could send a single command and receive the result rapidly enough. If each bot were being driven by keyboard commands by a human, it would probably be fine across the Internet. But if a single client could field a large number of Bots making decisions at computer speed, it seemed clear that commands needed to be batched up, processed as a group, and results sent back.

So, we just made batches work yesterday. Yesterday’s final test looks like this:

    def test_new_syntax_for_raw_collections(self):
        WorldEntity.next_id = 100
        world = World(10, 10)
        block_id = world.add_block(6, 5)  # 101
        batch_in = RequestBuilder() \
            .request(world.add_bot(5, 5, direction=Direction.EAST)) \
            .action('take', None) \
            .action('turn', 'SOUTH') \
            .action('step', None) \
            .action('step', None) \
            .action('drop', block_id) \
            .request(world.add_bot(7, 7)) \
            .action('step') \
            .result()
        assert len(batch_in) == 2
        rq_0 = batch_in[0]
        assert rq_0['entity'] == 102
        assert len(rq_0['actions']) == 5
        batch_out = world.process(batch_in)
        result = batch_out[0]
        assert result['location'] == Location(5, 7)
        result = batch_out[1]
        assert result['location'] == Location(8, 7)
        item = world.map.at_xy(5, 8)
        assert item.id == block_id

That one test operates two bots in the world, giving one of them a lot to do and one of them just a little, and it tests that the world does all the things correctly. It only tests the World side of things. If we wanted a single round-trip test for World, this is it.

Smaller Tests are Better?

In our infrequent pairing on this program, or when I’ve shown it on our FGNO Zoom, GeePaw Hill has often commented on the tests. He prefers tests that are much more “micro” than the tests we have in this program. Our tests are generally more like “story tests”, where we set up a World and a Bot and some Blocks in the World and then we tell the Bot to do_something and then we test to see whether the World and Bot reflect what we expect.

    def test_wandering(self):
        world = World(10, 10)
        connection = DirectConnection(world)
        client_bot = connection.add_bot(5, 5)
        client_bot.do_something(connection)
        loc = client_bot.location
        assert loc != Location(5, 5)

So Hill’s not wrong. That’s nearly the simplest Bot test we have. It is testing whether the Bot, upon being asked to do something, will take a step. The test has surely evolved over time: when it was first written, DirectConnection didn’t even exist. But it was always a full round trip. It’s a story test, not a microtest.

The code back then would have started in do_something and then executed code that decided what to do. In early days, all the Bot could do was wander randomly. And, in those days, somewhere in the code, at the end of what would one day become a decision process mediated by a state machine, there was the code self.world.step(), or something very like that.

If we have two cooperating objects, where one directly calls the other like that, I can only think of two reasonable ways to test whether that step operation happens when it should:

Set up the objects, run the code, check the resulting state;
Set up the Bot with a fake world, run the code, and see whether the Bot calls the fake.

There may be other reasonable ways, but those are the only two that tend to come to my mind. I dislike fake objects, especially of the kind we’d need there. It seems to me that we learn more from the #1-style test, because it tells us that the Bot did what it should, and also that the World did what it should.

Failures Aren’t Precise

There is a strong argument for the second kind of test, however. When it fails, we know that the Bot didn’t do the step. In contrast, when my test fails, we don’t know whether the Bot failed to step or the World failed to process the step properly. We have more work to do, because our story test can fail in many ways, rather than pointing directly to the problem.

So GeePaw has a legitimate concern about the kind of tests that I² wrote while developing this program: the tests do not discriminate as well as we can imagine. They are more indirect, and their failures are quite often much harder to interpret.

Every test in the file ‘test_bot.py’ is a story test, setting up a World and exercising Bot and World together. That is, unquestionably, somewhat wasteful. The tests surely take longer to write, and if they fail, it’s likely that we can’t be sure what broke.

Principle vs Practice

Well, in principle, that’s true. In practice, the tests run every time I stop typing. So if the step story test suddenly fails, I don’t have to look at the test and figure out what happened from scratch. Whatever I just typed broke the test, so I know exactly where to look to find the problem.

I would argue that once they are in place, these tests serve nearly as well as more pointed tests would, not because they point clearly to the problem, but because, usually, I don’t need a pointer to the problem via the test, because I just typed the problem into the code.

But they are more difficult to write, and we see from the test above that changes to how the code works, such as the addition of the DirectConnection, require us to modify the tests. That is necessary, but it is surely wasted effort nonetheless. More pointed tests, more “micro” tests, would very likely not require updating so often.

Mind you, there can be decent microtests for World.

    def test_bot_cannot_drop_off_world_west(self):
        self.check_cannot_drop_off_world(0, 5, Direction.WEST)

    def check_cannot_drop_off_world(self, x, y, direction):
        world = World(10, 10)
        bot_id = world.add_bot(x, y, direction)
        bot = world.entity_from_id(bot_id)
        block = WorldEntity.block(4, 4)
        bot.receive(block)
        world.drop_forward(bot, block)
        assert bot.has(block), 'drop should not happen'

This is about as close as we can get to a test that the bot cannot drop a block off the edge of the world. It is parameterized so that we can check all four sides of the world, but beyond that, we need a bot with a block in the world, and when it tried to drop, it shouldn’t happen.

I wonder if there is a simpler test. Let’s look at drop_forward:

class World:
    def drop_forward(self, bot, entity):
        if self.map.place_at(entity, bot.forward_location()):
            bot.remove(entity)

class Map:
    def place_at(self, entity, drop_location):
        if self.location_is_open(drop_location):
            entity.location = drop_location
            self.place(entity)
            return True
        else:
            return False

    def location_is_open(self, location: Location):
        return self.is_within_map(location) and not self.is_occupied(location)

    def is_within_map(self, location):
        return 0 <= location.x <= self.width and 0 <= location.y <= self.height

    def is_occupied(self, location):
        return self.at_xy(location.x, location.y)

Well, we could certainly have microtests for location_is_open and is_within_map and is_occupied. And we could write a small test for place_at. Those would all be rather nice tests of the Map.

But even with detailed tests of Map in place, I think I’d still want the story test at the World level. And, by my lights, with that test, I don’t need the Map ones.

Individual Differences

There are those who will tell us that there must be a test for every method of every object. Life’s too short.

There are those who will tell us that there must be a test for every bit of conditional logic. These people are not mistaken, but if they interpret that to mean an individual test for each individual bit of conditional logic, I think they’re asking for too much.

I’ve been present during a lot of my work, nearly all of it in fact, and a fair amount of GeePaw Hill’s, and I have noticed that his way of working is rather different from mine. I’ll try to be fair to us both.

GeePaw, in his own work, seems to have a design in mind, at least near where he is working. He keeps his objects well separated, generally with carefully defined types³, and tests against each class.

In my work, I have intentionally developed a kind of “finding my way” approach to building a system, starting with very simple ideas, evolving them, refactoring the code as it seems to need it, to keep the design “good enough”.

GeePaw Style as I see it

The impact of this is that when GeePaw is about to do something new, he is quite likely to have a step in mind that involves just one object, one method, one change to that method. Whatever he’s doing will probably require lots of steps, but they will tend to involve just one object, method, and change at a time.

The result of that style is that he builds microtests, and the system grows with a generally quite decent structure in place. That’s not to say that he doesn’t refactor: he does. It’s not to say that he doesn’t evolve his design: he does. And still, his individual steps tend always to affect just a single class and method and to allow for microtests.

Ron Style as I see it

In contrast, I do not always work with a good, carefully separated set of objects. Instead, I create a few classes that seem to me to stand about where they should and then helper classes, like the Map we see above, are discovered because the code does map-like things and at some point I extract methods, then classes, from the mix. My code starts in pretty good order, then deviates from what I’d call “good” until I notice an opportunity, and then we bring it back to something better.

One inevitable result of that is that a test written against an early state of the code will become more and more like a “story test” as the code evolves. Some people argue, quite eloquently, that when we pull out a new class, we should move tests or create tests directly for that class. I do not always do that. Commonly, I’ll have a few tests of the class, but since it is used by other classes, and since their operation depends on it, I let the calling class’s correct execution serve as a test for its subordinate or helper classes as well.

Is this good? Bad? Just weird?

Is that a good thing? Would it be better to do otherwise? Well, if I thought it was better for me, I’d do it. If I hear that it is better because it says in some book to do it, I might try it, but by and large, I’m going to program in the way that I find best.

What happens when we do things wrong?
My purpose is to find out!

My incremental style is important to the overall message⁴ of my work here. And that style tends to lead to tests and code that are not as crisp as I might get with more design up front, more focus on tiny objects appearing early on, more focus on microtests, and so on. But my mission here is to do what I do, live with it, and tell you about it.

But was there serious wastage, or not?

Overall, the tests have served their dual purposes: they have helped me write the code initially, and they have flagged most⁵ of the mistakes I’ve made right away. They have not required a lot of modification when things changed, and the modifications have almost always been nearly trivial. Unfortunately, though, not so trivial that PyCharm could do them automatically.

I do think that it would be possible to do this project with a clear separation between World and Bot from the beginning, sending an information packet back and forth rather than direct method calls⁶. If we had done that, then a World test might consist of creating a packet, sending it to world, and observing the packet that came back, or, perhaps, observing the World’s internal state, since it does maintain a complicated state. And a Bot test might consist of providing a world state to the Bot, calling some method, and observing what request packet it created.

I’m not sure that would really have been better. Some of the tests would have been more direct, but creating and analyzing the packets is harder than analyzing the object’s result state, because the objects understand messages, and the information packets are just highly structured dictionaries. Yes, we could give them object wrappers⁷, but that’s work we have not had to do with the current scheme.

Summary

My best estimate is that our tests would have been different had we considered client-server sooner, but that they would not have been notably simpler to write. Today our tests involve both a Bot and World and run round trip and check end state. In a client-server mode, we’d prime the World, send it a packet and then check the packet. About the same amount of work … and we haven’t as yet checked that the Bot understands the result and uses it accordingly.

So, my best guess is that we have not wasted very much testing time, space, energy or little grey cells because of not doing client-server earlier. But what about overall development time?

It took less than a work week to convert from direct calls to packet-based calls. That conversion would not have been required had we started right out thinking client-server, though it was not difficult nor terribly time-consuming.

However, if we had done it earlier, would the work of inventing and evolving the many extensions to Bot and World have proceeded so smoothly? Perhaps not, since each new change likely required, not just changes to Bot and World, but changes to the input packet and output packet.

Would starting with client-server have been faster, or slower than what we experienced? My intuition is that it would not have been very different had we done so.

We don’t even care!

For our purposes here, however, we don’t even care! The lesson we set out to learn was “what would happen if you wrote a regular little video game and they suddenly told you it had to be client-server?” And what we found out was that we could make that change without rewriting the whole program, in small steps, keeping everything working.

If it’s true that we can refactor to support even significant changes, we can keep delivering value smoothly, rather than taking a year out for a rewrite that may never work at all. And that, my friends, I consider to be an important result.

See you next time!

A sort of passive-aggressive kind of click-bait? You decide. ↩
It is my practice to say “we” did things when I’m trying to bring you into the situation and the code, sharing the discoveries and reasoning. It is my practice to say “I” when I’m describing a mistake. I don’t always manage to do that, but I do know who’s making the mistakes around here, and it isn’t you. ↩
GeePaw tends to prefer languages with strict compile-time typing, and uses types quite adeptly. ↩
Well, mostly I do this to entertain myself and keep the little grey cells active. But I do have a fundamental notion in play, which is that most of the code we encounter will not be anywhere near perfectly crafted, and that we can always improve it to the point where we can live comfortably with it, if we choose to do so. So messy code and tests are important to that message. And it’s terribly convenient that I am pretty good at creating messy code and tests! ↩
Exceptions have occurred. There have been times when only running the game has turned up a defect. There has definitely been a crack among the tests, allowing certain defects to occur. I’ve enhanced the tests often when that happens, but probably not every time. I don’t think a different style of testing would have changed that. I think there’s some kind of operation that the game does that was untested, and I’m not sure that it’s fully tested now. ↩
And, in fact, at one point in this program, the Bot would call the World and the World would call back to the Bot before returning. Dangerous? Weird? Made sense to me … ↩
We might still do that, as part of bullet-proofing the message structures. ↩