This Time for Sure?

The Repo on GitHub

Still on the quest of separating server things from client things, in small steps. I think we’re down to just one issue, the item a Bot might be holding. We discuss refactoring vs rewriting.

Refactor or Rewrite?

It’s the small steps that are important here. We can take it as given that we “waited too long” to set up for client-server, and the result was that our design was not really suited for that mode. Our World (server) and Bot (client) shared various kinds of information, and the World was poking information into the Bot, which, across a network, is probably right out, unless one really wanted to wait a long time for things to happen.

We’ve been working to resolve this situation for a handful or two of sessions, each one improving the situation, creating a connection object, making a new object to represent entities on the World side, creating a dictionary to pass across the connection, and so on. Along the way, we changed many things, and in the sessions that weren’t entirely experimental or entirely debacular¹, we typically did many small changes, and we kept the tests and game all running.²

It follows that we could, if the team were larger, have been adding features to the game at the same time as the client-server adjustments were taking place. I think that would have required a bit of coordination and planning, since we wouldn’t want to be adding cross-connection ideas while the people across the desk were trying to remove them, but we could have interleaved new features and client-server work in whatever proportions we wished.

What does that tell us about anything other than this toy program? Well, what it tells me is that it is very likely that design mistakes, even serious ones, can typically be resolved by refactoring rather than rewriting. It tells me is that it is very likely that large changes of product focus can typically be resolved by refactoring rather than rewriting. And, since I think that rewriting a product is a very risky venture, I think it means that when substantial changes are needed on our product, we owe it to ourselves and the enterprise to very seriously consider refactoring the existing program to meet the new needs rather than rewrite it.

Of course, we often want to rewrite it, and there’s no way to know which way would really be best. My personal experience with rewrites has been pretty poor, and if faced with a product that needed even major design changes, I would seriously consider refactoring. And if you were to take two good teams and have one refactor the old product and the other start writing a new one, I’d put my money on the refactoring team.

YMMV, of course.

Let’s Get to Work

I believe and fondly hope that the one remaining cross-connection issue is in the pick-up and drop-off logic. I believe that we are sending a NewEntity instance to the Bot. I believe we have a test that proves that.

    def test_connection_does_not_hold_world_entity(self):
        world = World(10, 10)
        bot_id = world.add_bot(5, 5)
        world_bot = world.map.at_id(bot_id)
        world.add_block(6, 5)
        world.take_forward(world_bot)
        assert world_bot.holding
        info = world.fetch(bot_id)
        assert not isinstance(info['held_entity'], WorldEntity)

This test inspects the dictionary that the world creates to be sent to the client, and, there, sure enough is an instance of WorldEntity, which as one might guess from the clever name, is a world-side object.

Our mission is to pass something that will serve the needs of the client side, but not something that is a world-side object. Something that can be serialized down to something reasonable.

My tentative plan at this moment is to do something too simple. It does seem likely that as this product grows, we’ll be passing more and more detailed information over to the client. Even now, the Blocks have an aroma, and surely at some point we will allow our bots to pick up more than one kind of thing, and they’ll want to know what those things are, and perhaps know other details about them.

It is therefore very tempting to design some kind of general object description that we can use in place of this WorldEntity, or even to pass through the entire as_dictionary part of the held entity, as we currently pass the as_dictionary of the bot. But that seems to me to be problematical, possibly leading to a dictionary holding a dictionary holding a dictionary ad infinitum, and just no, let’s not create a new problem while solving this one.

So my tentative plan is to pass something trivial, perhaps just a boolean or id or a simple string defining what’s held.

To find out whether that’s feasible, let’s look at what the Bot object does with whatever it’s holding. I think it does almost nothing.

class Bot:
    @property
    def holding(self):
        return self._knowledge.holding

Senders of that include …

class Bot:
    def perform_actions(self, actions, connection):
        for action in actions:
            match action:
                case 'take':
                    connection.take(self)
                case 'drop':
                    connection.drop(self, self.holding)
                ...

There are some tests that might be troubled if we change that value. Depending on decisions I haven’t made yet, I’ll either look at them or just see if they break. It seems to me that the state machine knows if we are holding things but I don’t know how it works, and my search for access to holding didn’t turn them up.

class Looking:
    def action(self, knowledge):
        if knowledge.can_take:
            return ['take']
        return []

    def update(self, knowledge):
        if knowledge.has_block:
            knowledge.use_energy()
            return Walking()
        else:
            return Looking()

Right, we refer to the Knowledge in the state machine, not the Bot. Let’s see who accesses it that way.

class Knowledge:
    @property
    def has_block(self):
        return self._held_entity and self._held_entity.name == 'B'

Since there is no other possibility than a Block, I think we could remove that and clause and everything would still work.

A quick check assures me that yes, everything will still work.

What if we were to pass across either 0 or 1, depending on whether the world-side entity is holding something? Isn’t there a check in drop-forward, though? I’m going to try this and see where it leads. If, as I suspect, it leads somewhere better, we’ll stay there. If not, we’ll back out.

class Knowledge:
    def as_dictionary(self):
        return {'direction': self._direction,
                'held_entity': self._held_entity,
                'id': self._id,
                'location': self._location,
                'scent': self._scent,
                'vision': self._vision}

I propose to change the held_entity to be 0 or the id of the entity. I think this may break the drop logic for a moment. We’ll see.

class Knowledge:
    def as_dictionary(self):
        return {'direction': self._direction,
                'held_entity': 0 if not self._held_entity else self._held_entity.id,
                'id': self._id,
                'location': self._location,
                'scent': self._scent,
                'vision': self._vision}

Our test for this feature still fails. What’s up with that?

A bit of study tells me that world.fetch doesn’t quite do what I thought it did:

class World:
    def fetch(self, entity_id):
        return self.map.at_id(entity_id).as_dictionary()

class WorldEntity:
    def as_dictionary(self):
        return self._dict

Well. We don’t use Knowledge here. We need that translation code here in WorldEntity.

class WorldEntity:
    def as_dictionary(self):
        held = 0 if not self.holding else self.holding.id
        return {'direction': self.direction,
                'held_entity': held,
                'eid': self.id,
                'location': self.location,
                'scent': self.scent,
                'vision': self.vision}

Note that I had to say eid, not id. We’re trying, not very hard, to move away from the word id, because it shadows a Python function.

One test fails:

    def test_set_and_fetch(self):
        entity = WorldEntity.bot(0, 0, Direction.EAST)
        assert entity.direction == Direction.EAST
        entity.location = Location(6, 4)
        assert entity.location == Location(6,4)
        block = WorldEntity.block(9, 9, 0)
        entity.receive(block)
        assert entity.holding == block
        entity.scent = 37
        assert entity.scent == 37
        entity.vision = []
        assert entity.vision == []
        assert self.is_valid(entity)
        assert entity.as_dictionary() is entity._dict

That final assertion is just not appropriate any more. We don’t want to send the whole dictionary across. I’m not sure we need this test at all, but for now, let’s write it out longhand, to see how awful it is.

    def test_set_and_fetch(self):
        # not much of a test really
        entity = WorldEntity.bot(0, 0, Direction.EAST)
        assert entity.direction == Direction.EAST
        entity.location = Location(6, 4)
        assert entity.location == Location(6,4)
        block = WorldEntity.block(9, 9, 0)
        entity.receive(block)
        assert entity.holding == block
        entity.scent = 37
        assert entity.scent == 37
        entity.vision = []
        assert entity.vision == []
        assert self.is_valid(entity)
        fetched = entity.as_dictionary()
        assert fetched['eid'] == entity.id
        assert fetched['direction'] == Direction.EAST
        assert fetched['held_entity'] == block.id
        assert fetched['location'] == Location(6, 4)
        assert fetched['scent'] == 37
        assert fetched['vision'] == []

Well, there it is. It’s what I wanted, and all the tests are green. Test the game just to be sure it works. I really expected some trouble with dropping.

Yes, the game crashes trying to do a drop. But why don’t we have a test that fails on drop?

The change needed is in DirectConnection:

class Bot:
            ...
                case 'drop':
                    connection.drop(self, self.holding)

class DirectConnection:
    def drop(self, client_bot, block):
        self.world.command('drop', client_bot.id, block.id)
        self.update_client(client_bot)

Since we’re passing the entity id into holding, that’s what’s sent from Bot, so we don’t need to dereference in drop.

class DirectConnection:
    def drop(self, client_bot, block_id):
        self.world.command('drop', client_bot.id, block_id)
        self.update_client(client_bot)

And that breaks a test.

    def test_drop_block_on_open_cell(self):
        world = World(10, 10)
        connection = DirectConnection(world)
        client_bot = connection.add_bot(5, 5)
        bot_id = client_bot.id
        world_bot = world.map.at_id(bot_id)
        block_id = world.add_block(1, 9)
        client_block = Block(1, 9)
        client_block.id = block_id
        world_block = world.map.at_id(block_id)
        world_bot.receive(world_block)
        assert len(world.map.contents.keys()) == 2
        connection.drop(client_bot, client_block)
        assert len(world.map.contents.keys()) == 2
        assert not world_bot.has(world_block)
        assert not world.is_empty(Location(6, 5))

Th connection line should say client_block.id and then it passes. And the game works.

I’ve made a note to dig into why no test broke when DC wasn’t in sync with held_entity. I suspect we have no drop tests that go through DirectConnection. They probably are “legacy”. Let’s look for one. It’d be nice to fix it now.

    def command(self, action, bot_id, parameter=None):
        world_bot = self.map.at_id(bot_id)
        if action == 'step':
            self.step(world_bot)
        elif action == 'take':
            self.take_forward(world_bot)
        elif action == 'drop':
            block = self.map.at_id(parameter)
            self.drop_forward(world_bot, block)
        elif action == 'turn':
            self.set_direction(world_bot, parameter)
        else:
            raise Exception('Unknown command')

    def drop_forward(self, bot, entity):
        if self.map.place_at(entity, bot.forward_location()):
            bot.remove(entity)

class Map:
    def place_at(self, entity, drop_location):
        if self.location_is_open(drop_location):
            entity.location = drop_location
            self.place(entity)
            return True
        else:
            return False

I think that what will have happened here is that no one has checked to see what is being placed at the location. I think that if we did check, we’d have found an integer. I’ll write a test that should find the issue we just fixed. It’s going to be messy:

    def test_bot_drops_and_world_receives(self):
        world = World(10, 10)
        connection = DirectConnection(world)
        client_bot = connection.add_bot(5, 5)
        world.add_block(6, 5)
        block = world.map.at_xy(6, 5)
        assert isinstance(block, WorldEntity)
        connection.take(client_bot)
        assert client_bot.has_block()
        assert not world.map.at_xy(6, 5)
        connection.drop(client_bot, client_bot.holding)
        test_block = world.map.at_xy(6, 5)
        assert isinstance(test_block, WorldEntity)

This one has to be a bit messy, because it is working both client side and server side. It has a client bot and a world instance, and it is checking integrity between them. It does fail when the DirectConnect was fetching the id and shouldn’t have been, so I think it is legitimately the sort of test that was missing when the game failed and the tests ran earlier today.

That doesn’t make me love it any better, though. And I noticed that some of the related tests are weak, arguably flat wrong. For example:

    def test_bot_cannot_drop_off_world_west(self):
        world = World(10, 10)
        bot = DirectConnection(world).add_bot(0, 5)
        block = Block(4, 4)
        bot.receive(block)
        bot.direction = Direction.WEST
        world.drop_forward(bot, block)
        assert bot.has(block), 'drop should not happen'

We shouldn’t give the Bot a Block: it’s supposed to receive an id now. Here’s a legitimate rewrite for that one.

    def test_bot_cannot_drop_off_world_west(self):
        world = World(10, 10)
        connection = DirectConnection(world)
        client_bot = connection.add_bot(0, 5)
        world.add_block(1 , 5)
        connection.take(client_bot)
        assert client_bot.has_block()
        connection.set_direction(client_bot, 'WEST')
        connection.drop(client_bot, client_bot.holding)
        assert client_bot.has_block(), 'drop should not happen'

That’s green also. I think we’re good. Commit: We send an integer id to held_entity to avoid sending a server item across the wires.

Retrospective

Well, it was a small change, but because it was a change to the interface, we had to change both sides and the connection. One would like to avoid doing that as often as one could. And we were fortunate in that we tested the game, because the connection wasn’t quite right: I had changed the wrong as_dictionary method.

That said, with this commit, I am about 95% confident that there are no world objects going to the client side, and no client objects coming to the server side … in the operational code.

The tests are a different story entirely, and as the product goes forward, we need to address that. The test that I just changed above was stuffing a Block instance through Bot.receive, rather than an id. There are a number of those. We intend that that method will only receive an integer id for now. And here, Python’s duck typing has not served us well. Since the receive method just tucks whatever it receives away in a dictionary, we could pass it literally anything.

Should we use type hinting?

I’ve consciously made the decision not to check types and raise exceptions, since they would cause the server (or client) to crash, and we definitely don’t want that. So I want to rely on our tests to ensure that everything is as it should be. In a compile-time type-checking language, we’d have had to declare the type of the parameter to receive, and we’d get compile errors from any code that does it wrong. Here, we are more on our own.

One thing we could and probably should experiment with is to declare more types in Python, which it does support, and then perhaps even running a type-checking linter over the code, if PyCharm’s type checking isn’t strong enough.

Here’s a short experiment:

    def receive(self, entity_id: int):
        self._knowledge.receive(entity_id)

Here we’ve declared the parameter to be int. (Someday it might be a real class but it’s an int now.) Mow in this test:

    def test_bot_cannot_drop_off_world_north(self):
        world = World(10, 10)
        bot: Bot = DirectConnection(world).add_bot(5, 0)
        block = Block(4, 4)
        bot.receive(block)
        bot.direction = Direction.NORTH
        world.drop_forward(bot, block)
        assert bot.has(block), 'drop should not happen'

We declared the bot to be of type Bot, and now the receive is squiggled by PyCharm:

type check indication

PyCharm has “Inspect Code” and similar tools, and that one shows, among other issues, “incorrect type”, highlighting that statement. So … it seems that it would be a valuable practice to begin declaring more type hints, and checking the inspections to see what they are turning up.

Certainly in a large python application, type declarations and checking would be very wise. They might be a very good tool here, especially for checking between server and client. I’ve made a note of that and will work on it as the fancy strikes.

Summary

I think that we probably have no one-side objects crossing the wire to the other side, though I would not bet my BMW on it. And I’m sure that we have tests, like the one above, that work but really shouldn’t.

And we have an issue coming up in the future, around what the client bots might be holding. But, for now, all they really need to know is that they are holding something. Do we use the id properly, checking to be sure we’ve been given the id of the thing to drop? I don’t think I would bet even a C-note on that one, but it’s in my notes.

With the separation more strongly in place, we can see that some, perhaps many of our tests that try to test round trip behavior need improvement. They do things like the one fixed above, and lots of it.

But all this is to be expected! We intentionally write code that was not focused on client-server, and allowed world-side and client-side objects to reference each other. We intentionally built up “legacy code” so that we could experience what we’re experiencing now. Slow, tedious work to resolve the issues. But it’s slow, tedious work that can be done in small steps, keeping the system running, improving the tests and code, making it better.

And that is the work: making code better by improving it and its tests, in small steps, without breaking things.

That said … it’s not the most fun I ever had. But we could have gone back to fun at any point along the past ten days, could go back to fun tomorrow. Or we could continue to improve tests and such.

I think what we’ll do is work on most of the test improvement outside of these articles, unless something really interesting arises. We’ll try some improved type hinting and report on that, because I expect that we’ll find it useful. And, because that’s the way I am, I’ll probably create a “user requirement” that breaks today’s trick, requiring more detailed information to get to the client Bot about what it is holding.

And, education and energy permitting … we’ll probably even make a real client-server hookup. I’m trying to recruit someone to suffer with me on that one.

Next time? I honestly don’t know. Join me to find out!

No, it’s not a word. But it sure deserves to be one, doesn’t it? ↩
There were a few mistakes that briefly broke the game along the way, but they don’t change the essential truth of the story. We can refactor and keep the system running. And yes, we do have to be careful. ↩