Some thoughts on choosing small steps. I think it’s always possible. Is it always the right way to go? I issue a challenge to myself.

Yesterday, I used up almost all my time with design thinking, trying to find a place to start. It would have gone much faster had I not written down so much of it. I also suspect that I might not have made such a good decision, because slowing down is good. Slow is smooth, smooth is fast, they say.

We had a list of five big things that we might do on the road to client-server: build something representing a batch of commands; build something that needs a batch, like a Cohort; figure out a simpler message format; apply that format using JSON; actually set up a client-server connection in Robot World.

Some of those obviously precede others. No point setting up a client-server connection without our batch of messages in an agreed-upon form.

Ultimately, we chose to build a trivial Cohort object. The idea of the Cohort is that it will contain all the Bot instances for a client, and that it will receive a message do_something, and will forward that to all its Bots, and will batch up all their commands and return it to the caller, to be shipped to the World and executed. And the Cohort will later receive the results of that batch and distribute them to the Bot instances.

Almost all our development so far has centered on a single Bot. The World, DirectConnection, and Bot all operate one Bot at a time. The Game runs several Bots, but really just builds a list of them and ticks through it manually, nothing you could call part of the real system.

In my browsing yesterday, I rediscovered the rather nice experiments I did toward a kind of “command builder” object, which could allow for a very nice syntax for issuing commands. The test example looks like this:

    world_input = InputBuilder(world) \
        .request(bot_id) \
            .action('take') \
            .action('turn','SOUTH') \
            .action('step') \
            .action('step') \
            .action('drop', block_id) \
        .request(world.add_bot(7, 7)) \
            .action('step') \
            .action('step') \
        .result()

With these ideas in hand, we can begin to see how things will wind up, something like this, perhaps:

  1. Receiving do_something, the Cohort creates a command builder.
  2. For each Bot in the Cohort, Cohort issues a request to the builder, identifying the bot, and then
  3. Passes the builder to the Bot’s do_something, where
  4. The Bot uses whatever decision process it has to add actions to the builder.
  5. When all Bots have had their shots, (things start getting vague here), someone(?)
  6. Asks the builder for the JSON of the command input, and
  7. The Connection passes it to the World, where
  8. Using some scheme of its own the world applies the commands to its bot entities,
  9. Accumulating results into some undefined return structure, and then
  10. A response list is sent back, whereupon
  11. The Cohort distributes it back to the Bots in some reasonable way.

If I were more of a CSS master, I’d arrange those lines to become increasingly faded, as my understanding of what we’ll need to do fades. Not that we couldn’t decide those matters. It’s just that we haven’t, and we don’t really need to.

But there is a concern, and for me it’s a fairly large one: we need to do this rather significant change in small steps. And by small steps, I mean moves that only take a few minutes from green to green. And by a few, I mean that twenty minutes is probably too long for the actual change. Ten is better. Five would be quite nice. An hour is right out.

Here’s how I think of errors, defect injection, which leads me to want to do such tiny steps:

Defect Injection.
Poisson Arrivals?

To a first approximation, I assume I inject errors into my code at some more or less constant rate over time. A better approximation would consider different types of errors, but I think a constant rate for each type is probably close to right. However, an even better approximation would consider how many ideas I’m juggling in my head at the same time. If I were to simultaneously try to write an input builder, keeping its JSON output up to date, while adjusting the JSON reader side to use the new JSON features, and the resulting output list therefore receiving new information that had to go into the output JSON and back into the Bots … I think my defect injection rate went up just typing that.

So, because defect injection is either constant over time or worse, we want to minimize our time between green and green. If we discover that tests are red after just one error has been injected, it tends to be easy to find. When we have injected two or more, it’s more confounding and fixing them takes longer.

Imagine how fine it would be if the moment we put in a defective line we got an immediate gentle notification. We’d just fix it right there.

Like PyCharm’s test runner.

In PyCharm, I use the automatic test running feature. When I pause in typing for a couple of seconds, PyCharm runs the tests. Generally I know what to expect. If I’m making a new test run, I know exactly when it should go green. If it doesn’t, I know I’ve done something wrong and it’s usually easy to find. The best part is that I don’t have to do anything but pause to see how we’re doing.

But if I’m working on a long series of moves all of which have to work … the tests go red and stay there. I lose information, and I don’t like it. My tests will tell me how I’m doing when they can, so I like to give them the chance.

We had it tough in those days …

Now, I’ve been programming a long time, probably since before you were born, and I do know how to write a lot of code between test runs. Believe me, when you’re lucky to get two runs a day, you learn to write code that is likely to work. But it’s hard, it’s tedious, and even if you’re quite clever, it only takes one missing or incorrect character to waste half a day. Back in the day, I learned to sweet-talk Senior Master Sergeant Whittaker into letting me sit down at the console of the IBM 7090 and run my programs myself, in the middle of the night. I still had to run back and forth to a keypunch machine, but I got a lot more runs in that way.

Today’s way is much better than that. For one, I don’t have to stay up until 6 AM to get computer time. Instead I go to bed at a reasonable hour and get up at 6 AM to put in my computer time. Sounds the same, but no, it’s much nicer this way.

So I want to write minimal code between runs, and I want each run to show me exactly what I expect. I want most of the runs to show me all my tests running green. Sometimes I want all but one to run green, and the one to show me exactly the error I expect … or, if it must, an error that I did not expect. I want there to be only one error, one thing to change at that moment.

Could we always do it?

Is it even possible to work, say, such that there is never more than 20 minutes between green bars? I think it is, but sometimes you have to cheat. I am pretty confident that I could Small Step TDD the Cohort class until I knew it worked. I have already done the InputBuilder, so I’m sure I can Small Step TDD the one that I really want, which will be very similar. I’m sure I can do JSON in small steps. So I’m sure that I can do each piece of this job in small steps.

But I can’t actually install that code until it all works … and then installing it will take more than twenty minutes if only to fix up all the places that need to use the new code. And to break that time into small chunks, I can put all the changes on a “feature flag”, so that the old code is used until I flip the flag. Clearly I can put in the feature flags one at a time in small steps: I already have the code for both sides of the flag.

So I think it’s (almost?) always possible to go in small steps. The questions remaining include: Will iwe get done faster that way? It’s Hill’s Money Proposition: We work this way because it gets us to working shippable code faster. If it was not faster, Hill would not do it. I might still do it, because I’m here to discover what happens when I do things … but in fact I also find it faster to work in small steps … when I can find them.

Sometimes I give up on small steps. And often, after working for a few hours, I lose the thread and toss it all out. The next day, or the day after, I seem often to have become smarter and more able to do the thing, generally in small steps.

Are small steps always better?

I think they are, probably even if it takes a day to figure out a few minutes’ work. But I don’t always work that way, because a day of figuring out often includes a lot of programming, trying things, and sometimes, that code is “good enough”, and I keep it. I’m pretty good in that mode, and I’m pretty good at improving code that needs it, so if I do accept a long and somewhat sloppy solution, I can generally refactor it back to something good.

Is it faster that way, accepting code that I debugged into existence? Or would I do better to invariably throw it away and do it again? I do not know. I know what I do, and I know that I generally get away with it. Sometimes I feel a bit grubby afterward, but I can generally clean things up afterward. I’m not perfect.

But I suspect, very strongly, that I’d do better if I could always find the approach that brings me closer to where I want to be, every ten or twenty minutes, green to green.

What about where we are now?

OK, down to cases. Be that way. Right now we have a trivial Cohort:

class Cohort:
    def __init__(self, bot):
        self._bot = bot

    def update(self, result_dict):
        self._bot._knowledge.update(result_dict)

It only handles one Bot, not many. And in fact we only use it for one command:

class Bot:
    def perform_actions(self, actions, connection):
        for action in actions:
            match action:
                case 'take':
                    connection.take(self)
                case 'drop':
                    connection.drop(self, self.holding)
                case 'step':
                    self._old_location = self.location
                    cohort = Cohort(self)
                    connection.step(cohort, self.id)
                case 'NORTH' | 'EAST' | 'SOUTH' | 'WEST':
                    connection.set_direction(self, action)
                case _:
                    assert 0, f'no case {action}'

And only the step command in DirectConnection understands the Cohort:

class DirectConnection:
    def step(self, cohort, client_bot_id):
        self.world.command('step', client_bot_id)
        result_dict = self.world.fetch(client_bot_id)
        cohort.update(result_dict)

Now somewhere in (or attached to) the code we see above, we need to build up a collection of requests, one for each Bot in a Cohort, and we need that collection of requests to be sent as a unit to the DirectConnection, perhaps as a collection, perhaps as a JSON string, and we need for the commands to be sent over to the World.

Later, we need to send the batch over to the World, and have the whole batch processed and the results sent back, in a batch.

For this to happen, I suspect we’ll need some new objects, or at least some new fairly complicated methods that will be calling for new objects.

The 2X20 challenge.

I’ll try to evolve from where we are now, to a scheme that passes JSON batches of requests from the Bot side to the World side, and JSON batches of responses back, ready to be sent over a real socket (but not including that part), with a focus on these concerns:

  1. 2X20: Have everything green at least every 40 minutes if not 20.
  2. If the 2X20 expires, I must roll back without saving a secret copy of the expiring code.
  3. Have the Game class using the new scheme, not just our tests.
  4. Spikes are allowed, to try out new ideas, but they, too, must follow the 2X20 rule.

I’ll probably refine these rules. The idea I have in mind is to require myself to adhere more closely to the 20 minute ideal, while not being a jerk about it. I’ll try to call my shots, and of course when I mess up, as I surely will, I’ll be telling you all about it.

See you next time!