Arrgh! No, not space pirates all of a sudden. I released a fatal defect yesterday. What’s up with that?

Imagine my surprise when I moved Invaders to my other iPad to look at something, and this happened:

gunner in wrong position

I touched the screen, and this happened:

crash

The message was this:

Main:20: attempt to index a nil value (global 'Gunner')
stack traceback:
	Main:20: in function 'touched'

The code I released yesterday was fatally broken and unplayable. That doesn’t spark joy.

Nobody’s perfect.

When I tweeted my dismay, I got a few replies like “at least you shipped” and “even giants are not perfect” and “then you are doing great” and “you are human after all”. And one that said “cancelled”.

I am human. Overweight a bit but certainly not giant. And I’m nowhere near perfect: you can’t read these articles and imagine that I’m perfect, and I sure as heck can’t write them and think I’m perfect. I make as many mistakes per cubic meter as anyone, and more than many.

The underlying point of my articles, if there is one, is that we can ship improved versions of our program frequently. I ship mine after work sessions of a couple of hours, and while they’re often not free of defects, I generally know what the defects are, and the program is almost aways better.

This version, prepared rather more carefully than some, didn’t work at all.

We deserve to know why it got released, why it didn’t work, and how to improve.

Shunryu Suzuki said:

Each of you is perfect the way you are
… and you can use a little improvement.

And that’s the trick, isn’t it? A little improvement in the program, every time. A little improvement in ourselves, every time. So we’ll take a look here at what’s wrong in the code, what needs improvement in the way we code, and what needs improvement in our release process.

The Code

I’ll spare you the few moments I spent looking for this, they were the usual debugging things. No great insights. But I did find the problem in two phases.

First I found that the Gunner had been clobbered by the tests. There’s a test that sets the gunner coordinates and bomb coordinates and tests the collision-checking code. It was hammering the game’s own gunner, not a private copy. On my watching tv iPad, I patched in some save and restore code in the test.

Then I found the touch problem. Fiddled with that and found the real problem, which is this:

function setup()
    Runner = GameRunner()
    runTests()
    SoundPlayer = Sound()
    createShields()
    createBombTypes()
    invaderNumber = 1
    Lives = 3
    Score = 0
    Line = image(208,1)
    for x = 1,208 do
        Line:set(x,1,255, 255, 255)
    end
end

The Runner is set up before we run the tests. I did this on purpose at some time in the past, to make some of the tests run, who were using the Runner in some harmless ways. Until they weren’t.

That code makes the game’s actual innards accessible to the tests. Inevitably, as things changed and more tests came into being, somebody stabbed the game in its actual innards.

So the fix is “simple”, reverse those two statements and then make the tests work again if any break. None do. Nice.

However the touch still crashes. Why’s that?

function touched(touch)
    Gunner:touched(touch)
end

Gunner is no longer global. We “fixed” that yesterday. And, clearly never touched the screen again after that fix. So:

function touched(touch)
    Runner.player:touched(touch)
end

And now the game runs properly.

What about your tests?

Mind you, I’ve been writing all along how, in a visual game like this, I don’t know good ways to test certain things. And I’ve been occasionally getting in trouble not testing things that could be tested, having created this nice excuse for not testing much. And my tests have usually helped.

Could we test touched?

How did these two problems not get found? Well, I used Codea’s search to look for accesses to Gunner when we removed the global, and somehow I missed that one. Human error.

There is no test that exercises touched and it would be difficult, though not impossible, to write a decent one. It would be easy to write a trivial one, and maybe we’ll do that in a moment, to close the barn door.

Could we test gunner position during game play?

I’m not sure how an automated test could tell me that the gunner was going to appear on screen at the wrong Y coordinate if a test set its coordinate awry.

Well … you could have all the game initialization broken out into a separate function. We might be pretty close to that now. And then you could run that init in a test and then rip the guts out of it checking all the coordinates that you had just put in. To me, that’s a very poor kind of test, because you just get the values right and then don’t mess with them and the test isn’t needed. Any changes that you make to coordinates are purposeful and trigger additional changes in the poor test. I avoid that sort of thing.

Could we at least run the program?

Of course a manual test of the most trivial kind would have found the defect. Somehow, yesterday, I did all that refactoring, relying on my tests, and never ran the game after some crucial change. Certainly didn’t run it right before I released.

So maybe we can write a test for touched, just to see how it might be done. We’ll try. Unless I forget.

It’s just not reasonable to write a test to find the gunner defect.

Avoiding problems like these

What we can do, and should do in my view, is explore how to avoid problems like these, to make it impossible for the tests to clobber the real game, and to make ourselves more certain that the game will play.

Globals

The fundamental cause of this problem, in the code if not in my head, is the globals. If Gunner were not global, then the tests would have had to make their own, and we couldn’t have smashed a live one.

A short term, effective, but risky fix is to create the game globals only after the tests have run. That’s the line swap that I made above. And I’ll sure do my best to remember to keep that line first. But it’s risky in general, because it’s tempting to create some useful test stuff first. I fell into that trap once already. Some new programmer, unsteeped as we are in the wisdom of the ages, could make that mistake again.

There are other issues with the globals: some tests just won’t run at all if some globals aren’t set up. Here’s some system code:

function Missile:handleOffScreen()
    if self.pos.y > TheArmy:saucerTop() then
        self.explodeCount = 15
    end
end

This is the code that explodes a missile after it has passed all possible targets. And it references that global, TheArmy. If I wanted to test that, I’d have no choice today but to set up that global and access it, and once it is set up, we can’t be sure it’ll get reset later. We could save and restore it in our test, which would be fairly safe.

Globals are useful, and they are mistakes waiting to happen.

So we can improve by using fewer globals, and getting rid of as many as we can.

Of course we got into this problem by removing a global, so were we not the intrepid adventurers we are, we might be afraid to remove another. I’m pretty sure that if I had brought down Minecraft yesterday, instead of a program that maybe six people even know about, I’d be afraid to do it again.

Improvement Ideas

I’m not here to be perfect other than in the Suzuki sense, but I am here to improve. Here are the ideas I’ve got so far about how to avoid things like this in the future:

  • Limit the use of globals to a few
  • No, fewer than that.
  • Always briefly test the game manually right before release.
  • Make tests run in their own sandbox.

I’ve probably mentioned an idea further above and already forgotten it. That’s OK, it leaves room for more improvement.

For now, I’ll manually test this baby and ship it.

See you soon, probably Monday!

Invaders.zip