Some thoughts on tests and code, inspired by shipping a defect and by ideas from Rickard. And then some changes. I have a nice one in mind.
Rickard Lindberg and I have been conversing over in the prehistoric elephant land, and his points yesterday seem quite germane. His remarks are shown indented, mine at the left margin.
I also agree that it’s good to test at as a low level as possible. My thinking with the decentralized version was that the lowest level possible was the “Interactor”. If you test at a lower level, you can for example mix up the order of begin/end calls (arguably not that likely). But you could end up with a test that represents something that will never happen in reality. And the test is not as relevant any more.
This can happen. And I do have some irrelevant tests right now. As we work on them, I’ll try to keep this conversation in mind and remark on the extent to which Rickard’s remarks here apply. In the cases I can remember offhand, the irrelevance is coming from changing assumptions about how the Fleets and Fleet objects work.
When I build a new Flyer, one of the special interacting kind, or when I add some detection or counting to an existing flyer, I am thinking about how that object responds during the
end cycle. The design goal is for those functions to provide everything that a given Flyer needs in its life cycle. So, naturally enough, I have tended toward just instantiating one and exercising it.
However, in favor of Rickard’s concerns, that life cycle does require access to a Fleets instance, and the changes the object makes are made to the Fleets instance. So I do have to build a Fleets, and that has been problematical in at least some tests.
As I think about this, it comes to me that there is a kind of asymmetry here. The Interactor does send the begin-interact-end sequence, but it does not send the
tick. That is sent by Game.
Perhaps responsibilities are not well-aligned here. If that’s the case, testing “correctly” will be difficult and will often seem to require either testing at a higher level than one would like, or a lower one than would be ideal. That awkwardness in testing would likely result in tests that can too easily become irrelevant, or worse, misleading.
One way of thinking is dividing the code base in two parts saying this first part, I’m going to test extensively, then the other part will be mostly trivial and it will for sure work without tests.
That might work well in practice. However, I’ve been bitten by the “will for sure work” too many times (I feel).
Yes. I seem to have the ability to make mistakes in the most trivial code one can imagine. Is it because mistakes are random, or because when writing “trivial” code I don’t pay as much attention? Either way, I seem to be able to make mistakes anywhere.
The question is what to do. Given that I could get a Python
@property wrong, should I test all my properties? The better question is: “would testing properties get me to working code sooner, or later, than not testing them?”
I don’t know the answer. I just try to pay attention and lean a bit differently based on what happens.
So I am pursuing a way of testing everything. I want feedback within seconds if I broke something. Anything. Not sure it’s worth it. But that’s what I’m currently exploring.
It’s a good exploration in my view. If we always do the same thing, we are never exploring the boundaries near where we are. So we do well, I think, to try a bit more of this, a bit less of that, to see what happens. When good things happen, the good habit is reinforced, and when bad things happen, we’ll do less of it. In short: we learn a lot at the boundaries, so we probably benefit from pushing them a bit all the time.
My current favored approach is overlapping, sociable testing. (What James Shore writes about.) That means “integration” tests as well. I try to push as much functionality as I can to the edges where it’s easier to test. But I still want the (overlapping, sociable) integration tests to catch mistakes there as well.
James is surely a better programmer than I am, and I strongly recommend his articles and videos. He thinks hard and deeply about how things work, and he explains things well.
It seems intuitively clear that overlapping testing is best, though I suspect it is only best because we are imperfect animals. That’s not likely to change, so it is likely to remain best for the duration. When we test only at the “integration” level, we will often miss possible calling sequences that would break the objects if only we had tested them. When we test only at the lowest object level, we can miss, among other possibilities, interactions between objects that will occur in the integrated code but not in our tests.
Yesterday’s discovery is a perfect example. My tests that determined that missiles disappear after hitting something continued to work, because they were using a combination of objects that worked, and the actual game code used a different combination that did not work. I’m not sure whether to put the blame on missing detailed tests or missing integrated ones, but there’s no doubt that the defect was at the integration level, so my suspicion in this case is that Rickard’s concerns are valid: at least some of my tests would work better at a higher level of integration.
I will take Rickard’s ideas, and all the other ideas that may come up, and try to integrate them into my work. Thanks!
We’re trying to get to a design where all the logic of Asteroids is in the individual Flyer objects, and none of it in the Game or Fleets or Fleet classes. I expect that Game may become less complicated, and that Fleets/Fleet surely will. In fact we have already removed at least one specialized Fleet object and are using another one only once, where we used to use it twice.
To my thinking, the decentralized objects are all quite simple. However, I’d freely grant that that since it’s the interaction of the objects that does the work, two simple objects can result in somewhat surprising results. I’ve found that folx I tell about how it works do not instantly get the idea, so it’s kind of “simple, not simple”.
Things have almost always gone quite smoothly in the transition, but yesterday there was the horrible discovery that the ship’s missiles were invulnerable and would sail on, destroying asteroid after asteroid, until they finally timed out. That improved my score but was not an intended feature. (Maybe if I put in additional quarters?)
I’ve attributed the defect to the fact that some of my most important tests are setting up Fleets instances by providing collections, and by accessing specific collections from inside the Fleets object, and generally behaving as if they know what’s going on … but I’m changing what’s going on.
Yesterday I created the FleetsInspector object (FI), to be used in tests, providing things like “all the saucer missiles” or “all the asteroids”. Today, as my first coding exercise, I propose to change how that works.
class FleetsInspector: def __init__(self, fleets): self.fleets = fleets @property def all(self): return self.fleets.all_objects @property def asteroids(self): return [a for a in self.all if isinstance(a, Asteroid)] @property def missiles(self): return [m for m in self.all if isinstance(m, Missile)] @property def saucers(self): return [s for s in self.all if isinstance(s, Saucer)] @property def saucer_missiles(self): return [m for m in self.all if isinstance(m, Missile) and m.is_saucer_missile] @property def ships(self): return [s for s in self.all if isinstance(s, Ship)] def clear_saucer_missiles(self): for m in self.fleets.saucer_missiles: self.fleets.remove_saucer_missile(m) FI = FleetsInspector
I note some duplication there. I think we could call it Feature Envy. It occurs to me that Fleets could help us here, and that we might also be better protected against changes to Fleets and its Fleet instances.
It’ll be a bit deep in the bag of tricks but I hope not too far.
I’ll write a test for the feature. I write a fairly comprehensive one:
def test_fleets_select(self): fleets = Fleets() fleets.add_asteroid(Asteroid(2)) fleets.add_asteroid(Asteroid(1)) fleets.add_wavemaker(WaveMaker()) wm = fleets.select(lambda x: isinstance(x, WaveMaker)) assert len(wm) == 1 as2 = fleets.select(lambda x: isinstance(x, Asteroid)) assert len(as2) == 2 as1 = fleets.select(lambda x: isinstance(x, Asteroid) and x.size == 2) assert len(as1) == 1
Probably you get the plan: I’m going to add a
select method to Fleets that the FI can use instead of those weird comprehension things.
class Fleets: def select(self, condition): return [flyer for flyer in self.all_objects if condition(flyer)]
The test runs. Commit: Fleets.select tested.
But wait, there’s more. I also want
count. This is speculative, I grant, but I think it’ll be useful.
assert fleets.count(lambda x: isinstance(x, Asteroid)) == 2
class Fleets: def count(self, condition): return len(self.select(condition))
OK, nice. Commit: Fleets.count method.
Now I can improve FleetsInspector:
class FleetsInspector: def __init__(self, fleets): self.fleets = fleets @property def asteroids(self): return self.fleets.select(lambda a: isinstance(a, Asteroid)) @property def missiles(self): return self.fleets.select(lambda m: isinstance(m, Missile)) @property def saucers(self): return self.fleets.select(lambda s: isinstance(s, Saucer)) @property def saucer_missiles(self): return self.fleets.select(lambda m: isinstance(m, Missile) and m.is_saucer_missile) @property def ships(self): return self.fleets.select(lambda s: isinstance(s, Ship)) def clear_saucer_missiles(self): for m in self.saucer_missiles: self.fleets.remove_saucer_missile(m) FI = FleetsInspector
My tests all run, which gives me confidence that these changes are good. I immediately notice some duplication
self.fleets.select. Extract …
self.fleet.select and did extract method, and PyCharm did something very odd:
def select(self): return self.fleets.select @property def asteroids(self): return self.select()(lambda a: isinstance(a, Asteroid))
That’s correct but it certainly looks odd. Selecting Convert Method to Property gives me this, which is what I had in mind:
@property def select(self): return self.fleets.select @property def asteroids(self): return self.select(lambda a: isinstance(a, Asteroid))
Well, no, it isn’t quite what I had in mind. I had in mind that the local
select would take the parameter. Let’s undo those and try another path.
I can’t convince PyCharm to do the extract that I want. I’ll do it manually.
def select(self, condition): return self.fleets.select(condition) @property def asteroids(self): return self.select(lambda a: isinstance(a, Asteroid))
And so on. That’s what I had in mind. My tests are all running.
Let’s look at some of the FI ones. Here’s one that justifies my speculation:
def test_missile_asteroid_scores_with_missiles_in_others(self): fleets = Fleets() pos = Vector2(100, 100) asteroid = Asteroid(2, pos) fleets.add_asteroid(asteroid) missile = Missile.from_ship(pos, Vector2(0, 0)) fleets.add_flyer(missile) fleets.add_scorekeeper(ScoreKeeper()) interactor = Interactor(fleets) interactor.perform_interactions() interactor.perform_interactions() missiles = FI(fleets).missiles assert not missiles assert interactor.testing_only_score == 20 asteroids = FI(fleets).asteroids assert len(asteroids) == 2
Well, no, actually it doesn’t justify my idea of offering
count in Fleets. To use it, I’d either have to change this test to use a lambda, or I’d have to add a count_asteroids method to FI. Neither is worth it. I think my speculative method was a waste.
Anyway we are green, so commit: refactor FI to provide
select convenience method.
I think this is just a bit nicer. Even though FI is just a tool, there’s value in making it work a bit more smoothly, and typing those comprehensions is a pain at the best of times.
The price paid is use of
lambda, which probably makes Guido frown, but I like it and my house my rules.
I have a possibly controversial idea. The tests have a habit of building Fleets objects by passing in collections. Fleets’ constructor looks like this:
class Fleets: def __init__(self, asteroids=None, missiles=None, saucers=None, saucer_missiles=None, ships=None): asteroids = asteroids if asteroids is not None else  missiles = missiles if missiles is not None else  saucers = saucers if saucers is not None else  saucer_missiles = saucer_missiles if saucer_missiles is not None else  ships = ships if ships is not None else  self.fleets = ( Fleet(asteroids), Fleet(missiles), SaucerFleet(saucers), MissileFleet(saucer_missiles, u.SAUCER_MISSILE_LIMIT), ShipFleet(ships), Fleet(), # explosions not used Fleet()) self.thumper = Thumper(self.beat1, self.beat2)
We are changing Fleets, moving in the direction of putting all the objects into the
others collection, the last one in the list above. (I am wondering as we look at that why we don’t use a dictionary here, so that we could access those lists by name instead of index. Good question.) But I digress.
It seems to me that we’d do better to create our fleets collection however we do it, and then use the various
add methods to add any provided inputs. Then they’d go into the appropriate collection even as things change.
I think if we were to do this, some tests would break, because in addition to deciding on behalf of Fleets where things go, we’re probably looking there later. We’re at a green bar, so I think I’d like to try this idea.
It gives us one more advantage: we could default the collections to
 rather than
None, since we won’t be storing them directly into our Fleets object. Python creates defaults at compile time or some such horrible thing, and I have been concerned that we might wind up with spurious objects in the Fleets. (I should write a test to figure out what really happens, shouldn’t I? Perhaps one day.)
I’m going to make the change and see how much breaks.
With this in place:
class Fleets: def __init__(self, asteroids=(), missiles=(), saucers=(), saucer_missiles=(), ships=()): self.fleets = ( Fleet(), Fleet(), SaucerFleet(), MissileFleet(, u.SAUCER_MISSILE_LIMIT), ShipFleet(), Fleet(), # explosions not used Fleet()) for asteroid in asteroids: self.add_asteroid(asteroid) for missile in missiles: self.add_missile(missile) for saucer in saucers: self.add_saucer(saucer) for saucer_missile in saucer_missiles: self.add_saucer_missile(saucer_missile) for ship in ships: self.add_ship(ship) self.thumper = Thumper(self.beat1, self.beat2)
I think that’s what I intended … but 13 tests are failing. Let’s have a look.
Here’s an example of what I expected:
def test_ship_rez(self): ShipFleet.rez_from_fleet = True ships =  fleets = Fleets(, , , , ships) ship_fleet = fleets.ships assert not ships ship_fleet.tick(0.1, fleets) assert not ships ship_fleet.tick(u.SHIP_EMERGENCE_TIME, fleets) assert ships
Because we’re protecting our collections now, this test can’t hold on to ships and expect it to change. But can’t this test refer to the fleet instead?
def test_ship_rez(self): ShipFleet.rez_from_fleet = True fleets = Fleets() ship_fleet = fleets.ships assert not ship_fleet ship_fleet.tick(0.1, fleets) assert not ship_fleet ship_fleet.tick(u.SHIP_EMERGENCE_TIME, fleets) assert ship_fleet
The test does run. But I think we’d do better using FI.
def test_ship_rez(self): ShipFleet.rez_from_fleet = True fleets = Fleets() fi = FI(fleets) assert not fi.ships fleets.tick(0.1) assert not fi.ships fleets.tick(u.SHIP_EMERGENCE_TIME) assert fi.ships
That one goes green. I think we have some slightly tedious but simple changes to get things back to green, and if we do, the Fleets object will be much safer, both in its own implementation and because we’re accessing it in a more regular fashion rather than by haruspicy .
Along the way, I found it desirable to change this:
class Fleets: @property def missiles(self): return self.select(lambda m: isinstance(m, Missile))
That was returning the
missiles collection, which is no longer even used.
I’m still ticking away at tests, nothing to see here. After a fairly long session of typing
fi = FI(fleets) and
fi.this_or_that, my tests are green. I will try the game. Should be fine, I don’t think I changed anything that could break, but if I did I want to know it.
Game seems solid, not to my surprise. Commit: Refactor Fleets to copy input collections via add_ methods, refactor 13 tests to match.
That is more than sufficient. Let’s sum up.
I made a small addition to Fleet, allowing selection of objects via a conditional provided by the caller. That allowed me to simplify the code in FleetsInspector, removing the duplicated list comprehensions.
I also speculatively added a
count method which has so far never been used. Speculation wasted. Fortunately it was just a half-dozen lines of code and test.
Then I wanted to make a substantial change to Fleets, which is never populated on creation in actual use, but is often populated by tests. The tests really shouldn’t do that, and should be changed, but as an interim measure I changed Fleets to create its individual Fleet instances empty and then add to them, which allows Fleets to move things around among Fleet instances with a bit more flexibility.
Along the way I found that Fleets needed to make at least one of its collections more virtual, and changed it:
class Fleets: @property def missiles(self): return self.select(lambda m: isinstance(m, Missile))
This means that while Fleets decides where to put missiles, it didn’t have to be changed to remember where they were. This made things work nicely, but it makes me a bit uneasy about whether all the other collections should work the same way.
In particular, the
missiles property is returning a list, not a Fleet instance. The other properties are returning a possibly smart Fleet.
This is an inconsistency and while we have to expect some of these as we move things around I am concerned that there will be defects lurking.
But overall OK …
That said, the fleets and interactions tests are pretty robust, even more so now that they are mostly converted to use FleetsInspector, so we’re pretty safe … but still I do feel uneasy.
We’ll see what happens. I would like to move toward actually pushing capability into new flyers, but I think these changes have made the work a bit easier. I’m reminded of this article, showing how we might clear the work area before making the changes we really want.
The changes to the tests are not interesting enough to paste all the code in there, though you are free to browse them. the “test_interactions” file is most germane. If you did look in there, not really recommended, you’d see that I’m more often sending
tick to the Fleets instead of a specific Fleet, and that the tests are running Interactor most of the time.
That seems to me to be about the right level for tests that check to see that things appear and disappear on schedule. Some of the tests are tricky. One thing was that the ship emergence time and the missile timeout are such that I had to slip the missile into the middle of the test, because it would normally have timed out already and I wanted to be sure that if any missiles are still around, the ship won’t emerge.
I would argue that the rule itself is tricky: the ship will not emerge if there are missiles on screen, if there are saucer missiles on screen, or if any asteroids are too close to the center. Ask a multi-part question, expect a lot of individual tests.
Are all my tests just where they should be, doing just what they should do? I would not make that claim, but despite that it was a pain to fix them up, the fact that 13 tests failed and needed a little revision was a good sign that they’re supporting me pretty well.
And yet …
That said, I did want to try the game. I’ll need to think about what that tells me about what I should test that may not be well-tested now.
Still happy though!
Overall, a nice morning of cleanup, with no really scary bits. I like it that way. Thanks, Rickard, thanks everyone, and I’ll see you next time!