Yesterday’s “diversion” wasn’t a debacle, quite. The code solution is OK, but the development process not so much. Let’s try again. Outcome: Redeemed! This is up to my standards!

“Ever tried. Ever failed. No matter. Try again. Fail again.Fail better.”
– Samuel Beckett

Let’s review the spec as provided by GeePaw Hill:

The CollatedTests class is a class that is used to collect test results across multiple Junit runs. During each Junit run, log records are collected that indicate which tests passed, failed, were disabled, or were aborted. (I do not know what aborted means in this context, and have never seen an aborted test.)

Imagine a TestStatus, which is Pass, Fail, Disable, Abort, and one other, Unrun.

The CollatedTests class has two functions. 1) add(testname:String, status:TestStatus), and 2) endRun():List<TestResult>. This is because the data comes in as four arrays associated with the first four TestStatus enums above. (No Unruns are added.)

endRun() is the DS&A part. It must return a list of TestResult, which has three fields: Name of the test, TestStatus, and a boolean isNew. Further, that list must be in the same order every time it’s called. That is, if the 0’th test was “MyTest.testSomething()”, that must be the 0th entry at every call to endRun(). Even if it was not run during that JUnit run, and hence is missing in log record.

Finally, if a test has never been run before a particular JUnit run, that TestResult must be marked as “isNew”.

As we should do with any spec, we’ll take it as guidelines, not rules, and implement something that makes sense and provides what is asked for.

My focus this morning will be on being my best, using my best judgment and practices. Not by rote, but thoughtfully. I’m going to try to be a bit more explicit with things, perhaps even to the point of providing type hints. We’ll see.

And I have a new scheme in mind, as we’ll see.

There’s still a “big” decision to make: shall I proceed from a plan, or incrementally, discovering the plan? I think I do my best work in the second mode, but of course I do have a tentative plan, which is this:

Have trivial objects for Status and Result. Consider use of namedtuple, at least for Status.

Work toward two “ideas”: First, a result maker that holds a collection of Status elements, and upon request, provides a Result of a given name and a provided value of is_new. Second: a name list that provides the ordered list of aged names (age being new or not).

Test everything unless I know the reason why. I’ll surely test any conditional logic that shows up, and may decide not to test simple data classes that have no real behavior.

Let’s do it!

I have a new project and hooked up test file. I think I’ll even write at least a trivial test to drive out my base objects.

    def test_status(self):
        status = Status('TestFoo', 'Pass')
        assert status.name == 'TestFoo'
        assert status.outcome == 'Pass'

I answer that challenge:

class Status:
    def __init__(self, name: str, outcome: str):
        self.name = name
        self.outcome = outcome

I’m following Keith Braithwaite’s technique of writing my code in the test file, but I will move classes like this to separate files as soon as they start to stabilize or to grow. This one is ready to go.

Commit: trivial Status class Move it to its own file. Commit again: Status file.

Now the Result class, which is of course rather similar. I am consciously trying to code for documentation and to produce something that transmits meaning, so I decide to try this layout:

    def test_result(self):
        result = Result(name='TestBar', outcome='Fail', is_new=True)

PyCharm wants to help.

class Result:
    def __init__(self, name:str, outcome:str, is_new:bool):
        self.name = name
        self.outcome = outcome
        self.is_new = is_new

The test is green, and the fact is, I trust it. But let’s add the checks before we commit, and move it.

    def test_result(self):
        result = Result(name='TestBar', outcome='Fail', is_new=True)
        assert result.name == 'TestBar'
        assert result.outcome == 'Fail'
        assert result.is_new is True

Green. Commit: trivial Result class.

Reflection

OK, that was little more than a good warmup. Let’s think about what we really have to do. I think we’ll be writing an object called Collator, not CollatedTests. I think of it as active, not passive. I have in mind a simple protocol:

  1. A begin_run method that is to be called before we start a run;
  2. An add method that receives a Status;
  3. A method results that can be iterated to get the ordered list of results.

We’ll code this directly and then refactor to good code, as is my usual fashion.

Begin with a test. I think this is what I want:

    def test_collator_initialized(self):
        collator = Collator()
        collator.begin()
        assert list(collator.results) == []

My initial Collator looks like this:

class Collator:
    def __init__(self):
        pass

    def begin(self):
        pass

    def results(self):
        return []

I move the Collator to its own file and commit: initial Collator.

I started to write this test:

    def test_result_for_provided_status(self):
        collator = Collator()
        collator.begin()
        collator.add(Status('TestBar', 'Pass'))
        results = list(collator.results())

I decide not to start here, with two reasons:

  1. I think it’ll be better to start with new vs known names;
  2. I’m not quite sure what results() is, so I want to wait a bit before testing it a lot.

I delete that test, even though I can imagine it might be close to useful. I’d rather remove it than deal with ignored or spurious red tests.

    def test_added_name_is_new(self):
        collator = Collator()
        collator.begin()
        collator.add_name('TestBar')
        aged_names = list(collator.aged_names())
        assert len(aged_names) == 1
        assert aged_names[0] == AgedName(name='TestBar', is_new=True)

I’m imagining that aged_names() is an iterator or generator that can produce a list of AgedName instances, with a name string and an is_new boolean. That’s a bit large for one bite, but I didn’t quite realize that I wanted the AgedName type until I got there. Yesterday we just had a tuple representing this idea.

First the little class:

class AgedName:
    def __init__(self, name:str, is_new:bool):
        self.name = name
        self.is_new = is_new

And in Collator, I get this far. Note that I’ve paused at the method aged_names:

class Collator:
    def __init__(self):
        self.new_names = []

    def begin(self):
        pass

    def add_name(self, name:str):
        self.new_names.append(name)

    def aged_names(self):


    def results(self):
        return []

I intend that the method aged_names will be a generator. This is a bit fancy, I admit, but it will be a simple one and I think we’ll like it. I did an experiment in Juno Python on my iPad last night, and it seemed fruitful.

    def aged_names(self):
        while self.new_names:
            name = self.new_names.pop(0)
            yield AgedName(name=name, is_new=True)

The test still fails, because AgedName doesn’t have an equal comparison yet. Let’s defer that and change the test.

    def test_added_name_is_new(self):
        collator = Collator()
        collator.begin()
        collator.add_name('TestBar')
        aged_names = list(collator.aged_names())
        assert len(aged_names) == 1
        assert aged_names[0].name == 'TestBar'
        assert aged_names[0].is_new is True

The test is green. We have a generator. It is an iterable, a thing that we can create a list from or loop over. And, for now, it is creating AgedResults directly. We’ll change that in due time.

Belay this idea

Let’s refactor. We’ll extract a method from aged_names to create the AgedName, which will give us a place to extend how that works. No, on second thought, let’s not do that. There’s no need for it yet.

We need a new test. Let’s try one that shows the transition of a name from new to known:

    def test_names_become_known(self):
        collator = Collator()
        collator.begin()
        collator.add_name('TestBar')
        collator.begin()
        collator.add_name('TestFoo')
        aged_names = list(collator.aged_names())
        assert len(aged_names) == 2
        assert aged_names[0].name == 'TestBar'
        assert aged_names[0].is_new is False
        assert aged_names[1].name == 'TestFoo'
        assert aged_names[1].is_new is True

I expect this to be failing on the check for False. It is:

>       assert aged_names[0].is_new is False
E       assert True is False

Perfect.

Reflection

Here is where the magic happens. My plan is this:

  1. Add known_names to Collator;
  2. Change the generator to generate first the known_names, then the new_names;
  3. Known ones will have is_new False, for new it will be true;
  4. As we iterate the new ones, we’ll move them to the known ones;
  5. Because we’ll be modifying the known ones on the fly, we’ll need a copy to iterate.

That’s a lot. As I mentioned above, I did a little spike last night, in Juno. It’s not hard, though. Watch:

I discover that my test is flawed. I need to get the names before I add in the second tranche. Or batch. You do you.

    def test_names_become_known(self):
        collator = Collator()
        collator.begin()
        collator.add_name('TestBar')
        unused = list(collator.aged_names())  # ADDED
        collator.begin()
        collator.add_name('TestFoo')
        aged_names = list(collator.aged_names())
        assert len(aged_names) == 2
        assert aged_names[0].name == 'TestBar'
        assert aged_names[0].is_new is False
        assert aged_names[1].name == 'TestFoo'
        assert aged_names[1].is_new is True

The test is green, running this Collator code:

class Collator:
    def __init__(self):
        self.new_names = []
        self.known_names = []

    def aged_names(self):
        known = self.known_names[:]
        while known:
            name = known.pop(0)
            yield AgedName(name=name, is_new=False)
        while self.new_names:
            name = self.new_names.pop(0)
            self.known_names.append(name)
            yield AgedName(name=name, is_new=True)

Let’s look closely at the aged_names method. In case it’s not obvious it works like this:

  1. Make a copy of the known names;
  2. While there are names in known, pop one off and yield an AgedName, marked not new;
  3. Then, while there are new names, pop one off, append it to the known names because it’s not new any more, and yield an AgedName, marked new.

Nothing to it!

Commit: aged_names supports known and new names.

Reflection

I think I missed a possible commit up there a ways back. One of my flaws is that I don’t commit, sometimes, when I could. I believe strongly that it’s always best to commit when I can: I just get ahead of myself.

T&C|R

I’m reminded of Kent Beck’s notion of T&C|R, Test-and-Commit or Revert, where he has an automated hook that commits every time the tests are all green, and reverts the code (but not the tests) when they go red. I’ve never had the nerve to try that: it would be reverting my code all over and it would just tick me off.

But what harm would it do if I had a hook that committed on all tests green? I have no idea how to add a hook to PyCharm, or whether it’s even possible. Maybe such a thing exists: I’ll try to remember to look around.

aged_names

The aged_names method is exactly as advertised. Make a copy of the known names. Iterate, yielding a suitable AgedName for each, with is_new=False. Then iterate the new names, moving each one to the known names, and yielding an AgedName with is_new=True.

I freely grant that I like this almost too much. It was what I was reaching for yesterday: I see a little machine that produces the names in order, producing Result instances made our of Status instances, or created when a Status is missing.

Is a generator too deep in the bag of tricks? I would argue that it is not. Generators are a first-class Python feature and a professional Python programmer and shop should surely know them. It’s not like you’d use them every day, but when they’re called for, I think they are appropriate. Anyway, my house, my rules.

An object is appearing

It’s pretty clear that the name aging process is going to be separate from the status-to-result conversion process. So it’s quite likely that there is an object waiting to be born, an AgedNames class or some such name. We’ll hold off on that and leave our two collections in place for now, just working in Collator class.

Begin was useless

I think it was fair to expect that the call to begin would have flushed the new names to the old names, even if we didn’t read out the prior aged names result. We’ll write a test for that and make it work.

We’re not handling duplicates yet.

We’ll also write a test for that and make it work.

I’m glad we had this little chat. Let’s get back to it.

Duplicates

I think we’ll do duplicates first: it should make begin easier.

    def test_duplicates_do_not_occur(self):
        collator = Collator()
        collator.begin()
        collator.add_name('TestBar')
        collator.add_name('TestBar')
        aged_names = list(collator.aged_names())
        assert len(aged_names) == 1

This fails, of course, with length 2. We know how to fix this: we did it yesterday:

    def add_name(self, name:str):
        if name not in self.known_names and name not in self.new_names:
            self.new_names.append(name)

Green. Commit: cannot duplicate a name in aged names.

Now we want to make getting the names optional, given that you call begin:

    def test_begin_works_even_if_names_not_read_out(self):
        collator = Collator()
        collator.begin()
        collator.add_name('TestBar')
        collator.begin()
        collator.add_name('TestFoo')
        aged_names = list(collator.aged_names())
        assert len(aged_names) == 2
        assert aged_names[1].name == 'TestFoo'
        assert aged_names[1].is_new is True
        assert aged_names[0].name == 'TestBar'
        assert aged_names[0].is_new is False

This fails on the last line, as we know. Fix:

class Collator:
    def begin(self):
        self.known_names.extend(self.new_names)
        self.new_names = []

Green. Commit: begin moves new names to known and resets it.

Reflection

I think we’re ready to handle the Result-to-Status capability. Let’s review what will be needed.

  1. The add method will hand us a Status instance;
  2. We’ll save those, and we’ll push the name into our name aging method, add_name;
  3. We’ll produce a real results method, which will behave as a generator, unless we change out mind;
  4. that generator will yield instances of Result, with missing ones having outcome=Unrun (horrible word, that);

I think we’ll start with a couple of tests, one to accept a Status and return a corresponding Result, and another to return a default result.

We’ll run into an issue with is_new, and we’ll resolve that when it arises, I think, driving it from a couple of final tests.

Let’s go.

Status to Result

A review of the spec tells me that our add method takes a name and an outcome, not a Status. I do think we’ll continue to use the Status object internally, but let’s stay open to the possibility that we don’t really need it. I think we’ll prefer having it, though.

Added in Post
It turns out we don’t need it, and it gets removed at the end.

Fortunately, we deleted that early test that I wrote for add. Now we can do one that should actually work. But first a couple of helper tests.

    def test_add_name_provides_status(self):
        collator = Collator()
        collator.begin()
        collator.add(name='TestFoo', outcome='Fail')
        result = list(collator.results())[0]
        assert result.name == 'TestFoo'
        assert result.outcome == 'Fail'
        assert result.is_new is True

PyCharm prompted most of that but it looks good to me. We have no add method. Let’s have one.

class Collator:
    def __init__(self):
        self.outcomes = []
        self.new_names = []
        self.known_names = []

    def begin(self):
        self.known_names.extend(self.new_names)
        self.new_names = []
        self.outcomes = []

    def add(self, name:str, outcome:str):
        self.outcomes.append(Status(name=name, outcome=outcome))

That much was clear. Note that I remembered to clear the outcomes on begin. Also note that there’s a good chance we’d prefer the name Outcome for the object currently known as Status.

Now we have the small problem of getting them back. That tells me that a list isn’t what I want. I want a dictionary. Change:

Oh my. This makes perfect sense:

class Collator:
    def __init__(self):
        self.outcomes = dict()
        self.new_names = []
        self.known_names = []

    def begin(self):
        self.known_names.extend(self.new_names)
        self.new_names = []
        self.outcomes = dict()

    def add(self, name:str, outcome:str):
        self.outcomes[name] = outcome

We can make this just a dictionary from name to outcome: that’s all we need to know. The entire Status object is wasted. All three or four lines of it.

Is that bad?
No, it’s good. It’s also an indication that writing a class before we actually need it might be wasteful. So far we have discovered that Status is probably not needed at all, and we haven’t used Result even though we wrote it ages ago. Interesting lesson there. YAGNI, anyone?[^yagni]

We need a method to return a Result and I want to make the test easier. Like this:

    def test_add_name_provides_result(self):
        collator = Collator()
        collator.begin()
        collator.add(name='TestFoo', outcome='Fail')
        result = collator.result_for('TestFoo')
        assert result.name == 'TestFoo'
        assert result.outcome == 'Fail'
        assert result.is_new is True

Why did you do that?

I did that because I saw that I would need a way to convert an outcome to a result and the test was getting all the results, and then plucking one out. Better to get one, in the two forms it can arise, and then drive out the collection.

Now we need fesult_for:

    def result_for(self, name:str):
        return Result(name=name, outcome=self.outcomes[name], is_new=True)

Green. Commit: initial add and result_for methods New test:

    def test_missing_name_provides_unrun_result(self):
        collator = Collator()
        collator.begin()
        result = collator.result_for('TestBar')
        assert result.name == 'TestBar'
        assert result.outcome == 'Unrun'
        assert result.is_new is True

This should be easy enough. Extract:

    def result_for(self, name:str):
        outcome = self.outcomes[name]
        return Result(name=name, outcome=outcome, is_new=True)

Pythonically, use try:except:

    def result_for(self, name:str):
        try:
            outcome = self.outcomes[name]
        except KeyError:
            outcome = 'Unrun'
        return Result(name=name, outcome=outcome, is_new=True)

Extract a method to make that neater:

    def result_for(self, name:str):
        outcome = self.outcome_for(name)
        return Result(name=name, outcome=outcome, is_new=True)

    def outcome_for(self, name):
        try:
            outcome = self.outcomes[name]
        except KeyError:
            outcome = 'Unrun'
        return outcome

Inline in result_for. Return directly in outcome_for:

    def result_for(self, name:str):
        return Result(name=name, outcome=(self.outcome_for(name)), is_new=True)

    def outcome_for(self, name):
        try:
            return self.outcomes[name]
        except KeyError:
            return 'Unrun'

Green. Commit: result_for handles missing outcomes.

Reflection

I think we’re nearly done, having only the results generator to deal with.

class Collator:
    def __init__(self):
        self.outcomes = dict()
        self.new_names = []
        self.known_names = []

    def begin(self):
        self.known_names.extend(self.new_names)
        self.new_names = []
        self.outcomes = defaultdict()

    def add(self, name:str, outcome:str):
        self.outcomes[name] = outcome

    def add_name(self, name:str):
        if name not in self.known_names and name not in self.new_names:
            self.new_names.append(name)

    def aged_names(self):
        known = self.known_names[:]
        while known:
            name = known.pop(0)
            yield AgedName(name=name, is_new=False)
        while self.new_names:
            name = self.new_names.pop(0)
            self.known_names.append(name)
            yield AgedName(name=name, is_new=True)

    def result_for(self, name:str):
        return Result(name=name, outcome=(self.outcome_for(name)), is_new=True)

    def outcome_for(self, name):
        try:
            return self.outcomes[name]
        except KeyError:
            return 'Unrun'

    def results(self):
        return []

We see that we are unconditionally returning all the Result instances from result_for with is_new=True. That won’t do. And we know that the caller of that method will know whether the name is new or not, because the caller has the name, which it’ll get from the aged_names generator. So let’s change the signature of result_for to accept the is_new:

    def result_for(self, name: str, is_new):
        return Result(name=name, outcome=(self.outcome_for(name)), is_new=is_new)

PyCharm correctly filled in existing callers with the default that I asked for, True, so the tests stay green. Commit: result_for accepts is_new parameter.

Now let’s do one story test for a full run. Just a simple one should suffice. Here’s more than enough to drive out the method, but we’ll want more:

    def test_story_test(self):
        collator = Collator()
        collator.begin()
        collator.add(name='TestFoo', outcome='Pass')
        collator.add(name='TestBar', outcome='Fail')
        initial_result = list(collator.results())
        assert initial_result[0].name == 'TestFoo'
        assert initial_result[0].outcome == 'Pass'
        assert initial_result[0].is_new is True
        assert initial_result[1].name == 'TestBar'
        assert initial_result[1].outcome == 'Fail'
        assert initial_result[1].is_new is True

And for result(), we’ll want to iterate the names, fetch outcomes, and produce results. I think like this:

    def results(self):
        for aged_name in self.aged_names():
            yield self.result_for(aged_name.name, is_new=aged_name.is_new)

My test fails, list seems to be empty. Need a simpler test. I’ll mark this one to skip, because I think it needs to work, and write a simpler test.

    def test_story_test_aged_names(self):
        collator = Collator()
        collator.begin()
        collator.add(name='TestFoo', outcome='Pass')
        collator.add(name='TestBar', outcome='Fail')
        aged_names = list(collator.aged_names())
        assert len(aged_names) == 2

This fails with length 0. I’ve forgotten to add the names when they come in.

    def add(self, name:str, outcome:str):
        self.outcomes[name] = outcome
        self.add_name(name)

Un-skip the story test. It passes. Commit: results method working on initial test.

I think it’s totally working. Extend the story test to prove it:

    def test_story_test(self):
        collator = Collator()
        collator.begin()
        collator.add(name='TestFoo', outcome='Pass')
        collator.add(name='TestBar', outcome='Fail')
        initial_result = list(collator.results())
        self.check(initial_result, 0,
                   'TestFoo', 'Pass', True)
        self.check(initial_result, 1,
                   'TestBar', 'Fail', True)
        collator.begin()
        collator.add(name='TestBaz', outcome='Pass')
        collator.add(name='TestBar', outcome='Pass')
        second_result = list(collator.results())
        self.check(second_result, 0,
                   'TestFoo', 'Unrun', False)
        self.check(second_result, 1,
                   'TestBar', 'Pass', False)
        self.check(second_result, 2,
                   'TestBaz', 'Pass', True)

    def check(self, results, index, name, outcome, new):
        result = results[index]
        assert result.name == name
        assert result.outcome == outcome
        assert result.is_new is new

I got tired of typing in those asserts, so wrote check by Wishful Thinking, first typing the calls and then creating the method. And I went back and replaced the original long-hand assertions with appropriate results.

The test is green We have passed our story test. Commit: Collator passes all tests.

I remove the Status class and commit again, as it is unused except for one test, which I also remove.

We’ll call it a morning at this point. It’s nigh on to 1100 hours and I started just before Breakfast with the Beatles started at 0800. I’ll show all the code below, and in the next session we’ll see what refactoring we might want.

Kudos to PyCharm

I do not have the “AI” turned on in PyCharm, and will not have it. But even its regular suggestions are quite good and made writing the tests much easier, and often it would even generate the right code for an __init__ method or the like. It is a marvelous product and I recommend it highly. And no, they don’t pay me for that, although if they wanted to, it would be OK by me.

Summary

I feel redeemed. This went very well, despite a lot of new stuff with the generators and so on. I had one or two tests that were a bit large and in one case I just removed it and did a similar test much later, and for the other, I skipped it, did a simpler test that immediately showed me the defect, and then un-skipped and finished he prior test.

Remind me
I think that our aged_names generator can be simpler. Remind me to do that tomorrow. I just thought of it: it’s more complex than it needs to be.

I think that our new Collator class is rather clear and clean, although a refactoring session with fresh eyes tomorrow will probably pay off. I think mostly we might just extract a separate class for aged names.

Here’s the Collator class. I hope you like it, and feel free to get in touch with remarks, questions, whatever. See you next time!



class Collator:
    def __init__(self):
        self.outcomes = dict()
        self.new_names = []
        self.known_names = []

    def begin(self):
        self.known_names.extend(self.new_names)
        self.new_names = []
        self.outcomes = dict()

    def add(self, name:str, outcome:str):
        self.outcomes[name] = outcome
        self.add_name(name)

    def add_name(self, name:str):
        if name not in self.known_names and name not in self.new_names:
            self.new_names.append(name)

    def aged_names(self):
        known = self.known_names[:]
        while known:
            name = known.pop(0)
            yield AgedName(name=name, is_new=False)
        while self.new_names:
            name = self.new_names.pop(0)
            self.known_names.append(name)
            yield AgedName(name=name, is_new=True)
    def outcome_for(self, name):
        try:
            return self.outcomes[name]
        except KeyError:
            return 'Unrun'

    def results(self):
        for aged_name in self.aged_names():
            yield self.result_for(aged_name.name, is_new=aged_name.is_new)

    def result_for(self, name: str, is_new):
        return Result(name=name, outcome=(self.outcome_for(name)), is_new=is_new)