Even Better!?!?

Collator Repo on GitHub

A few words with GeePaw lead me to envision an even simpler Collator. Let’s see. (Very smooth refactoring to a different, possibly better version.)

On a Slack we use, GeePaw Hill and I were talking about the Collator. We spoke of “bullet-proofing” it, making it as impervious as possible to misuse. We agreed that we generally do not do much of that, but observed that it might be desirable sometimes, depending on one’s users. After that chat, I was thinking and now I think I have an even better idea for how this thing should work. I’m adding these goals to the spec:

Multiple calls to get results after adding should all return the same results. (I think we already have this feature.)
It should work without with.
New batches will be started automatically whenever results are requested.
In fact, we should remove with altogether.

We might choose not to remove the with if we had users, but so far we have none, so we can change the protocol as we see fit.

I propose to refactor to meet these new goals, and I expect to end up with an object even smaller and simpler than we have now. This outcome—if and when it occurs—will amaze Recent Past Ron, who felt that the current solution was essentially minimal. And recall, we have reduced its size from about 90 lines to about 20 already, and from four classes (two trivial) to two (one trivial).

Most of our current tests, perhaps all of them, use with. We’ll keep the with working until the last step, after which all those tests will have to be modified in a simple lexical fashion, removing the with and tabbing out the lines under its control.

Let’s begin with a new test that tells the whole new story. Here’s the version still with with:

    def test_collator_without_with(self):
        collator = Collator()
        with collator:
            collator.add(name='TestFoo', outcome='Pass')
            collator.add(name='TestBar', outcome='Fail')
            initial_result = list(collator.results())

        self.check(initial_result, 0,
                   'TestFoo', 'Pass', True)
        self.check(initial_result, 1,
                   'TestBar', 'Fail', True)
        duplicate_initial_result = list(collator.results())
        assert duplicate_initial_result == initial_result

        with collator:
            collator.add(name='TestBaz', outcome='Pass')
            collator.add(name='TestBar', outcome='Pass')
            second_result = list(collator.results())

        self.check(second_result, 0,
                   'TestFoo', 'Unrun', False)
        self.check(second_result, 1,
                   'TestBar', 'Pass', False)
        self.check(second_result, 2,
                   'TestBaz', 'Pass', True)
        duplicate_second_result = list(collator.results())
        assert duplicate_second_result == second_result

That’s verifying that the results() method can be called repeatedly with the same, um, results.

Now we’ll remove the with calls. The test will fail. Then we’ll mark it to skip until we’re ready, because we plan some refactoring and changes that we’d prefer to commit as we go. Here it is with the with removed:

    def test_collator_without_with(self):
        collator = Collator()
        collator.add(name='TestFoo', outcome='Pass')
        collator.add(name='TestBar', outcome='Fail')
        initial_result = list(collator.results())

        self.check(initial_result, 0,
                   'TestFoo', 'Pass', True)
        self.check(initial_result, 1,
                   'TestBar', 'Fail', True)
        duplicate_initial_result = list(collator.results())
        assert duplicate_initial_result == initial_result

        collator.add(name='TestBaz', outcome='Pass')
        collator.add(name='TestBar', outcome='Pass')
        second_result = list(collator.results())

        self.check(second_result, 0,
                   'TestFoo', 'Unrun', False)
        self.check(second_result, 1,
                   'TestBar', 'Pass', False)
        self.check(second_result, 2,
                   'TestBaz', 'Pass', True)
        duplicate_second_result = list(collator.results())
        assert duplicate_second_result == second_result

It fails, of course. It’ll fail on the second result, because the object isn’t reset. Skip that test for now.

    @pytest.mark.skip("soon")

OK Green, one ignored. Commit: Final story test skipped but ready.

Here’s Collator. 20 line breaks, let’s make a note of that:

class Collator:
    def __init__(self):
        self.outcomes = dict()
        self.high_water = 0

    def __enter__(self) -> Self:
        for name in self.outcomes.keys():
            self.outcomes[name] = 'Unrun'
        self.high_water = len(self.outcomes)
        return self

    def __exit__(self, *args):
        pass

    def add(self, name: str, outcome: str):
        self.outcomes[name] = outcome

    def results(self) -> Generator[Result, None, None]:
        return (Result(k, v, i>= self.high_water)
                for i, (k, v) in enumerate(self.outcomes.items()))

Extract method from __enter__:

    def __enter__(self) -> Self:
        self._prepare_for_next_batch()
        return self

    def _prepare_for_next_batch(self):
        for name in self.outcomes.keys():
            self.outcomes[name] = 'Unrun'
        self.high_water = len(self.outcomes)

Green. Commit: refactoring extract method

Add and initialize a new member variable:

class Collator:
    def __init__(self):
        self.outcomes = dict()
        self.high_water = 0
        self.reset_on_add = self._prepare_for_next_batch

Green. Commit: new reset_on_add member initialized

Change _prepare_for_next_batch to only occur once per batch:

    def _prepare_for_next_batch(self):
        for name in self.outcomes.keys():
            self.outcomes[name] = 'Unrun'
        self.high_water = len(self.outcomes)
        self.reset_on_add = lambda: None

Green. Commit: _prepare_for_next_batch resets reset_on_add to do nothing.

Now change results to set _prepare_for_next_batch to do the init:

    def results(self) -> Generator[Result, None, None]:
        self.reset_on_add = self._prepare_for_next_batch
        return (Result(k, v, i>= self.high_water)
                for i, (k, v) in enumerate(self.outcomes.items()))

Green commit: results sets up for a reset.

Let’s think about what’s happening now. We init the reset variable to prepare for a new batch. When we prepare for a new batch, we change reset to do nothing. When we fetch results, we set the reset variable back to preparing. It’s a switch. Let’s call it in add:

    def add(self, name: str, outcome: str):
        self.reset_on_add()
        self.outcomes[name] = outcome

Green. Commit: add calls reset_on_add on every addition.

I claim that we no longer need to call the prepare method in __enter__, which currently looks like this:

    def __enter__(self) -> Self:
        self._prepare_for_next_batch()
        return self

Make it look like this:

    def __enter__(self) -> Self:
        return self

Two tests fail! They are checking collator state. Here’s one:

    def test_story_test(self):
        collator = Collator()
        with collator:
            assert collator.high_water == 0
            collator.add(name='TestFoo', outcome='Pass')
            collator.add(name='TestBar', outcome='Fail')
            initial_result = list(collator.results())
        self.check(initial_result, 0,
                   'TestFoo', 'Pass', True)
        self.check(initial_result, 1,
                   'TestBar', 'Fail', True)
        with collator:
            ### The following assertion fails
            assert collator.high_water == 2
            collator.add(name='TestBaz', outcome='Pass')
            collator.add(name='TestBar', outcome='Pass')
            second_result = list(collator.results())
        self.check(second_result, 0,
                   'TestFoo', 'Unrun', False)
        self.check(second_result, 1,
                   'TestBar', 'Pass', False)
        self.check(second_result, 2,
                   'TestBaz', 'Pass', True)

We can change this test to work by moving the assertion down one line:

        ...
        with collator:
            collator.add(name='TestBaz', outcome='Pass')
            assert collator.high_water == 2
            collator.add(name='TestBar', outcome='Pass')
            ...

That one passes. Here’s the other failure:

    def test_collator_aged_names(self):
        collator = Collator()
        with collator:
            collator.add(name='TestFoo', outcome='Pass')
            collator.add(name='TestBar', outcome='Fail')
        with collator:
            collator.add(name='TestBaz', outcome='Pass')
            aged = list(collator.results())
        assert len(aged) == 3
        assert [a.name for a in aged] == ['TestFoo', 'TestBar', 'TestBaz']
        assert [a.is_new for a in aged] == [False, False, True]

It’s failing because we didn’t ask for results after the first two, and our new definition is that it’s all the same batch until you ask for results. So this will fix that test:

    def test_collator_aged_names(self):
        collator = Collator()
        with collator:
            collator.add(name='TestFoo', outcome='Pass')
            collator.add(name='TestBar', outcome='Fail')
            collator.results()  # necessary to delineate batches
        with collator:
            collator.add(name='TestBaz', outcome='Pass')
            aged = list(collator.results())
        assert len(aged) == 3
        assert [a.name for a in aged] == ['TestFoo', 'TestBar', 'TestBaz']
        assert [a.is_new for a in aged] == [False, False, True]

Green: commit: with does not reset. new batches are delineated by asking for results.

I think that now our new test should go green. Remove the skip. It runs green. Commit: test without with runs green. OK to remove with at your pleasure.

Here’s the Collator as it stands:

class Collator:
    def __init__(self):
        self.outcomes = dict()
        self.high_water = 0
        self.reset_on_add = self._prepare_for_next_batch

    def __enter__(self) -> Self:
        return self

    def __exit__(self, *args):
        pass

    def add(self, name: str, outcome: str):
        self.reset_on_add()
        self.outcomes[name] = outcome

    def results(self) -> Generator[Result, None, None]:
        self.reset_on_add = self._prepare_for_next_batch
        return (Result(k, v, i>= self.high_water)
                for i, (k, v) in enumerate(self.outcomes.items()))

    def _prepare_for_next_batch(self):
        for name in self.outcomes.keys():
            self.outcomes[name] = 'Unrun'
        self.high_water = len(self.outcomes)
        self.reset_on_add = lambda: None

Now we can change all our tests not to do the with any more. For example, from this:

    def test_collator_aged_names(self):
        collator = Collator()
        with collator:
            collator.add(name='TestFoo', outcome='Pass')
            collator.add(name='TestBar', outcome='Fail')
            collator.results()
        with collator:
            collator.add(name='TestBaz', outcome='Pass')
            aged = list(collator.results())
        assert len(aged) == 3
        assert [a.name for a in aged] == ['TestFoo', 'TestBar', 'TestBaz']
        assert [a.is_new for a in aged] == [False, False, True]

To this:

    def test_collator_aged_names(self):
        collator = Collator()
        collator.add(name='TestFoo', outcome='Pass')
        collator.add(name='TestBar', outcome='Fail')
        collator.results()
        collator.add(name='TestBaz', outcome='Pass')
        aged = list(collator.results())
        assert len(aged) == 3
        assert [a.name for a in aged] == ['TestFoo', 'TestBar', 'TestBaz']
        assert [a.is_new for a in aged] == [False, False, True]

I’ll remove the __enter__ and __exit__ to find all the users of with, which we choose to remove as an unnecessary notion to think about. That only takes moments.

Commit: Collator no longer requires or supports with syntax.

Here’s our final Collator:

class Collator:
    def __init__(self):
        self.outcomes = dict()
        self.high_water = 0
        self.reset_on_add = self._prepare_for_next_batch

    def add(self, name: str, outcome: str):
        self.reset_on_add()
        self.outcomes[name] = outcome

    def results(self) -> Generator[Result, None, None]:
        self.reset_on_add = self._prepare_for_next_batch
        return (Result(k, v, i>= self.high_water)
                for i, (k, v) in enumerate(self.outcomes.items()))

    def _prepare_for_next_batch(self):
        for name in self.outcomes.keys():
            self.outcomes[name] = 'Unrun'
        self.high_water = len(self.outcomes)
        self.reset_on_add = lambda: None

Still 20 line breaks, but no longer with the somewhat obscure __enter__ and __exit__ dunder methods. There isn’t a single line of code that can be removed without breaking a test.

Let’s reflect and sum up.

Summary | yrammuS

The good news is that this refactoring went very smoothly with no real surprises and only one point where a couple of tests needed changing to accommodate the new protocol. I am pleased with the process: very small commits and few surprises.

Is this version better than the preceding one? I had thought that it would be obviously better and possibly shorter. Now that we’re done, the comparison is closer than I had expected. The new version is the same length, and is more complex in one way, less so in another.

I would assess whether it’s “better” as follows:

Better: No longer requires (or allows) with.; Removed two somewhat obscure dunder methods.; User code is less nested.; Implementation may be slightly simpler.
Not Better: Internal toggling via call is a bit fancy. Perhaps less obvious than enter / exit?; Requiring the with notation might actually have produced better-reading code from users.

I don’t know the answer as to which is better. I think that last point, about the user code, should perhaps carry a bit more weight than the others, because there will be a lot of user code, and anything we can do to nudge it to a better form can have a big payoff.

And I have no vested interest in either answer: I wrote all the versions we have considered here, four or five of them. In one case my result was OK and my process made me feel that I had brute-forced the answer rather than creating it with a smooth flow. I then did it again and got a similar result more smoothly. I did it again, with a new idea, and got a much better result. I then refactored the larger solution to be the same as the new idea one, showing that we could refactor to the good idea, without a full and risky rewrite. Always better.

And then today’s version. Better in some ways, perhaps worse in others. A very close judgment call, up to the team doing the work.

I’m here for the discoveries. I’m pleased that my idea for the day worked out, and interested to find that upon seeing the result, I like it, but not as well as I had expected to.

Next time, back to Forth! Probably. See you then!