Better Still

The new Collator seems very nice by my standards. Let’s make it even better. 🎉 🎊 🍾 🥂

`aged_names`

Yesterday, in closing, I mentioned possibly improving the aged_names generator method. Let’s begin with that: it should be simple, and I like to start the day with something easy. Here it is in its present form:

class Collator:
        def aged_names(self):
        known = self.known_names[:]
        while known:
            name = known.pop(0)
            yield AgedName(name=name, is_new=False)
        while self.new_names:
            name = self.new_names.pop(0)
            self.known_names.append(name)
            yield AgedName(name=name, is_new=True)

I see a couple of things. Mainly, this can be done with for rather than the intricate while-pop construction we have here. Like this:

    def aged_names(self):
        for name in self.known_names:
            yield AgedName(name=name, is_new=False)
        for name in self.new_names:
            self.known_names.append(name)
            yield AgedName(name=name, is_new=True)
        self.new_names = []

We do the resetting of new_names because the prior code popped everything out of it, and because our whole intention is that after we process the new items, they’re all known instead of new.

This is pretty good, but I’d like to get all the way to Composed Method pattern, which leads us to methods that either do a single loop or conditional, and methods that just call other methods. To get there, we’ll use PyCharm Extract Method twice, like this:

    def aged_names(self):
        yield from self.yield_known_names()
        yield from self.yield_and_age_new_names()
        self.new_names = []

    def yield_known_names(self):
        for name in self.known_names:
            yield AgedName(name=name, is_new=False)

    def yield_and_age_new_names(self):
        for name in self.new_names:
            self.known_names.append(name)
            yield AgedName(name=name, is_new=True)

Does the yield from surprise you? It surprised me. That construct is used when a generator method, like aged_names wants to delegate some of its yielding to subordinate functions. Otherwise, I guess, Python would not recognize the upper method as a generator. PyCharm did that for me. What would have happened if I had just called the subordinate methods without the yield from?

Tests would fail, like this:

    def results(self):
>       for aged_name in self.aged_names():
E       TypeError: 'NoneType' object is not iterable

The aged_names method is not recognized as a generator, because it does not contain a yield or yield from.

Anyway, I like the new formulation better. I like the for loops better than the while, I like that we avoided the weird popping, and I like seeing the aged_names method being nicely expressive of what’s going on.

Commit: refactor aged_names for simplicity and clarity.

`class Result`

Here is the complete listing of our little class Result:

class Result:
    def __init__(self, name:str, outcome:str, is_new:bool):
        self.name = name
        self.outcome = outcome
        self.is_new = is_new

It has no behavior It is a data class. It could be a Python dataclass, and I think it should be:

@dataclass
class Result:
    name: str
    outcome: str
    is_new: bool

Commit: convert Result to dataclass

`class AgedName`

Same treatment:

@dataclass
class AgedName:
    name: str
    is_new: bool

Commit: convert AgedName to dataclass.

`with`

Python has the notion of context managers, which one uses with a with statement, such as when we read files and the like. I think our object should work that way, and I have written a test to make it so:

    @pytest.mark.skip("make this work")
    def test_with(self):
        collator = Collator()
        with collator:
            collator.add(name='TestFoo', outcome='Pass')
            collator.add(name='TestBar', outcome='Fail')
            initial_result = list(collator.results())

        with collator:
            collator.add(name='TestBaz', outcome='Pass')
            collator.add(name='TestBar', outcome='Pass')
            second_result = list(collator.results())

        self.check(initial_result, 0,
                   'TestFoo', 'Pass', True)
        self.check(initial_result, 1,
                   'TestBar', 'Fail', True)
        self.check(second_result, 0,
                   'TestFoo', 'Unrun', False)
        self.check(second_result, 1,
                   'TestBar', 'Pass', False)
        self.check(second_result, 2,
                   'TestBaz', 'Pass', True)

This is just one of our existing tests converted to use with. Because the Collator has memory, we use the same one in each with. This test, of course, does not run when I remove the skip:

    def test_with(self):
        collator = Collator()
>       with collator:
E       TypeError: 'Collator' object does not support the context manager protocol

Not a big surprise, but this might be:

class Collator:
    def __enter__(self):
        self.begin()
        return self

    def __exit__(self, *args):
        pass

That’s all it takes to give us this nice syntax. If we wanted to do any fancy exception handling or the like, the __exit__ method actually takes a bunch of parameters specifying what to do, but implemented as above, any exceptions would be triggered outside the with.

Commit: context manager with protocol.

One Thing and Another

Our Collator object has two rather separate aspects. It converts test name+status calls into a sequence of Result objects, but internally it also keeps a pair of lists of new names and previously known ones, always returning them in the order of arrival.

The name management aspect of Collator could readily be placed in a separate class. When that’s true, it’s usually a good idea to do so. It makes things easier to understand, it makes them easier to change, and often, it promotes reuse. Let’s extract a new class from Collator.

What is the name of the new class? I don’t know. Is it a NameSequencer? A NameMemory? A RememberTheOrderOfTheseThings?

We’ll call it Sequencer for now, I think.

Let’s review the parts of Collator that refer to its sequencing aspect:

class Collator:
    def __init__(self):
        self.outcomes = dict()
        self.new_names = []    # Sequencer
        self.known_names = []  # Sequencer

Those two members belong in the Sequence, it seems clear.

    def begin(self):
        self.known_names.extend(self.new_names) # Sequencer
        self.new_names = []                     # Sequencer
        self.outcomes = dict()

    def add(self, name:str, outcome:str):
        self.outcomes[name] = outcome
        self.add_name(name) # Sequencer

The two methods above are partly concerned with the sequence and partly with the outcomes, which I think are the proper subject for the Collator.

And the following four methods are all about the Sequencer and have nothing to do with anything else.

    def add_name(self, name:str):
        if name not in self.known_names and name not in self.new_names:
            self.new_names.append(name)

    def aged_names(self):
        yield from self.yield_known_names()
        yield from self.yield_and_age_new_names()
        self.new_names = []

    def yield_known_names(self):
        for name in self.known_names:
            yield AgedName(name=name, is_new=False)

    def yield_and_age_new_names(self):
        for name in self.new_names:
            self.known_names.append(name)
            yield AgedName(name=name, is_new=True)

PyCharm does not offer a magical Extract Class refactoring. We’ll do this manually, but it should be mostly a matter of copy-pasta.

Let’s try to do this incrementally. Watch closely: this might be very nice. It might be laughably awful. Either way it should be interesting.

We’ll create the sequencer in a new file and give it the two members known_names and new_names:

class Sequencer:
    def __init__(self):
        self.new_names = []
        self.known_names = []

So far so good. Now we’ll remove the two member variables in Collator and add a Sequencer instead:

class Collator:
    def __init__(self):
        self.outcomes = dict()
        self.sequencer = Sequencer()

All the tests fail. Collator won’t compile, because it says things like this:

class Collator:
    def begin(self):
        self.known_names.extend(self.new_names)
        self.new_names = []
        self.outcomes = dict()

Let’s give Collator two new properties:

    @property
    def known_names(self):
        return self.sequencer.known_names

    @known_names.setter
    def known_names(self, known_names):
        self.sequencer.known_names = known_names

    @property
    def new_names(self):
        return self.sequencer.new_names

    @new_names.setter
    def new_names(self, new_names):
        self.sequencer.new_names = new_names

We are green. The Collator is just accessing its properties from the Sequencer. But now, it seems to me, we should be able to move capabilities across one at a time.

Commit: extracting Sequencer

class Collator:
    def begin(self):
        self.known_names.extend(self.new_names)
        self.new_names = []
        self.outcomes = dict()

We change that to be this:

class Collator:
    def begin(self):
        self.sequencer.begin()
        self.outcomes = dict()

class Sequencer:
    def begin(self):
        self.known_names.extend(self.new_names)
        self.new_names = []

Green. Commit.

Now this:

class Collator:
    def add(self, name:str, outcome:str):
        self.outcomes[name] = outcome
        self.add_name(name)

    def add_name(self, name:str):
        if name not in self.known_names and name not in self.new_names:
            self.new_names.append(name)

We move add_name to Sequencer and call it:

class Collator:
    def add(self, name:str, outcome:str):
        self.outcomes[name] = outcome
        self.add_name(name)

    def add_name(self, name:str):
        self.sequencer.add_name(name)

class Sequencer:
    def add_name(self, name:str):
        if name not in self.known_names and name not in self.new_names:
            self.new_names.append(name)

Green. Commit.

Now this:

class Collator:
    def aged_names(self):
        yield from self.yield_known_names()
        yield from self.yield_and_age_new_names()
        self.new_names = []

    def yield_known_names(self):
        for name in self.known_names:
            yield AgedName(name=name, is_new=False)

    def yield_and_age_new_names(self):
        for name in self.new_names:
            self.known_names.append(name)
            yield AgedName(name=name, is_new=True)

This moves en-masse to Sequencer, and we build a forwarder in Collator:

class Collator:
    def aged_names(self):
        yield from self.sequencer.aged_names()

class Sequencer:
    def aged_names(self):
        yield from self.yield_known_names()
        yield from self.yield_and_age_new_names()
        self.new_names = []

    def yield_known_names(self):
        for name in self.known_names:
            yield AgedName(name=name, is_new=False)

    def yield_and_age_new_names(self):
        for name in self.new_names:
            self.known_names.append(name)
            yield AgedName(name=name, is_new=True)

Green. Commit.

What’s left? I check usages of my properties. There are none. Remove them.

Green. Commit: Extracted class Sequencer from Collator

Reflection

OK, first of all, that was just about the sweetest Extract Class that I have ever done. Let me take a brief moment for a little victory celebration.

🎉 🎊 🍾 🥂

👏👏 Thank you, thank you. 👏👏

There are still some things not to like about what we have here. For example:

    def add(self, name:str, outcome:str):
        self.outcomes[name] = outcome
        self.add_name(name)

    def add_name(self, name:str):
        self.sequencer.add_name(name)

I don’t think that add_name method is doing much. I could inline it, but there are eight usages, 7 of them in tests. Here’s one:

    def test_added_name_is_new(self):
        collator = Collator()
        collator.begin()
        collator.add_name('TestBar')
        aged_names = list(collator.aged_names())
        assert len(aged_names) == 1
        assert aged_names[0].name == 'TestBar'
        assert aged_names[0].is_new is True

I see two ways to go here. One would be to recast tests like this as tests on Sequencer, like this:

    def test_added_name_is_new(self):
        sequencer = Sequencer()
        sequencer.begin()
        sequencer.add_name('TestBar')
        aged_names = list(sequencer.aged_names())
        assert len(aged_names) == 1
        assert aged_names[0].name == 'TestBar'
        assert aged_names[0].is_new is True

The other option would be to do the inline refactoring and let PyCharm sort it out. The latter would tell the story of how we go where we are a bit better, perhaps, but I am not much of a believer in that notion of testing. I don’t recall ever reading them to reconstruct the build process.

So I think I’ll see about fixing up these tests.

Commit: redirecting tests to Sequencer.

Now there is only one reference to Collator.add_name, so I inline:

    def add(self, name:str, outcome:str):
        self.outcomes[name] = outcome
        self.sequencer.add_name(name)

Commit: tidying

There are two usages of aged_names:

class Collator:
    def aged_names(self):
        yield from self.sequencer.aged_names()

One is this test:

    def test_story_test_aged_names(self):
        collator = Collator()
        collator.begin()
        collator.add(name='TestFoo', outcome='Pass')
        collator.add(name='TestBar', outcome='Fail')
        aged_names = list(collator.aged_names())
        assert len(aged_names) == 2

I see other tests, many of them, that verify this. I think I wrote this test in search of a defect. I remove it and inline the method.

    def results(self):
        for aged_name in self.sequencer.aged_names():
            yield self.result_for(aged_name.name, is_new=aged_name.is_new)

I think we’re done for now. Let’s sum up and I’ll put all the code at the bottom of the article.

Summary

We did a few insignificant but nice changes, including converting our two data classes to dataclass, and setting up Collator to behave as a Context Manager, supporting with.

Then a very smooth Extract class, done by first forwarding the member accesses, and then, incrementally, moved methods over to the new class. We used four commits in extracting sequencer, and in all we had a dozen commits in a about two hours, about every ten minutes, which is quite good for me.

We finished up by rewiring tests to use Sequencer instead of testing through Collator, where appropriate, and removed one test as redundant.

The only thing I’m aware of at this moment that I’d really like to improve is to add type hinting for all my return types, and for any parameters that currently don’t have them.

And we could make a case that instead of using a string for the outcome, we should create an enum, and that we might do well to indicate private methods with leading underbar.

Conclusion … for now

I think this is much better and I am quite pleased with my personal process over the past two days compared to my first attempt. We don’t always get the chance to do something over, but I’m glad I took an extra four or five hours to do this one.

Yay, me!

Here’s the code:

@dataclass
class AgedName:
    name: str
    is_new: bool

@dataclass
class Result:
    name: str
    outcome: str
    is_new: bool

class Collator:
    def __init__(self):
        self.outcomes = dict()
        self.sequencer = Sequencer()

    def __enter__(self):
        self.begin()
        return self

    def __exit__(self, *args):
        pass

    def begin(self):
        self.sequencer.begin()
        self.outcomes = dict()

    def add(self, name:str, outcome:str):
        self.outcomes[name] = outcome
        self.sequencer.add_name(name)

    def outcome_for(self, name:str):
        try:
            return self.outcomes[name]
        except KeyError:
            return 'Unrun'

    def result_for(self, name:str, is_new:bool):
        return Result(name=name, outcome=(self.outcome_for(name)), is_new=is_new)

    def results(self):
        for aged_name in self.sequencer.aged_names():
            yield self.result_for(aged_name.name, is_new=aged_name.is_new)

class Sequencer:
    def __init__(self):
        self.new_names = []
        self.known_names = []

    def begin(self):
        self.known_names.extend(self.new_names)
        self.new_names = []

    def add_name(self, name:str):
        if name not in self.known_names and name not in self.new_names:
            self.new_names.append(name)

    def aged_names(self):
        yield from self.yield_known_names()
        yield from self.yield_and_age_new_names()
        self.new_names = []

    def yield_known_names(self):
        for name in self.known_names:
            yield AgedName(name=name, is_new=False)

    def yield_and_age_new_names(self):
        for name in self.new_names:
            self.known_names.append(name)
            yield AgedName(name=name, is_new=True)