Better Still
The new Collator seems very nice by my standards. Let’s make it even better. 🎉 🎊 🍾 🥂
aged_names
Yesterday, in closing, I mentioned possibly improving the aged_names generator method. Let’s begin with that: it should be simple, and I like to start the day with something easy. Here it is in its present form:
class Collator:
def aged_names(self):
known = self.known_names[:]
while known:
name = known.pop(0)
yield AgedName(name=name, is_new=False)
while self.new_names:
name = self.new_names.pop(0)
self.known_names.append(name)
yield AgedName(name=name, is_new=True)
I see a couple of things. Mainly, this can be done with for
rather than the intricate while-pop
construction we have here. Like this:
def aged_names(self):
for name in self.known_names:
yield AgedName(name=name, is_new=False)
for name in self.new_names:
self.known_names.append(name)
yield AgedName(name=name, is_new=True)
self.new_names = []
We do the resetting of new_names
because the prior code popped everything out of it, and because our whole intention is that after we process the new items, they’re all known instead of new.
This is pretty good, but I’d like to get all the way to Composed Method pattern, which leads us to methods that either do a single loop or conditional, and methods that just call other methods. To get there, we’ll use PyCharm Extract Method twice, like this:
def aged_names(self):
yield from self.yield_known_names()
yield from self.yield_and_age_new_names()
self.new_names = []
def yield_known_names(self):
for name in self.known_names:
yield AgedName(name=name, is_new=False)
def yield_and_age_new_names(self):
for name in self.new_names:
self.known_names.append(name)
yield AgedName(name=name, is_new=True)
Does the yield from
surprise you? It surprised me. That construct is used when a generator method, like aged_names
wants to delegate some of its yielding to subordinate functions. Otherwise, I guess, Python would not recognize the upper method as a generator. PyCharm did that for me. What would have happened if I had just called the subordinate methods without the yield from
?
Tests would fail, like this:
def results(self):
> for aged_name in self.aged_names():
E TypeError: 'NoneType' object is not iterable
The aged_names method is not recognized as a generator, because it does not contain a yield
or yield from
.
Anyway, I like the new formulation better. I like the for
loops better than the while, I like that we avoided the weird popping, and I like seeing the aged_names
method being nicely expressive of what’s going on.
Commit: refactor aged_names
for simplicity and clarity.
class Result
Here is the complete listing of our little class Result:
class Result:
def __init__(self, name:str, outcome:str, is_new:bool):
self.name = name
self.outcome = outcome
self.is_new = is_new
It has no behavior It is a data class. It could be a Python dataclass, and I think it should be:
@dataclass
class Result:
name: str
outcome: str
is_new: bool
Commit: convert Result to dataclass
class AgedName
Same treatment:
@dataclass
class AgedName:
name: str
is_new: bool
Commit: convert AgedName to dataclass.
with
Python has the notion of context managers, which one uses with a with
statement, such as when we read files and the like. I think our object should work that way, and I have written a test to make it so:
@pytest.mark.skip("make this work")
def test_with(self):
collator = Collator()
with collator:
collator.add(name='TestFoo', outcome='Pass')
collator.add(name='TestBar', outcome='Fail')
initial_result = list(collator.results())
with collator:
collator.add(name='TestBaz', outcome='Pass')
collator.add(name='TestBar', outcome='Pass')
second_result = list(collator.results())
self.check(initial_result, 0,
'TestFoo', 'Pass', True)
self.check(initial_result, 1,
'TestBar', 'Fail', True)
self.check(second_result, 0,
'TestFoo', 'Unrun', False)
self.check(second_result, 1,
'TestBar', 'Pass', False)
self.check(second_result, 2,
'TestBaz', 'Pass', True)
This is just one of our existing tests converted to use with
. Because the Collator has memory, we use the same one in each with
. This test, of course, does not run when I remove the skip:
def test_with(self):
collator = Collator()
> with collator:
E TypeError: 'Collator' object does not support the context manager protocol
Not a big surprise, but this might be:
class Collator:
def __enter__(self):
self.begin()
return self
def __exit__(self, *args):
pass
That’s all it takes to give us this nice syntax. If we wanted to do any fancy exception handling or the like, the __exit__
method actually takes a bunch of parameters specifying what to do, but implemented as above, any exceptions would be triggered outside the with
.
Commit: context manager with
protocol.
One Thing and Another
Our Collator object has two rather separate aspects. It converts test name+status calls into a sequence of Result objects, but internally it also keeps a pair of lists of new names and previously known ones, always returning them in the order of arrival.
The name management aspect of Collator could readily be placed in a separate class. When that’s true, it’s usually a good idea to do so. It makes things easier to understand, it makes them easier to change, and often, it promotes reuse. Let’s extract a new class from Collator.
What is the name of the new class? I don’t know. Is it a NameSequencer? A NameMemory? A RememberTheOrderOfTheseThings?
We’ll call it Sequencer for now, I think.
Let’s review the parts of Collator that refer to its sequencing aspect:
class Collator:
def __init__(self):
self.outcomes = dict()
self.new_names = [] # Sequencer
self.known_names = [] # Sequencer
Those two members belong in the Sequence, it seems clear.
def begin(self):
self.known_names.extend(self.new_names) # Sequencer
self.new_names = [] # Sequencer
self.outcomes = dict()
def add(self, name:str, outcome:str):
self.outcomes[name] = outcome
self.add_name(name) # Sequencer
The two methods above are partly concerned with the sequence and partly with the outcomes, which I think are the proper subject for the Collator.
And the following four methods are all about the Sequencer and have nothing to do with anything else.
def add_name(self, name:str):
if name not in self.known_names and name not in self.new_names:
self.new_names.append(name)
def aged_names(self):
yield from self.yield_known_names()
yield from self.yield_and_age_new_names()
self.new_names = []
def yield_known_names(self):
for name in self.known_names:
yield AgedName(name=name, is_new=False)
def yield_and_age_new_names(self):
for name in self.new_names:
self.known_names.append(name)
yield AgedName(name=name, is_new=True)
PyCharm does not offer a magical Extract Class refactoring. We’ll do this manually, but it should be mostly a matter of copy-pasta.
Let’s try to do this incrementally. Watch closely: this might be very nice. It might be laughably awful. Either way it should be interesting.
We’ll create the sequencer in a new file and give it the two members known_names
and new_names
:
class Sequencer:
def __init__(self):
self.new_names = []
self.known_names = []
So far so good. Now we’ll remove the two member variables in Collator and add a Sequencer instead:
class Collator:
def __init__(self):
self.outcomes = dict()
self.sequencer = Sequencer()
All the tests fail. Collator won’t compile, because it says things like this:
class Collator:
def begin(self):
self.known_names.extend(self.new_names)
self.new_names = []
self.outcomes = dict()
Let’s give Collator two new properties:
@property
def known_names(self):
return self.sequencer.known_names
@known_names.setter
def known_names(self, known_names):
self.sequencer.known_names = known_names
@property
def new_names(self):
return self.sequencer.new_names
@new_names.setter
def new_names(self, new_names):
self.sequencer.new_names = new_names
We are green. The Collator is just accessing its properties from the Sequencer. But now, it seems to me, we should be able to move capabilities across one at a time.
Commit: extracting Sequencer
class Collator:
def begin(self):
self.known_names.extend(self.new_names)
self.new_names = []
self.outcomes = dict()
We change that to be this:
class Collator:
def begin(self):
self.sequencer.begin()
self.outcomes = dict()
class Sequencer:
def begin(self):
self.known_names.extend(self.new_names)
self.new_names = []
Green. Commit.
Now this:
class Collator:
def add(self, name:str, outcome:str):
self.outcomes[name] = outcome
self.add_name(name)
def add_name(self, name:str):
if name not in self.known_names and name not in self.new_names:
self.new_names.append(name)
We move add_name
to Sequencer and call it:
class Collator:
def add(self, name:str, outcome:str):
self.outcomes[name] = outcome
self.add_name(name)
def add_name(self, name:str):
self.sequencer.add_name(name)
class Sequencer:
def add_name(self, name:str):
if name not in self.known_names and name not in self.new_names:
self.new_names.append(name)
Green. Commit.
Now this:
class Collator:
def aged_names(self):
yield from self.yield_known_names()
yield from self.yield_and_age_new_names()
self.new_names = []
def yield_known_names(self):
for name in self.known_names:
yield AgedName(name=name, is_new=False)
def yield_and_age_new_names(self):
for name in self.new_names:
self.known_names.append(name)
yield AgedName(name=name, is_new=True)
This moves en-masse to Sequencer, and we build a forwarder in Collator:
class Collator:
def aged_names(self):
yield from self.sequencer.aged_names()
class Sequencer:
def aged_names(self):
yield from self.yield_known_names()
yield from self.yield_and_age_new_names()
self.new_names = []
def yield_known_names(self):
for name in self.known_names:
yield AgedName(name=name, is_new=False)
def yield_and_age_new_names(self):
for name in self.new_names:
self.known_names.append(name)
yield AgedName(name=name, is_new=True)
Green. Commit.
What’s left? I check usages of my properties. There are none. Remove them.
Green. Commit: Extracted class Sequencer from Collator
Reflection
OK, first of all, that was just about the sweetest Extract Class that I have ever done. Let me take a brief moment for a little victory celebration.
🎉 🎊 🍾 🥂
👏👏 Thank you, thank you. 👏👏
There are still some things not to like about what we have here. For example:
def add(self, name:str, outcome:str):
self.outcomes[name] = outcome
self.add_name(name)
def add_name(self, name:str):
self.sequencer.add_name(name)
I don’t think that add_name
method is doing much. I could inline it, but there are eight usages, 7 of them in tests. Here’s one:
def test_added_name_is_new(self):
collator = Collator()
collator.begin()
collator.add_name('TestBar')
aged_names = list(collator.aged_names())
assert len(aged_names) == 1
assert aged_names[0].name == 'TestBar'
assert aged_names[0].is_new is True
I see two ways to go here. One would be to recast tests like this as tests on Sequencer, like this:
def test_added_name_is_new(self):
sequencer = Sequencer()
sequencer.begin()
sequencer.add_name('TestBar')
aged_names = list(sequencer.aged_names())
assert len(aged_names) == 1
assert aged_names[0].name == 'TestBar'
assert aged_names[0].is_new is True
The other option would be to do the inline refactoring and let PyCharm sort it out. The latter would tell the story of how we go where we are a bit better, perhaps, but I am not much of a believer in that notion of testing. I don’t recall ever reading them to reconstruct the build process.
So I think I’ll see about fixing up these tests.
Commit: redirecting tests to Sequencer.
Now there is only one reference to Collator.add_name
, so I inline:
def add(self, name:str, outcome:str):
self.outcomes[name] = outcome
self.sequencer.add_name(name)
Commit: tidying
There are two usages of aged_names:
class Collator:
def aged_names(self):
yield from self.sequencer.aged_names()
One is this test:
def test_story_test_aged_names(self):
collator = Collator()
collator.begin()
collator.add(name='TestFoo', outcome='Pass')
collator.add(name='TestBar', outcome='Fail')
aged_names = list(collator.aged_names())
assert len(aged_names) == 2
I see other tests, many of them, that verify this. I think I wrote this test in search of a defect. I remove it and inline the method.
def results(self):
for aged_name in self.sequencer.aged_names():
yield self.result_for(aged_name.name, is_new=aged_name.is_new)
I think we’re done for now. Let’s sum up and I’ll put all the code at the bottom of the article.
Summary
We did a few insignificant but nice changes, including converting our two data classes to dataclass, and setting up Collator to behave as a Context Manager, supporting with
.
Then a very smooth Extract class, done by first forwarding the member accesses, and then, incrementally, moved methods over to the new class. We used four commits in extracting sequencer, and in all we had a dozen commits in a about two hours, about every ten minutes, which is quite good for me.
We finished up by rewiring tests to use Sequencer instead of testing through Collator, where appropriate, and removed one test as redundant.
The only thing I’m aware of at this moment that I’d really like to improve is to add type hinting for all my return types, and for any parameters that currently don’t have them.
And we could make a case that instead of using a string for the outcome, we should create an enum
, and that we might do well to indicate private methods with leading underbar.
Conclusion … for now
I think this is much better and I am quite pleased with my personal process over the past two days compared to my first attempt. We don’t always get the chance to do something over, but I’m glad I took an extra four or five hours to do this one.
Yay, me!
Here’s the code:
@dataclass
class AgedName:
name: str
is_new: bool
@dataclass
class Result:
name: str
outcome: str
is_new: bool
class Collator:
def __init__(self):
self.outcomes = dict()
self.sequencer = Sequencer()
def __enter__(self):
self.begin()
return self
def __exit__(self, *args):
pass
def begin(self):
self.sequencer.begin()
self.outcomes = dict()
def add(self, name:str, outcome:str):
self.outcomes[name] = outcome
self.sequencer.add_name(name)
def outcome_for(self, name:str):
try:
return self.outcomes[name]
except KeyError:
return 'Unrun'
def result_for(self, name:str, is_new:bool):
return Result(name=name, outcome=(self.outcome_for(name)), is_new=is_new)
def results(self):
for aged_name in self.sequencer.aged_names():
yield self.result_for(aged_name.name, is_new=aged_name.is_new)
class Sequencer:
def __init__(self):
self.new_names = []
self.known_names = []
def begin(self):
self.known_names.extend(self.new_names)
self.new_names = []
def add_name(self, name:str):
if name not in self.known_names and name not in self.new_names:
self.new_names.append(name)
def aged_names(self):
yield from self.yield_known_names()
yield from self.yield_and_age_new_names()
self.new_names = []
def yield_known_names(self):
for name in self.known_names:
yield AgedName(name=name, is_new=False)
def yield_and_age_new_names(self):
for name in self.new_names:
self.known_names.append(name)
yield AgedName(name=name, is_new=True)