Diversion 3: Refactoring

Continuing Diversion. It’s working. I am not proud of what it is nor of what I’ve done. We can improve what it is.

The code is not good. It’s not quite terrible but it’s not good enough to be allowed to stay as it is. I’ll pull out bits and improve them.

class Collator:
    def __init__(self):
        self.statuses = dict()
        self.keeper = NameKeeper()

    def add(self, status):
        self.statuses[status.name] = status
        self.keeper.add_name(status.name)

    def report(self):
        report = []
        for name in self.keeper.known_names():
            result = self.get_result(name, False)
            report.append(result)
        for name in self.keeper.new_names():
            report.append(self.get_result(name, True))
        self.keeper.memorize()
        self.statuses.clear()
        return report

The report method needs improvement, thus:

Extract method:

    def report(self):
        report = self.create_report()
        self.keeper.memorize()
        self.statuses.clear()
        return report

    def create_report(self):
        report = []
        for name in self.keeper.known_names():
            result = self.get_result(name, False)
            report.append(result)
        for name in self.keeper.new_names():
            report.append(self.get_result(name, True))
        return report

Green. Machine refactoring. Commit: tidying.

Make those two loops look alike by inlining.

    def create_report(self):
        report = []
        for name in self.keeper.known_names():
            report.append(self.get_result(name, False))
        for name in self.keeper.new_names():
            report.append(self.get_result(name, True))
        return report

Commit. Extract two tiny methods for clarity:

    def create_report(self):
        report = []
        self.report_known_names(report)
        self.report_new_names(report)
        return report

    def report_new_names(self, report):
        for name in self.keeper.new_names():
            report.append(self.get_result(name, True))

    def report_known_names(self, report):
        for name in self.keeper.known_names():
            report.append(self.get_result(name, False))

I don’t love this enough. Roll that back, try another approach.

Extract a variable:

    def create_report(self):
        report = []
        for name in self.keeper.known_names():
            result = self.get_result(name, False)
            report.append(result)
        for name in self.keeper.new_names():
            report.append(self.get_result(name, True))
        return report

Edit to get a comprehension:

    def create_report(self):
        report = [self.get_result(name, False) for name in self.keeper.known_names()]
        for name in self.keeper.new_names():
            report.append(self.get_result(name, True))
        return report

Do again for the other half:

    def create_report(self):
        report = [self.get_result(name, False) for name in self.keeper.known_names()]
        report += [self.get_result(name, True) for name in self.keeper.new_names()]
        return report

Nearly good. More improvement possible. I want to do it by machine refactorings when I can: extract method twice:

    def create_report(self):
        report = self.known_results()
        report += self.new_results()
        return report

    def known_results(self):
        return [self.get_result(name, False) for name in self.keeper.known_names()]

    def new_results(self):
        return [self.get_result(name, True) for name in self.keeper.new_names()]

Now make the two report lines one:

    def create_report(self):
        report = self.known_results() +  self.new_results()
        return report

Commit, I keep forgetting to commit on every green. I’m having too much fun, going too fast. Steady down.

One more inline:

    def create_report(self):
        return self.known_results() + self.new_results()

    def known_results(self):
        return [self.get_result(name, False) for name in self.keeper.known_names()]

    def new_results(self):
        return [self.get_result(name, True) for name in self.keeper.new_names()]

Commit: refactoring Collator.

Let’s look at the NameKeeper:

class NameKeeper:
    def __init__(self):
        self._known_names = []
        self._new_names = []

    def add_name(self, name):
        if name not in self._known_names:
            if name not in self._new_names:
                self._new_names.append(name)

    def known_names(self):
        return self._known_names

    def new_names(self):
        return self._new_names

    def memorize(self):
        self._known_names.extend(self._new_names)
        self._new_names = []

That’s nearly good. I think we might benefit from renaming memorize to memorize_new_names.

    def memorize_new_names(self):
        self._known_names.extend(self._new_names)
        self._new_names = []

Reflection

OK this is getting better, but it seems to me that the responsibilities aren’t quite right. When we’re creating the report, we don’t really want to do two loops at all, do we? Don’t we just want to know whether the name we’re looking up is new or not?

We could certainly return a little object, name and is_new, even a tuple. Let’s try it. We’ll TDD the new method.

    def test_tuples(self):
        nk = NameKeeper()
        nk.add_name('T1')
        nk.add_name('T2')
        assert nk.tuples() == [('T1', True), ('T2', True)]
        nk.memorize_new_names()
        nk.add_name('T3')
        nk.add_name('T1')
        assert nk.tuples() == [('T1', False), ('T2', False), ('T3', True)]

That is green with this:

    def tuples(self):
        return self.known_tuples() + self.new_tuples()

    def known_tuples(self):
        return [(name, False) for name in self._known_names]

    def new_tuples(self):
        return [(name, True) for name in self._new_names]

So now let’s use that in the Collator:

    def create_report(self):
        tuples = self.keeper.tuples()
        return [self.get_result(*t) for t in tuples]

We can remove the other two methods, known_names and new_names

Green. Commit: using new tuples method in NameKeeper

I decide to inline the temp:

    def create_report(self):
        return [self.get_result(*t) for t in (self.keeper.tuples())]

Commit: tidying

Let’s review and reflect on Collator now:

class Collator:
    def __init__(self):
        self.statuses = dict()
        self.keeper = NameKeeper()

    def add(self, status):
        self.statuses[status.name] = status
        self.keeper.add_name(status.name)

    def report(self):
        report = self.create_report()
        self.keeper.memorize_new_names()
        self.statuses.clear()
        return report

    def create_report(self):
        return [self.get_result(*t) for t in (self.keeper.tuples())]

    def get_result(self, name, is_new):
        try:
            status = self.statuses[name]
            result = ResultOfTest(status.name, status.status, is_new)
        except KeyError:
            result = ResultOfTest(name, "Unrun", is_new)
        return result

    def report_string(self):
        string = ''
        for result in self.report():
            string += result.report_string() + ' '
        return string

We can probably clean up that report_string a bit with a join and comprehension:

    def report_string(self):
        return ' '.join([result.report_string() 
            for result in self.report()])

That breaks the checks that are looking for the trailing space. Fix them:

    def test_even_harder(self):
        c = Collator()
        c.add(StatusOfTest('T1', 'Pass'))
        assert c.report_string() == 'T1Pnew'
        c.add(StatusOfTest('T2', 'Pass'))
        c.add(StatusOfTest('T1', 'Fail'))
        assert c.report_string() == 'T1F T2Pnew'
        c.add(StatusOfTest('T3', 'Pass'))
        c.add(StatusOfTest('T1', 'Fail'))
        report_string = c.report_string()
        assert report_string == 'T1F T2U T3Pnew'

Green Commit: tidying.

Time to reflect, then I’ll post all the code.

Reflection

I am quite disappointed with how this went. I think that the idea behind this is quite nice. My initial thinking was to have the two collections of names, new and known, and just to shuffle the new to the known after each cycle. That’s what we do. And then combining the output into tuples just came to me, and I like that well enough. One might argue for a tiny dataclass, and I wouldn’t object to that: it would surely be a more, how can I put this, formal approach than just flinging in a tuple.

I think it’s a decent solution and I think the code, now, is pretty much as I’d like it. It was the process of getting here that I didn’t like. I wish I had had a better save point early on, or that I had just accepted things and rolled back for a smaller test. Then, possibly, I’d have had tiny commits right along, instead of that long passage in the middle there where I couldn’t commit.

That doesn’t feel good. I wasn’t worried: somehow I was confident that I could do it. But I kind of did it by brute force instead of intelligent step by step planned work, and so I just feel kind of dirty.

I really thought this would just go whip whip there, done, impressing myself, GeePaw, and everyone with what a nifty and simple solution I had devised. It went OK, and we wound up about where I expected, but it just wasn’t impressive at all. It’s like winning a game of skill with brute strength. You have the win but not the honor that goes with it.

But I do think the result is rather nice. We have a couple of little classes, the StatusOfTest, which we receive from outside and is really just a stub here, and ResultOfTest, our little reporting object. And I am pleased with the report_string idea and even think that if we had to draw, say, a graph from this information, that string might be useful. It was certainly useful in testing.

Then the main deal is a Collator that just keeps track of one run at a time, and a NameKeeper that keeps track of the order and newness of names. We add things freely, then when we want a report, we pull the canonical name list out of the NameKeeper, together with the newness flag, we fetch all the tests that have a result, and substitute one of our own if any are missing.

I think that’s rather nice, and I hope you’re still awake to appreciate the good that is there. But the process just wasn’t the best and that’s on me.

Lessons Relearned

Stop As Soon As I Get In trouble: When things start to go bad,I should stop, back away, reflect, probably roll back, start again, start smaller.; I have to relearn this every few days. Apparently I am not a clever man.
Tests Are Lifesavers: On the good side, my tests were pretty good and rarely needed revision except when I wanted to improve them: there wasn’t much forced rewriting or revising.; And they were effective tests as well, breaking when something was wrong, or occasionally when a new capability meant that a test needed revision.
The Order Wasn’t Perfect: Well, we never get perfect. But in particular there were some instances where a test broke because I had added a new capability, like the newness, and that made me need to change a test. That does happen, but when we get things just right, it happens quite seldom. I think a bit more thinking was in order.
In Love With My Idea?: Was I so sure that my two-collection idea of known and new was a good one, that I rushed the code? It was a good idea and has held water. And I did rush the code. It’s actually possible that I would have done better to think a little less before sitting down here … or perhaps a bit more?; I did feel like I was just jogging along, almost running, because I was so sure of my footing. That probably didn’t serve me well.
Would Starting With “new” Have Been Better?: I do wonder if starting with “new” would have been better than starting with the order. There was some retrofitting that went on that we might have avoided had we started with new and then imposed the order. In fact, the order might very likely have just dropped out. I don’t see a “lesson” here. We had to start with one or the other. Maybe thinking “order first or new first, new first or order first” would have tipped me off to the idea that new was a better starting point but it seemed harder and therefore I left it until the other structure was in place.; Probably not, though: I think most of the trouble came long before new. New actually got handled pretty smoothly anyway. My troubles weren’t about “new” or “not new”. More likely I was just too sure that I had it all in hand and rushed.
What About A More Functional Approach?: I wonder whether a more functional approach would be nice. Maybe I’ll try this again with a new approach.

Summary

I like the result. I do not like the process, and want to believe that I usually do better. But what GeePaw wants and what you deserve, is the truth of what happened, not some polished made-up story of a perfect implementation.

This is what really happened. I’m decently proud of the result, and hope that you can see that it’s a pretty good one. I’m not proud of the stumbling chaotic way that I did it. And that, too, is the truth.

Source Code

I’ve renamed some of the classes, because class names starting with Test might confuse pytest and it certainly confuses me.

I’m sure there are things that could still be improved, but I think that mostly we’re in good shape. There are only two methods plus a string maker that make any kind of decision at all. There are only two methods with any indentation at all. Everything else just flows. I find that pleasing.

So here is the whole thing as it stands now:

class StatusOfTest:
    def __init__(self, name, status):
        self.name = name
        self.status = status


class ResultOfTest:
    def __init__(self, name, status, is_new=False):
        self.name = name
        self.status = status
        self.is_new = is_new

    def report_string(self):
        return self.name + self.status[0] + ('new' if self.is_new else '')


class Collator:
    def __init__(self):
        self.statuses = dict()
        self.keeper = NameKeeper()

    def add(self, status):
        self.statuses[status.name] = status
        self.keeper.add_name(status.name)

    def report(self):
        report = self.create_report()
        self.keeper.memorize_new_names()
        self.statuses.clear()
        return report

    def create_report(self):
        return [self.get_result(*t) for t in (self.keeper.tuples())]

    def get_result(self, name, is_new):
        try:
            status = self.statuses[name]
            result = ResultOfTest(status.name, status.status, is_new)
        except KeyError:
            result = ResultOfTest(name, "Unrun", is_new)
        return result

    def report_string(self):
        return ' '.join([result.report_string() for result in self.report()])

And the helper class NameKeeper:

class NameKeeper:
    def __init__(self):
        self._known_names = []
        self._new_names = []

    def add_name(self, name):
        if name not in self._known_names:
            if name not in self._new_names:
                self._new_names.append(name)

    def known_names(self):
        return self._known_names

    def new_names(self):
        return self._new_names

    def memorize_new_names(self):
        self._known_names.extend(self._new_names)
        self._new_names = []

    def tuples(self):
        return self.known_tuples() + self.new_tuples()

    def known_tuples(self):
        return [(name, False) for name in self._known_names]

    def new_tuples(self):
        return [(name, True) for name in self._new_names]

And all the tests:

class TestNameKeeper:
    def test_namekeeper(self):
        nk = NameKeeper()
        nk.add_name('T1')
        assert nk.new_names() == ['T1']
        nk.memorize_new_names()
        assert nk.new_names() == []
        assert nk.known_names() == ['T1']

    def test_phases(self):
        nk = NameKeeper()
        nk.add_name('T1')
        nk.add_name('T2')
        assert nk.known_names() == []
        assert nk.new_names() == ['T1', 'T2']
        nk.memorize_new_names()
        assert nk.known_names() == ['T1', 'T2']
        assert nk.new_names() == []
        nk.add_name('T3')
        nk.add_name('T1')
        assert nk.known_names() == ['T1', 'T2']
        assert nk.new_names() == ['T3']
        nk.memorize_new_names()
        assert nk.known_names() == ['T1', 'T2', 'T3']
        assert nk.new_names() == []

    def test_tuples(self):
        nk = NameKeeper()
        nk.add_name('T1')
        nk.add_name('T2')
        assert nk.tuples() == [('T1', True), ('T2', True)]
        nk.memorize_new_names()
        nk.add_name('T3')
        nk.add_name('T1')
        assert nk.tuples() == [('T1', False), ('T2', False), ('T3', True)]



class TestCollated:
    def test_collator_exists(self):
        Collator()

    def test_no_tests(self):
        c = Collator()
        assert c.report() == []

    def test_one_status(self):
        c = Collator()
        c.add(StatusOfTest('T1', 'Pass'))
        report = c.report()
        result = report[0]
        assert type(result) == ResultOfTest
        assert result.name == 'T1'
        assert result.status == 'Pass'

    def test_two_phases(self):
        c = Collator()
        c.add(StatusOfTest('T1', 'Pass'))
        c.report()
        c.add(StatusOfTest('T2', 'Fail'))
        result = c.report()
        assert result[0].name == 'T1'
        assert result[1].name == 'T2'

    def test_second_report_shows_unrun(self):
        c = Collator()
        c.add(StatusOfTest('T1', 'Pass'))
        c.report()
        unrun = c.report()
        assert len(unrun) == 1
        assert unrun[0].name == 'T1'
        assert unrun[0].status == 'Unrun'

    def test_two_harder_phases(self):
        c = Collator()
        c.add(StatusOfTest('T1', 'Pass'))
        c.report()
        c.add(StatusOfTest('T2', 'Pass'))
        c.add(StatusOfTest('T1', 'Fail'))
        result = c.report()
        assert len(result) == 2
        assert result[0].name == 'T1'
        assert result[0].status == 'Fail'
        assert result[1].name == 'T2'
        assert result[1].status == 'Pass'

    def test_even_harder(self):
        c = Collator()
        c.add(StatusOfTest('T1', 'Pass'))
        assert c.report_string() == 'T1Pnew'
        c.add(StatusOfTest('T2', 'Pass'))
        c.add(StatusOfTest('T1', 'Fail'))
        assert c.report_string() == 'T1F T2Pnew'
        c.add(StatusOfTest('T3', 'Pass'))
        c.add(StatusOfTest('T1', 'Fail'))
        report_string = c.report_string()
        assert report_string == 'T1F T2U T3Pnew'