Changes

Mistakes? Technical Debt? Better ideas? All the above?

As my brother GeePaw Hill puts it, we here are in the business of changing code. It turns out that some of this code needs changing, and, for at least some of it, a sufficiently cruel person would say that I “should have known better”. But before I hang my head in shame, I realize that I know at any given time only what I know, and I don’t know the rest. At every moment of programming, I do the best that I can do to balance all the concerns that face me. Among those concerns, even here at home, is “getting something done”. That concern is probably even higher for you at work.

As we balance “getting something done” against “do everything perfectly”, getting something done needs to win often enough to keep bread on the table, the wolf from the door, and management off our back. And, often, there comes a time when we discover something that needs improvement, and, often, it’s best to improve it, whether because the improved code will be easier to understand, or be more reliable, or just because we have a moment and it needs to be better.

Yesterday I noticed something when “improving” the code, and upon reflection I think it should work differently.

Frequent readers will recall that our class, XSet, implements all the behavior of Extended Set Theory that we choose to build. That class relies on a handful of subclasses of XImplementation to store sets and to produce their elements and perform the few basic operations that are necessary. The various XImplementation subclasses organize their data in a handful of different fashions that are optimized for particular cases.

One convention that we have is that a set operation in XSet can call a method on an XImpementation, offering it the chance to do that operation differently than how XSet might do it. We usually do that to allow the implementation to do something more efficient than the generalized set operation might do.

And yesterday I realized that we don’t do that as nicely as we might. Let’s get down to cases.

class XSet:
    def diff(self, other):
        """
        set difference or relative complement
        :param other: any set
        :return: self - other
        """
        try:
            return self.implementation.diff(other)
        except AttributeError:
            mine = set(t for t in self)
            others = set(t for t in other)
            remaining = mine - others
            return XSet.from_tuples(remaining)

class XFrozen(XImplementation):
    def diff(self, other):
        from xset import XSet
        if not isinstance(other.implementation, self.__class__):
            raise AttributeError
        else:
            diff = self.data - other.implementation.data
            impl = self.__class__(diff)
            return XSet(impl)

We see here the full convention that we’re currently using. We send a message to our implementation, and if it responds, we return the result it gives us, and if it returns AttributError, we do our standard generic implementation. AttributeError is returned when an object does not implement an operation at all, so the upshot is that we do not need to make any provision in XImplementation subclasses that do not handle the message diff: they’ll automatically return AttributeError.

When we did that code yesterday, note that we had to say from xset import XSet inside the method. Why? Well, because XSet imports all the implementations, because it uses them, so if we import XSet into an implementation we get a circular import and pytest can’t cope.

As I thought about that yesterday, it came to me that the XImplementations are used by XSet and as a helper kind of class, it’s really not good for them to know the names of their user classes. In particular, it’s not good for XFrozen.diff to know that it’s supposed to return an XSet. It has no real need to know that XSets exist at all.

And it came to me that when we ask an XImplementation to do some operation, it should return an XImplementation instance, not an XSet instance. (And, most probably, it should only ever return an instance of its own class.)

So we’re going to change the rules, to something like this:

XImplementations can optionally offer specialized methods in support of set operations. XSet can call those methods, fielding AttributeError in case the operation is not implemented. If the operation completes, it will typically return an XImplementation instance that the XSet can use as all or part of the resultant set.

I don’t love that phrasing. I do think it’s what we want to do.

Let’s fix the case above to see how it goes.

class XSet:
    def diff(self, other):
        """
        set difference or relative complement
        :param other: any set
        :return: self - other
        """
        try:
            return XSet(self.implementation.diff(other))
        except AttributeError:
            mine = set(t for t in self)
            others = set(t for t in other)
            remaining = mine - others
            return XSet.from_tuples(remaining)


class XFrozen(XImplementation):
    def diff(self, other):
        if not isinstance(other.implementation, self.__class__):
            raise AttributeError
        else:
            diff = self.data - other.implementation.data
            return self.__class__(diff)

That’s all it takes. We went from green to red to green as I changed those few lines. Commit: diff attempts deferral to implementation, expecting an implementation back.

Let’s find any other cases. Here’s one:

class XSet:
    def rename_contents(self, re_scoping_set: Self):
        try:
            return self.implementation.rename_contents(re_scoping_set)
        except AttributeError:
            new_tuples = ((e.rename(re_scoping_set), s) for e, s in self)
            return XSet.from_tuples(new_tuples)

class XFlatFile(XImplementation):
    def rename_contents(self, re_scoping_set):
        new_names = []
        for name, start, len in self.fields:
            changed_name = name
            for old, new in re_scoping_set:
                if name == old:
                    changed_name = new
            new_names.append((changed_name, start, len))
        new_impl = self.__class__(self.full_file_path, new_names, self.scope_set)
        return XSet(new_impl)

We’ll give it the same treatment:

    def rename_contents(self, re_scoping_set: Self):
        try:
            return XSet(self.implementation.rename_contents(re_scoping_set))
        except AttributeError:
            new_tuples = ((e.rename(re_scoping_set), s) for e, s in self)
            return XSet.from_tuples(new_tuples)

    def rename_contents(self, re_scoping_set):
        new_names = []
        for name, start, len in self.fields:
            changed_name = name
            for old, new in re_scoping_set:
                if name == old:
                    changed_name = new
            new_names.append((changed_name, start, len))
        return self.__class__(self.full_file_path, new_names, self.scope_set)

Commit: change rename_contents in XFlatFile to return an implementation not an XSet.

It turns out that XFlatFile includes another reference to XSet, here:

    def element_at(self, scope):
        if not isinstance(scope, int) or scope < 1:
            return None
        rec = self.get_record(scope - 1)
        if rec == '':
            return None
        return XSet(XFlat(self.fields, rec))

This is tricky. XFlatFile is defined to implement a set of sets. It converts a flat file into a set of records, and it does so without converting the record to tuples, instead returning the flat record as an instance of XSet containing an instance of XFlat, which makes one flat record look like a set of tuples.

Can we reasonably eliminate this reference to XSet? Other than tests, which we can change, we have two usages:

class XFlatFile(XImplementation):
    def __contains__(self, item):
        if self.scope_set is not None:
            return any(es == item for es in self)
        de, ds = item
        return de == self.element_at(ds)

    def __iter__(self):
        def lots():
            n = 1
            while True:
                yield n, n
                n += 1

        it = iter(self.scope_set) if self.scope_set else lots()
        for old_scope, new_scope in it:
            rec = self.element_at(old_scope)
            if rec is None:
                return
            yield rec, new_scope

In the __contains__ method, which can be called from all over, the only time we can have a result of True is if the item (which is by convention a tuple) contains a set as its element part. (We could early-out this method with a check for that and nothing could break.)

We could remove the __contains__ method, and Python would use __iter__ instead, but if we were to do that, it would scan the entire file.

In general, de can be any set. XFlatFile, in __contains__, isn’t really aware of that, relying on the == operator to do the right thing.

I think that in __contains__ we might be able to work around not returning an XSet from element_at. But what about __iter__? That one is used by users of XSets: it’s the method that would be used to return all the records of the flat file. Those are sets and must appear to be sets.

There is another thing that we could do, but we won’t. We could treat XSet as an abstract class and provide a subclass that contains a flat record. In other words, it would be as if we promoted the XFlat class to be a kind of XSet. I don’t think we want to do that. I think we want our implementations to be under XSet, not types of XSet.

I could be wrong about that. I feel sure, however, that we shouldn’t have some implementations inside XSet and some others as subclasses of an abstract XSet.

We’ll back away slowly from this case of an XImplementation knowing about XSet. Are there others?

There is one:

class XSet:
    def re_scope(self, other) -> Self:
        try:
            return self.implementation.re_scope(other)
        except AttributeError:
            return self.generic_re_scope(other)

    def generic_re_scope(self, other):
        tuples = ((e, new) for e, s in self for old, new in other if old == s)
        return XSet.from_tuples(tuples)

class XFlatFile(XImplementation):
    def re_scope(self, re_scoping_set):
        if self.scope_set is not None:
            re_scoping_set = self.scope_set.re_scope(re_scoping_set)
        re_scoping_set = self.validate_scope_set(re_scoping_set)
        if len(re_scoping_set) == 0:
            return XSet.null
        new_impl = self.__class__(self.full_file_path, self.fields, re_scoping_set, self.buffer)
        return XSet(new_impl)

OK this is our same thing and we can change it to return the implementation as per our new standard.

We have two cases in the lower-lever re_scope, dealing with the possibility of a zero-length re-scoping set, which can occur and is defined to return the null set. We’ll return None and make it work. Doing so immediately breaks a test. So let’s fix it and commit.

    def re_scope(self, other) -> Self:
        try:
            impl = self.implementation.re_scope(other)
            if impl is None:
                return XSet.null
            return impl
        except AttributeError:
            return self.generic_re_scope(other)

Commit: XFlatFile re-scope returns None if scope set is empty.

Now the other change, to return just the implementation:

class XFlatFile(XImplementation):
    def re_scope(self, re_scoping_set):
        if self.scope_set is not None:
            re_scoping_set = self.scope_set.re_scope(re_scoping_set)
        re_scoping_set = self.validate_scope_set(re_scoping_set)
        if len(re_scoping_set) == 0:
            return None
        return self.__class__(self.full_file_path, self.fields, re_scoping_set, self.buffer)

That demands this:

class XSet:
    def re_scope(self, other) -> Self:
        try:
            impl = self.implementation.re_scope(other)
            return XSet.null if impl is None else XSet(impl)
        except AttributeError:
            return self.generic_re_scope(other)

Green. Commit: XFlatFile re-scope returns implementation not XSet per convention.

Let’s reflect and sum up.

Summary

We found a jagged bit in our design with subordinate classes referring back up to superior classes, as part of our desire to let the subordinates have a chance at helping with operations based on their specialized knowledge of their own structure. We had then allowed those classes to refer to the presumed class of their caller, to create a result of that class.

It seems better for an implementation to return an implementation and for its user class to wrap it in XSet if it wants to.

The changes were all quite simple, mostly just moving the decision to wrap the result up to XSet, and in one case, dealing with a specialized return. That all went well, with no surprises, and a series of four very small commits. Those could have been spread over time if need be, but in fact they were so small that doing them all today made perfect sense. In a larger system we might do the transition over a longer period of time.

We had hoped to remove all the cross-references between implementations and the XSet class, but we ran across one whose resolution is not the same as the others, in essence because an element of an XFlatFile-based XSet is an XFlat-based XSet, and since the iteration of a set is in its implementation, this implementation needs to know how to create a suitable instance.

This does suggest that there could, maybe should, be a different relationship between XSet and at least the XFlat implementation, perhaps a subclass relationship. But that would be a big decision to make, and as we sit today, it would not obviously be better. So we chose to leave that circular reference in place, and, fortunately, it’s not one that bothers pytest.

You sound uncertain, Ron.

Yes. I have decades of programming experience but only about a year of Python, and what we’re doing here is a bit deep in the bag of tricks even for a Python expert. I freely grant that I am ignorant about a lot of the intricacies of Python, and what I’ve read about dealing with circular imports tells me that there is no well-known always do this kind of solution. That said, I do feel that I “should” know more, and I continue to try to lean.

But this will definitely mean, in case it wasn’t obvious, that what I’m doing here isn’t always suitable for a large-scale Python program. (Some would argue that Python is not suitable for a large-scale Python program, and we can see their point sometimes.)

Recall that the overall name of this recent series is FAFO, where we Fool Around with Extended Set Theory in Python and Find Out what happens, and we’ve only been doing that since the end of January, about two months. So I plead for mercy on the grounds of inexperience mitigated by long years of service.

But seriously, we are building software for a known-tricky subject, extended set theory, using an incremental and iterative approach, where we always discover better ways after using the not-so-better ways for a while, and where we often come, first, to realize that what we have isn’t quite the thing, and then, often, later, to see what would be the thing, and then we move toward it. In the case of the simple helper implementations, we see what seems like a better convention. In the case of the XFlatFile iterator … no one here chez Ron sees the better way. Yet.

We move in small steps from less done to more done, less capability to more capability, improving everything, trying to keep everything in balance.

This is the way.

See you next time! I can’t wait to find out what we do next.