FAFO on GitHub

We’ll start with a simple removal of the requirement for implementations to implement __contains__. After that, we’ll see. (And that’s not what happens.)

Hello, friends!

We made a small but significant design change last time, implementing includes by iterating the set in question rather than requiring each implementation type to support __contains__. So we can, and should remove that method from the abstract class XImplementation. I think that what we’ll do, however, is to leave it defined in the abstract class, implemented as raising the NotImplemented exception. I’m not sure why I want to do that, but two things come to mind:

  1. If someone did use in on an XImplementation, I’d rather catch it than let it run wild;
  2. I foresee a day when we might allow implementations to provide __contains__ optionally.

Here goes. This should be just a simple removal of @abstractmethod and deletion of a few implementations.

class XImplementation(ABC):
    # @abstractmethod  no longer required, might be used later
    def __contains__(self, item):
        raise NotImplemented

    @abstractmethod
    def __iter__(self):
        raise NotImplemented

    @abstractmethod
    def __hash__(self):
        raise NotImplemented

    @abstractmethod
    def __repr__(self):
        raise NotImplemented

Belay that! Upon examination, I find that XFlat and XFrozen have decent implementations of __contains__. Even XFlatFile has a fair one.

Damn. I have messed up the repo. Somehow got into detached head state. OK, I think I’m all better. The issue was that I had a commit hanging and in trying to sort it, somehow created a branch on GitHub. Meh.

OK, we have a direction change. Since we have what seem to be decent implementations of __contains__ in (some of?) our concrete implementations, let’s change our XSet includes to use them if they are there. I am confident enough in our tests to think this will be a reasonable thing to try.

Here’s XSet.includes:

    def includes(self, element, scope) -> bool:
        if scope is None:
            scope = self.null
        return any(e == element and s == scope for e, s in self)

We want to try __contains__ first, and if it raises NotImplemented, use our one-liner there.

    def includes(self, element, scope) -> bool:
        if scope is None:
            scope = self.null
        try:
            (element, scope) in self.implementation
        except NotImplemented:
            return any(e == element and s == scope for e, s in self)

Right. 17 tests fail. This is not good. Oh. I didn’t return the answer. Duh. That is Jeffries Standard Error Number One by the way.

    def includes(self, element, scope) -> bool:
        if scope is None:
            scope = self.null
        try:
            return (element, scope) in self.implementation
        except NotImplemented:
            return any(e == element and s == scope for e, s in self)

And all the tests pass. Whew. Commit: XImplementation subclasses may optionally implement __contains__.

Reflection

Well, good catch on noticing that those implementations had decent __contains__ methods. But let’s review them as a group:

class XFrozen:
    def __contains__(self, item):
        return item in self.data

class XFlat:
    def __contains__(self, item):
        element, scope = item
        for symbol, start, end in self.fields:
            if symbol == scope:
                field = self.record[start:end].strip()
                return field == element
        return False

class XFlatFile:
    def __contains__(self, item):
        de, ds = item
        for e,s in self:
            if de == e and ds == s:
                return True
        return False

I like the first two. The XFrozen defers the in to its frozenset and we can generally trust Python to do the right thing. XFlat spins through the symbol table, but only slices out the one field we need. The XSet level implementation would slice them all. So that’s quite righteous. (It would be even more righteous if our symbol table were a bit smarter.)

But what about XFlatFile? First, is it even tested? Quite possibly not, it is still under development in the test file. There is a test:

    def test_ff_contains(self):
        path = '~/Desktop/job_db'
        fields = XFlat.fields(('last', 12, 'first', 12, 'job', 12, 'pay', 8))
        ff = XFlatFile(path, fields)
        flat_set = XSet(ff)
        flat_iter = iter(flat_set)
        flat_rec, index = next(flat_iter)
        assert flat_rec.includes('jeffries', 'last')
        assert flat_set.includes(flat_rec, index)

I believe that XFlatFile returns the record number (starting at 1) as the scope:

class XFlatFileIterator:
    def __next__(self):
        rec = self.file.get_record(self.index)
        if rec == '':
            raise StopIteration
        else:
            self.index += 1
            flat_set = XSet(XFlat(self.file.fields, rec))
            return flat_set, self.index

It does exactly that. But that means that in __contains__ we should just fetch that one record and test it.

And I predict that after we do that, we’ll need a little refactoring.

class XFlatFile:
    def __contains__(self, item):
        de, ds = item
        if not isinstance(ds, int):
            return False
        rec = self.get_record(ds - 1)
        if rec == '':
            return False
        flat_set = XSet(XFlat(self.fields, rec))
        return flat_set == de

We ensure that the desired scope is an integer, fetch the bytes for that scope, turn them into a suitable XSet and then check just that one set for equality. The tests pass.

Commit: speed up XFlatFile.contains by mass quantities.

Now we have some duplication between XFlatFile and its iterator. We need a method to return the actual element at the given index, that is, the XSet(XFlat(…)). If we’re off the end of the file, we’ll return None.

This looks a bit tricky to do as a refactoring, with all the exits along the way. Let’s TDD the new method.

    def test_ff_element_at(self):
        path = '~/Desktop/job_db'
        fields = XFlat.fields(('last', 12, 'first', 12, 'job', 12, 'pay', 8))
        ff = XFlatFile(path, fields)
        e1 = ff.element_at(1)
        assert e1.includes('jeffries', 'last')

This test is not robust enough but it is enough to get us started.

    def element_at(self, scope):
        if not isinstance(scope, int) or scope < 1:
            return None
        rec = self.get_record(scope - 1)
        if rec == '':
            return None
        return XSet(XFlat(self.fields, rec))

Test passes. Let’s use this new thing in _contains__:

    def __contains__(self, item):
        de, ds = item
        return de == self.element_at(ds)

That’s all it takes. Commit: New XFlatFile.element_at, used in __contains__.

We can use it in our iterator also:

class XFlatFileIterator:
    def __next__(self):
        self.index += 1
        rec = self.file.element_at(self.index)
        if rec is None:
            raise StopIteration
        else:
            return rec, self.index

THis was a bit tricky because the element_at expects scope and the get_record expects the index of the record in the file. I failed to increment the index first. I think some renaming may be in order here. But we are green.

Commit: using element_at in iterator.

Before we forget, let’s improve that test.

    def test_ff_element_at(self):
        path = '~/Desktop/job_db'
        fields = XFlat.fields(('last', 12, 'first', 12, 'job', 12, 'pay', 8))
        ff = XFlatFile(path, fields)
        e1 = ff.element_at(1)
        assert e1.includes('jeffries', 'last')
        assert e1.includes('9000', 'pay')
        assert ff.element_at(0) is None
        assert ff.element_at(1001) is None
        assert ff.element_at('three') is None

PyCharm alerts me about the ‘three’, since it notices that you can’t subtract 1 from ‘three’.

Commit: improve test. Time to reflect:

Reflection

This has been good throughout. And the element_at method seems important to me, because at the XSet level, we have no way to get the value of an element at a desired scope. We can test for it, but we have no way to access it. From a set-theoretic point of view, this is not even a question: the elements and scopes are arbitrary as far as set theory is concerned. You can’t really see them outside the set, though you can compare them and calculate with them inside a set definition.

The question is a bit harder than it may seem. Given a set and a particular scope, like ‘pay’, there are three cases regarding the value of ‘pay’ in the set:

  1. It’s not there;
  2. It’s there exactly once;
  3. It’s there more than once.

Right. A set’s element-scope combinations are unique, but this is a valid set:

{ 9000pay, 13000pay }

I don’t see a use for it, but it’s a valid set. So when we have a set (think record from the people data set), and we want the pay field, we have to cater to the three cases. At this writing, I don’t have an answer that I actually like, but a thing like element_at, returning a possible None, is one possibility.

We will face this issue soon, I think. For now, we note that this method is a forerunner to the real thing.

Taking the wider view, we’ve done some good stuff this morning. Though I and thought we’d defer our optional __contains__ for a while, a review of the code told me that now was a good time. Had I been a little smarter, I’d have noticed that before starting to remove it. And if I wrote a different style of article, I’d have just started with “OK, now let’s make sure that we can use …”.

I prefer to show you what really happens, because, as I see it, good programming is a process of continual discovery and adjustment, not a process of predicting the future and then making the prediction come true no matter what we learn along the way. I think it’s far better to take small steps in what seems like a good direction, and to continually pay attention to what we learn, so that our next step will be better than something we might have predicted in the past.

Note also the little discovery in doing __contains__ for XFlatFile, that there was a need for a method that returns a flat file element (an XSet(XFlat)), and using it then in two places, giving them a striking amount of commonality.

There is still the matter of the index = 0 when scope = 1 thing. In the past, I have redefined ordered n-tuples to start at index 0. For now, at least, they start at 1, as Childs defined them. Let’s see if we can make the code more clear around that topic. First, the iterator. It can use scope:

class XFlatFileIterator:
    def __init__(self, flat_file):
        self.file = flat_file
        self.scope = 0 

    def __iter__(self):
        return self

    def __next__(self):
        self.scope += 1
        rec = self.file.element_at(self.scope)
        if rec is None:
            raise StopIteration
        else:
            return rec, self.scope

This is better, but I do not like the scope being 0, which is known not to be valid. Let’s try this:

class XFlatFileIterator:
    def __init__(self, flat_file):
        self.file = flat_file
        self.scope = 1

    def __iter__(self):
        return self

    def __next__(self):
        result = (rec := self.file.element_at(self.scope), self.scope)
        self.scope += 1
        if rec is None:
            raise StopIteration
        else:
            return result

The walrus operator (rec :=) is a bit deep in the bag, but I think it’s more clear what’s going on. YMMV and I certainly welcome your thoughts and ideas at the usual place. I don’t love this, but I like it more, or at least dislike it less.

Commit: tidy XFlatFileIterator to use scope rather than index.

I try one more renaming:

    def __next__(self):
        element_tuple = (rec := self.file.element_at(self.scope), self.scope)
        self.scope += 1
        if rec is None:
            raise StopIteration
        else:
            return element_tuple

Committed. I’m not sure I love it but again, I dislike it less.

It’s Sunday, so there is a sumptuous breakfast in the offing, so I think we’ll sum up here.

Summary

The big lesson of course, is to stay loose. When an idea starts not to look good, consider not sticking to it, even if we have to back up a bit. The code knows things that our mind and memory do not, and the code, while it works for us, as Hill would put it, does have … well feels to me as if it has … ways it wants to be. I find it valuable to stay open to the code telling me how things should be.

If that’s too whifty for you, well, recast it to your liking. Maybe “we can’t remember everything about the code, so remain open to discovering that our plans may need modification in the light of rediscovering what the code is”.

Yeah, I prefer the metaphor of the code telling us things.

We’ve made some decent progress here, but it is down in the innards of the system. I would prefer to see some action more up at the top, in terms of new things that the code can do for us. And I’m still looking for an actual application of a suitable size. Feel free to toot or email with ideas.

I do have more ideas about the internals, especially another implementation that might be called XRelation, which will be a potentially large set of records, all with the same fields, represented by one set of field names, in order and field values in each record, in the same order. Do we need that? Not really. But we are here to explore how to implement XST, and some of our experiments will be motivated by our overall experience in database and XST, and one of us here has a lot of history.

Until next time, be well, and do come back!