FAFO on GitHub

Ran across this. Rather nice. Then I took one more step. Off a cliff. Fortunately, I didn’t look down.

Hello, again, friends! I was looking at XFlat, to see if we can get rid of it. I don’t think we should, and maybe we’ll discuss that later. But I noticed this:

class XFlat(XImplementation):
    def __contains__(self, item):
        element, scope = item
        for symbol, start, end in self.fields:
            if symbol == scope:
                field = self.record[start:end].strip()
                if field == element:
                    return True
        return False

    def __iter__(self):
        for symbol, start, end in self.fields:
            yield self.record[start:end].strip(), symbol

I thought perhaps __contains__ could be improved by just using the ability to iterate the set item by item, like this:

    def __contains__(self, item):
        for pair in self:
            if pair == item:
                return True
        return False

And then, like this:

    def __contains__(self, item):
        return any(pair == item for pair in self)

Rather nice, I think. I hope you agree. See you soon!


The Next Day …

Note
The following gets a bit convoluted. Let me explain. No, there is too much. Let me sum up.

We “know” that if an object does not implement __contains__, Python automatically falls back to iterate the object, and if the object cannot be iterated, only then does Python give up and raise an exception. So we set out to remove our __contains__, because all it does is iterate the set. Might as well let Python just do it.

However, our implementation of the abstract class XImplementation included raising a NotImplementedError as part of _contains__, even though we always intended __contains__ to be optional. There was code of our own, at the XSet level, that fielded the error and did an iterative solution. That seemed wrong. (See Summary.)

It appears that Python does not rely on an exception to fall back from __contains__ to iteration. It seems to happen by some internal magic. This is probably legitimate, I’m not complaining. So it takes a little unwinding of our code to accommodate optional support of __contains__ in concrete subclasses of XImplementation, and, in particular, the abstract class no longer implements __contains__ in any form.

That’s what you’re in for. Read on, skim, it’s up to you.

I wrote the above last night, just before a lovely supper of home-made Reuben sandwiches. I had the following idea just as the dinner bell rang.

Why do we need this __contains__ at all? Doesn’t Python use __iter__ anyway, when there is no __contains__? I think it does. If so, we can optimize these two lines down to zero lines.

I get an error on removing the method. The abstract method catches the call and returns NotImplementedError. What if that was not in the abstract class?

With __contains__ entirely removed from the abstract class, all tests run. I believe that my understanding is flawed about just how Python decides which way to handle these things. Let’s have a test:

    def test_how_in_fails(self):
        x = 37
>       assert 5 in x
E       TypeError: argument of type 'int' is not iterable

XSet’s includes is, I think, mistaken. It reads this way:

class XSet:
    def includes(self, element, scope) -> bool:
        if scope is None:
            scope = self.null
        try:
            return (element, scope) in self.implementation
        except NotImplementedError:
            return any(e == element and s == scope for e, s in self)

I believe that in fact that if that try raises any exception, there would be no , because of this:

class XSet:
    def __iter__(self):
        for e, s in self.implementation:
            yield e,s

For that for to work, the implementation must implement __iter__, and if it does, then the in in includes must also succeed. What should we do?

One possibility is to change the except to accept all Exceptions, and then … do what? We have no recovery. If any exception is raised, we’re doomed. There’s a serious undetected defect in the system.

Or, we can remove the try/except and just do the in, on the grounds that in has to work because every implementation can iterate the set. The try/except was there because I mistakenly thought that raising NotImplementedException was a good way to signal that we do not have a fast __contains__ in the implementation and to do it the long way. But Python just handles that.

We’ll remove the try/except since we can’t really do anything useful.

Curiously, if I remove it, two tests fail. Fascinating.

Ah. I find that XFlatFile has a fast __contains__ that only works sometimes, and it tries to signal giving up by raising the exception that we no longer honor:

class XFlatFile(XImplementation):
    def __contains__(self, item):
        if self.scope_set is not None:
            raise NotImplementedError
        de, ds = item
        return de == self.element_at(ds)

First let’s see if raising TypeError fixes the issue. It does not. Does Python deal with this __contains__ turning to __iter__ thing with exceptions? Or does it instead inspect the dictionary of the object and decide. I am unable to find out for sure. We can of course, do this:

    def __contains__(self, item):
        if self.scope_set is not None:
            return any(es == item for es in self)
        de, ds = item
        return de == self.element_at(ds)

If scope_set is none, the flat file can be seen as indexed by record numbers starting at 1. I really don’t think we want to have any other kind of flat file. Let me make a note of that.

With that code in place, the tests all run green. I think we’ll commit this: Removing default __contains__ from abstract set, forcing Python to revert to __iter__ for XImplementations without __contains__. Adjustments made to XSet, XFlat and XFlatFile to accommodate this change.

The scope_set on an XFlatFile is a true scope-set as in re_scope:

    def test_uses_scope_set(self):
        path = '~/Desktop/job_db'
        fields = XFlat.fields(('last', 12, 'first', 12, 'job', 12, 'pay', 8))
        scopes = XSet.from_tuples(((107, 1), (932, 2)))
        ff = XFlatFile(path, fields, scopes)
        ff_set = XSet(ff)
        assert len(ff_set) == 2
        e, s = ff_set.select(lambda e, s: s == 1).pop()
        assert e.includes('amy', 'first')
        assert any(s == 1 and e.includes('amy', 'first') for e,s in ff_set)
        assert any(s == 2 and e.includes('janet', 'first') for e,s in ff_set)
        assert all(e.includes('amy', 'first') for e,s in ff_set if s == 1)
        assert all(e.includes('janet', 'first') for e,s in ff_set if s == 2)
        amy = ff_set.select(lambda e, s: s == 1 and e.includes('amy', 'first'))
        assert len(amy) == 1

This somewhat intricate test has two elements, scoped 1 and 2. Those elements are records 107 and 932, respectively, from the flat file ‘job_db’. The test verifies that #1 is amy and #2 is janet and that there are just the two records in the result.

This is allowed, to give us the future possibility of creating flat file subsets and groupings without creating new files. Premature and speculative? Yes, but it seemed to make sense at the time.

I think we can write an even more interesting test. Let’s try it.

    def test_uses_named_scope(self):
        path = '~/Desktop/job_db'
        fields = XFlat.fields(('last', 12, 'first', 12, 'job', 12, 'pay', 8))
        scopes = XSet.from_tuples(((107, "amy"), (932, "janet")))
        ff = XFlatFile(path, fields, scopes)
        ff_set = XSet(ff)
        assert len(ff_set) == 2
        e, s = ff_set.select(lambda e, s: s == "amy").pop()
        assert e.includes('amy', 'first')
        assert any(s == 'amy' and e.includes('amy', 'first') for e,s in ff_set)
        assert any(s == 'janet' and e.includes('janet', 'first') for e,s in ff_set)
        assert all(e.includes('amy', 'first') for e,s in ff_set if s == 'amy')
        assert all(e.includes('janet', 'first') for e,s in ff_set if s == 'janet')
        amy = ff_set.select(lambda e, s: s == 'amy' and e.includes('amy', 'first'))
        assert len(amy) == 1

Passes with flying green colors. Commit: test showing rename to non-integer in subset of flat file.

Summary

Added in Post

One thing came out of this that I only realized in final review of this article. The __contains__ method and NotImplementedError stuff was my early attempt at allowing implementations to provide some capability optionally, in this case, inclusion.

I now think that was not the right way to handle the issue. I think I’d prefer that the base of implementations would provide a default implementation that must work, and that the XSet level should just speak in the terms it understands. Those come down to inclusion and iteration, which I believe are the two primitives that are necessary and sufficient for set theory. I want to think more about this but I believe that what we have now is better than what we had before.

Read on … back to the standard timeline:

Well. That was interesting. I noticed the possibility for a nice little refactoring. And it was nice, taking seven lines of code down to one. That led, however, to the recognition that the one line only did what Python would do anyway, so we “could” remove our method entirely.

That didn’t quite work, because our use of NotImplementedError is incompatible with the inner workings of Python. That took a bit of thinking, unwinding, and explaining.

A bit uncertain

I’m not entirely sure whether to feel good or badly about this. In making our Python XSets as Pythonic as we can, we need to deal with a number of Pythons “dunder” methods, the inner workings of what makes it the powerful and pleasant language that it is. That kind of work, wiring new objects into a powerful object hierarchy, is always tricky and requires some pretty deep knowledge of the mystic arts.

Deep in the bag

I am working rather deep in the bag of tricks, intentionally, and even though I’ve written almost 400 articles using Python — OMG what is wrong with me?!?! — I haven’t worked in the deep innards all that much, so I need to expect to learn better ways to do things. So far, my tests seem to be keeping me honest.

But you are a fool!

I freely grant that it’s a bit scary. I am just a bit concerned that I will embarrass myself with some kind of goof mistake that will make me look like a fool. But wait: I am a fool. So if I learn something, even by falling on my face, I’ll be a bit less of a fool.

So it’s OK. And anyway, the tests all run. How bad could it be?

Stop by next time and find out. See you then!


Notes as of now

  • Should XFlatFile be limited to only scope-set = 1..n? (No, scope_set can be used for random access.)
  • Lex underbar ala “total_pay”
  • refactor result method. (I would have to search to find out what this refers to.)
  • Should XSet implement __contains__?
  • Should we use complex scopes like <salary, int>?
  • Improve restrict code?
  • Use reduce in XFlat fields?
  • XRelation - one set of scopes for entire set, save memory?
  • Spill storage - tuples could be easy to store?
  • Better use of map and other itertools?
  • Remember scopes in project operator? (Not sure what this means.)
  • Improve Flat symbol table? Use XST in it?
  • Could a, b, c = Xset return three tuples? (Probably not.)
  • Does re_scope use comprehensions?
  • What about very large sets?
  • Do we want is_null?