A Nice Refactoring
Ran across this. Rather nice. Then I took one more step. Off a cliff. Fortunately, I didn’t look down.
Hello, again, friends! I was looking at XFlat, to see if we can get rid of it. I don’t think we should, and maybe we’ll discuss that later. But I noticed this:
class XFlat(XImplementation):
def __contains__(self, item):
element, scope = item
for symbol, start, end in self.fields:
if symbol == scope:
field = self.record[start:end].strip()
if field == element:
return True
return False
def __iter__(self):
for symbol, start, end in self.fields:
yield self.record[start:end].strip(), symbol
I thought perhaps __contains__
could be improved by just using the ability to iterate the set item by item, like this:
def __contains__(self, item):
for pair in self:
if pair == item:
return True
return False
And then, like this:
def __contains__(self, item):
return any(pair == item for pair in self)
Rather nice, I think. I hope you agree. See you soon!
The Next Day …
- Note
- The following gets a bit convoluted. Let me explain. No, there is too much. Let me sum up.
-
We “know” that if an object does not implement
__contains__
, Python automatically falls back to iterate the object, and if the object cannot be iterated, only then does Python give up and raise an exception. So we set out to remove our__contains__
, because all it does is iterate the set. Might as well let Python just do it. -
However, our implementation of the abstract class XImplementation included raising a NotImplementedError as part of
_contains__
, even though we always intended__contains__
to be optional. There was code of our own, at the XSet level, that fielded the error and did an iterative solution. That seemed wrong. (See Summary.) -
It appears that Python does not rely on an exception to fall back from
__contains__
to iteration. It seems to happen by some internal magic. This is probably legitimate, I’m not complaining. So it takes a little unwinding of our code to accommodate optional support of__contains__
in concrete subclasses of XImplementation, and, in particular, the abstract class no longer implements__contains__
in any form. -
That’s what you’re in for. Read on, skim, it’s up to you.
I wrote the above last night, just before a lovely supper of home-made Reuben sandwiches. I had the following idea just as the dinner bell rang.
Why do we need this __contains__
at all? Doesn’t Python use __iter__
anyway, when there is no __contains__
? I think it does. If so, we can optimize these two lines down to zero lines.
I get an error on removing the method. The abstract method catches the call and returns NotImplementedError
. What if that was not in the abstract class?
With __contains__
entirely removed from the abstract class, all tests run. I believe that my understanding is flawed about just how Python decides which way to handle these things. Let’s have a test:
def test_how_in_fails(self):
x = 37
> assert 5 in x
E TypeError: argument of type 'int' is not iterable
XSet’s includes
is, I think, mistaken. It reads this way:
class XSet:
def includes(self, element, scope) -> bool:
if scope is None:
scope = self.null
try:
return (element, scope) in self.implementation
except NotImplementedError:
return any(e == element and s == scope for e, s in self)
I believe that in fact that if that try
raises any exception, there would be no , because of this:
class XSet:
def __iter__(self):
for e, s in self.implementation:
yield e,s
For that for
to work, the implementation must implement __iter__
, and if it does, then the in
in includes
must also succeed. What should we do?
One possibility is to change the except
to accept all Exceptions, and then … do what? We have no recovery. If any exception is raised, we’re doomed. There’s a serious undetected defect in the system.
Or, we can remove the try/except
and just do the in
, on the grounds that in
has to work because every implementation can iterate the set. The try/except
was there because I mistakenly thought that raising NotImplementedException
was a good way to signal that we do not have a fast __contains__
in the implementation and to do it the long way. But Python just handles that.
We’ll remove the try/except
since we can’t really do anything useful.
Curiously, if I remove it, two tests fail. Fascinating.
Ah. I find that XFlatFile has a fast __contains__
that only works sometimes, and it tries to signal giving up by raising the exception that we no longer honor:
class XFlatFile(XImplementation):
def __contains__(self, item):
if self.scope_set is not None:
raise NotImplementedError
de, ds = item
return de == self.element_at(ds)
First let’s see if raising TypeError fixes the issue. It does not. Does Python deal with this __contains__
turning to __iter__
thing with exceptions? Or does it instead inspect the dictionary of the object and decide. I am unable to find out for sure. We can of course, do this:
def __contains__(self, item):
if self.scope_set is not None:
return any(es == item for es in self)
de, ds = item
return de == self.element_at(ds)
If scope_set is none, the flat file can be seen as indexed by record numbers starting at 1. I really don’t think we want to have any other kind of flat file. Let me make a note of that.
With that code in place, the tests all run green. I think we’ll commit this: Removing default __contains__
from abstract set, forcing Python to revert to __iter__
for XImplementations without __contains__
. Adjustments made to XSet, XFlat and XFlatFile to accommodate this change.
The scope_set on an XFlatFile is a true scope-set as in re_scope
:
def test_uses_scope_set(self):
path = '~/Desktop/job_db'
fields = XFlat.fields(('last', 12, 'first', 12, 'job', 12, 'pay', 8))
scopes = XSet.from_tuples(((107, 1), (932, 2)))
ff = XFlatFile(path, fields, scopes)
ff_set = XSet(ff)
assert len(ff_set) == 2
e, s = ff_set.select(lambda e, s: s == 1).pop()
assert e.includes('amy', 'first')
assert any(s == 1 and e.includes('amy', 'first') for e,s in ff_set)
assert any(s == 2 and e.includes('janet', 'first') for e,s in ff_set)
assert all(e.includes('amy', 'first') for e,s in ff_set if s == 1)
assert all(e.includes('janet', 'first') for e,s in ff_set if s == 2)
amy = ff_set.select(lambda e, s: s == 1 and e.includes('amy', 'first'))
assert len(amy) == 1
This somewhat intricate test has two elements, scoped 1 and 2. Those elements are records 107 and 932, respectively, from the flat file ‘job_db’. The test verifies that #1 is amy and #2 is janet and that there are just the two records in the result.
This is allowed, to give us the future possibility of creating flat file subsets and groupings without creating new files. Premature and speculative? Yes, but it seemed to make sense at the time.
I think we can write an even more interesting test. Let’s try it.
def test_uses_named_scope(self):
path = '~/Desktop/job_db'
fields = XFlat.fields(('last', 12, 'first', 12, 'job', 12, 'pay', 8))
scopes = XSet.from_tuples(((107, "amy"), (932, "janet")))
ff = XFlatFile(path, fields, scopes)
ff_set = XSet(ff)
assert len(ff_set) == 2
e, s = ff_set.select(lambda e, s: s == "amy").pop()
assert e.includes('amy', 'first')
assert any(s == 'amy' and e.includes('amy', 'first') for e,s in ff_set)
assert any(s == 'janet' and e.includes('janet', 'first') for e,s in ff_set)
assert all(e.includes('amy', 'first') for e,s in ff_set if s == 'amy')
assert all(e.includes('janet', 'first') for e,s in ff_set if s == 'janet')
amy = ff_set.select(lambda e, s: s == 'amy' and e.includes('amy', 'first'))
assert len(amy) == 1
Passes with flying green colors. Commit: test showing rename to non-integer in subset of flat file.
Summary
- Added in Post
-
One thing came out of this that I only realized in final review of this article. The
__contains__
method and NotImplementedError stuff was my early attempt at allowing implementations to provide some capability optionally, in this case, inclusion. -
I now think that was not the right way to handle the issue. I think I’d prefer that the base of implementations would provide a default implementation that must work, and that the XSet level should just speak in the terms it understands. Those come down to inclusion and iteration, which I believe are the two primitives that are necessary and sufficient for set theory. I want to think more about this but I believe that what we have now is better than what we had before.
-
Read on … back to the standard timeline:
Well. That was interesting. I noticed the possibility for a nice little refactoring. And it was nice, taking seven lines of code down to one. That led, however, to the recognition that the one line only did what Python would do anyway, so we “could” remove our method entirely.
That didn’t quite work, because our use of NotImplementedError is incompatible with the inner workings of Python. That took a bit of thinking, unwinding, and explaining.
A bit uncertain
I’m not entirely sure whether to feel good or badly about this. In making our Python XSets as Pythonic as we can, we need to deal with a number of Pythons “dunder” methods, the inner workings of what makes it the powerful and pleasant language that it is. That kind of work, wiring new objects into a powerful object hierarchy, is always tricky and requires some pretty deep knowledge of the mystic arts.
Deep in the bag
I am working rather deep in the bag of tricks, intentionally, and even though I’ve written almost 400 articles using Python — OMG what is wrong with me?!?! — I haven’t worked in the deep innards all that much, so I need to expect to learn better ways to do things. So far, my tests seem to be keeping me honest.
But you are a fool!
I freely grant that it’s a bit scary. I am just a bit concerned that I will embarrass myself with some kind of goof mistake that will make me look like a fool. But wait: I am a fool. So if I learn something, even by falling on my face, I’ll be a bit less of a fool.
So it’s OK. And anyway, the tests all run. How bad could it be?
Stop by next time and find out. See you then!
Notes as of now
-
Should XFlatFile be limited to only scope-set = 1..n? (No, scope_set can be used for random access.) - Lex underbar ala “total_pay”
- refactor
result
method. (I would have to search to find out what this refers to.) - Should XSet implement
__contains__
? - Should we use complex scopes like
<salary, int>
? - Improve
restrict
code? - Use
reduce
in XFlat fields? - XRelation - one set of scopes for entire set, save memory?
- Spill storage - tuples could be easy to store?
- Better use of
map
and other itertools? - Remember scopes in project operator? (Not sure what this means.)
- Improve Flat symbol table? Use XST in it?
- Could a, b, c = Xset return three tuples? (Probably not.)
- Does re_scope use comprehensions?
- What about very large sets?
- Do we want
is_null
?