FAFO 11: Restrict
Let’s see how we can select records from an XSet, using the
restrict
operator.
Although Childs seemed mostly to use integers as scopes, it seems to me that both elements and scopes are arbitrary, and I like to use strings. We have a couple of tests creating records in that fashion; here’s a key one:
def test_xset_records_in(self):
r1 = XSet([Atom("jeffries", "last"), Atom("ron", "first")])
r2 = XSet([Atom("chet", "first"), Atom("hendrickson", "last")])
r2rev = XSet([Atom("hendrickson", "last"), Atom("chet", "first")])
r3 = XSet([Atom("hill", "last"), Atom("geepaw", "first")])
personnel = XSet([r1, r2])
assert r1 in personnel
assert r2 in personnel
assert r2rev in personnel
assert r3 not in personnel
We’ll do our restrict test like this:
def test_xset_restrict(self):
ron = XSet([Atom("jeffries", "last"), Atom("ron", "first"), Atom("boss", "job")])
chet = XSet([Atom("chet", "first"), Atom("hendrickson", "last"), Atom("boss", "job")])
hill = XSet([Atom("hill", "last"), Atom("geepaw", "first"), Atom("serf", "job")])
personnel = XSet([ron, chet, hill])
boss_record = XSet([Atom("boss", "job")])
boss_set = XSet([boss_record])
bosses = personnel.restrict(boss_set)
assert ron in bosses
assert chet in bosses
assert hill not in bosses
The definition of restrict that I will use here is that a is an element of A.restrict(B) if, for some b in B, a is in A and b is a subset of A. That is, if there’s a record in B that “matches” a record in A, then that record of A is in the result.
I’m going to write it out longhand first, and then figure out how to use some of Python’s more clever bits.
def restrict(self, other):
if not isinstance(other, self.__class__):
return NotImplemented
result = []
for match in self.contents:
for check in other.contents:
if check.is_subset(match):
result.append(match)
return frozenset(result)
Believe it or not, that passed the test the first time! Let’s add another test:
def test_xset_restrict_again(self):
ron = XSet([Atom("jeffries", "last"), Atom("ron", "first"), Atom("boss", "job")])
chet = XSet([Atom("chet", "first"), Atom("hendrickson", "last"), Atom("boss", "job")])
hill = XSet([Atom("hill", "last"), Atom("geepaw", "first"), Atom("serf", "job")])
personnel = XSet([ron, chet, hill])
serf_record = XSet([Atom("serf", "job")])
serf_set = XSet([serf_record])
serfs = personnel.restrict(serf_set)
assert ron not in serfs
assert chet not in serfs
assert hill in serfs
That one passes as well. Woot!
Now let’s pythonate the code. Pythonify? Pythonerate? Whatever.
def restrict(self, other):
if not isinstance(other, self.__class__):
return NotImplemented
result = []
for match in self.contents:
if any([check.is_subset(match) for check in other.contents]):
result.append(match)
return frozenset(result)
And …
def restrict(self, other):
if not isinstance(other, self.__class__):
return NotImplemented
return frozenset(
[candidate for candidate in self.contents
if any([check.is_subset(candidate) for check in other.contents])])
I need to think about whether that’s really better, but letting Python do the comprehension has a chance of something clever happening inside. Commit: initial restrict operation.
Reflection
This is a Big Deal, despite the simplicity of the method. Restrict can select any subset of records from a set, based on any matching fields. Depending on how we arrange the restricting set, we can get a number of potentially useful selections:
{ {serf@job}, {middle_manager@job} } will select all the serfs and all the middle managers: it’s an or.
{ {serf@job, coal@department} } will select all the serfs in the coal department: it’s an and.
What it cannot do, however, is select everyone who makes over $10,000 per month: there’s no way to do an exact match for numbers > 10,000.
- Note
- Here I begin a brief experimental excursion down a too-clever path.
I’m wondering now how we might do that in some clever way. Assume that we have salary stored as an integer, just to make the question solid. We know that all this subset stuff comes down to whether two Atoms are equal; at least we can suppose that’s what happens when frozenset.issubset
runs.
What if there was another kind of Atom that wasn’t quite as honest as it might be about equality?
Let’s try a test.
OK … this works but I don’t think we can accept it:
def test_gt_atom(self):
a1 = Atom(10000, "value")
a2 = GtAtom(9999, "value")
assert a2 == a1
This new kind of atom says it is equal to another atom if its value is less than or equal to that other atom’s value:
Atom = namedtuple("Atom", ["element", "scope"])
class GtAtom(Atom):
def __eq__(self, other):
if not isinstance(other, Atom):
return NotImplemented
return self.scope == other.scope and self.element <= other.element
I think that with this we really could select on salary. Let’s prove it. My attempt fails:
def test_xset_restrict_gt(self):
ron = XSet([Atom("jeffries", "last"), Atom("ron", "first"), Atom(10000, "salary")])
chet = XSet([Atom("chet", "first"), Atom("hendrickson", "last"), Atom(10001, "salary")])
hill = XSet([Atom("hill", "last"), Atom("geepaw", "first"), Atom("10002", "salary")])
personnel = XSet([ron, chet, hill])
salary_gt_record = XSet([GtAtom(10000, "salary")])
salary_gt_set = XSet([salary_gt_record])
highly_paid = personnel.restrict(salary_gt_set)
assert ron not in highly_paid
assert chet in highly_paid
assert hill in highly_paid
Why? Well, because my horrible lying __eq__
did not get called, doubtless because of how things unwound. I have no idea whatsoever what happens when a namedtuple
is checked for equality, or whatever happens in issubset
in frozenset
. Too mysterious. I’m going to back all that out and then reflect and sum up.
- Note
- And we roll back, having learned a bit: one way not to do it.
Reflection and Summary
We have taken a “big” step with restrict, which is a very powerful operation even in the simple form that I use. We’re now processing sets of “records” and can select them based on contents. Before we can do anything truly useful, if ever we can, we’ll need more operations, but the first one is always the most significant … unless, for some reason, our scheme really won’t work and just happens to work thus far.
But I don’t think that’s going to happen. I think we’re in good shape so far.
At this instant, I don’t know what I’ll do next. I think that the idea of a weird kind of atom that compares for greater than instead of equality was clever, but too clever to be allowed to live. However, if there were something a bit less clever that did push the decision-making down further into the objects, that would be a good thing.
Formally speaking, the base of set theory is the question “is this an element of that?”, and that is always a check for equality (perhaps even identity). It’s easy enough to define a set operation with a greater-than comparison, but it really starts getting ad hoc quite quickly.
Perhaps there is a way to do what we need by passing a lambda down deep, but I fear that it will make the simplicity wither away. And right now, despite what you may be thinking, it’s really quite simple, with checks for equality all that’s going on down at the bottom.
I’ll have to think about this. One possibility might be to redo the Atom not as a named tuple, but I’m not adept with hashing and if we’re to use frozenset
things need to be hashable. Of course, there’s no real reason to use frozenset
, but it does seem quite powerful and useful.
I honestly don’t know what’s next … which is kind of fun, for me at least. I hope a few of you will follow along!
See you next time!