FAFO on GitHub

It is 0253 hours. I have a report and another report.

After I closed yesterday, I tried something which I think is questionable, but that seems right.

When last we met, an XSet was a collection of Atoms. An Atom was a namedtuple, essentially a tiny class that amounts to a Python tuple with named fields. Ours were named element and scope.

As an experiment, I removed the Atom. Where we would have written this:

r1 = XSet([Atom("jeffries", "last"), Atom("ron", "first")])

we now write this:

r1 = XSet([("jeffries", "last"), ("ron", "first")])

As you can see, the code is shorter the new way. Is it more clear? I would say that once we know that an XSet must be a collection of two-tuples, and get used to it for a bit, it is more clear because it is shorter and easier to read. I could be wrong.

Internally, an XSet has a single member contents, which is a frozenset of 2-tuples. At present, XSet does not enforce the 2-tuples rule, and I believe that it should.

More questionable, however, is the use of the tuple at all. I have been taught and have learned, two distinct things, that it is generally “better” to wrap native collections in a class of my own. And this tuple decision flies in the face of that. Right now, in principle, we could make an XSet like this:

    def test_invalid_xset(self):
        with pytest.raises(AttributeError):
            bad = XSet([1, 2, 3])

This test fails. It should pass by raising the AttributeError exception. (Unless we find or create a better choice.) Let’s make it so. Here’s XSet:

class XSet:
    null = None

    @classmethod
    def classical_set(cls, a_list) -> Self:
        null = cls([])
        wrapped = [(item, null) for item in a_list]
        return cls(wrapped)

    @classmethod
    def tuple_set(cls, a_list) -> Self:
        wrapped = [(item, index+1) for index, item in enumerate(a_list)]
        return cls(wrapped)

    def __init__(self, a_list):
        self.contents = frozenset(a_list)

The two class methods do the right thing. The direct call to create the set does not. Let’s fix that. We want it to raise an exception. (I could imagine a more benign behavior but let’s be harsh for now.)

I try this:

    def __init__(self, a_list):
        def is_2_tuple(a):
            return isinstance(a, tuple) and len(a) == 2

        if all(is_2_tuple(a) for a in a_list):
            self.contents = frozenset(a_list)
        else:
            raise AttributeError

My new test passes, and six tests fail. I am surprised. Even at 3 bloody AM, I am surprised. Here’s the first one:

    def test_xset_records_in(self):
        r1 = XSet([("jeffries", "last"), ("ron", "first")])
        r2 = XSet([("chet", "first"), ("hendrickson", "last")])
        r2rev = XSet([("hendrickson", "last"), ("chet", "first")])
        r3 = XSet([("hill", "last"), ("geepaw", "first")])
        personnel = XSet([r1, r2])
        assert r1 in personnel
        assert r2 in personnel
        assert r2rev in personnel  # this test killed Python {}
        assert r3 not in personnel

Well, sorry, charlie, but that test is wrong. personnel is not a legal XSet. We must fix the test:

    def test_xset_records_in(self):
        r1 = XSet([("jeffries", "last"), ("ron", "first")])
        r2 = XSet([("chet", "first"), ("hendrickson", "last")])
        r2rev = XSet([("hendrickson", "last"), ("chet", "first")])
        r3 = XSet([("hill", "last"), ("geepaw", "first")])
        personnel = XSet.classical_set([r1, r2])
        null = XSet.null
        assert (r1, null) in personnel
        assert (r2, null) in personnel
        assert (r2rev, null) in personnel  # this test killed Python {}
        assert (r3, null) not in personnel

We could use the includes method, which post-dates this test. We’ll leave it this way. One down, five to go.

    def test_xset_restrict(self):
        ron = XSet([("jeffries", "last"), ("ron", "first"), ("boss", "job")])
        chet = XSet([("chet", "first"), ("hendrickson", "last"), ("boss", "job")])
        hill = XSet([("hill", "last"), ("geepaw", "first"), ("serf", "job")])
        personnel = XSet.classical_set([ron, chet, hill])
        boss_record = XSet([("boss", "job")])
        boss_set = XSet.classical_set([boss_record])
        bosses = personnel.restrict(boss_set)
        assert isinstance(bosses, XSet)
        assert bosses.includes(ron, None)
        assert bosses.includes(chet, None)
        assert bosses.excludes(hill, None)

Why did this fail? It’s failing because of the assert on ron, and it does not raise an exception. It has returned a null set: I added an assert. Is select broken? Yes, it is, we’ll fix that and everything might heal up.

    def test_select(self):
        def sel(e, s):
            return e > 3
        s1 = XSet.tuple_set((0, 1, 2, 3, 4, 5, 6))
        selected = s1.select(sel)
        assert (4, 5) in selected

Oh my. I think I know what this is. A generator cannot be used twice, and the lists coming into XSet are generators. Interesting! We recast thus:

    def __init__(self, a_list):
        def is_2_tuple(a):
            return isinstance(a, tuple) and len(a) == 2

        self.contents = frozenset(a_list)
        if not all(is_2_tuple(a) for a in self.contents):
            raise AttributeError

We save and freeze first, then check the result. Tests are green. Commit: XSet checks that it contains only 2-tuples.

Reflection

At some cost, an additional scan of every set created, we now have guaranteed integrity, at least to the extent that we know they all contain only two-tuples. That said, if we had a way of building a set up incrementally with a class method or factory, and we protected the basic init method somehow, we might avoid that check. We’ll keep that in mind.

Larger Reflection

The way that select is implemented has an aspect that I really like:

    def select(self, cond) -> Self:
        return XSet(((e, s) for e, s in self.contents if cond(e, s)))

In Extended Set Theory, we tend to say things like this:

S = { es | e ∈s X and blah blah other stuff about e and s}

In our code, we are breaking out e and s (which of course stand for element and scope, but we are going to be saying that a lot so we’ll quickly get to know what e and s mean if we don’t already) and then we express a condition on e and s. So the code matches the set theory.

My plan is that we will invariably break out e and s when we iterate a set. We’ll never (what, never? well, hardly ever.) never pull out the tuple itself. We’ll behave as if an XSet is a set of e and s things. Which, actually, it is.

Here, as another example, is restrict:

    def restrict(self, other) -> Self:
        def other_has_match(our_e, our_s):
            return any((other_e.is_subset(our_e) for other_e, other_s in other.contents))

        if not isinstance(other, self.__class__):
            return NotImplemented
        return self.select(other_has_match)

Let me try something with that.

    def restrict(self, restrictor) -> Self:
        def other_has_match(e, s):
            return any((e_r.is_subset(e) for e_r, s_r 
                in restrictor.contents))

        if not isinstance(restrictor, self.__class__):
            return NotImplemented
        return self.select(other_has_match)

Can’t we drop the .contents? Yes.

    def restrict(self, restrictor) -> Self:
        def other_has_match(e, s):
            return any((e_r.is_subset(e) for e_r, s_r 
                in restrictor))

        if not isinstance(restrictor, self.__class__):
            return NotImplemented
        return self.select(other_has_match)

Commit: simplify restrict a bit. I find a few more places that say .contents but do not need to. It is possible that only a very few of our methods, mostly the dunder ones, need to do that. We can iterate and check in at the XSet level, because we do have the dunder methods for iteration.

Commit: remove unneeded .contents.

Even Bigger Reflection

OK, I admit that is it 4 AM, so I might not be at my best, although this session has gone quite nicely. Anyway, I am getting a very good feeling about what we have here. Though it is still small, the XSet feels solid to me, and I think it supports arbitrarily complicated nested sets. What is making that true is that tuples and frozensets are immutable and therefore hashable, which means we can create any element scope tuple and ask whether it is in any given set, and get an answer.

    def excludes(self, element, scope) -> bool:
        return not self.includes(element, scope)

    def includes(self, element, scope) -> bool:
        if not scope:
            scope = self.null
        return (element, scope) in self

With the includes method in hand, we can, in essence do anything in set theory, because set theory comes down to

x ∈s X

Therefore, I am inclined to “promote” this experiment to a project, in which I will push XST as far as I can manage. I’m not going to say quite what direction we’ll go in, and I am not saying it will be a product, but I’m going to see how far I can go.

If anyone from ComShare is out there: I do not think I can get as far as Steve Weiss and the rest of you got with s-expressions and all that. But, on the other hand, I think I might get further into the theory than our code ever got.

Or I might crash and burn. Or, I’m old. These might be my last words. I really hope not, though: I’m still having a good time.

So … stand back … and I’ll see you next time!