FAFO 10: More XST

FAFO on GitHub

I propose to push a bit further on the use of Python frozenset to do a little Extended Set Theory. I mention symmetric difference!

Pro Tip: When using PyCharm, try to name your test files such that the one you’re working on is lowest in the alphabet. It will be easier to access the test tree. As for how to make that happen, well, you’re on your own.

In my copious free time yesterday, I read about frozenset and even did a few trivial tests in my iPad’s Pythonista app. I think it will make an excellent basis for my XSet extended set class. I’ll begin this morning by converting that class to use frozenset. I’m not sure whether it will change any of the methods or not. Let’s review that class:

class XSet:
    def __init__(self, a_list):
        self.contents = a_list

    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return self.is_subset(other) and other.is_subset(self)
        else:
            return False

    def __iter__(self):
        return self.contents.__iter__()

    def is_subset(self, other):
        if not isinstance(other, self.__class__):
            return False
        return all(atom in other for atom in self)

Oh it definitely will change things! That’s great. Let’s review our tests as well:

    def test_frozen_sets(self):
        r1 = frozenset([Atom("jeffries", "last"), Atom("ron", "first")])
        r2 = frozenset([Atom("chet", "first"), Atom("hendrickson", "last")])
        r2rev = frozenset([Atom("hendrickson", "last"), Atom("chet", "first")])
        r3 = frozenset([Atom("hill", "last"), Atom("geepaw", "first")])
        personnel = frozenset([r1, r2])
        assert r1 in personnel
        assert r2 in personnel
        assert r2rev in personnel  # <======
        assert r3 not in personnel

    def test_xset_in(self):
        list1 = [Atom("x", 1), Atom("y", 2)]
        list2 = [Atom("y", 2), Atom("x", 1)]
        assert list1 != list2
        set1 = XSet(list1)
        set2 = XSet(list2)
        assert set1 == set2
        set3 = XSet([Atom("z", 1), Atom("y", 2)])
        assert set1 != set3

    def test_xset_records_in(self):
        r1 = XSet([Atom("jeffries", "last"), Atom("ron", "first")])
        r2 = XSet([Atom("chet", "first"), Atom("hendrickson", "last")])
        r2rev = XSet([Atom("hendrickson", "last"), Atom("chet", "first")])
        r3 = XSet([Atom("hill", "last"), Atom("geepaw", "first")])
        personnel = XSet([r1, r2])
        assert r1 in personnel
        assert r2 in personnel
        assert r2rev in personnel  # this test killed Python {}
        assert r3 not in personnel

    def test_frozen_operators(self):
        s1 = frozenset(("a", "b", "c"))
        s2 = frozenset(("x", "y"))
        s3 = s1 | s2
        assert s3 == {"a", "x", "c", "b", "y"}
        s2 |= s1
        assert s2 == s3

I think those should cover the conversion changes pretty well. Let’s do it:

I note that PyCharm has squiggles on both other and self in this method:

    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return self.is_subset(other) and other.is_subset(self)
        else:
            return False

The message is that “Type XSet doesn’t have expected attribute __contains__”.

Ah … it’s saying that because my is_subset method is using in, which says that one really ought to implement __contains__. But the tests are not failing. I think that Python knows more than one way to do the “contains” check, i.e. by iteration, and this is a warning. We’ll proceed.

class XSet:
    def __init__(self, a_list):
        self.contents = frozenset(a_list)

A test breaks:

    def test_xset_records_in(self):
        r1 = XSet([Atom("jeffries", "last"), Atom("ron", "first")])
        r2 = XSet([Atom("chet", "first"), Atom("hendrickson", "last")])
        r2rev = XSet([Atom("hendrickson", "last"), Atom("chet", "first")])
        r3 = XSet([Atom("hill", "last"), Atom("geepaw", "first")])
        personnel = XSet([r1, r2])
        assert r1 in personnel
        assert r2 in personnel
        assert r2rev in personnel  # this test killed Python {}
        assert r3 not in personnel

The error is that the XSet is unhashable, so it can’t be converted to a frozenset. What if we return the hash of our contents as our hash?

    def __hash__(self):
        return hash(self.contents)

Test runs. Let’s carry on, looking at the methods of XSet to see what we might improve:

    def is_subset(self, other):
        if not isinstance(other, self.__class__):
            return False
        return all(atom in other for atom in self)

That should really return NotImplemented rather than false if the other isn’t an XSet. And we can now do this:

    def is_subset(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return self.contents.issubset(other.contents)

Defer the operation to your member variable. Super. We should be committing: Converting XSet to use frozenset.

We can also change this:

    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return self.is_subset(other) and other.is_subset(self)
        else:
            return False

To this:

    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return self.contents == other.contents
        else:
            return NotImplemented

I want to reverse the if on is_subset to make the two methods have the same shape. PyCharm will do that for me, I think. It does this:

    def is_subset(self, other):
        if isinstance(other, self.__class__):
            return self.contents.issubset(other.contents)
        return NotImplemented

The two are still not the same, because I have an else in the one and not in the other.

Truth be told, I prefer the else in this case. I’ll add it.

    def is_subset(self, other):
        if isinstance(other, self.__class__):
            return self.contents.issubset(other.contents)
        else:
            return NotImplemented

Commit: XSet uses frozenset as contents.

Let’s reflect.

Reflection

Certainly using frozenset makes XSet much nicer than it was. We might almost ask why we should have XSet at all.

I have some reasons, some of which apply always, and some of which relate to what I (think I) know about the future.

Wrap native collections: I probably learned this by received wisdom at some point, but I have subsequently had plenty of experience that agrees with that wisdom. Sooner or later, when I use a native collection, I seem to wind up wanting a specialized method that the native does not have.
Wrap rather than subclass: When I do need a specialized method on a collection, one possibility is to subclass the native, then add the method to my subclass. (Another possibility is to monkey-patch the native, but that way lies madness.) My real reason for subclassing is that the native collection generally has a wide variation of available methods, and it’s not clear to me whether or not some of those will cause me trouble if they get used.; Now, sure, I trust my tests and I trust myself (somewhat) but the thing with my own classes is that they are limited in a way that a native class is not, and that limitation makes them easier to think about.
Wrap for new capability: As mentioned above, there may come a time when we need a new method, in which case wrapping will be necessary. As it happens in the case of XST I know that’s going to happen, because I know the kind of methods I’ll be adding, so that creating the class is the thing to do.

A wrapped collection is more narrow than the native collection. It’s more extendable. I prefer that.

Looking to the future

I’m not entirely sure why I’m working on this just now. I was nudged this way by some ideas I had about curriculum management, but following GeePaw Hill’s work, I don’t really see the use of the kind of searching and name management that XST might provide. But now I’m just interested, so I’ll push it a bit to see what I can learn.

There is one very interesting aspect to using frozenset, and that is that all our sets, once defined, are based on an immutable object. Now, XSet wouldn’t have to stay immutable. We could create a new frozenset internally and plug it in … although I would be concerned about what that would mean relative to __hash__. I think the rule of hashing is that if two objects’ hashes are unequal, the objects are unequal, and if the hashes are equal, the objects might be equal. I honestly do not know enough about what Python is going to do internally to be certain.

But it’s tempting to make XSets immutable also. If we were to do that, how would we manage updating? Something, somewhere, has to be mutable.

Suppose we had a name table, string -> XSet, and suppose that table was mutable. The bulk of the program would operate on sets from the name table, which would all be XSets. When we want to update a set, we create the new contents (also immutable) and replace the name table entry’s XSet with the new one.

That would be interesting. And in that table, there might be different kinds of XSets. For example, there could be memory-resident sets, the only kind we have now, but also file-resident sets. There might be sets that help us keep the file-resident sets also immutable. Let’s look at how that might be done.

Symmetric Difference

The “symmetric difference” of two sets is all the elements that are in one or the other but not both. Python uses the ^ to signify this, and therefore:

{ a, b, c } ^ { c, 1 } -> { a, b, c1 }

One way of looking at that is that we took the set abc and updated c to c1!

We can do some nifty things with symmetric difference. One is that we can retain history:

Name Table:
ABC as of 0829 -> { a, b, c }
ABC as of 0830 -> ABC as of 0829 ^ { c, c1 }

It turns out that many interesting operations distribute over symmetric difference. For example, any operation that selects a subset will do the right thing:

select(x ^ y) = select(x) ^ select(y)

It’s easy to see why this works. If, in our example above, the select would not have selected c, it still won’t, but it may or may not select a, b, or c1. And if it did select c, it will select it on both sides of the symmetric difference, and it will be cancelled out as if it were never there.

What’s Next?

I think this article is long enough, though I don’t feel done. I guess we’ll sum up here and I’ll start another.

Converting to frozenset leads to a simpler implementation of our XSet so far, because the frozenset has all the capability we needed. Since it has a lot more capability, we’ll probably be glad we found out about it.

I think we’ll follow a sort of dual thread for a while. We’ll build up a name table of string names pointing to XSets, or, more likely, to some new object type that itself may contain an XSet or a formula / expression in XSets. And we’ll add operations as needed to XSets.

In particular, before we go much further at all, I want to set up some nested XSets that contain record-like things, and to demonstrate selecting specific record-like things with a method we may call restrict. That will serve to ensure that our frozenset decision is a good one.

Good times, at least for me! I hope you get a bit of fun as well.

See you next time!