FAFO 9: Atoms and Sets and Tuples, Oh My!

FAFO on GitHub

Some random reading, and the messing about that I’ve done in the FAFO series, leads me to want to explore Extended Set Theory in Python. I do not expect it to be useful but it might be interesting.

XST

Extended Set Theory (XST) is a mathematical reformulation of set theory, created by David L Childs, then at the University of Michigan. Dave and people he has influenced, including YT¹, believe, or did believe, that XST should be very useful for large databases, because its solid formal basis allows it to represent and deal with very complex data structures. So, for example, rather than have an SQL relation, all mathematically nice, backed up by a complex b-tree indexing structure, with no formal basis, or at best a different formal basis than the relation, we might have, using XST, a common formalism throughout.

Many have tried to build database systems using XST. Some have succeeded somewhat. None have thrived, and as far as I know there is no commercial XST-based system extant in the world. Nonetheless, I’ve found the idea to be fascinating and occasionally play with the ideas a bit on my own. I have already driven a real XST effort almost entirely into the ground, although after my unfortunate demise, the team actually made it work. The company, however, elected not to take it to market. Ah well.

Let me say just a few more words about XST here, and of course more will come up as we go forward. In mathematical set theory, one quickly wants an ordered (not sorted) set, such as a set of x,y coordinates. So you invent the ordered pair <x, y>, and it’s all good, unless you are a mathematician, in which case you want to define the ordered pair in terms of set theory. The most common definition is:

<a, b> == {a, {a, b}}

Using this definition, from Kuratowski, it is possible to include ordered n-tuples as part of axiomatic set theory, and mathematicians are generally happy with this, mostly because they’re interested in the properties of sets, not doing anything useful with them. But if you’re interested in manipulating sets as the underlying formalism for a database kind of system, this definition is, well, weird.

Imagine that you have two perfectly reasonable sets, {x} and { {x,y} } and you take their union. You get {x,{x, y}}, and that is the ordered pair <x,y>! But had you done the union of {x} and { { z,y} }, you would not get an ordered pair! That’s just too weird to be tolerated.

So Dave Childs devised a whole new set of basic axioms for set theory, one which includes a solid definition of an ordered n-tuple, but which doesn’t act quite so weirdly.

As we’ll see, in XST a set’s elements have a “scope”, which Dave writes as an exponent, and the ordered triple <x,y,z> is the set {x¹, y², z³}.

I do not enjoy typing the x<sup>1</sup> that it takes to get superscripts, so it is my practice, of late, to write {x@1, y@2, z@3}. The at-sign seems apropos, because the superscript tells us where in the set the element is. i.e. where it is at. I may or may not use @ in Python, since it means matrix multiplication. Anyway, if you see it in the text, you’ll know what I mean.

Moving toward Python

In Python, two of the built-in data types are the tuple, and the set. A tuple is an immutable sequence of objects, and a set is a collection of objects with no defined order, such that any given object appears at most once in the set. It seems to me that these types might be quite useful for building an XST system, and what I proposed to do is to FAFO²

In earlier FAFO articles (should I tag them?) I did a bit of set experimentation, but I plan to start a new set of tests in the FAFO project, aimed more directly at XST. Let’s get started.

class TestXST:
    def test_hookup(self):
        assert 2 + 1 == 4

My weak implementation of 4 fails, so I correct the initial test and am now confident that my new tests are recognized by PyCharm, so that I can get started.

class TestXST:
    def test_hookup(self):
        assert 2 + 2 == 4

We are green. Commit: initial XST test.

Now my plan is that I will use a two-tuple (element, scope) as an internal object, to serve as the elements of our extended sets. In XST there is no way to hold on to an element-scope “pair” as an atomic thing. But in an implementation, we need to have both, as far as I’ve ever been able to figure out, so I’m going with this plan.

There is a very interesting thing in Python, the named tuple. It works like this:

from collections import namedtuple
Atom = namedtuple("Atom", ["element", "scope"])

class TestXST:
    def test_tuples(self):
        a1 = Atom(1, 2)
        a2 = Atom(2-1, 3-1)
        a3 = Atom(2, 1)
        assert a1 == a2
        assert not a1 == a3

I decided to call the thing Atom, because the word is short. ScopedElement just didn’t appeal to me. The test demonstrates that elements of the new class Atom are equal if their components are equal by value and not otherwise.

I’ll do another test, more for documentation than anything else:

    def test_members(self):
        a1 = Atom (31, 42)
        assert a1.element == 31
        assert a1.scope == 42

These are passing. I should be committing. Commit: initial definition of Atom with tests.

I should mention that Atom, as a named tuple, is a subclass of tuple. I don’t usually subclass native collections, but this seems so apt that I’m trying it.

Anyway, some set tests.

    def test_set_in(self):
        a1 = Atom("jeffries", "name")
        a2 = Atom("hendrickson", "name")
        a3 = Atom("hill", "name")
        s1 = {a1, a2}
        assert a1 in s1
        assert a2 in s1
        assert a3 not in s1

This seems sensible. Note that {a1, a2, …} is the notation for a literal set in Python.

Commit: testing atom in set.

Let’s try something deeper.

    def test_records(self):
        r1 = {Atom("jeffries", "last"), Atom("ron", "first")}
        r2 = {Atom("chet", "first"), Atom("hendrickson", "last")}
        r3 = {Atom("hill", "last"), Atom("geepaw", "first")}
        personnel = {r1, r2}
        assert r1 in personnel
        assert r3 not in personnel

This test fails, because Python cannot create the set personnel, because its elements (r1 and r2) are sets, and sets are not inherently hashable. Bummer. I had not thought of that in my brief thinking about how this might work.

Well, by way of a test, we could make personnel a list and I think the test would run … and in fact it does. But then we shouldn’t make r1, r2, r3 into Python sets, they, too, should be lists. If they are, I think the test can be made to break. Like this, it works:

    def test_records(self):
        r1 = {Atom("jeffries", "last"), Atom("ron", "first")}
        r2 = {Atom("chet", "first"), Atom("hendrickson", "last")}
        r3 = {Atom("hill", "last"), Atom("geepaw", "first")}
        personnel = [r1, r2]
        assert r1 in personnel
        assert r2 in personnel
        assert r3 not in personnel

But when we change those {} to [] it’s going to fail. I’m mistaken. This still works:

    def test_records(self):
        r1 = [Atom("jeffries", "last"), Atom("ron", "first")]
        r2 = [Atom("chet", "first"), Atom("hendrickson", "last")]
        r3 = [Atom("hill", "last"), Atom("geepaw", "first")]
        personnel = [r1, r2]
        assert r1 in personnel
        assert r2 in personnel
        assert r3 not in personnel

But this will not:

    def test_records(self):
        r1 = [Atom("jeffries", "last"), Atom("ron", "first")]
        r2 = [Atom("chet", "first"), Atom("hendrickson", "last")]
        r2rev = [Atom("hendrickson", "last"), Atom("chet", "first")]
        r3 = [Atom("hill", "last"), Atom("geepaw", "first")]
        personnel = [r1, r2]
        assert r1 in personnel
        assert r2 in personnel
        assert r2rev in personnel
        assert r3 not in personnel

The r2rev check fails, because the two lists r2 and r2rev are not equal, so r2rev is in fact not in personnel.

Of course, we wish that it were. This is telling me that although my Atoms may be useful, Python sets are not as useful as I had hoped they would be. I’ll make this test pass so that I can commit:

    def test_show_sets_insufficient_for_xst(self):
        r1 = [Atom("jeffries", "last"), Atom("ron", "first")]
        r2 = [Atom("chet", "first"), Atom("hendrickson", "last")]
        r2rev = [Atom("hendrickson", "last"), Atom("chet", "first")]
        r3 = [Atom("hill", "last"), Atom("geepaw", "first")]
        personnel = [r1, r2]
        assert r1 in personnel
        assert r2 in personnel
        assert r2rev not in personnel  # but we need it to be
        assert r3 not in personnel

Commit: test showing sets won’t cut it.

Reflection

I freely grant that I was thinking that Python sets would do the useful work for me here. It is clear that they will not, because sets themselves are not hashable. I think it unlikely that we could make them generally hashable, so we’ll not go down that path. So we need a new plan.

Generally speaking, when I have worked with XST, I have created a new type, perhaps called XSet, and implemented it to have the properties that it needs to have. It would be very pleasant, in Python, if we could implement the in capability, as it is a very nice notation. I must do some research to determine what it takes to implement in. A tiny bit of research tells me that it is the method __contains__ that allows us to say in.

Let’s devise a test that we would like to have pass. I think I’ll use simpler values this time.

    def test_xset_in(self):
        list1 = [Atom("x", 1), Atom("y", 2)]
        list2 = [Atom("y", 2), Atom("x", 1)]
        assert list1 != list2
        set1 = XSet(list1)
        set2 = XSet(list2)
        assert set1 == set2

This is a bit of a reach but I think we can do it.

We start with this, which continues to fail the test:

class XSet:
    def __init__(self, a_list):
        self.contents = a_list

    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return False
        else:
            return False

We need a somewhat better definition of set equality. There are many possibilities. Two sets A and B are equal iff:

every element a of A is in B and every element b in B is in A
A is a subset of B and B is a subset of A
A - B == B - A == the null set
A.symmetric_difference(B) is the null set
and surely many more

I think we’ll go with the subset definition because I’m working toward in:

class XSet:
    def __init__(self, a_list):
        self.contents = a_list

    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return self.is_subset(other) and other.is_subset(self)
        else:
            return False

    def is_subset(self, other):
        if not isinstance(other, self.__class__):
            return False
        for atom in self.contents:
            if atom not in other.contents:
                return False
        return True

Our test passes. I am almost surprised: it worked the first time. We can simplify this code a bit.

    def is_subset(self, other):
        if not isinstance(other, self.__class__):
            return False
        him = other.contents
        me = self.contents
        return all(atom in him for atom in me)

I made the local variables for “efficiency” with no real reason to think it matters.

We need a harder test.

    def test_xset_in(self):
        list1 = [Atom("x", 1), Atom("y", 2)]
        list2 = [Atom("y", 2), Atom("x", 1)]
        assert list1 != list2
        set1 = XSet(list1)
        set2 = XSet(list2)
        assert set1 == set2
        set3 = XSet([Atom("z", 1), Atom("y", 2)])
        assert set1 != set3

Still passes, no surprise.

Now I would really like to have XSet respond to in, so that my earlier test could be recast to work. Let’s copy and adjust it and see if we can pull this one out of the fire.

    def test_xset_records_in(self):
        r1 = XSet([Atom("jeffries", "last"), Atom("ron", "first")])
        r2 = XSet([Atom("chet", "first"), Atom("hendrickson", "last")])
        r2rev = XSet([Atom("hendrickson", "last"), Atom("chet", "first")])
        r3 = XSet([Atom("hill", "last"), Atom("geepaw", "first")])
        personnel = XSet([r1, r2])
        assert r1 in personnel
        assert r2 in personnel
        assert r2rev in personnel  # this test killed Python {}
        assert r3 not in personnel

I think this is written such that it should pass when things work right. It’s possible that there is an error in the test but I don’t think so.

It’s failing now:

>       assert r1 in personnel
E       TypeError: argument of type 'XSet' is not iterable

One of the ways that in is implemented is by iterating the collection in question. Let’s try that. Since our underlying data structure is a list, we ca probably just do this:

class XSet:
    def __init__(self, a_list):
        self.contents = a_list

    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return self.is_subset(other) and other.is_subset(self)
        else:
            return False

    def __iter__(self):
        return self.contents.__iter__()

The test runs. in works. We could rewrite is_subset:

    def is_subset(self, other):
        if not isinstance(other, self.__class__):
            return False
        return all(atom in other for atom in self)

Time to commit: XSet in works and reflect.

Reflection

Wow, 348 lines already this morning. Time to stop, most likely. Anyway …

I sort of noticed that in addition to set, Python has a frozenset type. I think that frozenset is hashable. That might mean that the test that fails on sets would work on the frozen version. Must find out.

    def test_frozen_sets(self):
        r1 = frozenset([Atom("jeffries", "last"), Atom("ron", "first")])
        r2 = frozenset([Atom("chet", "first"), Atom("hendrickson", "last")])
        r2rev = frozenset([Atom("hendrickson", "last"), Atom("chet", "first")])
        r3 = frozenset([Atom("hill", "last"), Atom("geepaw", "first")])
        personnel = frozenset([r1, r2])
        assert r1 in personnel
        assert r2 in personnel
        assert r2rev in personnel  # <======
        assert r3 not in personnel

It passes!! Most interesting. If we were to make all our sets immutable, via frozenset, we would inherit all the nifty set operations that I was hoping we could use in XST. It almost seems as if much of it would just work.

I must read more about frozen sets and do some experiments.

That aside, we have found a way to write our own class that can likely do the job, so we are at a good spot relative to trying some XST stuff in Python.

Why do we want that? Well, the impetus for me was the desire to search a database of documents for various key-value pairs, which surely can be done in other ways with a less nuclear weapon than extended set theory. But XST would provide a generality that might be nice, and, well, I like it.

Since my apparently endless series of articles is really just an extended joy project for me, we’ll push this idea a bit further and see where it leads.

So far, it’s interesting … and frozenset, which I just stumbled over, may make it even more interesting.

See you next time!

Yours Truly, of course. ↩
FAFO = “Fool Around and Find Out”, more or less … ↩