Bear Bites Ron

I make some useful initial observations, and then, well, I crash and burn. And mix a metaphor.

Hello, friends!

The following was the original blurb for this article:

Of course we all know that you have to make all the important design decisions before you start programming. When we know less than we will ever know again. So obviously true.

The blurb, of course, is sarcasm or irony or litotes or one of those trope things. As we program, we learn. If we do not manage to embed our learning into the code, the code will be, well, less learned than it could be. It will fall behind in school, possibly be held back, finally drop out, and will surely wind up on a street corner smoking weed and making catcalls at the smarter programs as they go by.

Note: I think the initial ideas below are pretty valid, but then my particular attempt at evolving the design fails. It happens. It never makes me happy, but it happens. Read on, I’ll tell you when the horror starts.

Don’t let that happen to our programs. When we learn something, let’s put it into the program, so that it learns and improves as well. We have such a case before us, just taking form out of the mists.

The notion has not yet taken form, but we can see enough of it to know what’s coming, because we also know some things about the “theory” part of Extended Set Theory, or, as its friends call it, XST:

Everything in XST comes down to being able to answer whether a given element at a given scope is in the set, and being able to produce all the element-scope pairs that the set contains.

That’s because every set definition comes down to definitions in terms of “for every x at scope y”, “there exists an x at scope y”, “S contains x at scope y”. That’s really pretty much it. So what does it tell us about our program? Let’s look at what we know right now.

We already have operation definitions that use the notions above;
They work on our basic extended set type, the frozenset of 2-tuples containing element and scope;
The XSet itself implements includes and, via __iter__, which it defers to the frozen set, it can produce all its elements.
We have just started on a new way of representing the set contents, the X_tuple, which keeps the whole set in a tuple, with implied scopes of 1, 2, 3, …;

To test the X_tuple, as things stand right now, we will need to test at least the select operator. In principle, one might argue that we need to test all the operations, so we are lucky that we have so few, but since restrict uses select, we’re probably good with just select.

Our longer range plan, though, is to allow any number of different implementations of XSet, each with some custom characteristics that might be useful. Some that come to mind include:

A set representing a set of records all with the same fields;
A set representing a set like the above, represented in comma-separated values (CSV) format;
A set represented in JSON. Could be pretty arbitrary, or might be constrained somehow;
A very large set with associated indexes;

We propose to allow such a set to implement certain operations internally. So far, our very simple X_tuple set just implements includes, basically as a test of how we’ll offer an implementation the opportunity to do an operation and to use the top-level one if the implementation doesn’t do that one.

What is coming out of the mist? I think it is this:

The XSet object will contain definitions of all the set operations we decide to have, implemented in terms of includes and for. It will contain a single member variable, now called contents, which can be any object that implements includes and __iter__, where, of course, it is __iter__ that provides for.

We will probably rename contents to implementation.

But there is something else coming out of the mist. Our new implementation, X_tuple, is a class of our own, and we can give it includes. Our existing implementation is an instance of a Python class, frozenset, and we cannot give it includes by any reasonable means.

Therefore our design needs to change, so that our base implementation is an instance of a class of our own, X_frozen or X_basic or whatever we may choose to call it.

Do we have to do that? No, we do not. Another possibility is to use Python’s in, which can be used to inquire whether one object is in a collection. We could implement includes as in and our rule would then be that all our implementations have to implement in, which one does by implementing __contains__. Since frozenset implements that, we’re good.

But if we did that, all our implementations would (probably) be classes of our own, except for one. True, we might conceivable think of another implementation solely in terms of python objects, or things grabbed from imports, but every time we do that, we expose ourselves to a huge list of methods that those classes support.

It’s just not the way to go. The way to go is two wrap our frozenset into a class of our own and defer operations to it, in the same way we’ll be doing with X_tuple and other such things.

When should we do this? We should do it quite soon. We might wait until we’re done with X_tuple, or we might do it now. I vote for now. Why? Because I do not expect to discover anything thrilling from finishing X_tuple, it’s not presently in use, and our current implementation is tested extensively, which means that when we change it, we’ll have lots of tests that, when they work, give us confidence in what we’ve done.

Let’s Get To It

Note: Right here is where things start to go wrong. At this writing, I do not see quite what happened, but I think that the overall issue is that I’m starting without a clean division between set theory and the specific implementation, and the fuzzy line causes too many things to fail.; I am not sure whether there is anything valuable from here on down, other than the thrilling sight of YT getting more and more confused, while, at the same time, learning things that will be useful next time. I’ll put in some more notes, but if you leave now, that’ll be OK.
Subnote: Further reading and writing of Notes tells me that there is some value in what I do below. If I had called it a Spike, I think I’d have been happy. It was a Spike: I just didn’t plan it that way.

The Big Decision here is what to call the new class. I can imagine: X_basic, X_fundamental, X_frozen, X_FS. I can probably imagine a lot more. Who cares? We can rename it every day if we want to. Let’s use X_frozenset, as that describes what’s inside and as an implementation, that’s a thing we want to know. (Is that a good reason? I don’t care. I just need a reason good enough to get down to it.)

Our main use of frozenset is here:

class XSet:
    def __init__(self, a_list):
        def is_2_tuple(a):
            return isinstance(a, tuple) and len(a) == 2

        self.contents = frozenset(a_list)
        if not all(is_2_tuple(a) for a in self.contents):
            raise AttributeError

I reckon we need a new class, X_frozenset, and we need to rename contents to implementation. First the rename:

    def __init__(self, a_list):
        def is_2_tuple(a):
            return isinstance(a, tuple) and len(a) == 2

        self.implementation = frozenset(a_list)
        if not all(is_2_tuple(a) for a in self.implementation):
            raise AttributeError

I’m not sure I’m going to like that. I can rename it again if need be. Commit: rename contents to implementation.

Now change the creation to use our new set which does not exist yet:

class XSet:
    def __init__(self, a_list):
        self.implementation = X_frozenset(a_list)

We’ll leave the validation up to the X_frozenset, for now. I do think this init is going to change, but not now.

class X_frozenset:
    def __init__(self, a_list):
        def is_2_tuple(a):
            return isinstance(a, tuple) and len(a) == 2

        self.contents = frozenset(a_list)
        if not all(is_2_tuple(a) for a in self.contents):
            raise AttributeError

Eleven tests fail. Excellent. They’ll be failing for lack of __eq__ and __iter__ and perhaps other little details. That’s why we have them. Fie! I tell you, Fie! on those who say we have to be able to predict what will happen. The computer works for me, and one of its jobs is to tell me what happened.

Note: I stand by the notion that the failing tests can generally be trusted to tick through the issues with a new bit of implementation. However, I suspect that the fact that the relationship between XSet and the frozenset implementation is a bit permeable leaves too many doors open for problems.; There is a difficult issue here, however, or at least at this still fuzzy moment I think there is. I’m not entirely clear one what the line should be. Theoretically, it should be at includes and iteration. But we were just working on pushing certain operations down into the implementations. The idea was that a given implementation might know a better way to do a restrict or something. As I worked, below, I wasn’t clear in my mind what parameters an implementation might receive. I think we have to clarify that, in general, it will receive a XSet, never an implementation, unless we use a double dispatch or something similar to give it more details about its set parameters.; But I am not sure. I’ll need to be before I try again.

The first test is in fact failing this assert:

>       assert set1 == set2
E       assert XSet(<xset.X_frozenset object at 0x1017e3e50>) == XSet(<xset.X_frozenset object at 0x1017e3ed0>)

And we therefore need __eq__ in X_frozenset. We’ll try this:

    def __eq__(self, other):
        return self.contents == other

The failing test passes. I said “try” there, because when I typed the method, I wasn’t sure if I could get away with referring to the implementation of the other set or not. I am now sure that I cannot, because it could be a vector or anything. I think we may have come to a realization and not a good one:

Two sets are equal if they have all the same elements at all the same scopes. This is not a question that can be answered directly.

Yikes! We’re going to need a bigger boat, and a bigger implementation of equality. Let’s trudge forward for now and see what else comes up. We may be going to roll all this back. For now, I will remove the __eq__.

Here’s a different failing check:

>       assert (r1, null) in personnel

I tried implementing __contains__, which I thought was how in works, but no luck. So:

    def __iter__(self):
        return iter(self.contents)

Down to 9 tests failing, from 11.

>       assert (r1, null) in personnel
E       assert (XSet(<xset.X_frozenset object at 0x105d17d50>), ∅) in XSet(<xset.X_frozenset object at 0x105d19350>)

The test:

    def test_xset_records_in(self):
        r1 = XSet([("jeffries", "last"), ("ron", "first")])
        r2 = XSet([("chet", "first"), ("hendrickson", "last")])
        r2rev = XSet([("hendrickson", "last"), ("chet", "first")])
        r3 = XSet([("hill", "last"), ("geepaw", "first")])
        personnel = XSet.classical_set([r1, r2])
        null = XSet.null
        assert (r1, null) in personnel
        assert (r2, null) in personnel
        assert (r2rev, null) in personnel  # this test killed Python {}
        assert (r3, null) not in personnel

Note: I am aware that things are going poorly. I don’t think I had given up hope at this point, but I was at least aware that there was trouble. Stopping here would have been premature, I think, although perhaps writing some simpler tests would have been a better idea.

This is not going particularly well. I’ll plug forward a bit, but I am aware that this idea is more tricky than I had thought. I’ll print the set personnel:

XSet(<xset.X_frozenset object at 0x105ed3f50>) @ XSet(<xset.X_frozenset object at 0x105e8f850>)
XSet(<xset.X_frozenset object at 0x105ed2150>) @ XSet(<xset.X_frozenset object at 0x105e8f850>)

I’ll bet that that right hand thing is trying to be a Null, but isn’t. The true null set prints as a slashed zero.

    @classmethod
    def classical_set(cls, a_list) -> Self:
        null = cls([])
        wrapped = [(item, null) for item in a_list]
        return cls(wrapped)

Arrgh. It used to be true that XSet([]) == XSet.null. It appears to be true no longer, because I removed the __eq__. I have to put it back. Let’s put it back correctly, horrid though it may be:

No. Didn’t work. I’m thrashing. Time to stop, but let’s try to learn a bit more first and then revert out of here. Unless a miracle occurs.

Note: OK, I recognize that I’m clearly in trouble. One way to go would have been to stop right here and start over. But I do learn some things by proceeding. Which would have been better? I’m not sure.

I don’t see how to give classical a reference to the null set.

No wait, this works:

    @classmethod
    def classical_set(cls, a_list) -> Self:
        null = cls.null
        wrapped = [(item, null) for item in a_list]
        return cls(wrapped)

Note: That null solution was worth seeing. Stopping now might have been good but …

Down from 9 to 7 failing tests. Maybe we’re not dead yet.

Next fail in the test I was working on:

    def test_xset_records_in(self):
        r1 = XSet([("jeffries", "last"), ("ron", "first")])
        r2 = XSet([("chet", "first"), ("hendrickson", "last")])
        r2rev = XSet([("hendrickson", "last"), ("chet", "first")])
        r3 = XSet([("hill", "last"), ("geepaw", "first")])
        personnel = XSet.classical_set([r1, r2])
        print("null", XSet.null)
        for e, s in personnel:
            print(e, "@", s)
        null = XSet.null
        assert (r1, null) in personnel
        assert (r2, null) in personnel
>       assert (r2rev, null) in personnel  # this test killed Python {}
E       assert (XSet(<xset.X_frozenset object at 0x105ec9ad0>), ∅) in XSet(<xset.X_frozenset object at 0x105ecb8d0>)

I wonder how it checked that. A quick change tells me that using includes does not fix the issue.

I suspect that the issue is that the two sets being compared are not equal, but they should be. Change the test to check that. Fails. We need __eq__ and do not yet have it.

The base definition of XSet equality rests on the old scheme of the implementations all being the same:

class XSet:
    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return self.implementation == other.implementation
        else:
            return NotImplemented

How can we most readily improve this? I really hate writing out the longhand definition, but I think we may need to just do it. We are almost sure to throw this away, so let’s keep trying to learn things.

    def __eq__(self, other):
        if isinstance(other, self.__class__):
            for e,s in self:
                if other.excludes(e, 2):
                    return False
            for e,s in other:
                if self.excludes(e, s):
                    return False
            return True
        else:
            return NotImplemented

This is longhand for A == B if A subset B and B subset A. I am still getting a failure on the comparison of r2rev and r2 here:

Note: Good exercise, isn’t really the problem. I’m not sure I know what the problem even is.

    def test_xset_records_in(self):
        r1 = XSet([("jeffries", "last"), ("ron", "first")])
        r2 = XSet([("chet", "first"), ("hendrickson", "last")])
        r2rev = XSet([("hendrickson", "last"), ("chet", "first")])
        r3 = XSet([("hill", "last"), ("geepaw", "first")])
        personnel = XSet.classical_set([r1, r2])
        print("null", XSet.null)
        for e, s in personnel:
            print(e, "@", s)
        null = XSet.null
        assert (r1, null) in personnel
        assert (r2, null) in personnel
        assert r2rev == r2
        assert (r2rev, null) in personnel  # this test killed Python {}
        assert (r3, null) not in personnel

OK this is really beyond my limits: I’m going to debug to find out where that comparison is going wrong.

As soon as I do that, I see the “2” where it should say “s”. How did I do that?

Note: Starting that method over would have fixed the 2, but we didn’t have a save point. Debugging caused me to read the code, even though I knew it had to be right. It wasn’t, of course.

Only 4 tests failing now.

Ah a decent error!

    def is_subset(self, other) -> bool:
        if isinstance(other, self.__class__):
>           return self.implementation.issubset(other.implementation)
E           AttributeError: 'X_frozenset' object has no attribute 'issubset'

That’s the kind of error I was hoping for, the ones that tell me that I need to implement a method.

We need to write this out longhand for now.

class XSet:
    def is_subset(self, other) -> bool:
        if isinstance(other, self.__class__):
            return all(self.includes(e, s) for e, s in other)
        else:
            return NotImplemented

Note: This was worth learning. I am building up a list of things to be aware of next time. Unfortunately I was not writing them on cards. Fortunately, I was writing this article, which I can scan for ideas before next time.

This does not make more tests work. I am disappoint. just briefly, though, because it’s a new useful error:

>       assert len(bosses.implementation) > 0
E       TypeError: object of type 'X_frozenset' has no len()

Sweet. We can do this.

class X_frozenset:
    def __len__(self):
        return len(self.contents)

Note: This, too, was worth the learning. I think len is going to turn out to be a key notion, although there are issues with infinite sets or just sets where we’d have to generate the data to get that number. Maybe len is a bad idea …

Down to three. One is this:

    def test_xset_restrict(self):
        ron = XSet([("jeffries", "last"), ("ron", "first"), ("boss", "job")])
        chet = XSet([("chet", "first"), ("hendrickson", "last"), ("boss", "job")])
        hill = XSet([("hill", "last"), ("geepaw", "first"), ("serf", "job")])
        personnel = XSet.classical_set([ron, chet, hill])
        boss_record = XSet([("boss", "job")])
        boss_set = XSet.classical_set([boss_record])
        bosses = personnel.restrict(boss_set)
        assert isinstance(bosses, XSet)
        assert len(bosses.implementation) > 0
        assert bosses.includes(ron, None)
        assert bosses.includes(chet, None)
        assert bosses.excludes(hill, None)

I don’t think we should be looking at that .implementation there. Test still fails, now:

>       assert bosses.includes(ron, None)
E       assert False
E        +  where False = <bound method XSet.includes of ∅>(XSet(<xset.X_frozenset object at 0x105c7e910>), None)
E        +    where <bound method XSet.includes of ∅> = ∅.includes

I’ve lost the thread. Well past time. I have determined that the restrict in the test above is not finding any elements, so the subset-checking in restrict must be failing.

Note: I was moving along smoothly and suddenly, here, I just plain dropped all the balls that I had in the air. Nothing. Brain empty. I do look for just a bit more info, and it may or may not pay off. Cost was very low, just a print.

XSet(X_fs(frozenset({('boss', 'job')}))) subset? XSet(X_fs(frozenset({('hill', 'last'), ('geepaw', 'first'), ('serf', 'job')}))) False
XSet(X_fs(frozenset({('boss', 'job')}))) subset? XSet(X_fs(frozenset({('boss', 'job'), ('hendrickson', 'last'), ('chet', 'first')}))) False
XSet(X_fs(frozenset({('boss', 'job')}))) subset? XSet(X_fs(frozenset({('jeffries', 'last'), ('ron', 'first'), ('boss', 'job')}))) False

Those should say False, True, True.

Let me line one of those up differently:

XSet(X_fs(frozenset({('boss', 'job')}))) subset? 
XSet(X_fs(frozenset({('boss', 'job'), ('hendrickson', 'last'), ('chet', 'first')}))) False

Those look right to me. I am tired, confused, and have lost the thread. Roll back. Tests all back to green.

Summary

The notes above highlight things that we actually did learn from this, um, effort. If I had pre-committed to the notion that it was a spike, to be thrown away, I’d feel better. Truth is, I really thought it would be easy and wold work out.

Would it have been “better” to have rolled back sooner, at least at the second indication that I was thrashing? Perhaps, though there was learning after that.

Breakfast isn’t ready yet, so I had time to burn. That said … I definitely burned it.

Smaller steps? I don’t see quite how, but smaller steps will be good. The first batch of failures was 14 tests, if I recall. Had one change healed most of them, the step might have been small enough. But no, it dropped to 11, 7, 4 … all while red. Clear indication that creating the new implementation set was too big a step.

I think we may need to do at least two things next time:

Try to better define the interface between XSet and the implementations inside it;
Test-drive the new X_frozenset or whatever we pick, before plugging it in, ensuring that it meets the desired interface.

Not as much of a debacle as I thought … if I had called it a spike, we’d all be nodding right now.

See you next time!