FAFO 14: Select

FAFO on GitHub

In which, we report on a nifty little thing, and then, well, I don’t know yet what I’ll do. Some concluding remarks on what it is that I do here.

Hello, friends!

Last evening during the desultory beginning of FGNO¹, while waiting for things to liven up, I implemented a nice little method on XSet, select, and then used it in restrict. It goes like this:

    def test_select(self):
        def sel(a):
            return a.element > 3

        s1 = XSet.tuple_set((0, 1, 2, 3, 4, 5, 6))
        selected = s1.select(sel)
        assert Atom(4, 5) in selected

To use select, you provide a function that, when passed an atom from a set, returns a boolean. If True, the atom is in the result, and if False, it is not.

In the test above, we just check to see if the element part is greater than three. Since the input set is a tuple set, each element has an index and they go { 0¹, 1², …, 4⁵, … }.

The select function is this:

    def select(self, cond):
        return XSet((candidate_atom for candidate_atom in self.contents 
            if cond(candidate_atom)))

select just scans the contents of the set, and, well, selects the ones where cond is true. Quite straightforward. With that in hand, I did a harder test:

    def test_harder_select(self):
        def sel(a):
            return a in likes

        likes = XSet.classical_set((3, 4, 5))
        haves = XSet.classical_set((1, 2, 3, 4, 5, 6, 7))
        result = haves.select(sel)
        null = XSet.null
        assert Atom(1, null) not in result
        assert Atom(2, null) not in result
        assert Atom(3, null) in result
        assert Atom(4, null) in result
        assert Atom(5, null) in result
        assert Atom(6, null) not in result

Here we include items from haves if they are also in likes. (There is an easier way to do that, of course, but we haven’t implemented it yet.) This test is also green.

In addition, the test highlights the need for an easier way to express membership than creating an atom. We really don’t want anyone to even know about atoms.

But now that we have select, we can rewrite restrict to use it:

    def restrict(self, other) -> Self:
        if not isinstance(other, self.__class__):
            return NotImplemented

        def other_has_match(our_atom):
            return any((other_item.element.is_subset(our_atom.element) 
                for other_item in other.contents))
        return self.select(other_has_match)

Here, our condition function is that any element of the “other” set matches (is a subset of) the given record in the source set. This is the same condition we wrote out longhand previously.

Reflection

I did this, not because I needed the new select, but rather speculatively, imagining that it would be useful. But I was also motivated by the complexity of the restrict method, which used to look like this:

    def restrict(self, other):
        if not isinstance(other, self.__class__):
            return NotImplemented
        return frozenset((candidate_atom for candidate_atom in self.contents
                          if any((check_atom.element.is_subset(candidate_atom.element)
                                  for check_atom in other.contents))))

That’s no longer than the newer version (if we ignore a blank line) but it has that rather complicated comprehension, which I found hard to write and maybe harder to understand.

With just the one application of select, I’m not sure it’s bearing its weight, but I am confident that either it will, or that we’ll unwind it. One thing is for sure … as a user of this imaginary XSet library, I might need to select records on some weird condition that set operations cannot provide, and there will have to be some way of doing it.

One example: our speculative problem of a day or so ago, selecting the records with salary greater than some constant. That one will not fall readily into a standard set operation other than one where we pass in a predicate condition.

We can be confident that select will turn out to bear its own weight. Perhaps more important, in these early days with the XSet implementation, we’re learning how things can be done. If we go too far in some direction, that’ll be fine. We can’t know we’re going far enough unless we sometimes go too far. (This observation works better in programming than it does in cliff-walking, so be advised.)

And Now …

Let’s just test in a few other little things. I think I want the has_at method, which will answer whether a set contains a given element-scope pair.

    def test_has_at(self):
        odd_set = XSet([Atom(42, "answer"), Atom(666, XSet.null)])
        assert odd_set.has_at(42, "answer")
        assert odd_set.has_at(666, XSet.null)
        assert odd_set.has_at(666)
        assert not odd_set.has_at(42)

I intend that has_at accepts an element and a scope, but that it defaults the scope to XSet.null, for convenience in working with classical sets. Test does not run, for want of the method. The method should be easy:

    def has_at(self, element, scope=None):
        atom = Atom(element, scope if scope else self.null)
        return atom in self

Green. Commit: has_at.

You might have a question here, and I have an objection. The question is: is this really enough of a test for this method? I believe it is, because I already trust in to work, based on long experience with existing tests. I do think that now that it exists, I’ll quickly revise some tests to use it.

But first, I don’t really like the name has_at. I’ve used it in previous universes, and I didn’t really like it then, either. The method represents this math:

A.has_at(a, b) iff a ∈_b A

When I read that kind of thing without the subscript, I think/say “a element A” or “a is an element of A”. With the subscript, if I were talking with someone, I might say “a is element-b of A” or “a is element sub b of A”.

I would not likely say “A has a at scope b”, or anything like that.

I see no way to make the element the subject of a Python statement.

Would has_es be better? Or maybe has_e_s? Not easy to type.

How about includes? Let’s try that for a while.

    def includes(self, element, scope=None):
        atom = Atom(element, scope if scope else self.null)
        return atom in self

And let’s recast some tests to see how it looks.

I think this one will be a winner:

    def test_harder_select(self):
        likes = XSet.classical_set((3, 4, 5))
        haves = XSet.classical_set((1, 2, 3, 4, 5, 6, 7))

        def sel(a):
            return a in likes
        result = haves.select(sel)
        null = XSet.null
        assert Atom(1, null) not in result
        assert Atom(2, null) not in result
        assert Atom(3, null) in result
        assert Atom(4, null) in result
        assert Atom(5, null) in result
        assert Atom(6, null) not in result

That becomes:

    def test_harder_select(self):
        likes = XSet.classical_set((3, 4, 5))
        haves = XSet.classical_set((1, 2, 3, 4, 5, 6, 7))

        def sel(a):
            return a in likes
        result = haves.select(sel)
        assert not result.includes(1)
        assert not result.includes(2)
        assert result.includes(3)
        assert result.includes(4)
        assert result.includes(5)
        assert not result.includes(6)

And, clearly, this code asks for this method:

    def excludes(self, element, scope=None):
        return not self.includes(element, scope)

Allowing:

    def test_harder_select(self):
        likes = XSet.classical_set((3, 4, 5))
        haves = XSet.classical_set((1, 2, 3, 4, 5, 6, 7))

        def sel(a):
            return a in likes
        result = haves.select(sel)
        assert result.excludes(1)
        assert result.excludes(2)
        assert result.includes(3)
        assert result.includes(4)
        assert result.includes(5)
        assert result.excludes(6)

I like that. I’ll recast the other tests. For example:

    def test_xset_tuple_restrict(self):
        ron = XSet([Atom("jeffries", "last"), Atom("ron", "first"), Atom("boss", "job")])
        chet = XSet([Atom("chet", "first"), Atom("hendrickson", "last"), Atom("boss", "job")])
        hill = XSet([Atom("hill", "last"), Atom("geepaw", "first"), Atom("serf", "job")])
        personnel = XSet.tuple_set([ron, chet, hill])
        boss_record = XSet([Atom("boss", "job")])
        boss_set = XSet.tuple_set([boss_record])
        bosses = personnel.restrict(boss_set)
        assert bosses.includes(ron, 1)
        assert bosses.includes(chet, 2)
        assert bosses.excludes(hill, 3)

I am pleased enough to commit this: rename has_at to includes. Provide excludes.

Reflection

I am a bit concerned about the default in includes, allowing the shorthand where we can leave scope out and it defaults to the null set (the signal for a classical set element). I think it would be better not to default the value but to allow None to stand in for null.

I’ll make that change:

    def excludes(self, element, scope):
        return not self.includes(element, scope)

    def includes(self, element, scope):
        atom = Atom(element, scope if scope else self.null)
        return atom in self

A bunch of tests cry out for the obvious changes, such as:

    def test_xset_restrict(self):
        ron = XSet([Atom("jeffries", "last"), Atom("ron", "first"), Atom("boss", "job")])
        chet = XSet([Atom("chet", "first"), Atom("hendrickson", "last"), Atom("boss", "job")])
        hill = XSet([Atom("hill", "last"), Atom("geepaw", "first"), Atom("serf", "job")])
        personnel = XSet.classical_set([ron, chet, hill])
        boss_record = XSet([Atom("boss", "job")])
        boss_set = XSet.classical_set([boss_record])
        bosses = personnel.restrict(boss_set)
        assert isinstance(bosses, XSet)
        assert bosses.includes(ron, None)
        assert bosses.includes(chet, None)
        assert bosses.excludes(hill, None)

I think that’s legit. Complete then commit: require scope in includes excludes, allow None for null

Reflection

I feel good about includes rather than has_at or any of the other alternatives I thought of. It seemed at first that defaulting scope to null set was good, but with reflection it seemed to me to be error prone.

I do think we may someday want another method, perhaps includes_at_any_scope, hopefully with a shorter name.

We could do a few more set operations, at least including those that frozenset will implement for us, but I think I’ll step back from that for now, for at least two reasons: first, and most important, this article is more than long enough for my reader (hi, reader!). Second, while the implementation using frozenset seems solid, I am a bit concerned about relying on that as my fundamental implementation.

It might be desirable to have an even more rudimentary form of sets. It will almost certainly be desirable to have some specialized forms. Arguably the frozenset implementation is specialized and there should be a more fundamental one? Or does it really matter?

Childs used to say that all that mattered about a class was its membership condition, and that the form of that membership condition didn’t matter, it was the same set if it had the same membership condition. Or something like that. Anyway, no particular representation is “primary”. There are representations (currently only one in this version) and we use them as appropriate. So long as they all support all the same operations, it’s all good.

Last night at FGNO, a member had the audacity to ask me why I was doing this, and what it was for. My answer to the first question is “because it interests me”, and to the second, “I don’t know yet”. It would be nice if I actually had some real use for it, but at present, I have none. I can thrive for a while on the first. I hope you, dear reader, can find some interest here as well. To that end … see the summary.

Summary

As I look at this article and the recent series, and, for that matter, at the last thousand or so code-related articles, they’re all trying to demonstrate a very few things:

Code can be created in an evolutionary fashion and can stay well-designed and well-crafted indefinitely.
There’s lots of thinking involved, all the time, but none of it needs to be cast in concrete: we can improve anything if we need to. Code and its design can be quite malleable.
Testing, and in particular test-first, and in particular TDD, provides an unmatched level of confidence as we evolve the code. Without tests, I’m uncertain and my mistakes bite me. With tests, that happens far less often.
Small steps are so important. When I can commit after every few lines, my progress is stable, and generally in a direction toward more capability better expressed. When I do make missteps, they are small and I can quickly step in the right direction, or, sometimes, quickly revert and then take a different step. When my steps are larger, I more often get trapped in a debugging situation, or with a lot of code that I suddenly do not like.
Programming can be a peaceful, thoughtful, calm endeavor, delivering a feeling of accomplishing something, and doing it well.

I’m not trying to get you to do any particular thing. I am, however, trying to give you things to think about and possibly to try. Why? Because I enjoy what I do when I’m programming, and I wish the same for you. However you find your joy is fine with me, although I do think you might find some among the ideas that I express and demonstrate here.

See you next time, I hope!

Friday Geeks Night Out, a zoom session held reliably every Tuesday evening. ↩