Hold My Chai

In an astounding flurry, we are going to build a rename operation.¹

Hello, friends!

This morning, I propose to build a set operation that will rename desired fields of a set, while leaving any others untouched. Let me show you a test:

    def test_rename(self):
        new_names = XSet.from_tuples((
            ('first', 'first_name'), 
            ('last', 'last_name')))
        person = XSet.from_tuples((
            ('ron', 'first'), 
            ('jeffries', 'last'), 
            ('serf', 'job')))
        renamed = person.rename(new_names)
        assert renamed.includes('jeffries', 'last_name')
        assert renamed.includes('ron', 'first_name')
        assert renamed.includes('serf', 'job')

This desired operation is “just like” our re_scope operation, except that it leaves untouched any fields not mentioned in the renaming set.

Note that the desired operation just renames the elements in one set. In a relational system, we would presumably want to rename the elements of all the records (sets) in the outer relation (set). We’ll address that at a later time. One step at a time.

Let me call my shot this morning in more detail. I propose to build this operation out of other, simpler operations, like this:

Using a new operation, scope_set, get all the existing scopes from the record;
Using a new operation, element_set, get all the old scopes that are to be renamed, from the rename set;
Using a number of new operations, compute a set that renames all the scopes, old to old and old to new as appropriate.²

Let’s get started. I’ll begin by mark-skipping the test above, so that I can commit on greens.

    def test_scope_set(self):
        person = XSet.from_tuples((
            ('ron', 'first'), 
            ('jeffries', 'last'), 
            ('serf', 'job')))
        scopes = person.scope_set()
        assert scopes.includes('first', 'first')
        assert scopes.includes('last', 'last')
        assert scopes.includes('job', 'job')

We will proceed here in two phases. First, we will define all our operations at the XSet level, then, later³, we’ll see about providing more efficient versions at the XImplementation level. So:

class XSet:
    def scope_set(self):
        return XSet.from_tuples((s, s) for e, s in self)

Green. Commit: implement scope_set.

Reflection

This isn’t very well tested, is it? Well, no, but look at the implementation. How wrong could it be?

Let’s move on.

    def test_element_set(self):
        new_names = XSet.from_tuples((
            ('first', 'first_name'), 
            ('last', 'last_name')))
        old_names = new_names.element_set()
        assert old_names.includes('first', 'first')
        assert old_names.includes('last', 'last')

The implementation seems clear:

    def element_set(self):
        return XSet.from_tuples((e, e) for e, s in self)

Green. Commit: implement element_set.

Let’s look in a bit more detail at part 3 of our plan:

Using a number of new operations, compute a set that renames all the scopes, old to old and old to new as appropriate.

I propose to implement symmetric difference, which will itself be made up of a few other set operations, although it could also be open-coded. The symmetric difference of two sets is all the elements that are in one or the other, but not both. There are a few well-known ways to compute symmetric difference including:

A.sym_diff(B) = (A.union(B)).diff(A.intersect(B)), or
A.sym_diff(B) = (A.diff(B)).union(B.diff(A))

So I need the union and difference operations to make symmetric difference. It has been my long practice to shorten some of those names, like diff rather than difference. I hope you’ll be understanding about that.

    def test_union(self):
        a = XSet.from_tuples((
            ('first', 'first_name'), 
            ('last', 'last_name')))
        b = XSet.from_tuples((
            ('first', 'first'), 
            ('last', 'last')))
        u = a.union(b)
        assert u.includes('first', 'first')
        assert u.includes('first', 'first_name')
        assert u.includes('last', 'last')
        assert u.includes('last', 'last_name')

And …

    def union(self, other):
        mine = set((e, s) for e, s in self)
        others = set((e, s) for e, s in other)
        both = mine.union(others)
        return XSet.from_tuples(both)

This relies on the Python set.union method, which will work for us because our tuples make perfectly good sets.

Now difference:

    def test_diff(self):
        a = XSet.from_tuples((
            ('first', 'first_name'), 
            ('last', 'last_name')))
        b = XSet.from_tuples((('last', 'last_name'), ))
        diff = a.diff(b)
        assert diff.includes('first', 'first_name')
        assert diff.excludes('last', 'last_name')

And, of course …

    def diff(self, other):
        mine = set((e, s) for e, s in self)
        others = set((e, s) for e, s in other)
        remaining = mine.difference(others)
        return XSet.from_tuples(remaining)

And now let’s do symmetric difference:

    def test_sym_diff(self):
        first = XSet.classical_set(('a', 'b', 'c'))
        second = XSet.classical_set(('c', 'd'))
        sym = first.sym_diff(second)
        expected = XSet.classical_set(('a', 'b', 'd'))
        assert sym == expected

Weird: For some reason, I have been checking individual elements in my tests. Why not check for set equality? We have it implemented, after all. Weird.

With

    def sym_diff(self, other):
        d1 = self.diff(other)
        d2 = other.diff(self)
        result = d1.union(d2)
        return result

Green. I forgot to commit diff. Commit: implement diff and sym_diff.

Now before we jump to rename, I want to test-drive computing the rename set. Here’s a new test with a key chunk of code left out:

    def test_sym_diff_names(self):
        old_names = XSet.from_tuples((
            ('first', 'first'), 
            ('last', 'last'), 
            ('job', 'job')))
        new_names = XSet.from_tuples((
            ('first', 'first_name'), 
            ('last', 'last_name')))
        # renames = then a miracle occurs
        expected = XSet.from_tuples((
            ('first', 'first_name'), 
            ('last', 'last_name'), 
            ('job', 'job')))
        assert renames == expected

The result is surely the one we want if old_names are all the names from the set. We want to rename first and last to their new names, and rename job to job.

How do we get that result? How about this, coding directly in the test:

    def test_sym_diff_names(self):
        old_names = XSet.from_tuples((
            ('first', 'first'), 
            ('last', 'last'), 
            ('job', 'job')))
        new_names = XSet.from_tuples((
            ('first', 'first_name'), 
            ('last', 'last_name')))
        replaced = new_names.element_set()
        update = new_names.union(replaced)
        renames = old_names.sym_diff(update)
        expected = XSet.from_tuples((
            ('first', 'first_name'), 
            ('last', 'last_name'), 
            ('job', 'job')))
        assert renames == expected

You may find this hard to think about. Let’s step through it.

We have the old names from the set: we’ll get those with scope_set when we code rename.

old_names = { first^first, last^last, job^{job }}

We have the desired new names: those are given:

new_names = { first^first-name, last^last_name }

We get a list of the old names that will be replaced, using element_set:

replaced = { first^first, last^last }

We union those with the new ones to get the update set

update = { first^first, last^last, first^first_name, last^last_name }

And symmetric difference of the old names and the update set gives us the correct renaming set:

renames = { first^first_name, last^last_name, job^job }

Note: This a a general characteristic of symmetric difference. If in some set we propose to update items x and y to x’ and y’, we can create the set {x, y, x’, y’} and symmetric difference it with the original set. The symmetric difference will remove the duplicated items (x and y) and add in the new ones (x’ and y’). Voila! update.; The astute reader may wonder at this point whether, when working with immutable sets as we are here, it might sometimes be wise to virtualize the symmetric difference operator rather than produce a whole new set with the new items in it. The astute reader is correct. For a small number of updates, pipe-lining through a series of small symmetric differences would slow down read operations a little bit, while speeding up updates profoundly.; Actually doing this is outside our scope (no pun intended) at the moment.

We should be in a position to implement rename now. I’ll turn on the rename test and implement rename.

    def rename(self, re_scoping_set: Self):
        old_names = self.scope_set()
        replaced_names = re_scoping_set.element_set()
        update = replaced_names.union(re_scoping_set)
        renames = old_names.sym_diff(update)
        return self.re_scope(renames)

And we are green. Commit: implement rename using rescoping set.

Let’s reflect and segue into a summary.

Reflection

Hand me back my chai⁴. We implemented a general rename that takes a re-scoping set {old^new} and renames all the scopes mentioned in the re-scoping set, while leaving all the other scopes in the set as they were. We built that operation out of five new set operations: scope_set, element_set, difference, union, and symmetric difference.

Surely you are thinking that no one, absolutely no one, could pull those five operations out⁵ and then combine them to do a rename. As far as I know, you are absolutely right. There is a lot of familiarity with set theory on my part, and a lot of math before that, and a lot of work with Dave Childs and with others and alone, to get me to a point where I can just about do that.

And I did play with it a bit with Pencil and Paper (well, Procreate) last night, to get the math somewhat clear in my mind.

The only thing that is relatively new to me here is the re_scope method, which either I was not aware of until recently, or I had forgotten it. And I had to refresh on scope_set and element_set as well, though I knew that I wanted them. I think I used to know what they were but had forgotten those two, probably because I didn’t have a use for them in the past. Or I just forgot. There’s only so much room in there.

Now we could, presumably, code up a direct implementation of the rename, looking something like this:

for e,s in self:
    find whether s appears in the rename set as s, new
    if yes:
       add in e, new
    else
        add in e, s

The ‘find’ bit is left to the reader. Point is, we could do that. Why, by all that’s holy, would we instead implement five set operations and use them to get the same result? You might well ask.

Here are some reasons, some good, all true:

Building a program from well-defined elementary operations is a good design practice.
We can, if and when we wish, do formal mathematical proofs of our work. (We will rarely if ever do that.)
We can, with practice, reason more accurately in terms of well-defined set operations. (I do that frequently. Others could, with practice.)
We believe that we can implement a complex information system more rapidly this way.
We believe that we can gain interesting run-time efficiencies using this approach
We are having fun doing this.

Items 4 and 5 are matters of belief. I think that 4 is pretty clear, and follows from #1. As for #5, I’ve seen it done in experiments and even a time or two in real life, though we have not done anything like that here. If I last long enough, maybe we will.

I would freely grant that unfamiliarity with math, or with set theory, would not equip a developer to use these things readily. It would take practice. I would freely grant that some of those folx⁶ would be good-enough programmers that they could probably implement whatever user-oriented thing we might do without all the theory. I am not sure they could do it more reliably or faster, but it would probably come down to individual differences: I wouldn’t say that everyone familiar with this stuff would out-perform everyone who isn’t.

I do think that one has to grant that an operation that renames all the elements of a set, given a set of new names, leaving unmentioned names alone, is rather neatly implemented as:

    def rename(self, re_scoping_set: Self):
        old_names = self.scope_set()
        replaced_names = re_scoping_set.element_set()
        update = replaced_names.union(re_scoping_set)
        renames = old_names.sym_diff(update)
        return self.re_scope(renames)

I would welcome observations and comments via mastodon (ronjeffries at mastodon dot social) or email (ronjeffries at acm dot org).

Caveats

What would happen if we wanted to rename all the fields of all the records of our 1000-record file-resident set? Well, bunkie, I have to admit that we’d create a thousand record set of four fields, and rename every single one of them, resulting in a lot of stuff in memory. That would probably be bad.

We’ll want to address how we might do that sort of thing, and the general question of large sets and exceedingly complex sets, both of which we could probably create and hurt ourselves with. We’ll do that as time goes on.

Summary

Called it. Rename, done with five new set operations, all in one session, from 0730 to 1000 hours.

I am pleased. See you next time!

If you can name the source of “in an astounding flurry” I will give you an official Ron Jeffries hat tip on the social medium of my choice. ↩
A small miracle will be needed here, but I know what it is. ↩
By later, I mean “not today”. ↩
I don’t drink beer: I don’t like it. Especially not at 0730. ↩
… out of some orifice, that is … ↩
Folks, of course. I say “folx” in an attempt to be more inclusive. I hope it’s not too trendy, or too naff. If it’s too liberal, well, you can get your set theory somewhere else. ↩