A Better Way?

FAFO on GitHub

Laurent challenged me to find a better, more elegant formulation for getting the correct re_scoping set from a provided renaming set. I thought I had it. Then I thought I hadn’t. Then I thought I had.

I took Laurent’s challenge to my heart, and to my iPad, and last night, late, worked out what I thought was a better formulation. I felt sure that to consider the solution better, it needed to involve symmetric difference, because symmetric difference is so cool that I felt sure Laurent would lean that way.

Last night, I thought I had it, and sent him my solution.Then I headed for bed, later than usual. By the time I got there, I had thought of a case that my solution didn’t handle. Too late, I turned in.

This morning I found a note from Laurent. It seems that he found the same solution that I had: at least he congratulated me. But I am still concerned about a specific case as you’ll see.

Let’s talk about the basic requirement. The notion is that we want to rename some of the scopes in a set, and we want to leave all the rest alone. We propose to do this with a re_scope operation, which does a rename, but also returns a set containing only the names mentioned in the control set. This makes re_scope a lot more powerful, because, for example, if you have a set of a million records, and you re_scope by this set:

{ 13579¹, 251² },

you’ll get a set back containing first, record number 13579 from the original set, and then record 251, as an n-tuple. Very powerful. We want to retain that power, and we want rename to be a bit different from that. That’s why there are two methods, though we do use re_scope to do rename, by formulating a re_scope set that does the job.

Laurent’s original bug report involved the user supplying pairs renaming an old name to itself, which caused us to lose that field entirely, not at all what we wanted. Here is the test for that, now working:

    def test_rename_per_laurent(self):
        new_names = XSet.from_tuples((('first', 'first_name'), ('last', 'last')))
        person = XSet.from_tuples((('ron', 'first'), ('jeffries', 'last'), ('serf', 'job')))
        renamed = person.rename(new_names)
        assert renamed.includes('jeffries', 'last')
        assert renamed.includes('ron', 'first_name')
        assert renamed.includes('serf', 'job')

When it failed, this test lost the jeffries^last entirely, clearly not what we might want. So, with my fix, we have an implementation that doesn’t lose fields. And Laurent says he has a much nicer formulation (and I assumed that it has the same behavior as my current implementation, which turns out not to be the case, as you’ll see below).

I knew that Laurent thought his solution was very nice, and he had said something about not using union. That gave me a clue: it was very likely that Laurent’s solution involved symmetric difference? Why? Because symmetric difference is a very powerful operator and it tends to produce compact and elegant expressions for things. (It is also nearly as hard to think about as APL, but when one has one’s math hat on, that’s OK.)

To solve the problem, first I scribbled on my iPad, until I got a formula that I thought would work, and then wrote several tests. They all passed. Here’s the first one, essentially the original form of the bug that Laurent found in my first implementation:

    def test_new_rename(self):
        s = XSet.from_tuples((
            ('a','x'), 
            ('b','y'), 
            ('c','z')))
        r = XSet.from_tuples((
            ('y', 'y'), 
            ('z', 'w')))
        sss = s.scope_set()
        assert sss == XSet.from_tuples((
            ('x', 'x'), 
            ('y','y'),
            ('z','z')))
        res = r.element_set()
        assert res == XSet.from_tuples((
            ('y', 'y'), 
            ('z', 'z')))
        r_clean = r ^ res
        assert r_clean == XSet.from_tuples((
            ('z','w'),
            ('z', 'z')))
        scoper = sss ^ r_clean
        assert scoper == XSet.from_tuples((
            ('x', 'x'), 
            ('y','y'), 
            ('z', 'w')))

That is, in fact the result we want.

I included all those interspersed assertions more to document the steps than as actual tests: they show the intermediate results.

Let me try to explain in words how it works:

sss is the scope set of the input s, just each of its scopes duplicated top ad bottom. That set, used in a re_scoping operation, would rename everything to be what it already is. We actually want that, because, except for the ones we’re changing, we want the others unchanged. So this set is a bit more than we want, because it will contain any names that the r set might want to remove. We intend to fix that as we proceed.

res is the element set of the input r. Those pairs are all the names mentioned in the element side of each pair in r, namely the list of names to be changed.

r_clean is r ^ res. Here, that will remove the y^y from r and add in z^z.

Then we compute scoper = sss ^ r_clean, which, because symmetric difference, adds in the x and y pairs, and removes the z, giving us the result we wanted. The test passes.

Now let’s use a working test to produce a compact form of our solution. Here’s my working Laurent test without the intermediate assertions:

    def test_new_rename_compact(self):
        s = XSet.from_tuples((('a','x'), ('b','y'), ('c','z')))
        r = XSet.from_tuples((('y', 'y'), ('z', 'w')))
        sss = s.scope_set()
        res = r.element_set()
        r_clean = r ^ res
        scoper = sss ^ r_clean
        assert scoper == XSet.from_tuples((('x', 'x'), ('y','y'), ('z', 'w')))

Inline r_clean:

    def test_new_rename_compact(self):
        s = XSet.from_tuples((('a','x'), ('b','y'), ('c','z')))
        r = XSet.from_tuples((('y', 'y'), ('z', 'w')))
        sss = s.scope_set()
        res = r.element_set()
        scoper = sss ^ r ^ res
        assert scoper == XSet.from_tuples((('x', 'x'), ('y','y'), ('z', 'w')))

Inline sss and res:

    def test_new_rename_compact(self):
        s = XSet.from_tuples((('a','x'), ('b','y'), ('c','z')))
        r = XSet.from_tuples((('y', 'y'), ('z', 'w')))
        scoper = s.scope_set() ^ r ^ r.element_set()
        assert scoper == XSet.from_tuples((('x', 'x'), ('y','y'), ('z', 'w')))

That little formula just has to be right. Isn’t it pretty? So that’s what I had last night.

Then, nearly asleep, I thought of this test:

    def test_new_rename_with_all_duplicates(self):
        s = XSet.from_tuples((
            ('a','x'), 
            ('b','y'), 
            ('c','z')))
        r = XSet.from_tuples((
            ('x', 'x'), 
            ('y', 'y'), 
            ('z', 'z'), 
            ('z', 'w')))
        scoper = s.scope_set() ^ r ^ r.element_set()
        expected = XSet.from_tuples((
            ('x', 'x'), 
            ('y', 'y'), 
            ('z', 'w')))
        assert scoper == expected

That fails, and I thought that expected was what we should expect. This test passes:

    def test_new_rename_with_all_duplicates(self):
        s = XSet.from_tuples((
            ('a','x'), 
            ('b','y'), 
            ('c','z')))
        r = XSet.from_tuples((
            ('x', 'x'), 
            ('y', 'y'), 
            ('z', 'z'), 
            ('z', 'w')))
        scoper = s.scope_set() ^ r ^ r.element_set()
        expected = XSet.from_tuples((
            ('x', 'x'), 
            ('y', 'y'), 
            ('z', 'z'), 
            ('z', 'w')))
        assert scoper == expected

This rename will keep all of x y and z scopes plus add in a w-scoped element for anything formerly scoped as z. Last night, nearly asleep, I thought that was a bug. Now, I’m coming to think that it’s a feature. I am thinking that for two reasons:

First—and you can call me weird—the formulation of the operation is so elegant that it just has to make sense.

Second, I am starting to see how to explain, even in words, what the renaming set can and will do. Something like this:

The rename operator accepts a renaming set of scoped elements,

{ …, old_name^new_name, … },

and produces a result set where any element x^old_name in the input set will appear as x^new_name in the result set. Any elements of the input set whose scopes are not mentioned in the renaming set will appear as is.

The renaming set can contain multiple elements with the same old name and different new names. In this case, the element renamed will appear with each of the new names as scope. To rename an element and retain the old name as well, include both old_name^old_name and old_name^new_name in the renaming set.

For example, if we have

{ jeffries^last, ron^first },

and we rename with { last^last_name, last^last },

we will get

{ jeffries^last_name, ron^first, jeffries^last }

Note that ‘jeffries’ appears under both scopes ‘last_name’ and ‘last’.

I think a description like that, plus a few examples in the manual or the tests, is quite reasonable.

I decide to retain this definition. So now I can install this new implementation in the rename method:

    def rename(self, old_to_new_re_scope_set: Self):
        # renames this set, not its contents sets
        complete_re_scope = self.rename_to_re_scope(old_to_new_re_scope_set)
        return self.re_scope(complete_re_scope)

    def rename_to_re_scope(self, r):
        return self.scope_set() ^ r ^ r.element_set()

All 103 tests continue to pass. But I need to change them to call the rename_to_re_scope method, not their own local versions.

    def test_new_rename_inline(self):
        s = XSet.from_tuples((('a','x'), ('b','y'), ('c','z')))
        r = XSet.from_tuples((('y', 'y'), ('z', 'w')))
        sss = s.scope_set()
        assert sss == XSet.from_tuples((('x', 'x'), ('y','y'),('z','z')))
        res = r.element_set()
        assert res == XSet.from_tuples((('y', 'y'), ('z', 'z')))
        r_clean = r ^ res
        assert r_clean == XSet.from_tuples((('z','w'),('z', 'z')))
        scoper = sss ^ r_clean
        assert scoper == XSet.from_tuples((('x', 'x'), ('y','y'), ('z', 'w')))

    def test_new_rename_method(self):
        s = XSet.from_tuples((('a','x'), ('b','y'), ('c','z')))
        r = XSet.from_tuples((('y', 'y'), ('z', 'w')))
        scoper = s.rename_to_re_scope(r)
        assert scoper == XSet.from_tuples((('x', 'x'), ('y','y'), ('z', 'w')))

    def test_new_rename_with_all_duplicates(self):
        s = XSet.from_tuples((('a','x'), ('b','y'), ('c','z')))
        r = XSet.from_tuples((('x', 'x'), ('y', 'y'), ('z', 'z'), ('z', 'w')))
        scoper = s.rename_to_re_scope(r)
        expected = XSet.from_tuples((('x', 'x'), ('y', 'y'), ('z', 'z'), ('z', 'w')))
        # note that this will duplicate (z, z) and (z, w) AS INTENDED
        assert scoper == expected

I retain the first test to show how it works, and change the next two to call the method in the XSet class.

Commit: new formulation for converting rename set to re_scope operation

Reflective Remarks

The process here is a bit ragged, for two reasons. First, last night I was writing the tests to develop the method, and first thing this morning, before I even started to write, I wrote the test with z^z, thinking it reflected a bug, and decided during writing that, no, I was wrong, it was not wrong, it was right. So my thought process was a bit /\/\/\/\ to begin with.

Second, what might have been better would have been to stop, save my new scheme in the test, then carefully refactor it into XSet. Instead, I just slapped it in and then redid the tests. Doing the tests to call XSet first probably would have been better.

Overall, I think it is fair to say that the formula sss^r^res is so compact as to be nearly impossible to take in, and nearly impossible to analyze for correctness. In my long experience with both math and extended set theory on the computer, this is a perennial issue. Set theory gives us the ability to do very powerful things with very few operations. This is particularly true when we use symmetric difference. It’s just hard to think about.

Now I invariably say that our code needs to be clear, readable, obviously correct. The new method is really none of those, by most standards:

    def rename_to_re_scope(self, r):
        return self.scope_set() ^ r ^ r.element_set()

Maybe a better name for the method?

    def rename(self, old_to_new_re_scope_set: Self):
        # renames this set, not its contents sets
        complete_re_scope = self.convert_rename_to_re_scope(old_to_new_re_scope_set)
        return self.re_scope(complete_re_scope)

    def convert_rename_to_re_scope(self, r):
        return self.scope_set() ^ r ^ r.element_set()

Helpful, but far from perfect. Commit: rename method.

I don’t know. It’s a dilemma: the formal expression for things can be very compact. Could it be that explaining temporary variable names would be more clear? We did have those in the tests:

    def test_new_rename_inline(self):
        s = XSet.from_tuples((('a','x'), ('b','y'), ('c','z')))
        r = XSet.from_tuples((('y', 'y'), ('z', 'w')))
        sss = s.scope_set()
        assert sss == XSet.from_tuples((('x', 'x'), ('y','y'),('z','z')))
        res = r.element_set()
        assert res == XSet.from_tuples((('y', 'y'), ('z', 'z')))
        r_clean = r ^ res
        assert r_clean == XSet.from_tuples((('z','w'),('z', 'z')))
        scoper = sss ^ r_clean
        assert scoper == XSet.from_tuples((('x', 'x'), ('y','y'), ('z', 'w')))

Might help. Doesn’t seem to help much. I welcome ideas, but I think that the point of an XST system is to provide mathematically correct operations and use them in correct ways.

Summary

This past few days has been a bumpy ride for me. First, Laurent found a bug, which I fixed easily. Then he challenged me to find a better implementation, and with some scribbling I came up with one, the same one he had in mind. Good sign! Then I thought that the elegant solution did a bad thing. Then I realized it was a good thing once I really thought about it.

And now I have a compact lovely little expression whose actual operation is clear to almost no one, but that I am confident works exactly as we want.

Very interesting indeed.

See you next time!