Calculated Fields?

FAFO on GitHub

Suppose we want a set where one element is the sum of two others. What about calculating a scope? I am interrupted and then spot a squirrel.

Suppose we want a set of “records”:

{ { a^A, b^B, a+b^C }, … }

It’s easy enough to code that up directly, but how could we somehow provide a little language, or even a just a function, to do it?

I think we should start with “easy enough to code up directly” and see if it is in fact easy. I get about this far and become irritated:

    def test_calc_field(self):
        r1 = XSet.from_tuples(((1, 'a'), (2, 'b')))
        r2 = XSet.from_tuples(((10, 'a'), (20, 'b')))
        records = XSet.n_tuple((r1, r2))
        result = []
        for rec, _s in records:
            a = rec.element_at('a')
            b = rec.element_at('b')
            c = a + b
            new_rec = XSet.from_tuples(((a, 'a'), (b, 'b'), (c, 'c')))
            result.append(new_rec)
        new_set = XSet.from_tuples(result)
        ...

I invented the element_at method on the fly and have not implemented it yet. Its definition is something like if there is a scope-element pair with the scope provided, return the element part, otherwise None. We might want to say “exactly one” such pair. Right now, I just want a reasonable way to take a record apart. And down at the bottom, I want to quickly say that I want to see a == 1, b == 2 and c == 3, and a == 10 and b == 20 and c == 30 in the result set.

Let me finish the test the hard way:

Mysterious Pause

I was interrupted just after typing the sentence above. It was around 1500 hours Friday. Now it is 0812 Saturday. And I have had some thinking, none of it making me happy. Let’s do finish the test. I’ve come up with a different name for the method that doesn’t exist.

    def test_calc_field(self):
        r1 = XSet.from_tuples(((1, 'a'), (2, 'b')))
        r2 = XSet.from_tuples(((10, 'a'), (20, 'b')))
        records = XSet.n_tuple((r1, r2))
        result = []
        for rec, _s in records:
            a = rec.get('a')
            b = rec.get('b')
            c = a + b
            new_rec = XSet.from_tuples(((a, 'a'), (b, 'b'), (c, 'c')))
            result.append(new_rec)
        new_set = XSet.from_tuples(result)
        c1 = new_set.get(1)
        assert c1.get('c') == 3

There is no get method. I declare it to return any element of the set whose scope is the one provided, or None if there is no such element. We’ll write it now:

    def get(self, scope):
        for e, s in self:
            if s == scope:
                return e
        return None

I am curious why my test did not go green. I haven’t looked at it this morning to see why it was failing, so there may be some other difficulty.

Yes. The test needs to create an n-tuple of records. Couched this way, we’re green:

    def test_calc_field(self):
        r1 = XSet.from_tuples(((1, 'a'), (2, 'b')))
        r2 = XSet.from_tuples(((10, 'a'), (20, 'b')))
        records = XSet.n_tuple((r1, r2))
        result = []
        for rec, _s in records:
            a = rec.get('a')
            b = rec.get('b')
            c = a + b
            new_rec = XSet.from_tuples(((a, 'a'), (b, 'b'), (c, 'c')))
            result.append(new_rec)
        new_set = XSet.n_tuple(result)
        c1 = new_set.get(1)
        assert c1.get('c') == 3

Reading the test code and passing over the many dunder methods in XSet made me think of allowing one to subscript an XSet with the same meaning as get: return an element with that scope, or None.

Recast the test:

    def test_calc_field(self):
        r1 = XSet.from_tuples(((1, 'a'), (2, 'b')))
        r2 = XSet.from_tuples(((10, 'a'), (20, 'b')))
        records = XSet.n_tuple((r1, r2))
        result = []
        for rec, _s in records:
            a = rec['a']
            b = rec['b']
            c = a + b
            new_rec = XSet.from_tuples(((a, 'a'), (b, 'b'), (c, 'c')))
            result.append(new_rec)
        new_set = XSet.n_tuple(result)
        c1 = new_set[1]
        assert c1['c'] == 3

This code demands that we implement another dunder method:

    def __getitem__(self, scope):
        return self.get(scope)

I thought of inlining rec['a'] and so on but they’re used twice so we’ll leave that. We’re green. Commit: new test studying calculated fields. Drives out get and getitem.

Reflection

Let’s reflect. Looking at that test, we see a fairly simple idea, which is something like:

Create a relation with a couple of records with a couple of fields. Create a derived relation including also the sum of those two fields.

But the code! It’s more like:

Create a record set. Create another record set. Create a set containing those records. Create a list. Iterate over the set. On each iteration get the elements, calculate the function. Make a record. Put it in the list. When done, make a set from the list.

Now some of this hassle only hassles us in tests, but honestly there’s value to making tests easier to write and read. But the make a list and then make a set of it? I’m sure that occurs in our production code quite a bit. So sure that I’m not even going to look, yet.

SQUIRREL!!

I decide to divert to work on building sets more readily.

We’ll make a new test file for this.

class SetBuilder:
    pass


class TestSetBuilder:
    def test_exists(self):
        sb = SetBuilder()

So far so good. Let’s sketch how it might work. Our task, since it’s fresh in our mind, will be to build a set containing a couple of records each with a couple of fields. To keep it general, we’ll make the records have different shapes.

I immediately decide to go in smaller steps. First, commit: initial SetBuilder and test.

Then this first test:

    def test_small_set(self):
        sb = SetBuilder()
        sb.put('a', 'A')
        sb.put('b', 'B')
        xset = sb.set()
        assert xset['A'] == 'a'
        assert xset['B'] == 'b'

I am imagining that somehow, SetBuilder will know, or be told, what kind of set we’re trying to build. Or maybe there are different kinds. I don’t know yet. I’m using the tests to work out what a reasonable API for set building might be.

This test fails for want of those methods, of course.

class SetBuilder:
    def __init__(self):
        self.contents = []

    def put(self, element, scope):
        self.contents.append((element, scope))

    def set(self):
        return XSet.from_tuples(self.contents)

We are green. Commit: SetBuilder creates Xset.

Now let’s see about creating a set of records. No, first let’s allow this code:

    def test_streamlined(self):
        xset = SetBuilder()\
            .put('a', 'A')\
            .put('b','B')\
            .set()
        assert xset['A'] == 'a'
        assert xset['B'] == 'b'

And in SetBuilder:

    def put(self, element, scope):
        self.contents.append((element, scope))
        return self

I already wish I could nest these. Let’s do the test initially planned, a set of records.

    def test_set_of_records(self):
        rb = SetBuilder()
        r1 = SetBuilder()\
            .put('a', 'A')\
            .put('b', 'B')\
            .set()
        rb.put(r1, 1)
        r2 = SetBuilder()\
            .put('aa', 'A')\
            .put('bb', 'B')\
            .set()
        rb.put(r2, 2)
        records = rb.set()
        assert records[2]['B'] == 'bb'

That passes. Here’s what I would write on paper, that I can’t write in code:

records = {
{ a^A, b^B }¹,
{ aa^A, bb^B }²
}

Even that’s a pain, of course. And Python is quite picky about what you can and cannot do inside braces, so anything close to that is right out.

Fortunately, in set operations, we don’t usually have to do anything that explicit, so the notation here mostly affects us in tests. I think we’ll want to enhance SetBuilder, but it has shown enough value that I’ll move it to prod and use it.

I find and change this:

    def generic_re_scope(self, other):
        sb = SetBuilder()
        for e, s in self:
            for old, new in other:
                if old == s:
                    sb.put(e, new)
        return sb.set()

Green. Commit: use SetBuilder.

I see a number of places in tests for use of SetBuilder, but am surprised to find no other uses in the actual set operations. I guess I was just noticing how many times I had to build lists in tests.

Summary

The calculation test drove out the ability to “subscript” into a set and fetch a value from a given scope. It would be more pythonic, I suppose, if it were to raise an exception on absent scopes, but I prefer the convenience of returning None, I think. So far, we’ve not had the occasion to worry about either case. Maybe we’ll decide differently and change it. The code works for us, after all.

I’m not at all sorry about chasing that squirrel. The SetBuilder is somewhat useful already, will make writing tests easier, and may lead to something more useful. So it was a useful squirrel.

I observe that specifying calculations on fields in sets is a pain, not least because we have to do all that fetching of values and creating new fields. What I’d like to do would be to be able to write a little function and have a set operation that iterated the set, applying the function and putting the result into a new set. Or maybe a set that did that on the fly, mot bothering to create a new physical set at all.

We’ll return to this, the original topic at the top of the article, soon, and see if we can come up with something good. I am afraid that, for any real use, we will wind up needing to parse some kind of little expression language, so that something like

FROM S X = A + B*C

could somehow magically turn into X = S[‘A’] + S[‘B’]*S[‘C’], and actually do the job. We’d need a little parser / interpreter. That’s entirely feasible, but at this writing does not seem like fun to me. We’ll see.

I remain interested in our whole list of ideas. The trick will be to find the ones that best combine utility, fun, and learning. Overall, I remain uncertain of the future direction. I can’t wait to find out what we do.

See you then, I hope!