XST 23

Transformations, optimizations, and the relationship between OO and XST. Got some thinking to do. You get to watch, if you’re tough enough. At least a tiny bit of code.

Let’s begin with a glance at some code that bugged me yesterday. This function is meant to return either the index for a given value of a given scope, or nil:

function XSet:findSelectIndex(scope,value,index)
    if not index then return nil end
    local indexes = index:elementProject("INDEXES")
    if not indexes then return nil end
    local fieldIndex = indexes:elementProject(scope)
    if not fieldIndex then return nil end
    return fieldIndex:elementProject(value)
end

We use this method here:

function XSet:selectFieldValue(scope,value, index)
    local ix = self:findSelectIndex(scope,value,index)
    if ix then
        print("used index")
        return self:scopeRestrict(ix)
    else
        return self:select(function(record, recordScope)
            return record:hasAt(value,scope)
        end)
    end
end

This code is only mostly correct. It’s just used in our test for indexing, where we have created a set containing a set under “INDEXES, which contains a set under “state”, which contains a set under “MI”, which contains the scopes, i.e. record numbers, of the records of the input set that have the value “MI” under their “state” scope.

(When I say “contains x under y”, i mean that the set in question has an element {…,x^y,…} among its elements. I just invented this phrase, perhaps not for the first time, to represent that situation.)

In the fullness of time, my plan is something like this:

Sets of type XExpression have an XSet named xset as their contents. Inside xset is to be found an XSet under “DATA”, and that set is the actual data for the set in question. The convention to be is that there will be other sets in xset, with potentially useful stuff in them. In particular, there can be a set under “INDEXES” as described above.

Now I think we have at least one significant design issue here, because this magical xset variable only appears in sets of the XExpression class, and so we dare not even ask the question ‘findSelectIndex` without being careful about where and how that function is actually implemented. Most classes will have to respond nil. So as implemented now, as scaffolding, the function surely really should return nil in XSet and do all this activity in XExpression. Easy to fix if we are sure, or confident that we can change it again if need be.

The Desired Flow

What we’re really trying to do, according to me, is to create a “smart” restrict function, that will analyze the restricting set to see if it is suitable for indexing, then find the necessary indexes, if they exist, and then instead of a restrict, do the scopeRestrict that just pulls out records by their record number.

For that to be a very good idea, the restricted input set needs to be a Tuple, so that it can be directly accessed by record number.

You see that it’s getting complicated and not easy to think about.

And yet, the Lua object-oriented code to do this stuff isn’t really that bad. And there’s the dilemma:

Choosing between OO, XST, and Other

I’ve been thinking about building some kind of “expression language” to represent some planned set of XST operations, and some kind of analyzer / optimizer that would make decisions about how to transform the expression to make it faster. All with the sort of guaranteed correctness of mathematics.

(The guaranteed correctness of mathematic is, in layman’s terms: this is probably correct if you’ve been really careful, and you’ve written a proof for it, and a bunch of other mathematicians don’t dispute it. It’s a lot like the guaranteed correctness of computer programs.)

The question in my mind this morning is how to balance the rather delightfully simple expression of XST in object-oriented terms, the reasonably simple functions like the one above for choosing optimizations, versus the perhaps more general solution of building an expression tree of some kind and manipulating it.

Automated code refactoring might serve as a good example. The best way to automate code refactoring is believed to be to work from an abstract tree-like translation of the program, as opposed to working directly with the program text. Simple renaming will serve as an example.

Suppose we have a method selectFieldValue and we want to rename it to findFieldOfGivenValue. We could just grep the hell out of the code and substitute the strings. That would almost certainly be bad. We’d have to fiddle with our grep to make sure we didn’t change selectFieldValues, and we’d need to do some kind of magic to figure out whether each usage was calling the version that we want to change, instead of the perfectly good version that we want not to rename in another class. It quickly becomes impossible.

It’s never easy, but with a parse tree for the whole program, you have a chance.

Similarly, there are some things we might do in XST that would be easier to process with an abstract process model rather than just in code. There’s a simple example:

A:restrict(B):scopeSet()

This would better be done as

A:restrictToScope(B)

Unless our program has that whole idea A:restrict(B):scopeSet() in hand, it won’t know there’s a better way until it’s too late to do anything about it. So it’s “clear” that our optimizations are limited without the expression tree idea, or some equivalent.

Laziness As a Strategy

Suppose that we want to know how many people are in Michigan. We write:

C = People:restrict({ {MI^state} }):scopeSet():card()

And suppose that when we say that, nothing really happens. Well, not quite nothing: the system remembers that those things were supposed to be done, and at some future time if the answer is needed, the system tugs on the end of the string, causing it to start delivering data.

This might therefore look like someone tugging on the scope set, saying “give me your elements”. The scope set then could look at its input, see that it is a restrict, and trigger the necessary analysis to decide what to do. For example, if the restricting set B is just { MI^state}, the whole expression can be replaced with, roughly,

A["INDEXES"]["state"]["MI"]:card

That is, fetch the pre-existing index set for Michigan and ask for its cardinality. And, if that index is a Tuple, that just does this:

function Tuple:card()
    return #self.contents
end

Doesn’t even count ‘em. Tuples know their length. So in the presence of the right indexes, that complex set expression just returns the length of an already existing array.

Oh, there’s a little work in there, munging around in the index set and so on, but as we’ve seen, that’s just a few element project operations on some sets that are in memory.

It all seems very tempting. And I don’t know how to do it. What I do know how to do is to do what we’ve been doing here for 23 articles so far, write OO code that does many of the same things. We’ve seen some significant optimizations done that way. And we see some that are unlikely, but a clever user of our operations could get the same result by themselves.

I think we can proceed as we have been for a while yet. There are a few more useful set operations to build, and we should at least finish up making restrict discover and use an index if there is one. Meanwhile, or afterwards, as the winds may blow, we can look into how we might build up a lazy structure that’s amenable to at least a bit of analysis and optimization.

Which brings me back to this code that I don’t like:

function XSet:findSelectIndex(scope,value,index)
    if not index then return nil end
    local indexes = index:elementProject("INDEXES")
    if not indexes then return nil end
    local fieldIndex = indexes:elementProject(scope)
    if not fieldIndex then return nil end
    return fieldIndex:elementProject(value)
end

I really wanted to say this:

return index:elementProject("INDEXES"):elementProject(scope):elementProject(value)

Just fetching my way down the path until I either found what I wanted or element project returned nil. Thing is, if it returns nil in the middle, say looking for scope, you send elementProject to nil, and nil doesn’t understand.

I could presumably field an exception, but that seems more than a bit naff. I could have an elementProject that returns NULL instead of nil. That would probably work, but in general, a NULL is a valid value. Suppose that there are no People in North Dakota. (This may be the case: I’m not sure.) Then the index for ND in our INDEXES might legitimately be a NULL set and that would imply, not “no index available” but “no records as a result of your search”.)

So we can’t do that.

What if we did have elementProjectOrNULL? Could we legitimately say this:

return index:elementProjectOrNULL("INDEXES"):elementProjectOrNULL(scope):elementProject(value)

I think we could. Shall we try it?

function XSet:elementProjectOrNULL(scope)
    return self:elementProject(scope) or NULL
end

function XSet:findSelectIndex(scope,value,index)
    if not index then return nil end
    return index:elementProjectOrNULL("INDEXES")
        :elementProjectOrNULL(scope)
        :elementProject(value)
end

Tests are green. Commit: used elementProjectOrNULL in findSelectIndex.

I like that better. It’s not a bit step of progress but I think that it’s generally desirable to move to simpler code when we can. It pays off in easier changes later.

Let’s think about what that findSelectIndex is doing. In essence, it’s following a path, drilling down into an input set, in this case index. It’s a sort of nested element project.

Let’s create that method.

elementProjectPath

        _:test("elementProjectPath", function()
            local answer = XSet(3,"result")
            local inner = XSet():addAt(answer,"inner")
            local middle = XSet():addAt(inner,"middle")
            local nearTop = XSet():addAt(middle,"nearTop")
            local top = XSet():addAt(nearTop,"top")
            local found = top:elementProjectPath{"top","nearTop","middle","inner"}
            _:expect(found:hasAt(3,"result")).is(true)
        end)

This’ll fail not finding the method:

38: elementProjectPath -- Tests:508: attempt to call a nil value (method 'elementProjectPath')

Now we merely have to write it. I’m not sure I like its calling sequence ,but it’s analogous to elementProject.

function XSet:elementProjectPath(pathTable)
    local result = self
    for i,p in ipairs(pathTable) do
        result = result:elementProject(p)
        if not result then return nil end
    end
    return result
end

There was a typo in the test, should have said:

            local answer = XSet():addAt(3,"result")

The test runs. I’ll try a path that doesn’t work.

        _:test("elementProjectPath", function()
            local answer = XSet():addAt(3,"result")
            local inner = XSet():addAt(answer,"inner")
            local middle = XSet():addAt(inner,"middle")
            local nearTop = XSet():addAt(middle,"nearTop")
            local top = XSet():addAt(nearTop,"top")
            local found = top:elementProjectPath{"top","nearTop","middle","inner"}
            _:expect(found:hasAt(3,"result")).is(true)
            found = top:elementProjectPath{"top","nearTop","OOPS","inner"}
            _:expect(found).is(nil)
        end)

However, I think the input shouldn’t be a table, it should be a tuple. Change the test:

        _:test("elementProjectPath", function()
            local answer = XSet():addAt(3,"result")
            local inner = XSet():addAt(answer,"inner")
            local middle = XSet():addAt(inner,"middle")
            local nearTop = XSet():addAt(middle,"nearTop")
            local top = XSet():addAt(nearTop,"top")
            local found = top:elementProjectPath(Tuple{"top","nearTop","middle","inner"})
            _:expect(found:hasAt(3,"result")).is(true)
            found = top:elementProjectPath(Tuple{"top","nearTop","OOPS","inner"})
            _:expect(found).is(nil)
        end)

Test will fail on the table access or something worse.

No, the expects just fail. Why? Because you can in fact iterate a class instance. It’s just that you get nothing out of it.

Make it expect a set:

function XSet:elementProjectPath(aTuple)
    local result = self
    for pathElement,_scope in aTuple:elements() do
        result = result:elementProject(pathElement)
        if not result then return nil end
    end
    return result
end

Commit: new elementProjectPath method.

Now we can use this in our find. And we can delete that OrNULL method we just wrote. It was weird anyway.

function XSet:findSelectIndex(scope,value,index)
    if not index then return nil end
    local path = Tuple{"INDEXES",scope,value}
    return index:elementProjectPath(path)
end

Tests green. Commit: findSelectIndex uses elementProjectPath. elementProjectOrNULL removed.

It’s 1100 hours. Let’s sum up.

Summary

An interesting cycle this morning. Let me try to explain why.

I mostly wanted to think about how to do more advanced optimizations, even though it’s probably premature to actually do anything about them, since I don’t see a good starting foothold. But we had an optimization coming into being to look at, and it included some code that I didn’t like, that long series of if statements. Very ad hoc.

So after a bit of musing, I turned back to that code and devised a clever–perhaps too clever–operation to make it more compact. And it was compact.

Looking at that code, I saw that it was still pretty ad-hoc, since you’d have to write out longhand any nested element fetching that you wanted. That made me think of providing a path and looping over it.

Looping over the path made the new elementProjectOrNULL method unnecessary. I had not foreseen that. In fact, I was supposing that I’d have to do something tricky to use the OrNULL for all but the last element of the table. The code told me otherwise.

Then looking at that code, about the fourth generation of the find thing, I realized that passing in a raw table wasn’t consistent with what we’re doing here, namely processing sets.

So we converted the code to use a Tuple set instead. A Tuple, because otherwise you’d get a random order on your path, and no good could come of that.

Kent Beck used to tell us to “let the code participate in the design discussion”. Here we see a terrific example of why. Each version I wrote worked, and taught me something. Being willing to write it over and over again gave me the opportunity to see some good ways to do what was needed … and to see the flaws in each one, leading to a better one.

Is this one the best possible? I’d never say that. But it does provide an ability that it seems I need, to drill down into a nested structure to get something out.

I do think that Dave Childs may have had another approach to that, and I’ll look to see if I can find it. Just now, I rather like this one.

I believe that it was the combination of some high level design thinking, plus a few detailed iterations on a small part of the system, that led to a generally useful and very clean result.

It may look like thrashing. It might seem that if I had just thought harder about the problem, the nice solution would have popped into my mind, and I could have saved “all that typing”. Frankly, typing is not my bottleneck. Getting good ideas into my mind is the bottleneck, and seeing code that could be improved is a big part of getting good ideas.

For me. YMMV. These are the things that happen to me, for you to observe and learn from. If you only learn “don’t do like Ron”, that’s OK by me. But I do think there’s a bit more in here than that, and so must you, or you wouldn’t still be reading.

For me a fun day and some fun discoveries of things that were OK to do and things that were better to do.

See you next time!