XST 40: Refactoring Set Operations

I guess there’s nothing for it but to figure out how to rewrite, i.e. refactor, a set operation based on the existence of helper structures. But how? I have ideas but are any of them any good?

We have an elementary tree structure to represent a series of set operations. The background idea is that there might be a higher-level language, like SQL or something, that generates the tree of operations that need to be executed to perform the query.

I have no immediate plans to write such a language. I can imagine a public clamor great enough to induce me to try, but it’s about at the Gaga or Adele level of volume, so I think I’m safe.

The “idea” is that once a tree has been created that shows a solid set-theoretic way to get the answer, there are often other trees that get the same answer but that are more efficient. (There are also at least a countable infinity of trees that are worse. We’ll try to avoid at least half of those.)

The Question is: How?

We might imagine a truly generic tree structure, with certain values at the nodes, and a “pattern”, a smaller tree structure, that we can slide around the tree to see whether the pattern is matched. If it is, then the pattern by its nature identifies the places where the matched bit can be snipped out and a substitution pasted in.

tree match and substitution

I’m sure that somewhere in the literature there are articles and code showing how to do arbitrary tree-pattern matching and substitution. And I know for sure that there are articles about automated refactoring that are at least somewhat applicable, reaching back to the early 1990s. And LISP systems have written and rewritten programs on the fly even before that.

Suppose we were a little company, working to release our amazing set theoretic data management software. We’ve been working on it for almost 40 days now (part time, mind you), so we’re getting anxious to release the product. And we feel strongly that it must include at least one automated optimization at the top level. We already have some small optimizations in hasAt, and one partially implemented in selectValues, but we think that the big sales point for the product is that we’ll be releasing new optimizations over time–probably for a small add-on price.

As the developers, we know that we don’t really know how to do automated refactoring on an abstract syntax tree, and we don’t really know how to do expression replacement in LISP, and we’re not programming in LISP anyway, we’re programming in Lua.

We’re afraid that until we figure out a general purpose pattern matching and substitution ability, we won’t be able to deliver the ongoing improvements that Marketing sees as critical to success. This would take time, perhaps a lot of time. And that conflicts with our other need, which is to have the capability demonstrable at the upcoming Festival of Set Theory, to be held (online of course) Real Soon Now.

Were we not who we are, we’d be quite afraid that we’d paint ourselves into a corner. But we are who we are, so we are pretty confident that we can do something small and simple now, and that using our solid design and refactoring principles, we’ll be able to plug in something more general if and when we figure out how to do it.

Armed with the confidence that comes from years of incremental development, we decide to go ahead with something simple.

You Call This Simple??

We’re going to try an automated refactoring that completes the one we have done in code: selectFieldValue.

The selectFieldValue operation is defined here:

function XSet:selectFieldValue(scope,value, index)
    local ix = self:findSelectIndex(scope,value,index)
    if ix then
        return self:scopeRestrict(ix)
    else
        return self:select(function(record, recordScope)
            return record:hasAt(value,scope)
        end)
    end
end

The third parameter there, index, is optional. Let’s trace how it works a bit further:

function XSet:findSelectIndex(scope,value,index)
    if not index then return nil end
    local path = XSet:tuple{"INDEXES",scope,value}
    return index:elementProjectPath(path)
end

If there is a third parameter, it’s expected to be an XSet, and we expect it to have this form:

{ indexes^INDEXED }

And we expect indexes to have the form:

{ ageIndex^age, state_index^state }

And we expect, say, state_index, to have the form

{ mi_index^MI, oh_index^OH }

And we expect xx_index to be a set of record numbers (record scopes) that identify the records of state xx.

The elementProjectPath operation just fetches down that chain and either returns the desired index or nil.

function XSet:elementProjectPath(aTuple)
    local result = self
    for pathElement,_scope in aTuple:elements() do
        result = result:elementProject(pathElement)
        if not result then return nil end
    end
    return result
end

So if a set comes back to selectFieldValue, it’ll be used to scopeRestrict the input set rather than performing the long-hand implementation via select.

Our plan for today is to replicate enough of this logic to actually rewrite the tree to perform the scopeRestrict operation if it is appropriate.

This seems like a lot of mechanism. Let’s see if we can do it incrementally.

Admission, nay, statement

I freely state here that I am not sure just how to proceed. I only see vaguely what needs to be done.

A Small Step

I’m going to stop writing and draw another picture.

what-is-xxx

I propose to write a test that performs the substitution above. I am really unsure what XXX is, but if it works, I guess it can be anything. Anyway, It’s just the first test.

We’re working in the XPR spike.

I’m not even done with the test and I’m scaling back my plan. I want to just test finding the place in the string to do the job.

No, on third thought, that’s harder than the substitution. I’ll press on:

        _:test("substitute", function()
            local receiver = XPR{op="setNamed", arg1="A"}
            local fieldName = XPR{op="string", arg1="state"}
            local fieldValue = XPR{op="string", arg1="MI"}
            local operation = XPR{op="selectValue", arg1=fieldName, arg2=fieldValue}
            local save = XPR{op="saveAs",arg1="C",arg2=operation}
            
            local match = XPR{op="selectValue", arg1="1", arg2="2"}
            local replacement = XPR{op="scopeRestrict", arg1="1", arg2="XXX"}
            
            replace(save,match,replacement) -- then a miracle occurs
            
            local replaced = save.arg2
            _:expect(replaced.op).is("scopeRestrict")
        end)

I expect this to fail looking for replace, which for now I’ve just left as a function.

2: substitute -- Tests:35: attempt to call a nil value (global 'replace')

Sweet. What could replace possibly do?

function replace(tree, match, replacement)
    -- we're on `save` and arg2 needs to be replaced. 
    -- we just "know this".
    local to_replace = tree.arg2
    local replacement = XPR{op=replacement.op, arg1=replacement.arg1, arg2=replacement.arg2}
    if replacement.arg1 == 1 then
        replacement.arg1 = to_replace.arg1
    elseif replacement.arg1 == 2 then
        replacement.arg1 = to_replace.arg2
    end
    if replacement.arg2 == 1 then
        replacement.arg2 = to_replace.arg1
    elseif replacement.arg2 == 2 then
        replacement.arg2 = to_replace.arg2
    end
    -- replacement is ready
    tree.arg2=replacement
end

I just went up on my lines a bit here. I changed the test to expect numeric 1,2 rather than string “1”, “2” to plug in the old arguments, and long-hand coded something that’s supposed to do that. The test actually runs:

        _:test("substitute", function()
            local receiver = XPR{op="setNamed", arg1="A"}
            local fieldName = XPR{op="string", arg1="state"}
            local fieldValue = XPR{op="string", arg1="MI"}
            local operation = XPR{op="selectValue", arg1=fieldName, arg2=fieldValue}
            local save = XPR{op="saveAs",arg1="C",arg2=operation}
            
            local match = XPR{op="selectValue", arg1=1, arg2=2}
            local replacement = XPR{op="scopeRestrict", arg1=1, arg2="XXX"}
            
            replace(save,match,replacement) -- then a miracle occurs
            
            local replaced = save.arg2
            _:expect(replaced.op).is("scopeRestrict")
        end)

To make it fail, I can change the expect.

2: substitute  -- Actual: scopeRestrict, Expected: scopeRestrictFOO

So that’s nice. I have of course finessed the search, because I think it’s hard. Let’s extend the test to check the args.

Oops. The selectValue actually needs three args. Fix the test.

        _:test("substitute", function()
            local receiver = XPR{op="setNamed", arg1="A"}
            local fieldName = XPR{op="string", arg1="state"}
            local fieldValue = XPR{op="string", arg1="MI"}
            local operation = XPR{op="selectValue", arg1 = receiver, arg2=fieldName, arg3=fieldValue}
            local save = XPR{op="saveAs",arg1="C",arg2=operation}
            
            local match = XPR{op="selectValue", arg1=1, arg2=2}
            local replacement = XPR{op="scopeRestrict", arg1=1, arg2="XXX"}
            
            replace(save,match,replacement) -- then a miracle occurs
            
            local replaced = save.arg2
            _:expect(replaced.op).is("scopeRestrict")
            _:expect(replaced.arg1).is(receiver)
            _:expect(replaced.arg2).is("XXX")
        end)

Run test. I expect it to work, but I don’t expect it very strongly. It does work. Fascinating.

Of course if we were doing a replacement with more args, we’d be in trouble. And that code, oh how awful.

We could change our XPR object to have an array of arguments. But maybe we could do something more local. Let me try something here.

function replace(tree, match, replacement)
    -- we're on `save` and arg2 needs to be replaced. 
    -- we just "know this".
    local to_replace = tree.arg2
    local replacement = XPR{op=replacement.op, arg1=replacement.arg1, arg2=replacement.arg2}
    local args = get_args(to_replace)
    local reps = get_args(replacement)
    for i, rep in ipairs(reps) do
        if type(rep)=="number" then
            reps[i] = args[rep]
        end
    end
    replacement.arg1 = reps[1]
    replacement.arg2 = reps[2]
    -- replacement is ready
    tree.arg2=replacement
end

function get_args(xpr)
    return { xpr.arg1, xpr.arg2, xpr.arg3 }
end

So I’m basically pretending that the args are an array. And the test runs, so the unwinding part works. But I don’t really want to make the arguments in XPR an array. I think it’ll be better with them given even these not very exciting names. So lets’s add get_args and put_args methods to XPR.

function XPR:get_args()
    return { self.arg1, self.arg2, self.arg3 }
end

function XPR:put_args(ary)
    self.arg1 = ary[1]
    self.arg2 = ary[2]
    self.arg3 = ary[3]
end

And use them:

function replace(tree, match, replacement)
    -- we're on `save` and arg2 needs to be replaced. 
    -- we just "know this".
    local to_replace = tree.arg2
    local replacement = XPR{op=replacement.op, arg1=replacement.arg1, arg2=replacement.arg2}
    local args = to_replace:get_args()
    local reps = replacement:get_args()
    for i, rep in ipairs(reps) do
        if type(rep)=="number" then
            reps[i] = args[rep]
        end
    end
    replacement:put_args(reps)
    -- replacement is ready
    tree.arg2=replacement
end

Let’s review what we have, and what we haven’t.

Review

We have a fairly general replacement function, which, given a node in the tree, and s single replacement pattern node, can compute an actual replacement node, plugging in the arguments from the original node as desired in the replacement.

What we do not have is a way to find the right location in the tree to do the replacement, nor do we have a general way to wire the new one back in.

In particular, if the tree has more than one reference to the subtree, we’re in big trouble. However, I don’t think we’re going to have to deal with that for at least two reasons:

First, we’re probably not ever going to build trees that do that: there’s no reasonable language for doing it without a named-set reference
Second, we’re certainly not going to do it now. We are brave, but not stupid, and since we don’t have a solution for that in mind, we’ll just not do it.

Is this “technical debt”? No. Technical debt is about discovering that we now know a better way to do something we’ve done in a decent way. What we have here is “story slicing”. We have carved off a thing that we can do, and we are going to do it well.

Doubtless there is an amazing high-tech way to do this. Doubtless in our free time we’ll even think about it, and read about it. But for now, we don’t need to solve the general problem: we’ll be happy solving the specific case.

LISP

Yes, probably we should write a LISP interpreter, or buy one, and do all this in LISP. That won’t get us to the Festival of Set Theory, but our simple example surely will.

We don’t have the ability to find the pattern in the tree and wire it back in, at least not quite. But we may be pretty close.

We’ll find out over the next few sessions. For now, let’s commit: Rudimentary function replace substitutes one set operation for another.

See you next time for another thrilling chapter …