Yesterday’s search for strawberries in the XST patch discovered problems. I want to at least double check what we did. TL;DR: It all works. Odd morning.

One way of looking at yesterday’s work is to think “There was a small bug in the system, and we fixed it”. I’m not convinced about “small”. Here’s what concerns me, before we look at the code.

The “small bug” was that the base-level data structure for storing sets did not properly deal with sets as element scopes. I often think of scopes as field names, as in this example:

{ Ronfirst, Jeffrieslast }

Yesterday I read about what a horrible example the notion of “first name / last name” is, because people’s names are far more complex than that. Check Falsehoods Programmers Believe About Names for more. I apologize for my examples.

There are a few storage modes in XST, including special forms for CSV files and for tuples, but the core mode is XGeneric, and that’s where yesterday’s defect was. I have two concerns:

First and foremost, I want to be sure that XGeneric is now bulletproof, fully supporting extended sets as elements and scopes throughout. Second, I’d like to at least check the XTuple to be sure it can support XSets as elements.

My desire is to inspect the code to be sure. My better angels tell me that I should write some tests. Because I’m trying to be good today, I’ll do both.

I see something that may be a problem, but may not

Here’s how we add elements to an XGeneric:

function XGeneric:addAt(element, scope)
    assert(scope~=nil,"nil scope "..tostring(element))
    if XSet:isXSet(scope) and scope:isNull() then scope = NULL end
    if self:hasAt(element,scope) then return end
    if not self.contents[scope] then
        self.contents[scope] = {}
    end
    self.contents[scope][element] = true
    return self
end

I’ll write a test to demonstrate the concern.

        _:test("Sets as scopes", function()
            local S1 = XSet:fromTable{field="value"}
            local S2 = XSet:fromTable{field="value"}
            local Single = XSet()
            Single:addAt("foo", S1)
            Single:addAt("foo", S2)
            _:expect(Single:card()).is(1)
        end)

The two sets S1 and S2 are equal but not identical. I am concerned that the code above will put the value “foo” in theree twice. But the test runs. I am a bit surprised. Why does it run correctly?

Ah. The line:

    if self:hasAt(element,scope) then return end

If hasAt works, this method is safe. Here’s hasAt, which we believe we fixed yesterday:

function XGeneric:hasAt(element, scope)
    local elements = self:elementsAtScope(scope)
    return self:elementsInclude(elements,element)
end

function XGeneric:elementsAtScope(aScope)
    for scope,elementsTable in pairs(self.contents) do
        if scope == aScope then return elementsTable end
    end
    return {}
end

function XGeneric:elementsInclude(tab,element)
    for e,_true in pairs(tab) do
        if e==element then
            return true
        end
    end
    return false
end

Ah, good. elementsAtScope returns the elements table when the scope is equal, not identical. So when we go to add the S2 version, S2==S1 so we return our table with the existing “foo” in it.

I am reassured. However, next to the addAt is this:

function XGeneric:putAt(element,scope)
    assert(scope~=nil,"nil scope "..tostring(element))
    self.contents[scope] = {} -- remove any existing contents at this scope
    self.contents[scope][element] = true
    return self
end

Now it turns out that putAt is only used in the XStatistics sum function:

function XStatistics:sumFunction()
    return function(input, accumulator, control)
        for _i,field in ipairs(control.sum) do
            local accum = accumulator:at(field) or 0
            accum = accum + input:at(field) or 0
            accumulator:putAt(tostring(accum),field)
        end
    end
end

We can either look deeper into this code, or just try to break it. Let’s try to break it, it’ll be more fun. Let’s find a test to work from.

        _:test("No grouping", function()
            local control = {sum={"pay"} }
            local result = People:stats(control)
            _:expect(result:card(),"result").is(1)
            local correct = result:hasElement{pay="3300"}
            _:expect(correct).is(true)
        end)

Ah. Now that my memory is refreshed, I see that we provide the scope of the “field” we want to sum on, and the sum goes back into that scope. (We haven’t addressed what we might do about two statistics on the same field, such as count or standard deviation.) I can see at least two ways to go.

First, we might create a set with sets as scopes and make the stats function work on it. That would be the mathematically correct thing to do. But second, we could simply constrain the stats function to string scopes only. I am so tempted to do the second one. But let’s at least write the test.

I start by making a test that works:

        _:test("Set as Scope", function()
            local scope = XSet:fromTable{x="x"}
            scope = "value"
            local rec = XSet()
            rec:addAt(300,scope)
            local recs = XSet:tuple{rec, rec}
            local scopeSet = {scope}
            local control = {sum=scopeSet}
            local result = recs:stats(control)
            local find = {}
            find[scope] = "600"
            local has600 = result:hasElement(find)
            _:expect(has600).is(true)
        end)

That was harder than I’d have liked. There’s a message here about tests that are hard to write. The main thing I had to remember was that the output is always converted to a string, thus the “600” there toward the end. Anyway, this test runs, and if I comment out the scope-"value" it should still run, but I don’t think it will.

To my amazement, it does run, even with the set as a scope. What if the two scopes were equal but different?

        _:test("Set as Scope", function()
            local S1 = XSet:fromTable{x="x"}
            local S2 = XSet:fromTable{x="x"}
            local S3 = XSet:fromTable{x="x"}
            local S4 = XSet:fromTable{x="x"}
            --scope = "value"
            local rec1 = XSet():addAt(300,S1)
            local rec2 = XSet():addAt(300,S2)
            local recs = XSet:tuple{rec1, rec2}
            local scopeSet = {S3}
            local control = {sum=scopeSet}
            local result = recs:stats(control)
            local find = {}
            find[S4] = "600"
            local has600 = result:hasElement(find)
            _:expect(has600).is(true)
        end)

Here I’ve created three equal scope sets, S1, S2, S3, S4. I use each one once, but since they are equal, we have a right to expect the summing to work right. Surprisingly, it works. This is so surprising that I decide to dump the result set.

            for e,s in result:elements() do
                print("result rec")
                for ee,ss in e:elements() do
                    print("  rec", ee, " at scope")
                    for eee,sss in ss:elements() do
                        print("    ", eee.."@"..sss)
                    end
                end
            end

The dump is correct too, which I suppose it had to be:

result rec
  rec	600	 at scope
    	x@x

Time for heads our of the sand.

Retro …

I started out thinking that this couldn’t possibly work. I wrote a test that I feel sure describes the issue I was concerned about, namely that scopes that were equal but not identical would mess up the statistical sum. The test passes.

We are left with at least two possible actions now.

One is to say “OK, the hardest test I could think of passes, let’s move on to something interesting”. The other is to decide to dig into the code to understand why it works. Either of these can be the right thing to do sometimes. If our tests are solid and they run, the details aren’t terribly important, unless we’re actually working in that area and perhaps not even then. And, more understanding is always good.

Today, I’m going with the second option for a while. Let’s see why we were smarter than I thought we were.

Statistics calculation starts here:

function XStatistics:getStatistics(controlTable)
    return self.data:reduce(XSet(), function(result,record,scope)
        local accumulator = self:produceAccumulator(result,record)
        self:sumFunction()(record, accumulator, self.control)
        return result
    end)
end

We find the desired accumulator record–the one with our sum in it– using produceAccumulator:

function XStatistics:produceAccumulator(result, record)
    return XStatistics:produceAccumulatorMatching(result, self:createMatchRecord(record))
end

function XStatistics:produceAccumulatorMatching(result, matchRecord)
    return result:restrictElement(matchRecord):at(NULL) or self:createResultRecord(result,matchRecord)
end

The result set is of course an XSet. What does restrictElement do?

function XSet:restrictElement(E)
    return self:restrict(XSet():addAt(E,NULL))
end

This code works because restrict works. We could trace through to see why restrict works, and that will bring us back to the code we fixed yesterday to handle sets that are equal.

So I am not about 99% convinced that this is just fine. I won’t say 100%, because I think it’s unwise every to be dead certain: we might wind up certainly dead.

I am satisfied. What have we learned?

Summary

Certainly an odd morning’s work. I had two things in mind to do, making sure that XGeneric is bulletproof, and checking on XTuple. In doing the first, I drilled down into XStatistics, a digression but an important one.

In fairness, possibly important. Based on how many orifices I had to jump through to write that test, and given the most common use of statistics (if there were any uses at all), no one is likely to ever do statistics on a set with set scopes. Still, it should either work or be disallowed, rather than return wrong answers.

So the result was some additional testing, and the tests worked, which provides additional confidence in XGeneric.

Was it worthwhile? Yes! I woke up this morning and in my thinking I became uncertain that cases like these were properly handled. If this were a real product, out there in the hands of users, a subtle problem like this one could be very serious. So the right thing to do, I suggest, is to create a card or task or moment to go in and make sure that the product is solid. We might find “yes, already solid”, or we might find “wasn’t solid but is now”. Either way, we owe it to ourselves, our team, our product, and our customers to be sure that we’re delivering what we set out to build.

So yes, worthwhile.

But what about those XTuples? Well, I did just take a quick glance at it, and I am pretty confident in the code, because it’s quite simple. But I think we owe ourselves a test. I’d make a card on that: “Test XSets as elements of tuples”, and put it on the board to be done. Maybe next time.

For this time, thanks for tuning in. Odd morning.