It’s Christmas, I’m waiting for the household to wake up, and I enjoy what I’m doing. Perfect holiday so far!

I even had a nice visit with one son and his wife, and briefly saw the other and his, yesterday, so it’s all good. Of course, if programming on weekends and holidays is not your thing, more power to you!

I closed last time with code that displayed the results of my test for grouped summing:

        _:test("Grouping", function()
            local control = {sum={"pay"}, group={"state"} }
            local result = People:stats(control)
            _:expect(result:card(),"result").is(2)
            local report = ""
            for record,_ignored in result:elements() do
                for value,field in record:elements() do
                    report = report..field.."="..value.."\n"
                end
                report = report.."\n"
            end
            print(report)
        end)

The result of that is this display in Codea’s console:

pay=3000
state=MI

pay=300
state=OH

That’s even the correct answer. Let’s come up with a way to actually test results more easily. I’ll try a thing or two right in the test.

        _:test("Grouping", function()
            local control = {sum={"pay"}, group={"state"} }
            local result = People:stats(control)
            _:expect(result:card(),"result").is(2)
            local foundMI = result:exists(function(rec,scope)
                return rec:at("state") == "MI" and rec:at("pay") == "3000"
            end)
            _:expect(foundMI).is(true)
        end)

Here I use “exists” to assert that a record exists with state=MI and pay=3000. Test runs. Now I’ll imagine a new set operation that will let me encode that information more compactly, in a table.

            local foundOH = result:hasElement{state="OH",pay="300"}

I’m not sure about the function name but the idea is there. Test should fail not finding hasElement.

5: Grouping -- TestSum:68: attempt to call a nil value (method 'hasElement')

I wonder if that’s hard to write. Probably not.

function XSet:hasElement(valueTable)
    return self:exists(function(rec,_s)
        return false
    end)
end

This should fail the test, if I’d actually check the result in the test.

            _:expect(foundOH,"OH").is(true)
5: Grouping OH -- Actual: false, Expected: true

I love it when a plan comes together. The name of this method is troubling, because we have hasAt and hasntAt, but maybe it’s OK. I am tempted to name it hasRecord. But we don’t have an agreed definition for record, yet.

function XSet:hasElement(valueTable)
    return self:exists(function(rec,_s)
        local hasAll = true
        for scope,value in pairs(valueTable) do
            hasAll = hasAll and rec:hasAt(value,scope)
        end
        return hasAll
    end)
end

Tests run. Let’s use this elsewhere in the grouping test:

        _:test("Grouping", function()
            local control = {sum={"pay"}, group={"state"} }
            local result = People:stats(control)
            _:expect(result:card(),"result").is(2)
            local foundMI = result:hasElement{state="MI",pay="3000"}
            _:expect(foundMI,"MI").is(true)
            local foundOH = result:hasElement{state="OH",pay="300"}
            _:expect(foundOH,"OH").is(true)
        end)

We could inline those, but I think they’re easier to read this way.

Tests run. Commit: Grouping works with one group item. new hasElement set operation takes table input.

What Now?

Let’s try a test for two-levels of grouping. I believe it should just work.

        _:test("Two Levels of Grouping", function()
            local control = {sum={"pay"},group={"state","county"} }
            local result = People:stats(control)
            _:expect(result:card(),"result size").is(4)
        end)

This works. Let’s change the data, so that there are three output, records, putting two people in the same county. I don’t think that will break any other tests.

        _:before(function()
            p1 = XSet():record{name="p1",state="MI",county="Livingston",pay=1000,bonus=50,age=50}
            p2 = XSet():record{name="p2",state="OH",county="Butler",pay="100", bonus="50", age="30"}
            p3 = XSet():record{name="p3",state="MI",county="Wayne",pay="2000",bonus="50",age="60"}
            p4 = XSet():record{name="p4",state="OH",county="Butler",pay="200",bonus="50",age="40"}
            People = Tuple():add(p1):add(p2):add(p3):add(p4)
        end)

Test fails as expected:

6: Two Levels of Grouping result size -- Actual: 3, Expected: 4

Fix test and add record checks:

        _:test("Two Levels of Grouping", function()
            local control = {sum={"pay"},group={"state","county"} }
            local result = People:stats(control)
            _:expect(result:card(),"result size").is(3)
            local foundOH = result:hasElement{state="OH",county="Butler",pay="300"}
            _:expect(foundOH,"OH").is(true)
        end)

Shall we go for complete? Sure.

        _:test("Two Levels of Grouping", function()
            local control = {sum={"pay"},group={"state","county"} }
            local result = People:stats(control)
            _:expect(result:card(),"result size").is(3)
            local foundOH = result:hasElement{state="OH",county="Butler",pay="300"}
            _:expect(foundOH,"OH").is(true)
            local foundLivingston = result:hasElement{state="MI",county="Livingston",pay="1000"}
            _:expect(foundLivingston,"Livingston").is(true)
            local foundWayne = result:hasElement{state="MI",county="Wayne",pay="2000"}
            _:expect(foundWayne,"Wayne").is(true)
        end)

Nice. Grouping works with 1 or 2 levels, and I’m sure now that it’ll work for N greater than 1. What does it do at zero?

        _:test("No Grouping", function()
            local control = {sum={"pay"}}
            local result = People:stats(control)
            _:expect(result:card()).is(1)
            local totalPay = result:hasElement{pay="3300"}
            _:expect(totalPay,"pay").is(true)
        end)

I think that’s what we want: only one output record, with everything summed into it. I don’t expect to get that. Shall I read the code and predict the result? OK.

function XSet:stats(controlTable)
    local sum = function(input,accumulator, control)
        local fields = control.sum
        for i,field in ipairs(fields) do
            local accum = accumulator:at(field) or 0
            accum = accum + input:at(field) or 0
            accumulator:putAt(tostring(accum),field)
        end
    end
    local result = XSet()
    local groupFields = controlTable.group -- deal with missing
    for record,scope in self:elements() do
        local accumulator = result:findOrCreateAccumulator(record, groupFields)
        sum(record, accumulator, controlTable)
    end
    return result
end

function XSet:findOrCreateAccumulator(record, groupFields)
    local accumulator
    local matchRecord = self:createMatchRecord(record, groupFields)
    local matchSet = XSet():addAt(matchRecord,NULL)
    local accumulatorSet = self:restrict(matchSet)
    if accumulatorSet:isNull() then
        accumulator = matchRecord
        self:addAt(matchRecord,NULL)
    else
        accumulator = accumulatorSet:at(NULL)
    end
    return accumulator
end

function XSet:createMatchRecord(record,groupFields)
    local matchRecord = XSet()
    for i,field in ipairs(groupFields) do
        local value = record:at(field) or "MISSING"
        matchRecord:addAt(value, field)
    end
    return matchRecord
end

Well, we’ll pass nil to findOrCreateAccumulator and it will ask for a matchRecord, and that will not be able to loop. Test to be sure.

7: No Grouping -- attempt to index a nil value

Right. What if we passed in an empty table? Worth a try.

function XSet:stats(controlTable)
    local sum = function(input,accumulator, control)
        local fields = control.sum
        for i,field in ipairs(fields) do
            local accum = accumulator:at(field) or 0
            accum = accum + input:at(field) or 0
            accumulator:putAt(tostring(accum),field)
        end
    end
    local result = XSet()
    local groupFields = controlTable.group or {} -- <---
    for record,scope in self:elements() do
        local accumulator = result:findOrCreateAccumulator(record, groupFields)
        sum(record, accumulator, controlTable)
    end
    return result
end

The test would run, that’s what. Why? Because we create an empty match set, and restrict the accumulators set to find it, and it isn’t there the first time, so we add it. Thereafter, it is there and we repeatedly find it. Meanwhile, the summing function never checks the match fields, because we consciously separated that idea from the summing, so it sums into the one and only one record in the output set.

And that is exactly what we want.

Commit: stats set op returns one summary record if no grouping is defined.

I’d say that’s enough for Christmas morning. Let’s sum up.

Summary

This final outcome is almost scary, it’s so perfect. If the set (really a table) of grouping fields is empty, the match record is empty, there’s only one accumulator record, and it gets all the answers.

That, it seems to me, is exactly what should happen. We get one group for each unique combination of the fields in the group table, and if there are none, there’s just one combination, none.

This is the sort of thing you’d like to have happen when using a clean abstraction of any kind, and especially when your abstraction is all mathematical, such as we have here. I wasn’t confident at the outset about this case: I figured we might have to handle it in some special way, but other than providing an empty group list when there is none, no special handling was required.

Very nifty.

I do think we’ll want to do a bit more with the stats set operation. I think we should produce new names for the fields, like “pay-sum” and “bonus-sum”, so that we can also have “pay-mean” and such. I do not believe that I’ll go to the extreme of implementing a lot of statistics, but I’ll probably do count and mean. Maybe standard deviation, but probably not. It would just be showing off at this point.

I would have to think about mode, but I don’t think we can do that in a simple no-memory pass over the data, so it would require something more exotic. We could do it, no doubt, but I am not inclined that way.

If there is a lesson here, it lies somewhere in the region of “clean abstractions lead to clean code, and mathematics is one source of clean abstractions”. For most of us, on most days, the first part of that notion is more valuable than the second.

Beyond that, it may be time to stick our head up out of the burrow and look at the big picture. There are probably some methods that could be improved, and perhaps even some duplication that we can discover and condense.

Soon, we’ll do that. Maybe for my birthday.

See you then!