No, I’m not hearing voices. But the code does tell us things, just like any working material. We need to learn to listen. Today, we listen and the result is good.

I’m pretty sure I first heard the phrase “listening to the code” from Kent Beck. I recall also the notion of “letting the code participate in our design sessions”. These ideas are not some new form of animism or digital mysticism. They express the common experience that a maker has in working with their materials, that the materials have a say in what happens.

The carpenter knows that you saw and polish different woods differently, and even depending on whether you’re working with or across or against the grain. The knife maker knows that different metals work differently. The chemist knows that pressure and temperature change how chemicals react. The artist knows that different paints mix differently, and different materials take paint differently. The child knows that some snows make good snowballs and snowmen, and some do not.

As we work with any material, and as we improve in that work, we begin to feel the material responding to the work. It’s relatively easy to get that feeling with physical materials: they do have differing hardness and friction, which influences how our tools, and our hands, respond.

The metaphor remains strong in writing and programming. The work flows well when we’re approaching it well, and gets all sticky and messy when we aren’t going about things well enough. We’re here for the code, of course, so let’s look at our statistics making code again today, and see if we can see what it’s trying to tell us.

Stats Operation

Here, again, is our stats operation:

function XSet:stats(controlTable)
    local sum = function(input,accumulator, control)
        local fields = control.sum
        for i,field in ipairs(fields) do
            local accum = accumulator:at(field) or 0
            accum = accum + input:at(field) or 0
            accumulator:putAt(tostring(accum),field)
        end
    end
    local result = XSet()
    local groupFields = controlTable.group or {}
    for record,scope in self:elements() do
        local accumulator = result:findOrCreateAccumulator(record, groupFields)
        sum(record, accumulator, controlTable)
    end
    return result
end

function XSet:findOrCreateAccumulator(record, groupFields)
    local accumulator
    local matchRecord = self:createMatchRecord(record, groupFields)
    local accumulatorSet = self:restrictElement(matchRecord)
    if accumulatorSet:isNull() then
        accumulator = matchRecord
        self:addAt(matchRecord,NULL)
    else
        accumulator = accumulatorSet:at(NULL)
    end
    return accumulator
end

function XSet:createMatchRecord(record,groupFields)
    local matchRecord = XSet()
    for i,field in ipairs(groupFields) do
        local value = record:at(field) or "MISSING"
        matchRecord:addAt(value, field)
    end
    return matchRecord
end

It makes some sense that stats could be a set operation, although it is pretty custom-sized to a specific kind of set. But createMatchRecord? findOrCreateAccumulator? These do not seem like set operations at all.

Set operations, mostly, have fallen into two groups. There are what we might call primitives, which are low-level operations, like hasAt, which answers whether a particular element occurs at a particular scope, or elements, which enumerates the elements and scopes of a set. And then there are general operations, union, restrict, and so on, which carry out quite general operations on sets.

I guess there’s a third group, I suppose, like isNull, or isSubset. Should I count the various iterators, reduce, select, exists and every as another group?

Be that as it may, all those methods apply to all sets. More or less. Go with me here. There’s always a grey area. Or a gray area. Whichever.

These stats methods just aren’t like that at all. They are telling us something. What are they telling us? They’re telling us “I don’t belong here: I belong somewhere else”.

Let’s do a tiny bit of design thinking.

What’s Going on Here?

What’s going on here is that we have a set with pretty much the second simplest structure we use. It’s a set of “records”, where we mean by that “all the elements of this set have pretty much the same scopes throughout”, and we probably also mean “and most of the elements are simple values”.

And we want an output set that has a record for each combination of some grouping fields in those records, and that record should contain the sum (or other statistic) of certain fields that we’ll name.

That’s pretty darn specific.

I think the code is telling us that we want some new class here, to contain all this code. We can think of it, perhaps, as like a factory, given some input it produces a set containing the requested statistics. Maybe it’s a kind of “Factory” pattern, but I’m not going to push that notion. I am, however, going to try putting this code into a set of its own.

I think I’d like to leave the method stats on XSet, though we’ll move all the work elsewhere. I think we have sufficient tests to support this refactoring.

function Xset:stats(controlTable)
    return XStatistics(self,controlTable)
end

I’ll just start by creating the new class and moving the methods over. I think we’ll want a new tab for this.

function XStatistics:init(input, control)
    self.data = input
    self.control = control
    return self:getStatistics()
end

Now I can almost just move everything, with a few adjustments.


function XStatistics:getStatistics(controlTable)
    local sum = function(input,accumulator, control)
        local fields = control.sum
        for i,field in ipairs(fields) do
            local accum = accumulator:at(field) or 0
            accum = accum + input:at(field) or 0
            accumulator:putAt(tostring(accum),field)
        end
    end
    local result = XSet()
    local groupFields = controlTable.group or {}
    for record,scope in self.data:elements() do
        local accumulator = result:findOrCreateAccumulator(record, groupFields)
        sum(record, accumulator, controlTable)
    end
    return result
end

Oh, I’d better fix the references to controlTable …

function XStatistics:getStatistics(controlTable)
    local sum = function(input,accumulator, control)
        local fields = control.sum
        for i,field in ipairs(fields) do
            local accum = accumulator:at(field) or 0
            accum = accum + input:at(field) or 0
            accumulator:putAt(tostring(accum),field)
        end
    end
    local result = XSet()
    local groupFields = self.control.group or {}
    for record,scope in self.data:elements() do
        local accumulator = result:findOrCreateAccumulator(record, groupFields)
        sum(record, accumulator, self.control)
    end
    return result
end

This is rather a large method all of a sudden, since it’s not trivial to scan it for places where I need to rename or add self. Moving along …

function XStatistics:findOrCreateAccumulator(record, groupFields)
    local accumulator
    local matchRecord = self:createMatchRecord(record, groupFields)
    local accumulatorSet = self.data:restrictElement(matchRecord)
    if accumulatorSet:isNull() then
        accumulator = matchRecord
        self:addAt(matchRecord,NULL)
    else
        accumulator = accumulatorSet:at(NULL)
    end
    return accumulator
end

function XStatistics:createMatchRecord(record,groupFields)
    local matchRecord = XSet()
    for i,field in ipairs(groupFields) do
        local value = record:at(field) or "MISSING"
        matchRecord:addAt(value, field)
    end
    return matchRecord
end

I found one more place where I needed to change self to self.data. Let’s run the tests and see what breaks, let them lead the way if they need to.

First thing they do is find the typo in the initial call, where I said Xset, not XSet. Then:

5: Grouping -- XStatistics:21: attempt to call a nil value (method 'findOrCreateAccumulator')

I get three occurrences of that. It’s here:

function XStatistics:getStatistics(controlTable)
    local sum = function(input,accumulator, control)
        local fields = control.sum
        for i,field in ipairs(fields) do
            local accum = accumulator:at(field) or 0
            accum = accum + input:at(field) or 0
            accumulator:putAt(tostring(accum),field)
        end
    end
    local result = XSet()
    local groupFields = self.control.group or {}
    for record,scope in self.data:elements() do
        local accumulator = result:findOrCreateAccumulator(record, groupFields)
        sum(record, accumulator, self.control)
    end
    return result
end

Ah. Yes. Our weird method findOrCreateAccumulator is not called on the data but on a specific record. We can either pass in result as a parameter, or make that into an actual set operation. We’re in the mode of not creating weird single purpose set ops, so let’s draw it inside XStatistics for now.

A bit of reading carefully and we have this:

function XSet:stats(controlTable)
    local stat = XStatistics(self,controlTable)
    return stat:getStatistics()
end

That’s because we can’t return anything but an XStatistics from the init so we have go ask it to give us the stats.

And the implementation, now working:

-- XStatistics
-- 20211227

XStatistics = class()

function XStatistics:init(input, control)
    self.data = input
    self.control = control
end

function XStatistics:getStatistics(controlTable)
    local sum = function(input,accumulator, control)
        local fields = control.sum
        for i,field in ipairs(fields) do
            local accum = accumulator:at(field) or 0
            accum = accum + input:at(field) or 0
            accumulator:putAt(tostring(accum),field)
        end
    end
    local result = XSet()
    local groupFields = self.control.group or {}
    for record,scope in self.data:elements() do
        local accumulator = self:findOrCreateAccumulator(result,record, groupFields)
        sum(record, accumulator, self.control)
    end
    return result
end

function XStatistics:findOrCreateAccumulator(result, record, groupFields)
    local accumulator
    local matchRecord = self:createMatchRecord(record, groupFields)
    local accumulatorSet = result:restrictElement(matchRecord)
    if accumulatorSet:isNull() then
        accumulator = matchRecord
        result:addAt(matchRecord,NULL)
    else
        accumulator = accumulatorSet:at(NULL)
    end
    return accumulator
end

function XStatistics:createMatchRecord(record,groupFields)
    local matchRecord = XSet()
    for i,field in ipairs(groupFields) do
        local value = record:at(field) or "MISSING"
        matchRecord:addAt(value, field)
    end
    return matchRecord
end

First let’s commit: Moved stats method to new XStatistics class.

Now that it works, let’s make it more nearly right.

Making It Right

What’s not to like about this code? Well. First of all, the methods hardly ever refer to the input data at all. They are processing other sets. Let’s start with the worst offender:

function XStatistics:createMatchRecord(record,groupFields)
    local matchRecord = XSet()
    for i,field in ipairs(groupFields) do
        local value = record:at(field) or "MISSING"
        matchRecord:addAt(value, field)
    end
    return matchRecord
end

This thing, given a list of scopes and a record, produces the set containing the specific values under those scopes.

I think we have a name for that. I believe it is scopeRestrict, though it expects a set as its second parameter, not a table:

function XSet:scopeRestrict(B)
    return self:select(function(_ignored,aScope)
        return B:exists(function(bElement,_ignored)
            return bElement==aScope
        end)
    end)
end

Let’s see if we can use that method, and if we can, whether we can get code that we actually like:

function XStatistics:createMatchRecord(record,groupFields)
    local scopeSet = XSet()
    for i,field in ipairs(groupFields) do
        scopeSet:addAt(field,field)
    end
    return record:scopeRestrict(scopeSet)
end

Here I create a set of {scope@scope} for each of the scopes in groupFields and use the scopeRestrict to get the match record. Commit: use scopeRestrict to create matchRecord.

Now let’s compute that scopeSet once and for all. First extract the method:

function XStatistics:createMatchRecord(record,groupFields)
    self.scopeSet = self:getScopeSet(groupFields)
    return record:scopeRestrict(self.scopeSet)
end

function XStatistics:getScopeSet(groupFields)
    local scopeSet = XSet()
    for i,field in ipairs(groupFields) do
        scopeSet:addAt(field,field)
    end
    return scopeSet
end

Commit: extract method for scopeSet.

Now move the call to the init.

function XStatistics:init(input, control)
    self.data = input
    self.control = control
    self.scopeSet = self:getScopeSet(groupFields)
end

Now groupFields isn’t defined, so:

function XStatistics:init(input, control)
    self.data = input
    self.control = control
    self.scopeSet = self:getScopeSet(self.control.group or {})
end

Tests green. Commit: move creation of scopeSet to init.

Now these methods don’t use groupFields:

function XStatistics:getStatistics(controlTable)
    local sum = function(input,accumulator, control)
        local fields = control.sum
        for i,field in ipairs(fields) do
            local accum = accumulator:at(field) or 0
            accum = accum + input:at(field) or 0
            accumulator:putAt(tostring(accum),field)
        end
    end
    local result = XSet()
    local groupFields = self.control.group or {}
    for record,scope in self.data:elements() do
        local accumulator = self:findOrCreateAccumulator(result,record, groupFields)
        sum(record, accumulator, self.control)
    end
    return result
end

function XStatistics:findOrCreateAccumulator(result, record, groupFields)
    local accumulator
    local matchRecord = self:createMatchRecord(record, groupFields)
    local accumulatorSet = result:restrictElement(matchRecord)
    if accumulatorSet:isNull() then
        accumulator = matchRecord
        result:addAt(matchRecord,NULL)
    else
        accumulator = accumulatorSet:at(NULL)
    end
    return accumulator
end

function XStatistics:createMatchRecord(record,groupFields)
    return record:scopeRestrict(self.scopeSet)
end

Remove the parameter throughout.

function XStatistics:getStatistics(controlTable)
    local sum = function(input,accumulator, control)
        local fields = control.sum
        for i,field in ipairs(fields) do
            local accum = accumulator:at(field) or 0
            accum = accum + input:at(field) or 0
            accumulator:putAt(tostring(accum),field)
        end
    end
    local result = XSet()
    for record,scope in self.data:elements() do
        local accumulator = self:findOrCreateAccumulator(result,record)
        sum(record, accumulator, self.control)
    end
    return result
end

function XStatistics:findOrCreateAccumulator(result, record)
    local accumulator
    local matchRecord = self:createMatchRecord(record)
    local accumulatorSet = result:restrictElement(matchRecord)
    if accumulatorSet:isNull() then
        accumulator = matchRecord
        result:addAt(matchRecord,NULL)
    else
        accumulator = accumulatorSet:at(NULL)
    end
    return accumulator
end

Green, commit: Remove unused groupFields parameter.

Now we can remove the parameter from the match record creation as well.

function XStatistics:init(input, control)
    self.data = input
    self.control = control
    self.scopeSet = self:getScopeSet()
end

function XStatistics:getScopeSet()
    local groupFields = self.control.group or {}
    local scopeSet = XSet()
    for i,field in ipairs(groupFields) do
        scopeSet:addAt(field,field)
    end
    return scopeSet
end

Green. Commit: push group field fetch down to method that needs it.

So that’s nice.

What Else?

Is that all? Not remotely.

function XStatistics:getStatistics(controlTable)
    local sum = function(input,accumulator, control)
        local fields = control.sum
        for i,field in ipairs(fields) do
            local accum = accumulator:at(field) or 0
            accum = accum + input:at(field) or 0
            accumulator:putAt(tostring(accum),field)
        end
    end
    local result = XSet()
    for record,scope in self.data:elements() do
        local accumulator = self:findOrCreateAccumulator(result,record)
        sum(record, accumulator, self.control)
    end
    return result
end

I don’t like including the details of the sum function here in the method that uses it. We can give ourselves a method for that now.

function XStatistics:getStatistics(controlTable)
    local result = XSet()
    for record,scope in self.data:elements() do
        local accumulator = self:findOrCreateAccumulator(result,record)
        self:sumFunction()(record, accumulator, self.control)
    end
    return result
end

function XStatistics:sumFunction()
    return function(input,accumulator, control)
        local fields = control.sum
        for i,field in ipairs(fields) do
            local accum = accumulator:at(field) or 0
            accum = accum + input:at(field) or 0
            accumulator:putAt(tostring(accum),field)
        end
    end
end

Is that too obscure, with the sumFunction()(record...)? I think I’ll allow it, but it is a bit uncommon. Let’s see what else we can find and then check to see whether that sticks out as a problem.

Green. Commit: sumFunction returned from a method in XStatistics.

This …

function XStatistics:getStatistics(controlTable)
    local result = XSet()
    for record,scope in self.data:elements() do
        local accumulator = self:findOrCreateAccumulator(result,record)
        self:sumFunction()(record, accumulator, self.control)
    end
    return result
end

Isn’t that last bit just a reduce in wolf’s clothing? Let’s see if we can make it so.

Well, my first attempt crashed. Fortunately, we just committed. So I’ll revert. Try again:

function XStatistics:getStatistics(controlTable)
    return self.data:reduce(XSet(), function(result,record,scope)
        local accumulator = self:findOrCreateAccumulator(result,record)
        self:sumFunction()(record, accumulator, self.control)
        return result
    end)
end

Green: use reduce in getStatistics.

We could debate how much better or worse that is. It’s not the cleanest reduce ever. For now, I’ll allow it: I’m trying to make the operations like reduce part of our official coding standard, so we’ll build up our comfort with them as time goes on.

My thinking is that reduce is solid, and ad-hoc loops are less so, so that we should push things into reduce and its associates when we can.

Let’s glance at this:

function XStatistics:sumFunction()
    return function(input,accumulator, control)
        local fields = control.sum
        for i,field in ipairs(fields) do
            local accum = accumulator:at(field) or 0
            accum = accum + input:at(field) or 0
            accumulator:putAt(tostring(accum),field)
        end
    end
end

I think we can inline a bit here to our benefit. After reflection, just one tiny change, removing that local.

function XStatistics:sumFunction()
    return function(input, accumulator, control)
        for _i,field in ipairs(control.sum) do
            local accum = accumulator:at(field) or 0
            accum = accum + input:at(field) or 0
            accumulator:putAt(tostring(accum),field)
        end
    end
end

I’d like to reduce that three line loop content, but I don’t see quite how to do it. No sense walking out on thin ice. Green, commit: Inline temp in sumFunction.

And Now …

function XStatistics:findOrCreateAccumulator(result, record)
    local accumulator
    local matchRecord = self:createMatchRecord(record)
    local accumulatorSet = result:restrictElement(matchRecord)
    if accumulatorSet:isNull() then
        accumulator = matchRecord
        result:addAt(matchRecord,NULL)
    else
        accumulator = accumulatorSet:at(NULL)
    end
    return accumulator
end

This one’s a real [DELETED], isn’t it? I just hate methods with “Or” or “And” in their names. You know they’re not doing just one thing.

What could we call it that wouldn’t have a boolean operator in the name? produceAccumulator? Maybe.

What could we write that wouldn’t be so messy? Something like this, maybe?

function XStatistics:getStatistics(controlTable)
    return self.data:reduce(XSet(), function(result,record,scope)
        local accumulator = self:produceAccumulator(result,record)
        self:sumFunction()(record, accumulator, self.control)
        return result
    end)
end

function XStatistics:produceAccumulator(result, record)
    return XStatistics:produceAccumulatorMatching(result, self:createMatchRecord(record))
end

function XStatistics:produceAccumulatorMatching(result, matchRecord)
    local accumulatorSet = result:restrictElement(matchRecord)
    if accumulatorSet:isNull() then
        result:addAt(matchRecord,NULL)
        return matchRecord
    end
    return accumulatorSet:at(NULL)
end

I extracted the createMatchRecord to make that one-liner, removing one line from the produceAccumulatorMatching method. Green. Commit: refactoring getStatistics to use produceAccumulator.

Now this:

function XStatistics:produceAccumulatorMatching(result, matchRecord)
    local accumulatorSet = result:restrictElement(matchRecord)
    if accumulatorSet:isNull() then
        result:addAt(matchRecord,NULL)
        return matchRecord
    end
    return accumulatorSet:at(NULL)
end

We can do this:

function XStatistics:produceAccumulatorMatching(result, matchRecord)
    local accumulator = result:restrictElement(matchRecord):at(NULL)
    if not accumulator then
        result:addAt(matchRecord,NULL)
        return matchRecord
    end
    return accumulator
end

Better. What about this:

function XStatistics:produceAccumulatorMatching(result, matchRecord)
    return result:restrictElement(matchRecord):at(NULL) or self:createOne(result,matchRecord)
end

function XStatistics:createOne(result,matchRecord)
    result:addAt(matchRecord,NULL)
    return matchRecord
end

Now that’s what I’m talkin’ about. Here’s where we are:

-- XStatistics
-- 20211227

XStatistics = class()

function XStatistics:init(input, control)
    self.data = input
    self.control = control
    self.scopeSet = self:getScopeSet()
end

function XStatistics:createMatchRecord(record)
    return record:scopeRestrict(self.scopeSet)
end

function XStatistics:createResultRecord(result,matchRecord)
    result:addAt(matchRecord,NULL)
    return matchRecord
end

function XStatistics:getScopeSet()
    local groupFields = self.control.group or {}
    local scopeSet = XSet()
    for i,field in ipairs(groupFields) do
        scopeSet:addAt(field,field)
    end
    return scopeSet
end

function XStatistics:getStatistics(controlTable)
    return self.data:reduce(XSet(), function(result,record,scope)
        local accumulator = self:produceAccumulator(result,record)
        self:sumFunction()(record, accumulator, self.control)
        return result
    end)
end

function XStatistics:produceAccumulator(result, record)
    return XStatistics:produceAccumulatorMatching(result, self:createMatchRecord(record))
end

function XStatistics:produceAccumulatorMatching(result, matchRecord)
    return result:restrictElement(matchRecord):at(NULL) or self:createResultRecord(result,matchRecord)
end

function XStatistics:sumFunction()
    return function(input, accumulator, control)
        for _i,field in ipairs(control.sum) do
            local accum = accumulator:at(field) or 0
            accum = accum + input:at(field) or 0
            accumulator:putAt(tostring(accum),field)
        end
    end
end

Now then. Most of that is pretty short and simple.

We started off this morning with 40 lines. We now have 54, of which around ten are whitespace or commentary. And what we have is overall simpler, with fewer if statements and loops, though of course the conditions are still met and the right sets are still enumerated. We’ve just hidden that complexity away, pushed down to existing methods.

Most of what we have is pretty coherent with the notion of creating statistics. I have some interest in this function, however:

function XStatistics:getScopeSet()
    local groupFields = self.control.group or {}
    local scopeSet = XSet()
    for i,field in ipairs(groupFields) do
        scopeSet:addAt(field,field)
    end
    return scopeSet
end

That seems perilously close to doing something generically useful. We do have a function now, scopeSet, that returns the scopes from a given set, under NULL. Let’s change the definition to put them under themselves, and see if the tests mind.

function XSet:scopeSet()
    return self:reduce(XSet(), function(r,e,s)
        return r:addAt(s,s)
    end)
end

This test fails:

        _:test("restrictToScope", function()
            local S = XSet:on(CSVSet(CSVnames,CSVdata))
            local mi = XSet():addAt("MI","state")
            local restrictor = XSet():addAt(mi,NULL)
            local miFolksScopesHardWay = S:restrict(restrictor):scopeSet()
            local miFolksScopes = S:restrictToScope(restrictor)
            _:expect(miFolksScopes:card(),"scopes").is(14)
            _:expect(miFolksScopesHardWay:card(),"hard").is(14)
            _:expect(miFolksScopesHardWay:equals(miFolksScopes),"sets equal").is(true)
        end)

Another one needed the results changed to match. I think we need to look at the restrictToScope and see if it’s OK for it to return scope@scope.

function XSet:restrictToScope(B)
    local scopes = XSet()
    for a,s in self:elements() do
        if B:exists(function(b,_ignored) return b:isSubset(a) end) then
            scopes:addAt(s,NULL)
        end
    end
    return scopes
end

Let’s do it.

function XSet:restrictToScope(B)
    local scopes = XSet()
    for a,s in self:elements() do
        if B:exists(function(b,_ignored) return b:isSubset(a) end) then
            scopes:addAt(s,s)
        end
    end
    return scopes
end

Test. Green. Commit: scope sets are now scope at scope.

And now:

function XStatistics:getScopeSet()
    local groupFields = self.control.group or {}
    local scopeSet = XSet()
    for i,field in ipairs(groupFields) do
        scopeSet:addAt(field,field)
    end
    return scopeSet
end

Well, darn. I was thinking I could now call scopeSet but this is not to be. The input to scopeSet needs to be a set of records with some scopes. To create that is just as much code as we have here now. So, boo.

The code is telling me that it doesn’t want to do that.

But it’s also telling me that XSet is much simpler without all the statistics stuff woven in, and XStatistics is cleaner than a pure extraction of the stats stuff into a new class.

Summary

We listened to the code telling us that it didn’t belong in XSets, then telling us that bits of it were too complex. We moved things around, at least once finding we were working against the grain, and smoothed out much of the code. We deferred some operations back to XSet, as one would hope would happen. Is there more we could do? Almost always, and here, yes. Is it better? By my lights, it’s much better, if only because we’ve split the XSet_stats blob into two separate things.

And the new thing’s code is simpler as well.

Is it perfect? No, it’s not perfect. But it’s better. That’s a good thing: if it were perfect, there’d be nothing to do tomorrow.

But there are plenty of things to do tomorrow. See you then!


XSet2.zip