XST 29: Listening to the Code
No, I’m not hearing voices. But the code does tell us things, just like any working material. We need to learn to listen. Today, we listen and the result is good.
I’m pretty sure I first heard the phrase “listening to the code” from Kent Beck. I recall also the notion of “letting the code participate in our design sessions”. These ideas are not some new form of animism or digital mysticism. They express the common experience that a maker has in working with their materials, that the materials have a say in what happens.
The carpenter knows that you saw and polish different woods differently, and even depending on whether you’re working with or across or against the grain. The knife maker knows that different metals work differently. The chemist knows that pressure and temperature change how chemicals react. The artist knows that different paints mix differently, and different materials take paint differently. The child knows that some snows make good snowballs and snowmen, and some do not.
As we work with any material, and as we improve in that work, we begin to feel the material responding to the work. It’s relatively easy to get that feeling with physical materials: they do have differing hardness and friction, which influences how our tools, and our hands, respond.
The metaphor remains strong in writing and programming. The work flows well when we’re approaching it well, and gets all sticky and messy when we aren’t going about things well enough. We’re here for the code, of course, so let’s look at our statistics making code again today, and see if we can see what it’s trying to tell us.
Stats Operation
Here, again, is our stats
operation:
function XSet:stats(controlTable)
local sum = function(input,accumulator, control)
local fields = control.sum
for i,field in ipairs(fields) do
local accum = accumulator:at(field) or 0
accum = accum + input:at(field) or 0
accumulator:putAt(tostring(accum),field)
end
end
local result = XSet()
local groupFields = controlTable.group or {}
for record,scope in self:elements() do
local accumulator = result:findOrCreateAccumulator(record, groupFields)
sum(record, accumulator, controlTable)
end
return result
end
function XSet:findOrCreateAccumulator(record, groupFields)
local accumulator
local matchRecord = self:createMatchRecord(record, groupFields)
local accumulatorSet = self:restrictElement(matchRecord)
if accumulatorSet:isNull() then
accumulator = matchRecord
self:addAt(matchRecord,NULL)
else
accumulator = accumulatorSet:at(NULL)
end
return accumulator
end
function XSet:createMatchRecord(record,groupFields)
local matchRecord = XSet()
for i,field in ipairs(groupFields) do
local value = record:at(field) or "MISSING"
matchRecord:addAt(value, field)
end
return matchRecord
end
It makes some sense that stats
could be a set operation, although it is pretty custom-sized to a specific kind of set. But createMatchRecord
? findOrCreateAccumulator
? These do not seem like set operations at all.
Set operations, mostly, have fallen into two groups. There are what we might call primitives, which are low-level operations, like hasAt
, which answers whether a particular element occurs at a particular scope, or elements
, which enumerates the elements and scopes of a set. And then there are general operations, union
, restrict
, and so on, which carry out quite general operations on sets.
I guess there’s a third group, I suppose, like isNull
, or isSubset
. Should I count the various iterators, reduce
, select
, exists
and every
as another group?
Be that as it may, all those methods apply to all sets. More or less. Go with me here. There’s always a grey area. Or a gray area. Whichever.
These stats methods just aren’t like that at all. They are telling us something. What are they telling us? They’re telling us “I don’t belong here: I belong somewhere else”.
Let’s do a tiny bit of design thinking.
What’s Going on Here?
What’s going on here is that we have a set with pretty much the second simplest structure we use. It’s a set of “records”, where we mean by that “all the elements of this set have pretty much the same scopes throughout”, and we probably also mean “and most of the elements are simple values”.
And we want an output set that has a record for each combination of some grouping fields in those records, and that record should contain the sum (or other statistic) of certain fields that we’ll name.
That’s pretty darn specific.
I think the code is telling us that we want some new class here, to contain all this code. We can think of it, perhaps, as like a factory, given some input it produces a set containing the requested statistics. Maybe it’s a kind of “Factory” pattern, but I’m not going to push that notion. I am, however, going to try putting this code into a set of its own.
I think I’d like to leave the method stats
on XSet, though we’ll move all the work elsewhere. I think we have sufficient tests to support this refactoring.
function Xset:stats(controlTable)
return XStatistics(self,controlTable)
end
I’ll just start by creating the new class and moving the methods over. I think we’ll want a new tab for this.
function XStatistics:init(input, control)
self.data = input
self.control = control
return self:getStatistics()
end
Now I can almost just move everything, with a few adjustments.
function XStatistics:getStatistics(controlTable)
local sum = function(input,accumulator, control)
local fields = control.sum
for i,field in ipairs(fields) do
local accum = accumulator:at(field) or 0
accum = accum + input:at(field) or 0
accumulator:putAt(tostring(accum),field)
end
end
local result = XSet()
local groupFields = controlTable.group or {}
for record,scope in self.data:elements() do
local accumulator = result:findOrCreateAccumulator(record, groupFields)
sum(record, accumulator, controlTable)
end
return result
end
Oh, I’d better fix the references to controlTable …
function XStatistics:getStatistics(controlTable)
local sum = function(input,accumulator, control)
local fields = control.sum
for i,field in ipairs(fields) do
local accum = accumulator:at(field) or 0
accum = accum + input:at(field) or 0
accumulator:putAt(tostring(accum),field)
end
end
local result = XSet()
local groupFields = self.control.group or {}
for record,scope in self.data:elements() do
local accumulator = result:findOrCreateAccumulator(record, groupFields)
sum(record, accumulator, self.control)
end
return result
end
This is rather a large method all of a sudden, since it’s not trivial to scan it for places where I need to rename or add self
. Moving along …
function XStatistics:findOrCreateAccumulator(record, groupFields)
local accumulator
local matchRecord = self:createMatchRecord(record, groupFields)
local accumulatorSet = self.data:restrictElement(matchRecord)
if accumulatorSet:isNull() then
accumulator = matchRecord
self:addAt(matchRecord,NULL)
else
accumulator = accumulatorSet:at(NULL)
end
return accumulator
end
function XStatistics:createMatchRecord(record,groupFields)
local matchRecord = XSet()
for i,field in ipairs(groupFields) do
local value = record:at(field) or "MISSING"
matchRecord:addAt(value, field)
end
return matchRecord
end
I found one more place where I needed to change self
to self.data
. Let’s run the tests and see what breaks, let them lead the way if they need to.
First thing they do is find the typo in the initial call, where I said Xset
, not XSet
. Then:
5: Grouping -- XStatistics:21: attempt to call a nil value (method 'findOrCreateAccumulator')
I get three occurrences of that. It’s here:
function XStatistics:getStatistics(controlTable)
local sum = function(input,accumulator, control)
local fields = control.sum
for i,field in ipairs(fields) do
local accum = accumulator:at(field) or 0
accum = accum + input:at(field) or 0
accumulator:putAt(tostring(accum),field)
end
end
local result = XSet()
local groupFields = self.control.group or {}
for record,scope in self.data:elements() do
local accumulator = result:findOrCreateAccumulator(record, groupFields)
sum(record, accumulator, self.control)
end
return result
end
Ah. Yes. Our weird method findOrCreateAccumulator
is not called on the data but on a specific record. We can either pass in result as a parameter, or make that into an actual set operation. We’re in the mode of not creating weird single purpose set ops, so let’s draw it inside XStatistics
for now.
A bit of reading carefully and we have this:
function XSet:stats(controlTable)
local stat = XStatistics(self,controlTable)
return stat:getStatistics()
end
That’s because we can’t return anything but an XStatistics from the init
so we have go ask it to give us the stats.
And the implementation, now working:
-- XStatistics
-- 20211227
XStatistics = class()
function XStatistics:init(input, control)
self.data = input
self.control = control
end
function XStatistics:getStatistics(controlTable)
local sum = function(input,accumulator, control)
local fields = control.sum
for i,field in ipairs(fields) do
local accum = accumulator:at(field) or 0
accum = accum + input:at(field) or 0
accumulator:putAt(tostring(accum),field)
end
end
local result = XSet()
local groupFields = self.control.group or {}
for record,scope in self.data:elements() do
local accumulator = self:findOrCreateAccumulator(result,record, groupFields)
sum(record, accumulator, self.control)
end
return result
end
function XStatistics:findOrCreateAccumulator(result, record, groupFields)
local accumulator
local matchRecord = self:createMatchRecord(record, groupFields)
local accumulatorSet = result:restrictElement(matchRecord)
if accumulatorSet:isNull() then
accumulator = matchRecord
result:addAt(matchRecord,NULL)
else
accumulator = accumulatorSet:at(NULL)
end
return accumulator
end
function XStatistics:createMatchRecord(record,groupFields)
local matchRecord = XSet()
for i,field in ipairs(groupFields) do
local value = record:at(field) or "MISSING"
matchRecord:addAt(value, field)
end
return matchRecord
end
First let’s commit: Moved stats method to new XStatistics class.
Now that it works, let’s make it more nearly right.
Making It Right
What’s not to like about this code? Well. First of all, the methods hardly ever refer to the input data at all. They are processing other sets. Let’s start with the worst offender:
function XStatistics:createMatchRecord(record,groupFields)
local matchRecord = XSet()
for i,field in ipairs(groupFields) do
local value = record:at(field) or "MISSING"
matchRecord:addAt(value, field)
end
return matchRecord
end
This thing, given a list of scopes and a record, produces the set containing the specific values under those scopes.
I think we have a name for that. I believe it is scopeRestrict
, though it expects a set as its second parameter, not a table:
function XSet:scopeRestrict(B)
return self:select(function(_ignored,aScope)
return B:exists(function(bElement,_ignored)
return bElement==aScope
end)
end)
end
Let’s see if we can use that method, and if we can, whether we can get code that we actually like:
function XStatistics:createMatchRecord(record,groupFields)
local scopeSet = XSet()
for i,field in ipairs(groupFields) do
scopeSet:addAt(field,field)
end
return record:scopeRestrict(scopeSet)
end
Here I create a set of {scope@scope} for each of the scopes in groupFields
and use the scopeRestrict
to get the match record. Commit: use scopeRestrict to create matchRecord.
Now let’s compute that scopeSet once and for all. First extract the method:
function XStatistics:createMatchRecord(record,groupFields)
self.scopeSet = self:getScopeSet(groupFields)
return record:scopeRestrict(self.scopeSet)
end
function XStatistics:getScopeSet(groupFields)
local scopeSet = XSet()
for i,field in ipairs(groupFields) do
scopeSet:addAt(field,field)
end
return scopeSet
end
Commit: extract method for scopeSet.
Now move the call to the init.
function XStatistics:init(input, control)
self.data = input
self.control = control
self.scopeSet = self:getScopeSet(groupFields)
end
Now groupFields isn’t defined, so:
function XStatistics:init(input, control)
self.data = input
self.control = control
self.scopeSet = self:getScopeSet(self.control.group or {})
end
Tests green. Commit: move creation of scopeSet to init.
Now these methods don’t use groupFields:
function XStatistics:getStatistics(controlTable)
local sum = function(input,accumulator, control)
local fields = control.sum
for i,field in ipairs(fields) do
local accum = accumulator:at(field) or 0
accum = accum + input:at(field) or 0
accumulator:putAt(tostring(accum),field)
end
end
local result = XSet()
local groupFields = self.control.group or {}
for record,scope in self.data:elements() do
local accumulator = self:findOrCreateAccumulator(result,record, groupFields)
sum(record, accumulator, self.control)
end
return result
end
function XStatistics:findOrCreateAccumulator(result, record, groupFields)
local accumulator
local matchRecord = self:createMatchRecord(record, groupFields)
local accumulatorSet = result:restrictElement(matchRecord)
if accumulatorSet:isNull() then
accumulator = matchRecord
result:addAt(matchRecord,NULL)
else
accumulator = accumulatorSet:at(NULL)
end
return accumulator
end
function XStatistics:createMatchRecord(record,groupFields)
return record:scopeRestrict(self.scopeSet)
end
Remove the parameter throughout.
function XStatistics:getStatistics(controlTable)
local sum = function(input,accumulator, control)
local fields = control.sum
for i,field in ipairs(fields) do
local accum = accumulator:at(field) or 0
accum = accum + input:at(field) or 0
accumulator:putAt(tostring(accum),field)
end
end
local result = XSet()
for record,scope in self.data:elements() do
local accumulator = self:findOrCreateAccumulator(result,record)
sum(record, accumulator, self.control)
end
return result
end
function XStatistics:findOrCreateAccumulator(result, record)
local accumulator
local matchRecord = self:createMatchRecord(record)
local accumulatorSet = result:restrictElement(matchRecord)
if accumulatorSet:isNull() then
accumulator = matchRecord
result:addAt(matchRecord,NULL)
else
accumulator = accumulatorSet:at(NULL)
end
return accumulator
end
Green, commit: Remove unused groupFields parameter.
Now we can remove the parameter from the match record creation as well.
function XStatistics:init(input, control)
self.data = input
self.control = control
self.scopeSet = self:getScopeSet()
end
function XStatistics:getScopeSet()
local groupFields = self.control.group or {}
local scopeSet = XSet()
for i,field in ipairs(groupFields) do
scopeSet:addAt(field,field)
end
return scopeSet
end
Green. Commit: push group field fetch down to method that needs it.
So that’s nice.
What Else?
Is that all? Not remotely.
function XStatistics:getStatistics(controlTable)
local sum = function(input,accumulator, control)
local fields = control.sum
for i,field in ipairs(fields) do
local accum = accumulator:at(field) or 0
accum = accum + input:at(field) or 0
accumulator:putAt(tostring(accum),field)
end
end
local result = XSet()
for record,scope in self.data:elements() do
local accumulator = self:findOrCreateAccumulator(result,record)
sum(record, accumulator, self.control)
end
return result
end
I don’t like including the details of the sum function here in the method that uses it. We can give ourselves a method for that now.
function XStatistics:getStatistics(controlTable)
local result = XSet()
for record,scope in self.data:elements() do
local accumulator = self:findOrCreateAccumulator(result,record)
self:sumFunction()(record, accumulator, self.control)
end
return result
end
function XStatistics:sumFunction()
return function(input,accumulator, control)
local fields = control.sum
for i,field in ipairs(fields) do
local accum = accumulator:at(field) or 0
accum = accum + input:at(field) or 0
accumulator:putAt(tostring(accum),field)
end
end
end
Is that too obscure, with the sumFunction()(record...)
? I think I’ll allow it, but it is a bit uncommon. Let’s see what else we can find and then check to see whether that sticks out as a problem.
Green. Commit: sumFunction returned from a method in XStatistics.
This …
function XStatistics:getStatistics(controlTable)
local result = XSet()
for record,scope in self.data:elements() do
local accumulator = self:findOrCreateAccumulator(result,record)
self:sumFunction()(record, accumulator, self.control)
end
return result
end
Isn’t that last bit just a reduce
in wolf’s clothing? Let’s see if we can make it so.
Well, my first attempt crashed. Fortunately, we just committed. So I’ll revert. Try again:
function XStatistics:getStatistics(controlTable)
return self.data:reduce(XSet(), function(result,record,scope)
local accumulator = self:findOrCreateAccumulator(result,record)
self:sumFunction()(record, accumulator, self.control)
return result
end)
end
Green: use reduce
in getStatistics
.
We could debate how much better or worse that is. It’s not the cleanest reduce ever. For now, I’ll allow it: I’m trying to make the operations like reduce
part of our official coding standard, so we’ll build up our comfort with them as time goes on.
My thinking is that reduce
is solid, and ad-hoc loops are less so, so that we should push things into reduce
and its associates when we can.
Let’s glance at this:
function XStatistics:sumFunction()
return function(input,accumulator, control)
local fields = control.sum
for i,field in ipairs(fields) do
local accum = accumulator:at(field) or 0
accum = accum + input:at(field) or 0
accumulator:putAt(tostring(accum),field)
end
end
end
I think we can inline a bit here to our benefit. After reflection, just one tiny change, removing that local.
function XStatistics:sumFunction()
return function(input, accumulator, control)
for _i,field in ipairs(control.sum) do
local accum = accumulator:at(field) or 0
accum = accum + input:at(field) or 0
accumulator:putAt(tostring(accum),field)
end
end
end
I’d like to reduce that three line loop content, but I don’t see quite how to do it. No sense walking out on thin ice. Green, commit: Inline temp in sumFunction
.
And Now …
function XStatistics:findOrCreateAccumulator(result, record)
local accumulator
local matchRecord = self:createMatchRecord(record)
local accumulatorSet = result:restrictElement(matchRecord)
if accumulatorSet:isNull() then
accumulator = matchRecord
result:addAt(matchRecord,NULL)
else
accumulator = accumulatorSet:at(NULL)
end
return accumulator
end
This one’s a real [DELETED], isn’t it? I just hate methods with “Or” or “And” in their names. You know they’re not doing just one thing.
What could we call it that wouldn’t have a boolean operator in the name? produceAccumulator
? Maybe.
What could we write that wouldn’t be so messy? Something like this, maybe?
function XStatistics:getStatistics(controlTable)
return self.data:reduce(XSet(), function(result,record,scope)
local accumulator = self:produceAccumulator(result,record)
self:sumFunction()(record, accumulator, self.control)
return result
end)
end
function XStatistics:produceAccumulator(result, record)
return XStatistics:produceAccumulatorMatching(result, self:createMatchRecord(record))
end
function XStatistics:produceAccumulatorMatching(result, matchRecord)
local accumulatorSet = result:restrictElement(matchRecord)
if accumulatorSet:isNull() then
result:addAt(matchRecord,NULL)
return matchRecord
end
return accumulatorSet:at(NULL)
end
I extracted the createMatchRecord
to make that one-liner, removing one line from the produceAccumulatorMatching
method. Green. Commit: refactoring getStatistics
to use produceAccumulator
.
Now this:
function XStatistics:produceAccumulatorMatching(result, matchRecord)
local accumulatorSet = result:restrictElement(matchRecord)
if accumulatorSet:isNull() then
result:addAt(matchRecord,NULL)
return matchRecord
end
return accumulatorSet:at(NULL)
end
We can do this:
function XStatistics:produceAccumulatorMatching(result, matchRecord)
local accumulator = result:restrictElement(matchRecord):at(NULL)
if not accumulator then
result:addAt(matchRecord,NULL)
return matchRecord
end
return accumulator
end
Better. What about this:
function XStatistics:produceAccumulatorMatching(result, matchRecord)
return result:restrictElement(matchRecord):at(NULL) or self:createOne(result,matchRecord)
end
function XStatistics:createOne(result,matchRecord)
result:addAt(matchRecord,NULL)
return matchRecord
end
Now that’s what I’m talkin’ about. Here’s where we are:
-- XStatistics
-- 20211227
XStatistics = class()
function XStatistics:init(input, control)
self.data = input
self.control = control
self.scopeSet = self:getScopeSet()
end
function XStatistics:createMatchRecord(record)
return record:scopeRestrict(self.scopeSet)
end
function XStatistics:createResultRecord(result,matchRecord)
result:addAt(matchRecord,NULL)
return matchRecord
end
function XStatistics:getScopeSet()
local groupFields = self.control.group or {}
local scopeSet = XSet()
for i,field in ipairs(groupFields) do
scopeSet:addAt(field,field)
end
return scopeSet
end
function XStatistics:getStatistics(controlTable)
return self.data:reduce(XSet(), function(result,record,scope)
local accumulator = self:produceAccumulator(result,record)
self:sumFunction()(record, accumulator, self.control)
return result
end)
end
function XStatistics:produceAccumulator(result, record)
return XStatistics:produceAccumulatorMatching(result, self:createMatchRecord(record))
end
function XStatistics:produceAccumulatorMatching(result, matchRecord)
return result:restrictElement(matchRecord):at(NULL) or self:createResultRecord(result,matchRecord)
end
function XStatistics:sumFunction()
return function(input, accumulator, control)
for _i,field in ipairs(control.sum) do
local accum = accumulator:at(field) or 0
accum = accum + input:at(field) or 0
accumulator:putAt(tostring(accum),field)
end
end
end
Now then. Most of that is pretty short and simple.
We started off this morning with 40 lines. We now have 54, of which around ten are whitespace or commentary. And what we have is overall simpler, with fewer if statements and loops, though of course the conditions are still met and the right sets are still enumerated. We’ve just hidden that complexity away, pushed down to existing methods.
Most of what we have is pretty coherent with the notion of creating statistics. I have some interest in this function, however:
function XStatistics:getScopeSet()
local groupFields = self.control.group or {}
local scopeSet = XSet()
for i,field in ipairs(groupFields) do
scopeSet:addAt(field,field)
end
return scopeSet
end
That seems perilously close to doing something generically useful. We do have a function now, scopeSet
, that returns the scopes from a given set, under NULL. Let’s change the definition to put them under themselves, and see if the tests mind.
function XSet:scopeSet()
return self:reduce(XSet(), function(r,e,s)
return r:addAt(s,s)
end)
end
This test fails:
_:test("restrictToScope", function()
local S = XSet:on(CSVSet(CSVnames,CSVdata))
local mi = XSet():addAt("MI","state")
local restrictor = XSet():addAt(mi,NULL)
local miFolksScopesHardWay = S:restrict(restrictor):scopeSet()
local miFolksScopes = S:restrictToScope(restrictor)
_:expect(miFolksScopes:card(),"scopes").is(14)
_:expect(miFolksScopesHardWay:card(),"hard").is(14)
_:expect(miFolksScopesHardWay:equals(miFolksScopes),"sets equal").is(true)
end)
Another one needed the results changed to match. I think we need to look at the restrictToScope
and see if it’s OK for it to return scope@scope.
function XSet:restrictToScope(B)
local scopes = XSet()
for a,s in self:elements() do
if B:exists(function(b,_ignored) return b:isSubset(a) end) then
scopes:addAt(s,NULL)
end
end
return scopes
end
Let’s do it.
function XSet:restrictToScope(B)
local scopes = XSet()
for a,s in self:elements() do
if B:exists(function(b,_ignored) return b:isSubset(a) end) then
scopes:addAt(s,s)
end
end
return scopes
end
Test. Green. Commit: scope sets are now scope at scope.
And now:
function XStatistics:getScopeSet()
local groupFields = self.control.group or {}
local scopeSet = XSet()
for i,field in ipairs(groupFields) do
scopeSet:addAt(field,field)
end
return scopeSet
end
Well, darn. I was thinking I could now call scopeSet
but this is not to be. The input to scopeSet
needs to be a set of records with some scopes. To create that is just as much code as we have here now. So, boo.
The code is telling me that it doesn’t want to do that.
But it’s also telling me that XSet is much simpler without all the statistics stuff woven in, and XStatistics is cleaner than a pure extraction of the stats stuff into a new class.
Summary
We listened to the code telling us that it didn’t belong in XSets, then telling us that bits of it were too complex. We moved things around, at least once finding we were working against the grain, and smoothed out much of the code. We deferred some operations back to XSet, as one would hope would happen. Is there more we could do? Almost always, and here, yes. Is it better? By my lights, it’s much better, if only because we’ve split the XSet_stats blob into two separate things.
And the new thing’s code is simpler as well.
Is it perfect? No, it’s not perfect. But it’s better. That’s a good thing: if it were perfect, there’d be nothing to do tomorrow.
But there are plenty of things to do tomorrow. See you then!