XST 24: Hold My Beer. Well, Chai.
Today, rather than make any deep progress, I plan to work on something I consider interesting, sums, averages, and grouping. I promise to publish this even if it explodes. (It doesn’t, quite.)
I woke up with this idea in my head, so I thought it would be fun to try. The mission, and I choose to accept it, is to implement a set operation (or multiple operations) that can sum, average, and group values. Naively, I want it to work roughly like this.
We have some set of records (other sets) with fields (scopes) of various names. We want to specify some of these fields to be summed, or averaged, and to group the output by others. This is not my proposed syntax but it’ll give the idea:
input = People
group = "state"
sum = {"pay","bonus"}
avg = { "age" }
Why isn’t that the syntax? Maybe it should be, or something like it. Why should all my inputs be so hard to create? We’re working in Lua, why not have Lua help us out more than it currently does?
I’m going to work my way up to this, with tests of course. Let’s begin with some data.
-- test summing etc
local People
function testSummingEtc()
_:describe("Summing etc", function()
_:before(function()
local p1 = XSet():addAt("p1","name"):addAt("MI","state"):addAt("1000","pay"):addAt("50","age")
local p2 = XSet():addAt("p2","name"):addAt("OH","state"):addAt("100","pay"):addAt("30","age")
local p3 = XSet():addAt("p3","name"):addAt("MI","state"):addAt("2000","pay"):addAt("60","age")
local p4 = XSet():addAt("p4","name"):addAt("OH","state"):addAt("200","pay"):addAt("40","age")
People = Tuple():add(p1):add(p2):add(p3):add(p4)
end)
_:after(function()
end)
_:test("People exist", function()
_:expect(People:card()).is(3)
end)
end)
end
The test should fail because card is 4 not 3.
1: People exist -- Actual: 4, Expected: 3
Changing the test to be correct and we’re green.
Now for an actual test. A major issue here will be to invent some syntax. I think I’ll try a lua table:
_:test("Sum pay", function()
local control = {sum="pay"}
local result = People:stats(control)
end)
I’m quite sure that this won’t last but it’ll get us going. I’m positing a new operation, stats
, that takes a control table as shown. We’d like it to be a set, at least internally, but for now, it isn’t.
This should fail not finding stats.
2: Sum pay -- TestSum:26: attempt to call a nil value (method 'stats')
OK … that drives out an empty function stats. What do we want the output to be? Looking forward to grouping, we want a set of sets, each set representing a group, each containing, for now, the summed pay, under “pay”.
_:test("Sum pay", function()
local control = {sum="pay"}
local result = People:stats(control)
local foundRec = result~:elementProject(NULL)
_:expect(foundRec:hasAt("3300","pay")).is(true)
end)
I’m guessing at what I want here, as well as guessing at how to implement it. I hope this step is small enough, but if not we’ll try something easier.
I want a method Set:choose() that will return an arbitrary element from a set. I had to decide here that the output would have a NULL scope and I’m sure that won’t last. We’ll find out.
Here’s what I blurted out for stats
. It’s almost certainly too much. My test fails.
function XSet:stats(controlTable)
local summed = {}
for record,s in self:elements() do
for operation,field in pairs(controlTable) do
if k=="sum" then
local current = summed[field] or 0
summed[field] = current + (record:elementProject(field) or 0)
end
end
end
local result = XSet()
for field,sum in pairs(summed) do
result:addAt(sum.."",field)
end
return XSet():addAt(result,NULL)
end
I’ve put some prints in the test. It’s just really difficult to figure out what you’ve got with sets. Clearly this bite is too big, but I don’t yet see how to make it smaller. Here’s the test and its results:
_:test("Sum pay", function()
local control = {sum="pay"}
local result = People:stats(control)
_:expect(result:card(),"result").is(1)
for e,s in result:elements() do
print("result ", e,s)
end
local foundRec = result:elementProject(NULL)
_:expect(foundRec, "foundRec NULL").isnt(NULL)
print(foundRec)
for e,s in foundRec:elements() do
print("foundRec", e,s)
end
_:expect(foundRec:card(),"record").is(1)
_:expect(foundRec:hasAt("3300","pay")).is(true)
end)
result NULL NULL
NULL
2: Sum pay record -- Actual: 0, Expected: 1
2: Sum pay -- Actual: false, Expected: true
So we aren’t getting anything good out of the result, it seems to just have NULL@NULL. Weird. More tracing, a sure sign that I’m off the rails.
A bit more printing and I discover that when I renamed “k,v” in the code, I forgot to change one instance of “k”. The code is:
function XSet:stats(controlTable)
local summed = {}
for record,s in self:elements() do
for operation,field in pairs(controlTable) do
if operation=="sum" then
local current = summed[field] or 0
local value = record:elementProject(field) or 0
summed[field] = current + value
end
end
end
local result = XSet()
for field,sum in pairs(summed) do
result:addAt(sum.."",field)
end
return XSet():addAt(result,NULL)
end
Test is green. Let’s add in bonus info and try again.
_:before(function()
local p1 = XSet():addAt("p1","name"):addAt("MI","state"):addAt("1000","pay"):addAt("50","bonus"):addAt("50","age")
local p2 = XSet():addAt("p2","name"):addAt("OH","state"):addAt("100","pay"):addAt("50","bonus"):addAt("30","age")
local p3 = XSet():addAt("p3","name"):addAt("MI","state"):addAt("2000","pay"):addAt("50","bonus"):addAt("60","age")
local p4 = XSet():addAt("p4","name"):addAt("OH","state"):addAt("200","pay"):addAt("50","bonus"):addAt("40","age")
People = Tuple():add(p1):add(p2):add(p3):add(p4)
end)
And in the test:
_:test("Sum pay", function()
local control = {sum="pay", sum="bonus"}
local result = People:stats(control)
_:expect(result:card(),"result").is(1)
local foundRec = result:elementProject(NULL)
_:expect(foundRec, "foundRec NULL").isnt(NULL)
_:expect(foundRec:card(),"record").is(1)
local pay = foundRec:elementProject("pay")
_:expect(pay,"pay").is("3300")
local bonus = foundRec:elementProject("bonus")
_:expect(bonus,"bonus").is("200")
end)
The results will surprise you. At least they surprised me:
2: Sum pay pay -- Actual: nil, Expected: 3300
The bonus came out OK but now the … ohhh … I can’t have two values for “sum” in the control table. It’ll have to be a table of tables, like this:
_:test("Summing", function()
local control = {sum={"pay","bonus"}}
local result = People:stats(control)
_:expect(result:card(),"result").is(1)
local foundRec = result:elementProject(NULL)
_:expect(foundRec, "foundRec NULL").isnt(NULL)
_:expect(foundRec:card(),"record").is(2)
local pay = foundRec:elementProject("pay")
_:expect(pay,"pay").is("3300")
local bonus = foundRec:elementProject("bonus")
_:expect(bonus,"bonus").is("200")
end)
This will fail in some awful way, so:
function XSet:stats(controlTable)
local summed = {}
for record,s in self:elements() do
for operation,fields in pairs(controlTable) do
if operation=="sum" then
for i,field in ipairs(fields) do
local current = summed[field] or 0
local value = record:elementProject(field) or 0
summed[field] = current + value
end
end
end
end
local result = XSet()
for field,sum in pairs(summed) do
result:addAt(sum.."",field)
end
return XSet():addAt(result,NULL)
end
This passes the test. Time for an interim commit: stats method works for ungrouped sums.
That was heavy. I need a short break, so let’s do a retro.
Retro
This method is not well factored:
function XSet:stats(controlTable)
local summed = {}
for record,s in self:elements() do
for operation,fields in pairs(controlTable) do
if operation=="sum" then
for i,field in ipairs(fields) do
local current = summed[field] or 0
local value = record:elementProject(field) or 0
summed[field] = current + value
end
end
end
end
local result = XSet()
for field,sum in pairs(summed) do
result:addAt(sum.."",field)
end
return XSet():addAt(result,NULL)
end
We’ll deal with that shortly. The process itself was choppy. You didn’t see or feel the worst of it, but you did see that I resorted to printing, which is a sign that my tests aren’t strong enough to tell me what’s really going on. In part that’s because XSets are rather opaque, so it’s not easy to see what they contain.
This gives me an idea. What if we could readily require that a set has certain scopes? That would let me say something like this:
_:test("Summing", function()
local control = {sum={"pay","bonus"}}
local result = People:stats(control)
_:expect(result:card(),"result").is(1)
local foundRec = result:elementProject(NULL)
local scopes = foundRec:scopeArray()
_:expect(scopes).has("pay")
_:expect(scopes).has("bonus")
local pay = foundRec:elementProject("pay")
_:expect(pay,"pay").is("3300")
local bonus = foundRec:elementProject("bonus")
_:expect(bonus,"bonus").is("200")
end)
I just posited a new method scopeArray
. Doesn’t exist. Write it.
function XSet:scopeArray()
return self:reduce({}, function(r,e,s)
table.insert(r,s)
return r
end)
end
Right. Tests green. Commit: implemented XSet:scopeArray convenience method to return scopes of a set as an array.
I want another convenience method, at
which is the same as elementProject except shorter.
function XSet:at(scope)
return self:elementProject(scope)
end
That will let me write this:
_:test("Summing", function()
local control = {sum={"pay","bonus"}}
local result = People:stats(control)
_:expect(result:card(),"result").is(1)
local foundRec = result:elementProject(NULL)
_:expect(foundRec:at("pay")).is("3300")
_:expect(foundRec:at("bonus")).is("200")
end)
Much better. I rather like that.
Note that I’m treating all numeric fields as text. I don’t think that’s necessary, but I’m imagining that all the likely input sets I’ll see will have text records. Lua is pretty decent about converting back and forth. If you say 10 + “10”, you’ll get 20, and if you say 20..””, you’ll get “20”.
Unfortunately, CodeaUnit isn’t quite clever enough to conclude that “200” equals 200, and I’m not at all sure that it should, since, well, they aren’t equal.
I’m tempted to add some XSet-oriented capabilities to CodeaUnit. If we were doing a long-term project on XSets, I’d argue that our “Making App”, which includes CodeaUnit, should have that capability. Right this moment, I’m not prepared to do that, but I do expect we’ll do some more convenience methods.
Back To It
Anyway, this has been a bit raggedy-feeling to me. I’m ready to go back in, however. Let’s clean up our method a bit:
No, let’s not. Let’s let it stay a bit messy, and think about grouping instead.
If we were to do the whole thing, we’d allow grouping by, say, county within state and other such nested groups. What would we want our output to look like?
I think we have a basic decision to make. If we allow grouping by X within Y within Z, we could either add in another layer of nesting for each grouping level, or we can create some kind of concatenated scope, like
{
{ 3300pay }<MI,LIVINGSTON>,
{ 2345pay }<OH,BUTLER>
}
We should think a bit about what kind of output the user (me) will actually want in a case like this. Or, then again, maybe we should think about the best way to create the information we want, and then provide transformations for getting the data into another shape for convenience.
Of course, we want to do some of each.
From a coding viewpoint, indexing by a set or table is complicated by the fact that sets and tables are only equal by identity. So if we were to say
_:test("Table Equality", function()
_:expect({1,2,3}).isnt({1,2,3})
end)
That test will path. There are two tables there and they are deemed to be unequal by Lua.
Strings, however, compare more favorably.
_:test("Table and String Equality", function()
_:expect({1,2,3}).isnt({1,2,3})
_:expect("ab".."c").is("a".."bc")
end)
This test passes. Lua actually interns all strings, so that the two strings here are in fact identical, not just equal. This costs a bit more on string creation but makes comparison super fast. Probably also saves a bit of memory, depending what you’re up to.
If we’re gonna do this, I need a more robust set of input data.
I want another convenience method:
local p1 = XSet():record(name="p1",state="MI",county="Livingston",pay=1000,bonus=50,age=50}
This method can only create sets with a single element under any given scope, but that’s fine for what I need today.
Let’s write a test for that.
_:test("record method", function()
local p1 = XSet():record{name="p1",state="MI",county="Livingston",pay=1000,bonus=50,age=50}
_:expect(p1:at("name")).is("p1")
_:expect(p1:at("state")).is("MI")
_:expect(p1:at("pay")).is("1000")
end)
That should do the job. Note that I’m allowing numeric input and expecting strings out. (I’ll need to deal with reals, but not today.)
The test demands record
of course:
function XSet:record(tab)
result = XSet()
for scope,element in pairs(tab) do
result:addAt(tostring(element),tostring(scope))
end
return result
end
Test runs. Commit: XSet convenience method record
creates simple records.
Enhance the test with county:
_:before(function()
local p1 = XSet():record{name="p1",state="MI",county="Livingston",pay=1000,bonus=50,age=50}
local p2 = XSet():record{name="p2",state="OH",county="Butler",pay="100", bonus="50", age="30"}
local p3 = XSet():record{name="p3",state="MI",county="Wayne",pay="2000",bonus="50",age="60"}
local p4 = XSet():record{name="p4",state="OH",county="Adams",pay="200",bonus="50",age="40"}
People = Tuple():add(p1):add(p2):add(p3):add(p4)
end)
Tests still green.
Let’s try to write a grouping test. (I am getting tired, it’s well over three straight hours of this.)
_:test("Grouping", function()
local control = {sum={"pay"}, group={"state"} }
local result = People:stats(control)
_:expect(result:card(),"result").is(2)
end)
I expect this to fail getting only 1 record. Yes. Now let’s see if we can figure out how to group these babies.
It occurs to me that this syntax isn’t addressing how to group by more than one thing. I think I’m already in over my head here, so I’m going to ignore that.
I want to have, for each group value, a statistics table into which we do the summing.
I’m just going to try to code this up, much as I did before, but I’m going to start over and work by intention. I’ll save the old method for reference.
I’m not doing a very good job of working by intention. So far I’ve got this:
function XSet:stats(controlTable)
local result = {}
local groupControl = controlTable.groups or {"total"}
local sumControl = controlTable.sum
for record,s in self:elements() do
local gs = self:getGroupString(record)
local statsForGroup = result[gs] or {}
for i,field in ipairs(sumControl) do
local current = statsForGroup[field] or 0
local value = record:elementProject(field) or 0
statsForGroup[field] = current + value
end
results[gs] = statsForGroup
end
end
I think that at the end of that mess, I should have a table results with an entry in it for the sums, broken out by whatever getGroupString replied. Need to write that.
function XSet:getGroupString(record, groupControl)
key = ""
for i,scope in ipairs(groupControl) do
local value = self:elementProject(scope)
key = key..value or ""
end
return key
end
Let me divert and write a test for that.
_:test("group string", function()
local control = { group={"state","county"} }
local set = XSet():record{state="MI", county="Livingston"}
local key = set:getGroupString(set,control)
_:expect(key).is("MI.Livingston")
end)
Run it. Not quite. I home in on this test and code:
_:test("group string", function()
local control = { group={"state","county"} }
local set = XSet():record{state="MI", county="Livingston"}
local key = set:getGroupString(set,control)
_:expect(key).is(".MI.Livingston")
end)
function XSet:getGroupString(record, groupControl)
key = ""
local groupBy = groupControl.group or {}
for i,scope in ipairs(groupBy) do
local value = self:elementProject(scope)
key = key.."."..(value or "MISSING"..scope)
end
return key
end
I was not clear in my head whether I am passing in the whole table or just the group part.
Let’s do just the group part.
_:ignore("Grouping", function()
local control = {sum={"pay"}, group={"state"} }
local result = People:stats(control.group)
_:expect(result:card(),"result").is(2)
end)
Change the code:
Arrgh, changed the wrong (ignored) test.
_:test("group string", function()
local control = { group={"state","county"} }
local set = XSet():record{state="MI", county="Livingston"}
local key = set:getGroupString(set,control.group)
_:expect(key).is(".MI.Livingston")
end)
function XSet:getGroupString(record, groupControl)
key = ""
local groupBy = groupControl or {}
for i,scope in ipairs(groupBy) do
local value = self:elementProject(scope)
key = key.."."..(value or "MISSING"..scope)
end
return key
end
Test runs.
I went up on my lines and added in the stuff about MISSING. Improve the test:
_:test("group string", function()
local control = { group={"state","county"} }
local set = XSet():record{state="MI", county="Livingston"}
local key = set:getGroupString(set,control.group)
_:expect(key).is(".MI.Livingston")
key = set:getGroupString(set,{"state","what"})
_:expect(key).is(".MI.MISSINGwhat")
end)
Passes. I don’t like that argument list.
_:test("group string", function()
local control = { group={"state","county"} }
local set = XSet():record{state="MI", county="Livingston"}
local key = set:getGroupString(control.group)
_:expect(key).is(".MI.Livingston")
key = set:getGroupString({"state","what"})
_:expect(key).is(".MI.MISSINGwhat")
end)
function XSet:getGroupString(groupControl)
key = ""
local groupBy = groupControl or {}
for i,scope in ipairs(groupBy) do
local value = self:elementProject(scope)
key = key.."."..(value or "MISSING"..scope)
end
return key
end
Test is green. Two are ignored.
I can make this one run by calling the old function:
_:ignore("Summing", function()
local control = {sum={"pay","bonus"}}
local result = People:stats(control)
_:expect(result:card(),"result").is(1)
local foundRec = result:elementProject(NULL)
_:expect(foundRec:at("pay")).is("3300")
_:expect(foundRec:at("bonus")).is("200")
end)
function XSet:stats(controlTable)
if not controlTable.groups then
return self:statsNoGroup(controlTable)
end
local result = {}
local groupControl = controlTable.groups
local sumControl = controlTable.sum
for record,s in self:elements() do
local gs = record:getGroupString(groupControl)
local statsForGroup = result[gs] or {}
for i,field in ipairs(sumControl) do
local current = statsForGroup[field] or 0
local value = record:elementProject(field) or 0
statsForGroup[field] = current + value
end
results[gs] = statsForGroup
end
end
I just renamed the old function. So that bit works.
I’m fried. I’m going to commit with one test ignored. Commit: stats in progress, one ignored test.
Let’s sum up.
Summary
I’d feel better about this had I decided to call it a spike1. The raggedy feeling would then have been natural, and I might have done the experiment in a more protected form, and perhaps even in a separate little project.
That said, it didn’t go badly, but the code is definitely nothing to be proud of. It’s kind of the code of an inexperienced programmer. What do I mean by that?
Last night (Tuesday night), at the Friday Coding Zoom Ensemble meeting, we we talking about the dimensions of good code, and someone, I think Hill, said that inexperienced coders seem to focus a lot on just getting the right answer, and not so much on even understanding what they have written, nor on the clarity and readability of the code. And that’s what I’ve done here. I just wrote down a bunch of lines and bashed them until they submitted.
In a spike, I feel OK about that. Here, not so much, especially since I’ve worked long enough to need a real break away from the code, so that I’ve left it a bit in the air. There’s nothing broken … but there’s something not done.
But some interesting ideas have come out. I think the trick of appending the field values to get the grouping key will work for us, subject to limitations on the scopes being strings. I’ll try to remember to think about that. While that limit is reasonable, it’s not great. Ideally all our set operations would be agnostic to the types of the scopes and elements encountered.
Anyway, I at least know to quit when I’m too tired to be smart. I’ll see you next time!
-
A “spike” is the term Ward Cunningham gives to an experiment where you just kind of bash through something, to learn how to do it (and how not). ↩