XST 17 - Speculation & Experiment
Last night’s Zoom Ensemble netted me a few ideas. I’ll start exploring those today. Hilarity or perhaps something good will ensue. I can’t wait to find out.
As always, the Friday Night Coding group Zoomed last night. Yes, it was Tuesday. That’s when we meet. Because the group wanted to look at some code, and because I was quite willing to show what I’ve been doing, we looked at the XST program and talked about its prospects.
I spoke of my desire to support specialized data structures for performance. It’s not that this program will ever process enough data to need much in the say of performance enhancement. The idea is that the central story of XST is that you can transparently restructure the data for performance, while providing the same appearance “at the top”.
It was GeePaw Hill who suggested that there could be a “factory” that would be told that we want to do such and so an operation on these here sets, and the factory would return a function that was a good one for doing that, or the default function if it had no better ideas.
That led to a brief discussion of maybe having expression trees that could be modified by the program, which is something on my radar from discussions with Bill Wake. Then I drew this lovely picture, which I show in all its late-night glory:
What that picture imagines is that some set we have, like a collection of people, might be represented by a more complex XSet, perhaps with one set labeled “data” and another optional set, maybe “index”, where the latter contained some useful indexing information, carefully defined by the scribbles shown there inside.
So the hasAt
and elements
and perhaps other operations on that set would look to see if there was specialized info like index
and if there was, use it, and otherwise just grab the set under data
and use it in the standard fashion.
It’s fairly easy, I think, to see that we could code up a kind of set, under XData, that could do that for some special case or cases. We don’t quite have that now, but we could make our CSVSet and CSVLineSet work that way with little difficulty. And perhaps we should.
But first we must discuss some set theory.
Membership Condition
When we define a set with that squiggly math stuff, we are usually expressing a “set membership condition”. Even our restrict function is an example of that:
function XSet:restrict(B)
-- return all elements (a) of self such that
-- there exists a record (b) in B such that
-- b:subset(a)
return self:select(function(a,s)
return B:exists(function(b,s)
return b:isSubset(a)
end)
end)
end
That code is just a Lua transcription of the formal definition of our restrict operator.
Now Dave Childs used to say something like “A set is independent of the form of its membership condition”. Or maybe it was the other way around. I was never able to figure out what he meant and any attempts to explain it to me did not reach past my ears into the inner chambers of my mind.
But I was thinking about membership condition last night as I went to sleep, and this morning as I woke up. As one does.
The XST code, at the top, processes data sets like records:
{ Jeffrieslast, Ronfirst, Pinckneycity }
And sets that are collections of those. In fact, with a few restrictions on the scopes, the current program can process pretty much any complex extended set. We don’t have many operators for taking those apart and assembling them but we almost have enough already. More are of course possible. (I think there’s one, at least, that we need to build, although I think we could code around its absence if we had to. But we don’t have to.)
So let’s imagine a set of person records, like the one above, stored with integer scopes:
P = { a1, b2, c3, …, z26, … }
where each of the a-z is a person record with last, first, city.
Now imagine that I gave you this set:
D = { Pdata, { … }info }
And suppose that I challenged you to write the code to tell us how many people there are.
I think you’d quickly come to ask me how to get an element out of a set given its scope, and when I told you it was called rho, you’d write
D:rho("data"):card()
Now if I told you some information about what was under info
, you could probably code up things that used that information to useful ends.
What does this have to do with Membership Condition?
Well, I think we could say something like this:
x ∈ P <=> x ∈ rhodata(D)
x is a person record if and only if x is an element of the set found under the data
scope of D.
If you’re like me, you can almost see that, and almost see how to make it work.
Let’s speculate a bit (more) and then code something simple.
Speculation (More)
I think that to do this with set theory, we’ll probably need some simple expression language for set theory, so that we can say things like “what you’re looking for can be obtained by doing rho(‘data’) to me”, and then execute them.
As an alternative–and I think as our first attempt–we could just provide a Lua function that did the work, but longer term we want the system to be (potentially) smart enough to restructure the data on the fly, and we might wind up wanting to parse, refactor, and execute expressions in set theory.
That’s a wild idea for a random old guy on an iPad, but it has always seemed to me to be the promise of XST, so I’ll keep aiming there until my brain runs out of juice or my energy flag, or a squirrel comes by or something.
Let’s do it.
A Very Small Step
We want to take a very small step toward this big idea. My plan is to create a set, put some items in it, and then embed it in another set, and provide that one as an XData instance, and then decode it.
To do this, I’ll create another subclass of XData. I think I’ll call it XExpression, though it won’t deserve that name for a while, if ever.
We need a test.
_:test("Expression Set", function()
local chet = XSet():addAt("Chet","first"):addAt("Hendrickson","last")
local ron = XSet():addAt("Jeffries","last"):addAt("Ron","first")
local guys = XSet():addAt(chet,NULL):addAt(ron,NULL)
local data = XSet():addAt(guys,"DATA")
local smart = XExpression(data)
local P = XSet:on(smart)
_:expect(guys:card()).is(2)
_:expect(P:card()).is(2)
end)
I think this is right. Kind of messy but I build the guys set, a standard set of people. Then I build a set with that contained as the DATA element. Then I give that to an XExpression instance to manage it. There is no XExpression:
3: Expression Set -- XData:32: attempt to call a nil value (global 'XExpression')
So far so good. Build it:
XExpression = class(XData)
This will fail somewhere, probably one of the missing method messages.
3: Expression Set -- XData:48: Must implement init!
function XExpression:init(anXSet)
self.xset = anXSet
end
3: Expression Set -- XData:56: Must implement elements!
Seems fair.
function XExpression:elements()
-- return iterator
local data = self:findDATA(self.xset)
return data:elements()
end
Could it be this easy? We’ll need to write findDATA of course:
3: Expression Set -- XData:71: attempt to call a nil value (method 'findDATA')
function XExpression:findDATA(aSet)
for e,s in aSet:elements() do
if s=="DATA" then return e end
end
return NULL
end
We should really be checking that there’s only one DATA element, otherwise the situation isn’t well-defined, but we’ll get there. I think this might work. If not, we’ll find out why.
But it does work. Let’s refine findDATA a bit:
function XExpression:findDATA(aSet)
local data = nil
for e,s in aSet:elements() do
if s=="DATA" then
if data == nil then
data = e
else
return nil
end
end
end
return data
end
That should return nil if there’s no DATA scope or if there’s more than one. (not NULL: this is an element selector, not a set selector. It might return a set, it might return an atom.)
Tests still run. Commit: XExpression:elements works in simplest correct case.
We need to beef up the test to check for elements, and also we should make sure that the right thing happens if there’s no DATA or two of them. Right now that is surely not the case.
_:test("Expression Set", function()
local chet = XSet():addAt("Chet","first"):addAt("Hendrickson","last")
local ron = XSet():addAt("Jeffries","last"):addAt("Ron","first")
local guys = XSet():addAt(chet,"1"):addAt(ron,"2")
local data = XSet():addAt(guys,"DATA")
local smart = XExpression(data)
local P = XSet:on(smart)
_:expect(guys:card()).is(2)
_:expect(P:card()).is(2)
_:expect(P:hasAt(chet,"1")).is(true)
_:expect(P:hasAt(ron,"2")).is(true)
end)
I gave the records “numeric” scopes so that I can check for them. This should fail with no hasAt.
3: Expression Set -- XData:62: Must implement hasAt!
And:
function XExpression:hasAt(e,s)
local data = self:findDATA(self.xset)
if data then return data:hasAt(e,s) else return false end
end
Test runs. This, too, is not robust enough. We really do want a set back. This isn’t rho, it’s a use of rho. Must think about that. For now:
function XExpression:findDataSet(aSet)
local data = NULL
for e,s in aSet:elements() do
if s=="DATA" then
if data == NULL and e:is_a(XSet) then
data = e
else
return NULL
end
end
end
return data
end
I think this is fairly bullet-proof against too few or too many DATA scopes now. Tedious to write the tests but here we go:
I went around in my head a few times, trying to decide between throwing an exception on no DATA or just returning NULL as it does now. I think the latter will be confusing when people get it wrong. We’ll require an exception, though I hate that.
_:test("Expression Set", function()
local chet = XSet():addAt("Chet","first"):addAt("Hendrickson","last")
local ron = XSet():addAt("Jeffries","last"):addAt("Ron","first")
local guys = XSet():addAt(chet,"1"):addAt(ron,"2")
local data = XSet():addAt(guys,"DATA")
local bad = XExpression(guys)
local notSmart = XSet:on(notSmart)
_:expect(function() notSmart:card() end).throws("no DATA")
local smart = XExpression(data)
local P = XSet:on(smart)
_:expect(guys:card()).is(2)
_:expect(P:card()).is(2)
_:expect(P:hasAt(chet,"1")).is(true)
_:expect(P:hasAt(ron,"2")).is(true)
end)
3: Expression Set -- Actual: function: 0x286afff60, Expected: no DATA
That’s what a failed throw looks like.
Test was wrong. It’s confusing building these nested things. This is the right test so far:
_:test("Expression Set", function()
local chet = XSet():addAt("Chet","first"):addAt("Hendrickson","last")
local ron = XSet():addAt("Jeffries","last"):addAt("Ron","first")
local guys = XSet():addAt(chet,"1"):addAt(ron,"2")
local data = XSet():addAt(guys,"DATA")
local bad = XExpression(guys)
local notSmart = XSet:on(bad)
_:expect(function() notSmart:card() end).throws("no DATA")
local smart = XExpression(data)
local P = XSet:on(smart)
_:expect(guys:card()).is(2)
_:expect(P:card()).is(2)
_:expect(P:hasAt(chet,"1")).is(true)
_:expect(P:hasAt(ron,"2")).is(true)
end)
And with this code, it runs green:
function XExpression:findDataSet(aSet)
local data = nil
for e,s in aSet:elements() do
if s=="DATA" then
if data == nil and e:is_a(XSet) then
data = e
else
error("more than one DATA")
end
end
end
return data or error("no DATA")
end
I haven’t tested the more than one but I wrote it anyway. Now the test … no, these are too confusing. Break them out.
_:test("Expression Set with no DATA throws", function()
local data = XSet():addAt("foo","bar")
local notSmart = XExpression(data)
local P = XSet:on(notSmart)
_:expect(function() P:card() end).throws("no DATA")
end)
_:test("Expression Set with too many DATA throws", function()
local data = XSet:addAt("x","DATA"):addAt("y","DATA")
local tooSmart = XEpression(data)
local P = XSet:on(tooSmart)
_:expect(function() P:card() end).throws("no DATA")
end)
The first one of these runs. I left the old message there to see the second one fail. But it does this:
5: Expression Set with too many DATA throws -- XSet:25: attempt to index a nil value (field 'data')
That’s in addAt
:
function XSet:addAt(element, scope)
self.data:addAt(element,scope)
return self
end
And it means that self.data
was nil. Oh. Missing parens:
_:test("Expression Set with too many DATA throws", function()
local data = XSet():addAt("x","DATA"):addAt("y","DATA")
local tooSmart = XEpression(data)
local P = XSet:on(tooSmart)
_:expect(function() P:card() end).throws("no DATA")
end)
Test should fail not liking the message now. Well, if I were to spell XExpression correctly …
_:test("Expression Set with too many DATA throws", function()
local data = XSet():addAt("x","DATA"):addAt("y","DATA")
local tooSmart = XExpression(data)
local P = XSet:on(tooSmart)
_:expect(function() P:card() end).throws("no DATA")
end)
Now then:
5: Expression Set with too many DATA throws -- Actual: function: 0x2874a8e40, Expected: no DATA
The message is:
_:expect(function() P:card() end).throws("more than one DATA")
Still fails. Must be a bug in findDataSet …
function XExpression:findDataSet(aSet)
local data = nil
for e,s in aSet:elements() do
if s=="DATA" then
if data == nil and e:is_a(XSet) then
data = e
else
error("more than one DATA")
end
end
end
return data or error("no DATA")
end
I sure expect that to work. Going to resort to print, I’m getting panicky.
Oh, wait. I need to give it sets or it won’t count them, it’ll count them as not being there. Recode this thing again.
function XExpression:findDataSet(aSet)
local data = nil
for e,s in aSet:elements() do
if s=="DATA" then
if data == nil then
data = e
else
error("more than one DATA")
end
end
end
return data and data:is_a(XSet) and data or error("no DATA")
end
This passes both tests. Is that end bit too clever?
function XExpression:findDataSet(aSet)
local data = nil
for e,s in aSet:elements() do
if s=="DATA" then
if data == nil then
data = e
else
error("more than one DATA")
end
end
end
if data and data:is_a(XSet) then
return data
else
error("no DATA")
end
end
Perhaps that’s more clear.
We could be checking the XExpression for proper shape on creation but I’m reluctant to do that. I’m not sure why. I feel like the expression might change dynamically, but even then we could check it. Anyway, this is solid enough.
Commit: XExpression supports elements and hasAt with error checking. No smarts beyond finding DATA so far.
I think that’ll suffice for the morning. Let’s look back and sum up.
Summaretropective
Is that a word? It is now.
What we have here is a foot in the door of sets that compute set expressions in order to produce their rightful records. This one is nearly trivial: it just looks into the set it’s given to see if there is a set named DATA and if there is, acts as if it were an XGeneric on that data, i.e enumerating and hasAtting the DATA set. Is “hasAtting” a word? Probably not but it reminds me of a word. Does it remind you of a word?
Be that as it may, I think this is a small but important win. By embedding more cleverness in the XExpression set, we can, I believe, make good things happen. Things like what, you ask? Well, suppose we had a set P of people and the city they live in. And suppose we partitioned that set by city, so that we had
Q = { P1Pinckney, P2Atlanta, P3Dexter, .. }
with the various Px sets consisting of the records from the city under that city’s scope.
Then we could iterate the set P by successively iterating the various Px sets. And we could still answer hasAt()
, perhap at greater cost, by checking all the little sets.
But we could also answer
P:restrict({Pinckneycity})
by returning rho(Q,"Pinckney")
in one fetch.
That could be a big win for an application that did lots of statistics broken out by city. And it could be a big win for us set theory people, just because we could make it work!
Of course, as programmers, we’re like “yabbut you just partitioned the data by city”. And yes. But we did it in a more formal way that promises to deliver easier transitions between structures like that. Will it deliver? Remains to be seen. But I find it promising.
Similarly, we might provide info that gave us an index on patient ID or drivers license or passport number, for instant fetching of someone’s record:
P:restrict({J123-44-123-66DriversLicense })
We’ll need operations and set shapes that we don’t yet have, including n-tuples, which will be directly accessible by an integer index.
But I think we’re on the way. I’m seeing further than I ever have before when I’ve done this. This pleases me.
I hope to see both of you next time. And if you have questions or ideas, tweet me up or email ronjeffries at acm dot org.