XST 21: Some musing, and a bit of coding.
I don’t want to get stuck in a never-ending series of new setops: there won’t be much learning there. Where’s the beef?
I work on these various projects for fun, and to discover and share things that strike me as being generally useful, not just specific to whatever program we’re working on. Those objectives, and my other personality flaws, create a danger that I’ll become bored with what’s going on, and wander off. In my past this did not serve me well. In my present, it’s rather useful, because a change of mental scenes is refreshing.
I don’t think XST is close to mined out yet, but I do think we need to be sure to stay focused where the discovery and learning are best. I do like to swim where the krill and plentiful and tasty.
Indexing
We’re on our way to indexing. In fact, aside from a bit of packaging, I think we’re nearly there. But indexing isn’t quite as interesting on the iPad as it is in a real computer. And it’s even less interesting in a real computer of today, compared to the computers of yesteryear. In the olden days, you’ll never believe me when I tell you this, file access was slow. It could take ages to read information from hard drives back then.
It has taken about 15ms to read a random chunk of data from a hard drive for over 20 years. This means that the number of reads you can do per second is limited to less than 100. But I haven’t had a hard drive in my computer for years, and the solid state drives can do more like three thousand reads per second.
Of course indexing is still valuable, especially as data volumes increase, and we’re here to figure out how to do things with set theory, not to build a DBMS. But unless I want to start simulating a hard drive using some inter-machine protocol, we’re not going to see orders of magnitude improvement from indexing. Well, maybe one order of magnitude. We’ll see.
Hybrid Structure: OO and Sets
XST, according to Childs, offers the ability to do optimal things with ease and clarity, owing to the power and mathematical purity of Extended Set Theory. Bridging the gap between ability and actually doing has been a problem, for me, for teams I’ve worked with, and for other individuals and organizations who have done work with XST.
We have an interesting situation here with our little program. We have some powerful set operations, but we also have some programmed data structures. Our “generic” structure hashes scope to a table of values under that scope. That means that, given a scope value, we’re two hash probes away from the set elements under that scope. Our Tuple set is an array indexed by integer scope, which means we’re one subscripting operation away from any desired element. And our CSV sets manage comma-separated fields and records with fast string matching operations.
Now, historically, my focus would have been on representing those structures more directly in set theory. Now at the time I was doing it, I was not expert in OO, and we were programming in languages like C. But even then I think we might have had (even?) more success had I relaxed my demand that everything be done with simple set operations. But I’m not here to revise the past, or even to analyze it in much detail, especially since my memory of the past isn’t sufficiently detailed to let us compare the code of the 80s and 90s with today’s.
Still, this hybrid structure seems to be letting me reach understanding that I didn’t reach in the past. So I’m not going to push myself to get setops down to the bit level. This is particularly wise since Lua won’t let me do that at all well anyway. Never try to do what you can’t do, that’s a good motto. Although you never know what you can do until you try. And also “Do or do not. There is no try.”
I don’t know. Historical cryptic advice is like that. Anyway I like the hybrid we’re building.
Expressions
This remains to be addressed, but it seems to me to be an interesting possible path for us. Right now, if I want to optimize some operation, I can define a faster set operation on a specialized set, and in the code for any set op, I can check the situation and do something special. But what if the set expression to be used to solve some problem were represented by a data structure? Could we manipulate that structure, “refactor” it, to get an improved expression? It certainly seems possible. It also seems difficult. I don’t recall ever doing anything like that in the past. Which is as good as never having done it, when it comes to knowing what to do.
I have an example, at least a “thought experiment”.
Last night I finally remembered what set operation returns the set of scopes of an input set. It is called “scope set”. The scope set of {ax,b71} is {x, 71}.
To get an index kind of set, containing all the scopes of the set of people records from Michigan, we could do:
People:restrict({MIstate}):scopeSet()
Naively, that looks like one pass over all the people, plus another pass over the result. The cost is probably about 1.05 times the cost of the original pass. Probably no big deal, unless the intermediate set spilled back to the disc.
But imagine that there was another set operation restrictToScope
that returned, not the records, but the scopes. That operation would be faster than restrict
and it would produce a smaller set, a collection of integers rather than a collection of records. If we were slinging bytes around, it would be a lot smaller.
So it would be really powerful if our system could somehow translate restrict(X):scopeSet()
into restrictToScope(X)
.
So I am holding a small intention to see whether there’s some reasonable structure for expressions that we can create, and submit to some kind of a “planner” that might transform the expression to a better one, automagically.
A man has to have a dream. But just now, let’s do some code.
Restrict To Scope
Having thought of “restrict to scope”, I want to explore it. Actually, I want to implement it.
Here’s restrict:
function XSet:restrict(B)
-- return all elements (a) of self such that
-- there exists a record (b) in B such that
-- b:subset(a)
return self:select(function(a,s)
return B:exists(function(b,_ignored)
return b:isSubset(a)
end)
end)
end
I’m going to change the definition of CSVSet along the way here. Presently it gives each line a scope of NULL:
function CSVSet:elements()
-- return an iterator for contents
local pattern = ".-\n"
return coroutine.wrap(function()
for line in self.data:gmatch(pattern) do
local converted = self:createLineSet(line)
-- local converted = self:convertToXSet(line)
coroutine.yield(converted,NULL)
end
end)
end
I want to have elements include the record number as scope. I could just code it, but I can test it.
_:test("CSVSet elements have scope", function()
local S = XSet:on(CSVSet(CSVnames, CSVminidata))
local j = S:elementProject(2)
_:expect(j,"j not found").isnt(nil)
_:expect(j:elementProject("first_name")).is("Josephine")
end)
I think this will do the job. I expect an error on the j not found.
35: CSVSet elements have scope j not found -- Actual: nil, Expected: nil
Now:
function CSVSet:elements()
-- return an iterator for contents
local pattern = ".-\n"
return coroutine.wrap(function()
local recordNumber = 0
for line in self.data:gmatch(pattern) do
recordNumber = recordNumber + 1
local converted = self:createLineSet(line)
-- local converted = self:convertToXSet(line)
coroutine.yield(converted,recordNumber)
end
end)
end
We used to yield(converted,NULL)
. I expect the test to run.
It does. Commit: CSVSet returns record and recordNumber as scope.
Now a test for restrictToScope
:
_:test("restrictToScope", function()
local S = XSet:on(CSVSet(CSVnames,CSVdata))
local mi = XSet():addAt("MI","state")
local restrictor = XSet():addAt(mi,NULL)
local miFolksScopes = S:restricttoScope(restrictor)
_:expect(miFolksScopes:card()).is(69)
end)
I’ll strengthen this but it’s more than enough to get us going.
36: restrictToScope -- Tests:487: attempt to call a nil value (method 'restricttoScope')
As expected. Hm a typo also.
36: restrictToScope -- Tests:487: attempt to call a nil value (method 'restrictToScope')
Better. Implement:
function XSet:restrictToScope(B)
local scopes = XSet()
local count = 1
for a,s in self:elements() do
if B:exists(function(b,_ignored) return b:isSubset(a) end) then
scopes:addAt(s,count)
count = count + 1
end
end
return scopes
end
I decided to make the scopes a tuple, though not of tuple type. Maybe it should actually be a Tuple? Why not, let me change that. But the test failed:
36: restrictToScope -- Actual: 14, Expected: 69
That’s because there are only 14 Michigan records in my sample data. The 69 was for 5 states.
function XSet:restrictToScope(B)
local scopes = Tuple()
for a,s in self:elements() do
if B:exists(function(b,_ignored) return b:isSubset(a) end) then
scopes:add(s)
end
end
return scopes
end
Saved me messing about with that count. I’m wondering how I can do this method with reduce
but I didn’t want to think about that in the middle of thinking about what I wanted.
Let’s beef up the test a bit.
_:test("restrictToScope", function()
local S = XSet:on(CSVSet(CSVnames,CSVdata))
local mi = XSet():addAt("MI","state")
local restrictor = XSet():addAt(mi,NULL)
local miFolksScopes = S:restrictToScope(restrictor)
_:expect(miFolksScopes:card()).is(14)
local second = miFolksScopes:elementProject(2)
local secondRecord = S:elementProject(second)
local firstName = secondRecord:elementProject("first_name")
_:expect(firstName).is("Yoko")
end)
I’m not sure what the number of the second record is, but I know her name is “Yuki”. Let’s add a check for the record number and fix the name.
_:test("restrictToScope", function()
local S = XSet:on(CSVSet(CSVnames,CSVdata))
local mi = XSet():addAt("MI","state")
local restrictor = XSet():addAt(mi,NULL)
local miFolksScopes = S:restrictToScope(restrictor)
_:expect(miFolksScopes:card()).is(14)
local second = miFolksScopes:elementProject(2)
_:expect(second).is(19)
local secondRecord = S:elementProject(second)
local firstName = secondRecord:elementProject("first_name")
_:expect(firstName).is("Yuki")
end)
The test runs. I am not sure that I like the name restrictToScope
, because it is too much like scopeRestrict
and setScopeRestrict
. That’s bound to confuse me. I don’t have a better idea just now. Commit: restrictToScope
returns scopes of records matching a restrict set.
We don’t have scopeSet
yet. Let’s implement that and then use it in the above test.
_:test("scopeSet", function()
local S = XSet():addAt("a","x"):addAt("b",71):addAt("c",71)
local ss = S:scopeSet()
_:expect(ss:card()).is(2)
_:expect(ss:hasAt("x","x")).is(true)
_:expect(ss:hasAt(71,71)).is(true)
end)
I am tentatively deciding to return scopescope rather than scopeNULL. I might change my mind but at this writing I don’t see why I would.
This should fail not knowing scopeSet.
37: scopeSet -- Tests:498: attempt to call a nil value (method 'scopeSet')
Right. Now:
function XSet:scopeSet()
return self:reduce(XSet(), function(r,e,s)
return r:addAt(s,s)
end)
end
I expect this to work. It does. Commit: scopeSet returns scope@scope for all scopes in input set.
OK, my plan was to use scopeSet in the other test. Like this:
_:test("restrictToScope", function()
local S = XSet:on(CSVSet(CSVnames,CSVdata))
local mi = XSet():addAt("MI","state")
local restrictor = XSet():addAt(mi,NULL)
local miFolksScopesHardWay = S:restrict(restrictor):scopeSet()
local miFolksScopes = S:restrictToScope(restrictor)
_:expect(miFolksScopes:card()).is(14)
_:expect(miFolksScopesHardWay:equals(miFolkScopes)).is(true)
local second = miFolksScopes:elementProject(2)
_:expect(second).is(19)
local secondRecord = S:elementProject(second)
local firstName = secondRecord:elementProject("first_name")
_:expect(firstName).is("Yuki")
end)
What I think I’m doing here is creating the scope set two different ways and checking them for equality. However, the equality check fails.
36: restrictToScope -- XSet:109: attempt to index a nil value (upvalue 'other')
Nasty, that. Ah, but it’s a typo:
_:test("restrictToScope", function()
local S = XSet:on(CSVSet(CSVnames,CSVdata))
local mi = XSet():addAt("MI","state")
local restrictor = XSet():addAt(mi,NULL)
local miFolksScopesHardWay = S:restrict(restrictor):scopeSet()
local miFolksScopes = S:restrictToScope(restrictor)
_:expect(miFolksScopes:card()).is(14)
_:expect(miFolksScopesHardWay:equals(miFolksScopes)).is(true)
local second = miFolksScopes:elementProject(2)
_:expect(second).is(19)
local secondRecord = S:elementProject(second)
local firstName = secondRecord:elementProject("first_name")
_:expect(firstName).is("Yuki")
end)
I missed an “s” in the set name in the call to equals. However, now I get this:
36: restrictToScope -- Actual: false, Expected: true
That’s the equality check. Hm. I wonder what happened. I’m going to resort to print.
Ah. The two sets aren’t remotely equal. The one done the easy way is a tuple <2,19,213,233,…> and the hard way one isn’t a tuple, it’s a set with { 2@2, 19@19, etc }
We’d better make up our mind here.
In fact, I’ll need to go back to the drawing board on this one. I’ll leave these implemented, and the test failing. I’ve changed the test’s message:
36: restrictToScope sets equal NEEDS DECISION -- Actual: false, Expected: true
Commit: restrict:scopeSet and restrictToScope don’t return same answer. Needs determining.
If this were a product, I’d set a release flag for these two new operations, or otherwise ensure that they wouldn’t cause problems. I would probably not open a branch and hang on it. But you need to find your own way on that: I have the advantage that no one is working here but me and the cat, and we all know how hard cats work.
Just thinking here in the middle of things, I think it’s clear that restrict can find more than one element at the same scope. The fact that our CSVSet is a tuple, however, ensures that there isn’t more than one element per scope. So arguably, restrict
on a tuple should return a tuple, and scopeSet on a tuple, well, that’s the integers isn’t it?
No, I’ve gotta do some scribbling on Paper or paper to see what to do here.
Be that as it may, we have a good start on a couple of nice operations. It’s just that our own definitions are not consistent. If the words were different, we wouldn’t be bothered. Oh, wait …
I wonder if defining the scopeSet(s) to be {sNULL} such that exists(e,s) in S would resolve the problem?
Anyway I gotta think on this one. There’s a time to code and a time to think. This is the latter. Maybe you already see the answer. I don’t. There’s no sense coding into the abyss.
I’ll sum up and get to thinking.
Summary
I do think that the identity between restrict:scopeSet
and restrictToScope
is an interesting example of a place where optimization will be possible. And both the implementations were easy … they just weren’t producing the same result. That was a bug in my mind, not in the code.
So I’ll debug my mind and we’ll get back to this. Maybe even tomorrow.
See you then!
Post Script
P.S. I changed the definition of “Scope Set” to Childs’
{sNULL:xs in S}
And adjusted the tests and code:
_:test("restrictToScope", function()
local S = XSet:on(CSVSet(CSVnames,CSVdata))
local mi = XSet():addAt("MI","state")
local restrictor = XSet():addAt(mi,NULL)
local miFolksScopesHardWay = S:restrict(restrictor):scopeSet()
local miFolksScopes = S:restrictToScope(restrictor)
_:expect(miFolksScopes:card(),"scopes").is(14)
_:expect(miFolksScopesHardWay:card(),"hard").is(14)
_:expect(miFolksScopesHardWay:equals(miFolksScopes),"sets equal").is(true)
end)
_:test("scopeSet", function()
local S = XSet():addAt("a","x"):addAt("b",71):addAt("c",71)
local ss = S:scopeSet()
_:expect(ss:card()).is(2)
_:expect(ss:hasAt("x",NULL)).is(true)
_:expect(ss:hasAt(71,NULL)).is(true)
end)
function XSet:restrictToScope(B)
local scopes = XSet()
for a,s in self:elements() do
if B:exists(function(b,_ignored) return b:isSubset(a) end) then
scopes:addAt(s,NULL)
end
end
return scopes
end
function XSet:scopeSet()
return self:reduce(XSet(), function(r,e,s)
return r:addAt(s,NULL)
end)
end
Tests all run. I’d prefer a better check, that let one fetch a record from the result, but we’re not there yet. The thing with set processing is that you have the aggregates, and the instances are in any order they feel like.
Anyway, all good now. See you next time!