XST 18: Index Design
Short morning today. I have a tentative plan for indexes. I’ll scribble some sets. Might code.
I have an appointment at 0945 a half hour from here if the trees aren’t down, and it’s 0745 now, so this will be short. It may, however, be a bit deep. I have a starting idea for indexes.
Faithful readers will recall that the new XExpression set type expects to be given an XSet (not a raw data structure) containing something like this:
{ {your data here}DATA }
A set, containing a set with scope “DATA”, and that contained set is your data to be processed with element
and hasAt
.
The basic starting idea is that that outer set can contain other items as well. Let’s create indexes, from the inside toward the outside.
Suppose we have a set of “records”, and each element of the set has a unique scope. (Currently, I’ve been setting them all to NULL, but suppose now that instead, they’re all unique.)
So that set is like
{ r1x, r2y, … }
and all the scopes are unique. They might be record numbers, but all that matters to us is that they are unique.
Suppose that the records include a field, oh, state
. Consider a set like this:
{ {all the AL records’ scopes}AL,
{all the AK records’ scopes}AK,
…
{all the WA records’ scopes}WA
}
Now it is “easy” to see that if we were looking for the records from NE, we could “just” fetch the record with scope NE and then fetch all the records with the scopes found inside. And, if fetching a record by its scope is fast, then we just implemented indexing.
We could quibble over whether there is a more compact form for the data, and so on, but that’s all a matter of creating a few more specialized structures like CSVSet, only probably a lot simpler. But even now, to fetch a record by a unique scope is only one or two hash accesses, plus an iteration of a table of size 1.
So for indexing, I am envisioning another set inside the XExpression set, looking like this:
{ { s1AL…}state… } }
The set contains all the index sets we have, with each set’s scope being the field name indexed. We’d store the whole index description set at scope, oh, INDEX or something.
I propose to do this over the next few days, including some kind of code that will, in some cases, actually discover and use the indexes. I emphasize “some kind of code”.
Where I think I’d like to wind up is with some kind of general scheme for expressing ideas like
“If you need to restrict this set for state=something, instead do (this complicated but fast expression)”
In essence, I’d like to come up with a little language for expressing rewrite rules for set expressions. Frankly, I consider that to be somewhat daunting, and in my long history with XST I have never yet personally accomplished it, although I once had a brilliant coworker who managed it. The knowledge did not rub off but I hope to get there anyway.
An intermediate step, of course, is to analyze a request such as a restrict inside the restrict
code and branch off to another algorithm. I’m anticipating that after doing that a time or two, it might start becoming clearer how to do it with some kind of expression tree or something. Or maybe someone will help me.
But for now …
New Operations
There are some set operations, these defined in an article by Michael Champion, that I think will come in handy.
First, element projection. This operator, written with a lower-case rho for notation aficionados, is defined in our terms something like this:
EP(A,s) = y iff A:hasAt(y,s)
The function requires that y be unique, exactly one element under s, and returns nil otherwise. I could write that out in set theory but I hope you get the idea without.
We can use this function, for example, to find out whether there are any indexes:
local indexes = EP(self.data,"INDEX")
if indexes then
doSomethingClever()
else
doSomethingOrdinary()
end
I don’t suppose I’ll call it EP, no one could remember it, so it’s probably going to go down as elementProject
…
Champion also defines a couple of useful scope operations. Scope restriction is, roughly:
Given a set A and a set of scopes S, scopeRestrict(A,S)
is the set of all elements of A that have a scope s that’s an element of s.
Too abstract? Me too. Given a set S of record numbers and a set A stored by record number, scopeRestrict
returns all the records with the record numbers in S and no others.
Right. Index retrieval.
Champion also offers “Set Scope Restrict”, which is a scope restrict applied one level in. A bit of thinking will tell you that if
S = { “last”, “first” }
and P is a set of people records with lots of fields, then
A:setScopeRestrict(S)
is the set of records just including the first and last names of all the people in P. It’s relational “project”, for those of you into relational algebra, or “SELECT FIRST,LAST FROM PEOPLE” for you SQuirreLs.
So those are quite nice, and I plan to implement those operations and any more that are needed to build an INDEX set (and then use it).
That will be for another day. For today, I must go out in the wind and keep an appointment.
See you next time!