What Now?
Somewhat enlightened about how early XSP systems worked, there are a few ways we might go. I need too reflect on possibilities and pick a direction.
History
The only published XST systems, other than my various attempts here, were really quite limited in what they did internally. In essence, they processed files of fixed-length records of bytes. These sets had integer scopes, and possibly included field type and length in the scope information. Complex operations were programmed using set operations that dealt with those flat sets, with the choice of operations initially left to human programmers.
Later, some relational-like systems were built on top of that structure, with the choice of operations made by the program. I do not know how “set-theoretic” those next-level-up systems were.
At that basic set level, the human programmer made decisions about how to partition the data and process it efficiently, using partitioning operations to produce manageable subsets, processing those subsets, and then merging the results back together to get complete answers. It’s not hard to imagine that once one got good at doing that, one could specify pretty well how a computer could make similar decisions to get things done.
I believe that the well of historical knowledge about how all this was done is pretty much tapped out. There are details that would surely be interesting but I think we have the drift of how it was done.
Today
The question today is what to do next. Options include:
- Move On to Another Topic
- I find this notion unsatisfactory. I’ve not built anything truly new yet, compared to my earlier experiments with XST, and abandoning it now feels disappointing.
- Add Byte-Thrashing to FAFO
- We could see about building in a fixed-record relational structure into the current program, and learn a bit about how to move into and out of that mode as our sets permit. Currently, FAFO builds and processes sets that are far more “curly” than mere relations, and how that capability would relate to the flat structures might be interesting.
-
One area where we might learn a bit would be the one between “real” extended sets, which the XSet object really does implement (I claim), and the XImplementation classes, which implement limited types of sets, tuples or flat records and files. Generally, as soon as we process any of those special implementations, we are reverting to the more general XFrozen implementation, the only one that allows fully weird extended sets. Adding a layer of byte-thrashing flat relational operations would strain that XSet:XImplementation interface and might drive out some interesting design issues.
- Start Fresh
- There comes a time on every project where we start thinking we should start over. Somehow what we have seems not quite right and we feel that if we could do it over, everything would be so much better.
-
I believe that starting over is never a good idea, unless for some reason one is forced into a new language or the like. I say “never”. What I mean is that if we could bet against rewrites, we could make a lot of money, because the odds are that they are not a good idea.
-
Making this idea even less interesting is that I don’t have a sense of how, if only I could start over, I’d make something better. And I would bet that, if I did have that sense, I could refactor to get there faster than I could by starting over.
- Use Set Theory for Speed
- We could explore some applications of set theory to make some operations better. But what would “better” actually mean? Faster? I am tempted to read a 100 megabyte file into memory on my laptop and sort it, just to see how long it would take. The amazing thing is that I’m sure it would work, and it would be interesting to see how long it took.
-
My timing test for XFlatFile selects all the last=jeffries records from a 1000-record file set. Each record access involves opening the file, seeking and randomly reading one record, and closing the file. The current code processes those one hundred thousand records in less than 1.5 seconds. How rapidly could we process a large set if we simply left the file open? How rapidly if we read in a thousand or ten thousand records at once?
-
Still, we could do some tricks. Re-scoping sets, for example. We don’t have any reason to think we need more speed, but it might be interesting.
- Human User Interface
- We could put some kind of GUI on this thing. It might prove some point, like “Yes, we can build a relational database out of this thing”. That would require me to learn and use some GUI-building thing, which does not appeal to me at all, because nothing I’ve seen seems like much fun.
- Information Interface
- If we were to present a system to human users, we would probably choose from existing well-known information structures, perhaps relational, or perhaps one or more of the “NoSQL” forms that are out there. It is “well-known “ that any such structure can “readily” be expressed in XST. (This is one of those proof left to the student things, but it’s surely true, since XST is built out of math and math can model anything if you’re smart enough.)
-
Somehow, relational appeals to me very little, and emulating MongoDB appeals even less.
- Expressions for Optimization
- We might come up with a “little language” for expressing set theoretic operation sequences, and then optimize them. I’ve thought before about doing this and so far have always stopped short. There aren’t very many interesting optimizations, and blah blah excuses and reasons.
-
Maybe we could work up to this. Do the “set theory for speed” thing a bit, then work out how to discover useful structures that could be optimized.
- Cascading Complexity
- I’m not sure what I mean by this: the phrase just popped into my head. I think I mean that instead of starting, as we do now, with the general implementation that can express anything, what if we built our sets so that they started out as simpler, more specialized structures, and so long as operations allowed, kept results in simple structures, and only reverted to our (presumably) less efficient general structures when we had to.
-
To speculate just a bit, suppose one built a set with some scopes that were strings and constituent elements that were strings, and we expressed that record as a packed string of data and an associated symbol table. Such a thing might even manage to deal with multiple scopes with the same name and different constituent elements. Or the same, just point them both at the same segment. Many operations on such a set could return results in the same form.
-
An interesting question is whether such a structure would be better than XFrozen. What about a similar structure, but with multiple “records” in it, all with the same scopes? What about giving the fields the same length (moving toward flat file format). Might we get to a point where large-volume operations could be done without exploding these more compact structures into XFrozen format? Might we get pretty close to byte-thrashing, with a gain in efficiency?
-
Now that, I think I could get into.
I’m glad we had this little talk. I think I see a direction for the near future.
Today’s Plan
Today’s plan, of course, is the worst plan we will ever have again, at least on this topic, because tomorrow we’ll know more. Of course we can still make really bad plans if we need to, but if we stick close to this project, we can only improve. Probably.
Today’s plan is roughly this:
-
We’ll probably devise some new ways of specifying sets for our processor to process, and/or focus more attention on the specialized structures of n-tuple and XFlatFile in particular.
-
We’ll probably experiment with some operations on those structures, providing some set operations that either return the same structure, or another optimized structure where possible.
-
We’ll try never to produce an XFrozen-style ultra-flexible set when a simpler one will do.
-
We’ll try to generate a more layered architecture, with more clear divisions between information, data, and storage. If we manage to do that well, the interfaces between the layers will be set-theoretic.
-
We’ll try to leverage Python’s facilities, including slicing and functional programming.
-
We’ll consider expressing more methods more set-theoretically, with more there-exists and for-every kinds of expressions. This may lead us to more use of hashing, dictionaries, or binary search. I note that one of these is probably much more readily expressed in set theory than the other two.)
-
We’ll try to remain balanced. While it is likely that Python’s dictionaries were developed without the aid of set theory, they do work just as well as if they had been, so that we should feel free to use such things. Another interesting candidate for use is Python’s
heapq
library. -
We’ll almost certainly make valuable use of the scope-related operation re-scope, which can reorder sets, select records, and for all I know, pat them, prick them, and mark them with a P and that stands for pool.
Where to Begin
I propose not to settle that today. Today we’ll reflect, and tomorrow browse code and decide. I am leaning toward advancing the flat file notion and working toward closure when operating with flat sets. That is, producing flat sets as results where possible.
The project’s name, FAFO, holds true. We’ll continue to Fool Around, and continue to Find Out.
If you haven’t given up yet, please don’t give up now. If you have given up, do not read this.
See you next time!