XST Summary

FAFO on GitHub

I’m not sure I’m done with this XST experiment, but I’ve come to some conclusions that I want to set down. Having done that, maybe I’ll know what to do next.

Welcome back to Extended Set Theory Land. There’s surely plenty to do with the existing code, but between a couple of diversions and events relating to XST, we’re at a convenient pause, which we can use to assess what we know and decide what to do next. That could include what to do with XST, or a “pivot” to something else entirely. Or, who knows, improvements to Space Invaders.

Over the course of the XST series, a month and more in the past now, I was engaged in a series of emails with Dave Childs, the inventor of Extended Set Theory. I asked for and received a lot of articles from the distant past where all this was going on. I asked for and got a bit of information about the implementations of software he was involved with that used XST, somehow. I tried to interest him in what I was doing, but he has no real interest in implementation. It seems that his interest starts and stops with modeling storage, data, and information using set theory. And that’s just fine. As has happened so often in my intervals of working with him, the cycle ended with both of us frustrated and desiring not to continue. No ill will on my end, and I hope it’s the same with Dave.

Lookin’ inta history back, when all this was going on, I think Dave’s primary income came from contracts to work with people interested in applying XST, typically to database software implementation. There may have been some actual commercial releases of software that used the theory but I am nearly certain that none are extant today. In my arrangements with him, Dave was absolutely hands-off regarding implementation, and would really only deal with the math. And yet, he had been involved with software, and in one case he had written a program, in FORTRAN, that had been used to solve some famous relational benchmarks in times that were profoundly faster than the existing database software of the day. That possibility was so tantalizing … but enough history.

Based on what I’ve learned this time around from the papers and emails, Dave’s program was a set of functions, written in FORTRAN, that thrashed bytes around in memory at a very low level, therefore very rapidly. The functions all had clear and straightforward set-theoretic definitions, basically just byte-movers and comparisons and such. And he had a very fast sort. All on what amounted to arrays of arrays of bytes.

The benchmark problems were solved—assuming that I have all this right—by reading the data into flat arrays of arrays of bytes, and then coding up, almost in what you could think of as relational assembly language, a series of byte-thrashing operations that solved the specific problem you were set out to solve. You’d arrange the data with the key fields left-justified, do a one-pass merge operation to subset it, and so on. All the intelligence was in the programmer who created the series of operations that (presumably) solved the problem.

So I am assuming, without certainty, that the only thing Dave’s own implementation did that related to rapid processing of information was a hand-conversion of a complex relational problem into a series of very simple operations that read, write, and moved bytes around.

As programmers, we know that we could do something like that. We could implement fast byte-moving operations of various kinds. We could provide a language that produced code using those operations. We could do at least some forms of optimization. Given advances today in refactoring tools, we might even imagine some very powerful transformations from some SQL kind of language down to the byte-movers.

But I ask myself why we here would do that? What we have in my silly little python program is already blindingly fast, not due to my cleverness but due to the fact that my inexpensive laptop has 16 gig of fast memory and processes instructions at about 13 per zillisecond.

It might be educational to work out a syntax tree and some transformations, just to see how to do it. As for producing amazing performance … I don’t really see much joy there.

No decision today … but maybe look at the existing code, see what we can learn from it. And maybe … maybe … we could do a version that was centered on flat files and byte movers, just to see?

I am assuming that no one but me is interested in this. So maybe I should find something that at least one other person cares about?

We’ll find out. See you next time!