FAFO on GitHub

Study of some ancient scrolls leaves me thoughtful, and a bit disappointed.

Hello, friends!

I found a copy of the old manual for the “set processor” that was written at the University of Michigan back in the 1960s. I had probably studied it back in the 70’s or 80’s but that was long ago. I have developed some understanding that is either new, or long forgotten. I do not know everything about it, but I think this much is nearly accurate. What follows is incomplete, and I think I’d give it about a 75% correctness score. Vaguely similar to the truth …

## Tentative Understanding

The program processed only fixed-length sets of bytes. With a given set there could be an associated symbol table, describing fields in the set, with options including, if I recall, 1, 2, 4, ad 8 byte integers, and character fields of any fixed length. I think, but am not yet certain, that at least some set operations used the symbol table to decide what bytes to consider. For understanding, I don’t think it matters.

All the set operations, which I have been assured all had solid set-theoretic definitions1, came down to little more than string comparisons.

Consider two sets with the same fields and layout. Their records are fixed-length arrays of characters. Sort them naively by those characters. Still the same set, obviously.

Suppose we want to intersect those two sets. Merge the two arrays, off the top of each set. If the two records are equal, step one of the sets to the next record. When the two are not equal, emit the smaller to the intersection, and step that set. If one side is consumed, slap all the remaining records from the other side into the intersection. Voila, done, and by the way it is still sorted.

In an email, Dave Childs mentioned that they might binary search a set, if it was ordered.

There were operators of various kinds, including ones you’d expect, like union and symmetric difference, and at least one kind of restrict operation. There seemed to be a few operators that combined more than one elementary operation, or that did a sort of negation - perhaps a restrict that keeps the records that wold have been rejected by a standard restrict.

Many of the operations seem to be focused on the left-hand leading edge of the record. The first N bytes would be compared and then the remainder copied or rejected or whatever. At this writing, I don’t have more information than that.

I am not sure how it processed sets larger than memory, but it surely must have, as the computer in question was not that large. There are some interesting operations that may relate to handling of large sets, like one called SUBSET that selects every n-th record from a set. I’m not sure what that would be good for, but I can imagine using it to partition a set into still-sorted subsets that are well =suited to a simple merge later. That’s about the only use I could think of for the operation.

If the SUBSET operation was intelligent, it might be just a “view” of the input set, streaming over all the records but producing only the ones in question. So we might restrict some huge set by first sub-setting it, then restricting each subset, then merging the restricted subsets. Just speculating of course.

## Tentative Conclusions

The system implements a sort of “algebra” of operations all of which come down to operations on fixed-length character records. The operations seem to be focused only at the left hand leading edge of the records. As such, comparisons would be fast. (Even if offset, they would be fast but the impression I formed was that sets were usually either unordered or sorted based on pure byte values. (I could be wrong: sorts might have considered numeric fields but would it even matter?))

The user of the system would write scripts using these set operations, loading sets from files, and going through a series of steps that they had devised to get the answers they needed. They would restrict and intersect and union and RMIX (whatever that was), and at the end, apply some FORTRAN formats and the answer would come out.

I do not know how programmers using this little language thought. Did they work out set theory for each step, or did they instead envision what these simple operations did to the tables they were imagining? I suspect the latter, but for all I know they were all set theory wizards. Either way, the programs were much like programs written in a relational algebra: it’s just that it was a set theory algebra instead.

I can see how the operations could be very fast, because they were all just comparing and moving bytes. If there was any interpretation at all through the symbol tables, it would be almost trivially simple.

I do not know whether the program included any “views” or not. Let’s imagine an example.

The restriction operator, if I understand it, restricts against the first domains (fields) in the set. If you wanted to restrict against some other domain, I think you had to first RMIX it to the front of the set. We can imagine that the RMIX operation might actually recopy the set … but it would generally be far better to somehow map the subscripts used in the comparison. From our work with scope sets, we know that we can combine two consecutive re-scoping operations into a single one.

So we can imagine that the restriction code could be written to “think” it was comparing bytes 0 through 15, but through a simple lookup, actually be comparing bytes 20 through 27 and 55 through 62. The upshot would be that it would be possible to “mix” some domains to the front of the set, restrict them, and mix them back … with no actual record-thrashing going on.

Why move things to the front at all? Well, I think that the set theory, or the description of the operation, is simpler that way. We can think of the set as an ordered n-tuple of fields, and simply say how many will be matched upon. So, if I were going to do a restrict on field 5 and field 9, I would do two steps, one to move fields 5 and 9 to the front and then a standard restrict of two fields.

I cannot get into the heads of the people who built the thing, nor, I think, would I want to. But my guesses, whether they apply to history or not, seem interesting enough that we might try to apply them in our current experiment.

## Disappointed?

I “always” knew that the program, named STDS and various other names over time, only had simple n-tuples as its base model. So I am not surprised at what I’ve [re]learned here. I feel a bit like one does when a magician reveals how a trick is done: sometimes it is so simple that we are disappointed. (Other times, of course, the trick require such great skill that we are amazed.)

In programming, I think that simplicity is a very important goal, so perhaps I should be delighted to see how simple this might all be. But right now, I have a bit of “is that all there is” going on as well.

## Next Time?

I’ll consider further what this study suggests for our experiment here. Have I built too much, or too little? Perhaps both. No matter: we’re here to learn.

See you next time!

1. I think it is clear that absent asynchronous processing, any computer program has a solid set-theoretic definition. However, we generally do not know what that definition is, nor is it simple enough to make us confident. A set operation like “intersection” has a very clear set theoretic definition and if the computer implements that faithfully, we can probably reason more readily about what it does.