(Apr 25, 2024)
I'm not sure I'm done with this XST experiment, but I've come to some conclusions that I want to set down. Having done that, maybe I'll know what to do next.
(Mar 26, 2024)
Mistakes? Technical Debt? Better ideas? All the above?
(Mar 25, 2024)
OK with that out of my system, what shall we code today? It goes so nicely, until oops.
(Mar 25, 2024)
I'm not out of ideas, but I am out of directions, and my momentum is low. What shall I do?
(Mar 23, 2024)
The swirl that is my life leads me to think about duplicates and such. Very speculative musings, no conclusion. Sort of Captain's Log Stardate 77691.1.
(Mar 21, 2024)
The statistics code is much better now. But a little widget might be just the thing. Turns into a medium-sized widget, but I think it's an improvement.
(Mar 21, 2024)
The statistics code runs, but is written in 1960's style—yes, I DO know what 1960's code looks like—and needs improvement. An interesting series of events with a fine result.
(Mar 20, 2024)
Commonly, when we group data by some criteria, we desire summary information about the groups: counts, totals, averages, and so on. We take a couple of nice steps before my brain is fried.
(Mar 19, 2024)
Yesterday's experiment with `group_by` seemed very successful. What does it tell us about what we're doing and what we should be doing? Bit of a retrospective.
(Mar 18, 2024)
Let's try an experiment, along the lines of my sketch at the end of this morning's article. It goes very well. Very well indeed!
(Mar 18, 2024)
I'm here for another step on the way to the grouping operator. I have at least one new realization going in. More than one coming out. Some code progress as well.
(Mar 17, 2024)
Time to push a bit further on the grouping operations. I don't get far before realizing that I need to break away and try again later. Not great, but good.
(Mar 16, 2024)
A bit of research into the past of Extended Set Theory suggests an interesting possibility for the future, and provides the joy of working something out on my own. Includes some possibly amusing history about how primitive man created such documents.
(Mar 15, 2024)
I'm continuing directly on from the preceding article, to create my XGroup implementation. It may turn out that a break was needed. We'll see.
(Mar 15, 2024)
This morning I'm working toward a join operator, by way of a grouping operator. I have vague ideas, and need some help from my code.
(Mar 14, 2024)
Fancy as my new streaming select is, I think it's, well, not quite the thing. Easily replaced, but do I need to pivot? I surprise myself.
(Mar 13, 2024)
OK, new rule, we're going to read XFlatFile sets into memory. Should we refactor to this new result, or TDD a new class? Let's refactor. Works quite well!
(Mar 13, 2024)
It's 0415 and I am thinking about buffering. I create an XSet with a billion records in it. Python does not sneeze.
(Mar 12, 2024)
The thing about learning is that when we have sucked most of the learning juice out of some fruitful idea, there is often work to be done to get the thing finished. We have at least two of those hanging fruits right now. One of them has an interesting effect!
(Mar 12, 2024)
The Powers That Be have invited me to discuss the risks around this Python XST effort, on the assumption that it is anything other than play.
(Mar 11, 2024)
There's no doubt that we can create sets or set operations that stream their results, avoiding creating large temporary sets in memory. There are issues to think about. First experiment works!
(Mar 10, 2024)
In which, your intrepid author considers times now and times then, assessing whether and how to adjust what we do with XST given the new reality.
(Mar 9, 2024)
A check of our list of ideas leads to discoveries, to key decisions, and to reflection on the realities of software development. Quite a span, as perhaps it always should be. Interesting article reference included.
(Mar 7, 2024)
Ran across this. Rather nice. Then I took one more step. Off a cliff. Fortunately, I didn't look down.
(Mar 7, 2024)
We've been working on calculated fields. Let's see about getting them built into sets. Yucch! I touched a debugger.
(Mar 7, 2024)
Our work on calculations raises some longer-term concerns about large dataset and statistics. Are we in big trouble?
(Mar 6, 2024)
Let's fetch values from XST records for our Expressions. I expect this to go smoothly. It does. We discuss small steps. No, smaller than that.
(Mar 5, 2024)
I have a bit of time, with distractions. Let's see about error conditions in our expressions.
(Mar 5, 2024)
We need to work on assignments and field values. I was thinking values, and then we do assignments instead. That's where the path looked best.
(Mar 4, 2024)
We'll take some steps along the expression path. I'm slightly questioning part of what we have. The path is zig-zag but leads to a good place.
(Mar 4, 2024)
Spikes for the expression parsing have taught us enough. Let's see how we can best move from the spike code into decent production objects. There's one somewhat large issue: symbols. No, two: data types.
(Mar 3, 2024)
Back to the interpreter. Shall we try the lambda thing again, or something else? Lambda seems fraught. Partial function FTW.
(Mar 3, 2024)
We can parse a simple expression. Let's sketch the expression interpreter. Bear gives us a nip, but we have progress.
(Mar 1, 2024)
I think I'll break down and do expressions. I'm thinking a simple Dijkstra-style parser will do the job for us. Let's see if we can find a simple way to develop one.
(Mar 1, 2024)
Suppose we want a set where one element is the sum of two others. What about calculating a scope? I am interrupted and then spot a squirrel.
(Mar 1, 2024)
Somewhat enlightened about how early XSP systems worked, there are a few ways we might go. I need too reflect on possibilities and pick a direction.
(Feb 29, 2024)
Study of some ancient scrolls leaves me thoughtful, and a bit disappointed.
(Feb 27, 2024)
You probably do not want to read this. I'm writing down my thoughts, in the vain hope of figuring out what they are and what they should be. Dave Childs was kind enough to send me some links to articles and other materials about XST. I include them at the end of this article, for reasons. The materials gave me some things to think about.
(Feb 26, 2024)
I was whining earlier about the difficulty of writing tests that assert about elements inside a result set. A thought has come to me.
(Feb 26, 2024)
I want to work on some indexing operations. Along the way I rediscover XFlatFile's existing scope set feature. Just what we need. Or is it? Includes a judicious rollback.
(Feb 25, 2024)
Laurent challenged me to find a better, more elegant formulation for getting the correct re_scoping set from a provided renaming set. I thought I had it. Then I thought I hadn't. Then I thought I had.
(Feb 24, 2024)
We have a long-form cardinality method. Let's use the len function and require it as part of the implementations.
(Feb 24, 2024)
Let's look around and see what we might do. I'll even make a list. Jira is an abomination upon the land.
(Feb 23, 2024)
I have received a bug report! This is great news! Someone is paying attention!
(Feb 23, 2024)
Putting a specialized rename into our flat set showed us that the symbol table there is quite ad hoc. Let's make it more like a set. (Turns out: no.)
(Feb 22, 2024)
Since the Flat sets know their field names once and for all, we could use a better rename that doesn't copy the data. Simple enough, rather nice.
(Feb 22, 2024)
If there was a kind of set that was expressed as a function, we could possibly pipeline operations, reducing memory impact. Is that possible, and is it a good idea? So far, maybe not.
(Feb 21, 2024)
What if a function IS a set instead of returning a set? This might be significant.
(Feb 21, 2024)
We try and succeed in implementing an XFlatFile that refers only to a subset of the file. This is a big deal!
(Feb 21, 2024)
I'm working on the idea of an XFlatFile that only reads a defined subset of the file. Am I working without a net? Not really.
(Feb 20, 2024)
Plus wasn't the right operator for union. Let's update that and then a few other operators.
(Feb 20, 2024)
I had thought that I'd work on the flat data. My thoughts led me astray, to a glimmering of a possibly good idea. So I'll just code something cute to finish the morning's work.
(Feb 19, 2024)
Thinking about rename in the Flat implementation leads to discovery of an interesting defect, and some thinking about use of the powerful generality of set theory.
(Feb 19, 2024)
In an astounding flurry, we are going to build a rename operation.
(Feb 18, 2024)
Childs has defined two "re-scope" operations. Let's see if we can implement one of them, and if we're glad we did.
(Feb 18, 2024)
We'll start with a simple removal of the requirement for implementations to implement `__contains__`. After that, we'll see. (And that's not what happens.)
(Feb 17, 2024)
In your absence, I made a simple but significant change. I had an interesting idea. And an important observation. And more.
(Feb 16, 2024)
We make some progress on our flat files, but progress is slow and there are a lot of words here. Best skim or skip?
(Feb 15, 2024)
Let's see about getting set creation sorted. We need to be better able to create sets with any possible base implementation. Deep confusion ensues.
(Feb 15, 2024)
To process flat files, we want to avoid leaving a file open, and we don't want to open it a zillion times. Do we have to invent buffering? Perhaps not.
(Feb 14, 2024)
Our experimental tests one flat records look good. Let me report on my off-line work, and then let's see what's next.
(Feb 13, 2024)
Let's get started on a flat-file-focused form of set. We make a bit of progress.
(Feb 13, 2024)
I want to begin by thinking about storage: how do we best produce specific memory / file formats? The article turns out to be pure speculation, but perhaps useful speculation.
(Feb 12, 2024)
Having prepared better than we did yesterday morning, let's proceed with wrapping our `frozenset` with a class of our own.
(Feb 11, 2024)
Let's have another try at wrapping our `frozenset` in an object that will work for us. Things go much better ... so far.
(Feb 11, 2024)
I make some useful initial observations, and then, well, I crash and burn. And mix a metaphor.
(Feb 10, 2024)
I think that this morning, I'll take a small step toward having more than one implementation of a set's data.
(Feb 9, 2024)
If I'm going to get serious about this Extended Set Theory thing, a tiny bit of design thinking seems to be in order.
(Feb 9, 2024)
It is 0253 hours. I have a report and another report.
(Feb 8, 2024)
Lets do `project`, as in projection, as preparation for trying a generator approach to set expressions. This part should be easy.
(Feb 8, 2024)
There's a thing I want to do with this XSet stuff, and I don't quite know how to do it. I need to better understand iterators and generators.
(Feb 7, 2024)
In which, we report on a nifty little thing, and then, well, I don't know yet what I'll do. Some concluding remarks on what it is that I do here.