FAFO 15: Wondering
There’s a thing I want to do with this XSet stuff, and I don’t quite know how to do it. I need to better understand iterators and generators.
Hello, friends!
Long ago, there was a product called Composite ‘77, C77 for short, so maybe you can guess how long ago it was. It was an interactive product, a sort of relational algebra system. You could type things like:
RETRIEVE NAME, SALARY FROM EMPLOYEES FOR SALARY GT 10000 AND JOB EQ 'SERF' INTO EXPENSIVESERFS
Then C77 would do that thing. It was quite powerful and useful, and would be useful even today if you had need of a simple way of talking to your data. I’ve been thinking about how one might implement such a thing using my XSets. And there are things that I don’t see. Some of them, I can wait for, but some are holding me up right now. I don’t need to see the whole path to implementing something, but I need to see what the path looks like, and I need to see some of the steps I might take that seem to be in the general direction we need to head.
I’m supposing that there is some kind of input language or visual display, and that it will generate some kind of “command tree”, a tree of operations and operands, feeding upward to a result. Maybe something like this:
INCLUDE NAME, SALARY
LIMIT SALARY GT 10000
RESTRICT
EMPLOYEES
{ SERF^JOB }
The exact syntax and order don’t matter at the moment. The issue I’m thinking about is how to build that tree of operations. I can think of at least two ways in Python.
One way might be to have a command
object with op code and operands, build a tree of those, and then, I guess, execute it. You’d probably have to unwind it from the bottom up, first doing the restrict, then the limit, then the include, whatever. Each command would run to completion, creating a temporary result, keep going until done. Since the temporary results might be large, this might be less efficient than we would like.
Another way might use generators. A Python generator generates results one at a time. It does so using “yield” to return a value. This permits the user of the generator to see all the values, but since they are produced one at a time, no large temporary results need to be created.
I do not understand generators well enough to see how to use them in a situation like the little example above. Let’s try to learn something about them. We will, of course, use tests. Here’s my first:
class TestGenerators:
def test_doubler(self):
def doubler(seq):
for v in seq:
yield 2*v
result = []
for d in doubler([1,2,3]):
result.append(d)
assert result == [2,4,6]
You can tell that I’ve been studying. I’m here because the reading has helped, but I need more understanding of possibilities than the web pages provide. Anyway, above, doubler
is a generator function, I guess just because it has a yield in it. And, as you can see, it yields twice whatever value it finds in its input sequence.
Let’s try a different one.
def test_twice(self):
def twice(seq):
for v in seq:
yield v
yield v
result = []
for d in twice((1,3,5)):
result.append(d)
assert result == [1, 1, 3, 3, 5, 5]
Here, our generator, twice
, yields each value two times. I am sure we can nest these, but let’s test that. I’ll move the functions outside the tests so that I can write this next one:
def test_nested(self):
result = []
for d in twice(doubler((1, 2, 3))):
result.append(d)
assert result == [2, 2, 4, 4, 6, 6]
So far so good. We can, of course, use a comprehension on these things:
def test_nested_comprehension(self):
result = [d for d in twice(doubler((1, 2, 3)))]
assert result == [2, 2, 4, 4, 6, 6]
Still doing fine here, still wondering what’s going on. What is this thing?
def test_what_is_generator(self):
gen = twice((3, 2, 1))
assert gen.__class__.__name__ == "generator"
I found that out, of course, by asserting that it was “hello” and getting the message. So the function there is a generator. What would a regular function have said? Well, we actually called our generator. If we called a regular function that returns a value, we’d get the value back. So there is magic happening in the compiler, I guess, when it sees a yield in a function definition, it compiles something that, when called, returns a generator as its result. And, I think, we can probably call that thing directly. Let’s try it.
def test_use_saved_generator(self):
gen = doubler((3, 2, 1))
assert next(gen) == 6
assert next(gen) == 4
assert next(gen) == 2
with pytest.raises(StopIteration):
next(gen)
After a few attempts, I find that we can save the generator and then call next
on it. However, when it is finished, it will raise StopIteration
, so we’d have to be ready for that.
Let’s reflect. What have we learned?
Reflection
It’s easy enough to write a generator function. I think we can almost see how we could do a generator that would, say, read records from an XSet and return only the ones that had salary > 10000. We’ll do another test shortly that approaches that. No, let’s do it now.
def test_greater_than(self):
def gt(seq, value):
for v in seq:
if v > value:
yield v
big_ones = [big for big in gt((8, 9, 10, 11, 12), 9)]
assert big_ones == [10, 11, 12]
Right. We could make the condition in the if
as tricky as we need to. So that’s nice.
We have seen that we can nest generators, causing the results from one to feed into the next. I think we can probably do that with saved ones.
def test_nested_saved(self):
doubles = doubler((10, 20, 30))
dups = twice(doubles)
dup_doubles = [d for d in dups]
assert dup_doubles == [20, 20, 40, 40, 60, 60]
Yes … but … we do have to build that nest bottom up, because the bottom one needs its parameters. And I don’t quite see how to build a tree of these generators. I do see that we can probably create a nested nested nested generator for our top level thing, and maybe that’s all we need. But I was thinking more of a tree of objects that we would explicitly traverse to get things done.
Maybe that’s just not necessary. Maybe we can build an expression (here comes LISP again1) and use it to create a suitable nested generator set and then, assuming all the various generators are well tested, the nested calls just unwind ever so nicely.
I freely grant, right here, that I still don’t see the path and goal clearly enough to be sure of next steps. That just tells me that I need to do some more experiments. Here’s a mumbled thought on an experiment we might do.
Let’s build another set operation, maybe one that just includes certain fields from all the records of a set. I think it might want to be called scope_select because it will select by scope. Or maybe it’s called project (pro-ject, not pra-ject, because that’s its formal name.) Then we’ll create a data set of employees, bosses and serfs, and we’ll set up a nested generator set that first restricts employees on job==serf and then projects those records just fetching first name.
Clearly we can write that out longhand, but once that works we can work on some programmatic way of building up such a thing.
It’s a plan. Maybe I’ll work on that later today, maybe tomorrow for now:
Summary
I’ve “generated” a bunch of nice generator tests and have learned a bit about generators, including
- A generator can return different values from its input
- A generator can return more values than are in its input
- A generator can return fewer values that are in its input
- Generators can call other generators
- A generator is a kind of object with a mysterious class name
- You can save that object and iterate it later or even pass it to another generator
- A call to
next(gen)
will return the next value from a generator object - The generator will raise
StopIteration
when exhausted
That’s quite a lot, really. I still don’t feel as adept at juggling generators as I would some kind of Command object, but I have the feeling that, in Python, the generator may well be the better approach for what I vaguely want to do.
We’ll find out. See you next time!
P.S. Sudden idea. Our select
creates a generator and immediately makes a set from it:
def select(self, cond) -> Self:
return XSet((candidate_atom for candidate_atom in self.contents if cond(candidate_atom)))
The set is probably created immediately, because:
def __init__(self, a_list):
self.contents = frozenset(a_list)
I expect that frozenset
needs to iterate the list in order to remove duplicates. That is not a certainty, by the way, you can at least imagine that it is lazy or very clever or both. Never mind … because we could have a method that select uses that returns the generator … and then use that generator as part of building up an expression.
Vague … but I might be onto something … we’ll find out.
-
Back in 2022, I was trying Extended Set Theory in Lua, and came to a sort of LISP-like structure. And now here it comes again. I hope this loop isn’t recursive … ↩