FAFO on GitHub

We try and succeed in implementing an XFlatFile that refers only to a subset of the file. This is a big deal!

Hello, friends! Welcome back!

This article follows directly on the heels of the preceding one, all done in one session. I broke the session into two articles as a small act of mercy.

Huh — not sure how to do this

I’d like to iterate the XSet. I know I could write another iterator that would do the job. But I’m thinking that I just want to somehow inject an alternate generator into the situation somehow. Can the __iter__ function be jiggered to take a parameter?

class XFlatFile(XImplementation):
    def __init__(self, file_path, fields):
        self.file_path = file_path
        self.fields = fields
        field_def = self.fields[-1]
        self.record_length = field_def[-1]

    def __iter__(self):
        return XFlatFileIterator(self)

Since we’re creating the iterator, we can pass it anything we wish. Let’s change things so that we pass in the generator function lots. That turns out to be easy:

class XFlatFile(XImplementation):
    def __iter__(self):
        def lots():
            n = 1
            while True:
                yield n
                n += 1
        return XFlatFileIterator(self, lots)

class XFlatFileIterator:
    def __init__(self, flat_file, generator):
        self.file = flat_file
        self.scope_gen = generator()

That passes all the tests. Let’s commit, even though I feel this isn’t quite the thing. Commit: XFlatFile passes generator to XFlatFileIterator.

I feel that I’m moving a bit too fast, and I do not have my morning chai yet. Let’s have a quick break.

Refresh - plan a next step or a few

What do we want? Well, I think we want a (kind of) XFlatFile whose iterator’s generator is a set of integers in some form or other, probably a standard scope_set kind of thing.

Let’s try that. We’re clean and green, so we can roll this back if we need to, but it feels good to me.

Let’s write a test that allows us to pass an additional parameter to an XFlatFile, signifying the set of integers, if we provide it. We’ll make it optional.

Before I do that: I’ve been working with XFlatFile and XFlatFileIterator in the test file. Early days, that’s a good thing to do: it keeps experimental classes in the test, and it can be more convenient. The classes are large enough now, and solid enough, that I think we want to move them to the src folder.

F6, select what to move, type a name, done. Commit: move XFlatFile and iterator to src I think I got the commit message wrong. Deal with it.

I’m not really sure that it’s broken out enough, there’s still XFlat, XFlatFile, and XFlatFileIterator in one file. Not that they don’t kind of belong together, but it makes for a long file. We’ll see. We can change things: it’s what we do.

Now what is it we want to do? We want to pass a set of integers into an XFlat file, optionally. I think. We’re clean and green again so this can be rolled back if need be. But we usually just move forward anyway.

I forgot to write a test for this:

class XFlatFile(XImplementation):
    def __init__(self, file_path, fields, scope_set=None):
        self.file_path = file_path
        self.fields = fields
        field_def = self.fields[-1]
        self.record_length = field_def[-1]

I added the scope_set parameter. Let’s write a test:

    def test_uses_scope_set(self):
        path = '~/Desktop/job_db'
        fields = XFlat.fields(('last', 12, 'first', 12, 'job', 12, 'pay', 8))
        scopes = XSet.from_tuples(((1, 1), (2,2)))
        ff = XFlatFile(path, fields, scopes)
        ff_set = XSet(ff)
        count = 0
        for _e, _s in ff_set:
            count += 1
        assert count == 2

I think that’s what we intend: if we pass in a set of scopes when we create an XFlatFile set, we intend that it will only produce those records. The test fails, getting 1000 instead of 2.

Let’s bash on XFlatFile now, to do something with the set of integers.

    def __init__(self, file_path, fields, scope_set=None):
        self.file_path = file_path
        self.fields = fields
        field_def = self.fields[-1]
        self.record_length = field_def[-1]
        self.scope_set = scope_set

    def __iter__(self):
        def lots():
            n = 1
            while True:
                yield n
                n += 1
        if self.scope_set:
            ss_iter = iter(self.scope_set)
            return XFlatFileIterator(self, ss_iter)
        else:
            return XFlatFileIterator(self, lots)

I had high hopes for that, but the test fails.

class XFlatFileIterator:
    def __init__(self, flat_file, generator):
        self.file = flat_file
>       self.scope_gen = generator()
E       TypeError: 'set_iterator' object is not callable

I think this is the difference between a generator and an iterator. I freely grant that I am not completely adept with the differences, so we’re deeper in the bag of tricks than is safe. That’s why we have tests. We can afford to learn.

We can either wrap our iterator in a generator function, making it callable, or we can pas i the other generator already called. Lets try the latter.

class XFlatFileIterator:
    def __init__(self, flat_file, generator):
        self.file = flat_file
        self.scope_gen = generator

class XFlatFile(XImplementation):
    def __init__(self, file_path, fields, scope_set=None):
        self.file_path = file_path
        self.fields = fields
        field_def = self.fields[-1]
        self.record_length = field_def[-1]
        self.scope_set = scope_set

    def __iter__(self):
        def lots():
            n = 1
            while True:
                yield n
                n += 1
        if self.scope_set:
            ss_iter = iter(self.scope_set)
            return XFlatFileIterator(self, ss_iter)
        else:
            return XFlatFileIterator(self, lots())

All the other tests continue to run and our fresh test fails, but possibly fails better:

Expected :2
Actual   :0

We were getting 1000 before, so we were off by 998. We’re only off by 2 now. This must be better.

I’m going to patch something into the __iter__ method above, to see if the iterator I get actually iterates.

    def __iter__(self):
        def lots():
            n = 1
            while True:
                yield n
                n += 1
        if self.scope_set:
            ss_iter = iter(self.scope_set)
            print("next", next(ss_iter))
            print("next", next(ss_iter))
            try:
                print("next", next(ss_iter))
            except StopIteration:
                print("done")
            ss_iter = iter(self.scope_set)
            return XFlatFileIterator(self, ss_iter)
        else:
            return XFlatFileIterator(self, lots())

This should print two records and then done, and it does:

next (1, 1)
next (2, 2)
done

So we know the iterator is good. It must be that we are not using it correctly. What happens to it?

class XFlatFileIterator:
    def __init__(self, flat_file, generator):
        self.file = flat_file
        self.scope_gen = generator

    def __iter__(self):
        return self

    def __next__(self):
        scope = next(self.scope_gen)
        element_tuple = (rec := self.file.element_at(scope), scope)
        if rec is None:
            raise StopIteration
        else:
            return element_tuple

Wow this article is too long. But we’re getting close. I’ll deal with the length later. This might work.

Let’s see what happens in our next. I’ll add in some prints. The print tells me right away what my mistake is. I am getting a tuple back from the next and I only want the scope. Fix that, but ow my generator fails because it isn’t generating a tuple. Fix that too. We are green. Here is all the relevant code:

class XFlatFile(XImplementation):
    def __init__(self, file_path, fields, scope_set=None):
        self.file_path = file_path
        self.fields = fields
        field_def = self.fields[-1]
        self.record_length = field_def[-1]
        self.scope_set = scope_set

    def __iter__(self):
        def lots():
            n = 1
            while True:
                yield n, n
                n += 1
        if self.scope_set:
            return XFlatFileIterator(self, iter(self.scope_set))
        else:
            return XFlatFileIterator(self, lots())

class XFlatFileIterator:
    def __init__(self, flat_file, generator):
        self.file = flat_file
        self.scope_gen = generator

    def __iter__(self):
        return self

    def __next__(self):
        _element, scope = next(self.scope_gen)
        element_tuple = (rec := self.file.element_at(scope), scope)
        if rec is None:
            raise StopIteration
        else:
            return element_tuple

We are green. We may want more robust tests, but we read two records and there’s every reason to suppose that they are the correct two, because we can see the code. Commit: XFlatFile can be created with optional scope_set, in which case it will return only the indicated records.

Reflection

With just a few small missteps, notably where I am not quite clear on the handling of generators vs iterators, This went swimmingly. Does that mean that my limited planning move quickly to coding take small steps guided by tests approach is a good one?

Well, it works quite often for me, like roughly always. But I have decades of experience programming and over two decades of trying to work with very loose planning. So it’s not inconceivable that I am getting fairly good at it. We cannot conclude that anything like this is a good idea for you.

I can tell you that I hang out with a lot of people who work this way and who believe in it. I can also tell you that they are very experienced, which really helps. I can also tell you that they are confident, and I am confident, that most developers would prosper more by working in small steps with just enough forethought and with great care about testing and about code quality.

So am I selling this idea? Not really: at least no one seems to be paying me for it. I am showing you what I do, telling you what I think as best I can, and showing you what happens.

It’s up to you to decide what you should do.

Summary

This is a big deal!

What we have now, in the XFlatFile implementation, is a view of a flat file, a view that can return any subset of the file, under the control of a scope set of integers. So far, we have just used it to return the first two records of our flat file.

But if we gave it the set { 3737, 193193 }, it would return those records.

This means that we can cache any sub-collection of the set that we want to, and reproduce those records as needed.

This very small set of changes opens the door to some very significant opportunities for optimization by caching small sets. We’ll explore some of those.

And … there are important questions before us.

Most significant, perhaps, is that we have plugged this notion directly into XFlatFile. That means that only sets implemented on flat files can have the ability to do this kind of scope selection. That suggests to me that this isn’t quite the structure that we want.

There is also the fact that the operation re_scope has the ability to pull elements at random out of a set. And we did not use re_scope here. This, too, suggests that we don’t have quite the structure that we want.

What is this “structure that we want”? Well, I don’t know. I’m in the same misty forest that you are. But it kind of seems like instead of the set of integers being inside the flat file, being used, they should be outside, telling the flat file what to read. I’m not even sure what I mean by that.

I used to think of one way of implementing XST as a sort of series of operations such that each operation could be asked for a record, and when it was tugged on to get a record, it would possibly tug on operations that it held on to, processing enough of those records to produce an output record, and then stopping.

We don’t really have that, quite, although since our sets are iterable, we can sometimes next them. But it seems to me that most of our operations … hey, this may be important … our operations are not sets. They produce sets, so that when you do an intersect, the operation will do the whole job before it returns.

Maybe … this may be important … maybe we want our functions to be sets. Maybe … I wonder … this is what Childs was talking about in one of the obscure papers I have.

What we have today is already very powerful when it comes to possible optimizations by caching results. But there may be even more power lurking here.

I’m having great fun. I wish you were as well, but I fear that I’ve lost most of my readers. I hope not, but I freely grant that set theory is not quite as exciting to most people as it might be.

I’ll see you next time, I hope!