FAFO on GitHub

We make some progress on our flat files, but progress is slow and there are a lot of words here. Best skim or skip?

Hello, friends!

It’s still Friday AM, and I am ready to get back to it. Our mission, before an import mistake confused me for over an hour, was to provide a way to build XSet instances with alternate implementations, not just XFrozen. We have in mind our XFlat and XFlat file implementations, which are currently under development in tests.

I got in trouble, because I wanted to know whether an instance of a concrete implementation like XFrozen was an instance of XImplementation, or whether I needed to check it another way, because I wanted to protect the XSet constructor by checking to make sure an actual XImplementation was provided. Let’s continue that thread now, but I have a notion that needs checking.

    def test_isinstance(self):
        xf = XFrozen(frozenset())
        assert isinstance(xf, XFrozen)
        the_set = XSet.from_tuples([('joe', 'first'), ('smith', 'last')])
        assert isinstance(the_set.implementation, XFrozen)
        assert isinstance(the_set.implementation, XImplementation)

The test passes, as one would hope. Therefore in XSet.__init__, we can do this:

    def __init__(self, an_implementation):
        assert isinstance(an_implementation, XImplementation)
        self.implementation = an_implementation

However … something came to my mind this morning. Let’s write a test to check it.

    def test_list_is_implementation(self):
        rec = [('smith', 'last'), ('sinjin', 'first')]
        seq_set = XSet(rec)
        assert seq_set.includes('smith', 'last')
        assert seq_set.excludes('st. john', 'first')

With our check for XImplementation removed, that test passes. A list of tuples like that works as an XSet, at least to some degree. I’m not sure what to think about that. Anyway I’m not going to allow it. I recast the test and write another one.

    def test_list_is_implementation(self):
        # this is why we require an XImplementation
        # ramifications of this are not at all clear.
        rec = [('smith', 'last'), ('sinjin', 'first')]
        seq_set = XSet.from_tuples(rec)
        seq_set.implementation = rec  #  just jam it in there
        assert seq_set.includes('smith', 'last')
        assert seq_set.excludes('st. john', 'first')

    def test_list_not_allowed(self):
        rec = [('smith', 'last'), ('sinjin', 'first')]
        with pytest.raises(AssertionError):
            seq_set = XSet(rec)

Time to commit. What have we wrought here? Commit: XSet creation requires an XImplementation. Class methods enhanced and revised.

Another Weird Idea

As I was trying to convince myself not to get up yet, I was thinking about string slices. When we create the fields from a flat file, we slice the base record to pull out the fields. If we ignore the fact that we trim, I was wondering whether a string slice creates a string or is a view of the underlying string. Strings are immutable, so in principle a slice could be a view. I doubt that they are, because they would then hold on to the base string indefinitely and that could be bad.

The weird idea induces me to write this test:

    def test_slice(self):
        a = 'abcdefghi'
        d = a[3:6]
        assert d == "def"
        assert isinstance(d, str)
        s = slice(3, 6)
        d2 = a[s]
        assert d2 == 'def'

I did a bit of reading about slices, which led me to the slice object, which represents a slicing operation. That’s possibly interesting. We might put slice instances in our symbol table, which might save a bit of arithmetic in the code. We might find them useful in our flat file code as well.

Which we should get back to.

Continuing XFlatFile

What is the status of our flat file tests and code? We have a number of tests working up to the big finale, mostly testing how to do unpacking and other elementary bits of the solution. And we have created the job_db file of 1000 records, and we have this test:

    def test_x_flatfile(self):
        path = expanduser('~/Desktop/job_db')
        fields = XFlat.fields(('last', 12, 'first', 12, 'job', 12, 'pay', 8))
        ff = XFlatFile(path, fields)
        assert ff.record_length == 44
        ff_set = XSet.from_tuples(())
        ff_set.implementation = ff
        count = 0
        for record in ff_set:
            count += 1
        assert count == 1000

Ah yes … I recall that the iterator, which I think is about all we have in XFlatFile, doesn’t return a scope for the records. I think we’d like it to return the record number, treating the flat file as if it were an ordered n_tuple of strings. (So the record number will start with 1, not 0 as one might expect.)

Improve the test:

    def test_x_flatfile(self):
        path = expanduser('~/Desktop/job_db')
        fields = XFlat.fields(('last', 12, 'first', 12, 'job', 12, 'pay', 8))
        ff = XFlatFile(path, fields)
        assert ff.record_length == 44
        ff_set = XSet.from_tuples(())
        ff_set.implementation = ff
        count = 0
        record_number_sum = 0
        for record, record_number in ff_set:
            count += 1
            record_number_sum += record_number
        assert count == 1000
        assert record_number_sum == 500500

It’s failing because there is nothing to store into record_number:

>       for record, record_number in ff_set:
E       ValueError: too many values to unpack (expected 2)

Not the best-phrased message ever, but that’s what it means. Fix the iterator:

class XFlatFileIterator:
    def __init__(self, flat_file):
        self.file = flat_file
        self.index = 0

    def __iter__(self):
        return self

    def __next__(self):
        rec = self.file.get_record(self.index)
        if rec == '':
            raise StopIteration
        else:
            self.index += 1
            flat = XFlat(self.file.fields, rec)
            flat_set = XSet.from_tuples(())
            flat_set.implementation = flat
            return (flat_set, self.index)

The change was to the last line, to return the tuple with the record and the index, which happens, conveniently, to be incremented.

Test runs, because. as everyone knows, the sum of the integers from 1 to 1000 is 500500.

We don’t need to jam the implementation in there any more, we can do this:

    def __next__(self):
        rec = self.file.get_record(self.index)
        if rec == '':
            raise StopIteration
        else:
            self.index += 1
            flat_set = XSet(XFlat(self.file.fields, rec))
            return flat_set, self.index

We don’t need the parens to return two things. PyCharm suggested removing them.

We have a similar opportunity in the test:

    def test_x_flatfile(self):
        path = expanduser('~/Desktop/job_db')
        fields = XFlat.fields(('last', 12, 'first', 12, 'job', 12, 'pay', 8))
        ff = XFlatFile(path, fields)
        assert ff.record_length == 44
        ff_set = XSet(ff)
        count = 0
        record_number_sum = 0
        for record, record_number in ff_set:
            count += 1
            record_number_sum += record_number
        assert count == 1000
        assert record_number_sum == 500500

Green. Commit: creating flats with direct calls to XSet creation.

I think we shouldn’t expand the path outside the XFlatFile. Change the test.

    def test_x_flatfile(self):
        path = '~/Desktop/job_db'
        fields = XFlat.fields(('last', 12, 'first', 12, 'job', 12, 'pay', 8))
        ff = XFlatFile(path, fields)
        ...

class XFlatFile(XImplementation):
    def __init__(self, file_path, fields):
        self.file = expanduser(file_path)
        self.fields = fields
        field_def = self.fields[-1]
        self.record_length = field_def[-1]

Green. Commit: pass relative path to XFlatFile, it does expanduser.

I think I’m going to regret that decision, as we’ll see in a moment. But that’s OK, we’ll change it if we want to.

We have most of the required methods still to implement:

class XFlatFile(XImplementation):
    def __contains__(self, item):
        pass

    def __iter__(self):
        return XFlatFileIterator(self)

    def __hash__(self):
        pass

    def __repr__(self):
        pass

Let’s do __repr__ because it’s going to irritate me.

    def __repr__(self):
        return f'XflatFile({self.file})'

I should have written my test first, but here it is:

    def test_repr(self):
        path = '~/Desktop/job_db'
        fields = XFlat.fields(('last', 12, 'first', 12, 'job', 12, 'pay', 8))
        ff = XFlatFile(path, fields)
        assert repr(ff) == 'XFlatFile(~/Desktop/job_db)'

Test fails:

Expected :'XFlatFile(~/Desktop/job_db)'
Actual   :'XflatFile(/Users/ron/Desktop/job_db)'

Fix it.

class XFlatFile(XImplementation):
    def __init__(self, file_path, fields):
        self.file_path = file_path
        self.fields = fields
        field_def = self.fields[-1]
        self.record_length = field_def[-1]

    def get_record(self, index):
        seek_address = index*self.record_length
        with open(expanduser(self.file_path), "r") as f:
            f.seek(seek_address)
            rec = f.read(self.record_length)
        return rec

Curiously, the test still fails, and it says:

Expected :'XFlatFile(~/Desktop/job_db)'
Actual   :'XflatFile(~/Desktop/job_db)'

Ah. Lower case, in the repr, my bad.

    def __repr__(self):
        return f'XFlatFile({self.file_path})'

Not a big deal but I’m glad I wrote the test. Commit: repr implemented.

What about __hash__? I guess we could hash the file name and symbol table.

    def __hash__(self):
        return hash((self.file_path, self.fields))

Those are immutable, as is the whole set, so that’ll probably do. I am not going to test that. I may regret that, but I don’t expect to.

Note
This would have been a perfect place to stop.

What about contains?

XFlatFile is a collection of XFlat records. I’m not clear how to test this. I think I’m tiring.

I start with this to see what comes out:

    def test_ff_contains(self):
        path = '~/Desktop/job_db'
        fields = XFlat.fields(('last', 12, 'first', 12, 'job', 12, 'pay', 8))
        ff = XFlatFile(path, fields)
        ff_iter = iter(ff)
        print(next(ff_iter))
        assert False

The print is:

(XSet(XFlat('jeffries    ron         serf            9000')), 1)

Let’s disassemble that and then try to find it.

I change the test a bit:

    def test_ff_contains(self):
        path = '~/Desktop/job_db'
        fields = XFlat.fields(('last', 12, 'first', 12, 'job', 12, 'pay', 8))
        ff = XFlatFile(path, fields)
        flat_set = XSet(ff)
        flat_iter = iter(flat_set)
        flat_rec, index = next(flat_iter)
        assert flat_rec.includes('jeffries', 'last')
        assert flat_set.includes(flat_set, index)

It’s really hard keeping sets vs implementations clear in my mind. Ideally, we rarely think of the implementations. The idea is to do them, stuff them into XSets as appropriate and go on about our business, thinking in terms of XSet, not anything beneath them.

Anyway the test above fails on the last assert, saying

>       assert flat_set.includes(flat_set, index)
E       assert False
E        +  where False = <bound method XSet.includes of XSet(XFlatFile(~/Desktop/job_db))>(XSet(XFlatFile(~/Desktop/job_db)), 1)
E        +    where <bound method XSet.includes of XSet(XFlatFile(~/Desktop/job_db))> = XSet(XFlatFile(~/Desktop/job_db)).includes

There’s that horrid message. One of these days I need to figure out how to get something better.

I’m going in. Let’s extract that last value and then step in to see what the includes is doing.

No. I’m too tired. I have this coded but I’m not at sure it’s right:

    def test_ff_contains(self):
        path = '~/Desktop/job_db'
        fields = XFlat.fields(('last', 12, 'first', 12, 'job', 12, 'pay', 8))
        ff = XFlatFile(path, fields)
        flat_set = XSet(ff)
        flat_iter = iter(flat_set)
        flat_rec, index = next(flat_iter)
        assert flat_rec.includes('jeffries', 'last')
        result = flat_set.includes(flat_set, index)
        assert result

    def __contains__(self, item):
        de, ds = item
        for e,s in self:
            if de == e and ds == s:
                return True
        return False

And do we really want to iterate a thousand-record file looking for a value?

Anyway, this article has become terribly long and I need a break.

Summary

Final commit with test failure commented out: improvements. see test_ff_contains to pick up what’s next..

We made good progress, albeit with more words and time than I might have anticipated. We do have a rather complex thing going on here, with a flat file that returns a specialized flat record, and we have it properly iterating and such. I suspect we could run a select on it or a restrict and have it work. We should try that.

Contains, if we are to do it, really does need to iterate the set, so it’ll be a bit caveat emptor there, but if we’re implementing set theory we have to make all the things work all the time.

We’ll return to this when I’m fresh. For now, my eyes and brain are tired.

Note to Self
Do we really need our XImplementation subclasses to know all those methods? Should some of them be built into XSet and only call the implementation as needed? Remember that idea with the Try?
Big Lesson
Pay better attention to tiring. Take a break rather than work tired, confused, slammed, whipped, befuddled.

We did well this morning. We’ll do well again soon.

See you next time!