Flat

Let’s get started on a flat-file-focused form of set. We make a bit of progress.

Hello, friends!

Our objective as I envision it, is a new XImplementation class, XFlatRecord. This implementation will have as its data a string of characters, and a symbol table describing the fields (and ultimately types) of the record inside.

We’ll begin with some tests.

class TestXFlat:
    def test_slicing(self):
        record = 'Jeffries    Ronald      Boss      '
        first = record[12:24]
        assert first == 'Ronald      '

I first asserted ‘Donald’, so I know it could fail. What we have “learned” here is that when we slice up our flat records, our slices will want to be in terms of from, up-to, not from, length. As card-carrying pythonistas, we knew that, of course.

However, in terms of defining the fields of a set, I think we’d like to provide field name and length (and perhaps type in a future version) and have the table built by the program, because field name and length is something a human can understand and adding up numbers, not so much. Another test:

    def test_make_symbol_table(self):
        def make_symbols(names_and_lengths):
            it = iter(names_and_lengths)
            by_two = zip(it, it)
            start = 0
            field_definitions = []
            for symbol, length in by_two:
                field_definitions.append((symbol, start, start+length))
                start += length
            return field_definitions
        info = ("last", 12, "first", 10, "job", 8)
        symbols = make_symbols(info)
        s1, s2, s3 = symbols
        assert s1 == ("last", 0, 12)
        assert s2 == ("first", 12, 22)
        assert s3 == ("job", 22, 30)

The results are as one would like. The trick with the zip deserves a bit of explanation. I freely grant that I found it on the internet.

It creates an iterator for the input list, it. Then it zips that iterator with itself. Since it’s the same iterator, zip fetches #0 from the iterator, then #1, and returns those as a pair, then #2 and #3, and so on. After that, we just loop over the pairs and compute the start and end for each field.

So that was interesting. We might ask ourselves whether we really want to provide the symbol info as one big list or whether we want to provide it field by field. I suppose in the most common case, we’ll be taking a set and pulling out just some of the fields, and reconstituting them into a new set.

We’ll consider the basic symbol table problem to be in hand and see about pulling out some fields from a record.

    def test_unpack(self):
        record = 'Jeffries    Ronald      Boss      '
        symbols = (("last", 0, 12), ("first", 12, 24), ("job", 24, 36))
        fields = field_set(record, symbols)
        assert fields.includes("Jeffries    ", "last")
        assert fields.includes("Ronald      ", "first")
        assert fields.includes("Boss        ", "job")

I “merely” need to write field_set.

Well, and I need to get the test right. The record was too short. Here’s the working version:

    def test_unpack(self):
        def field_set(record, symbols):
            result = []
            for name, start, finish in symbols:
                entry = (record[start:finish], name)
                result.append(entry)
            return XSet(result)
        record = 'Jeffries    Ronald      Boss        '
        symbols = (("last", 0, 12), ("first", 12, 24), ("job", 24, 36))
        fields = field_set(record, symbols)
        assert fields.includes("Jeffries    ", "last")
        assert fields.includes("Ronald      ", "first")
        assert fields.includes("Boss        ", "job")

So that’s one way. Can we do a comprehension? Indeed we can:

    def test_unpack(self):
        def field_set(record, symbols):
            result = ((record[start:finish], name) for name, start, finish in symbols)
            return XSet(result)
        record = 'Jeffries    Ronald      Boss        '
        symbols = (("last", 0, 12), ("first", 12, 24), ("job", 24, 36))
        fields = field_set(record, symbols)
        assert fields.includes("Jeffries    ", "last")
        assert fields.includes("Ronald      ", "first")
        assert fields.includes("Boss        ", "job")

So. That’s two really nice steps toward a flat record set. I think we’ll stop here and sum up, as the distraction level has risen too high for good concentration.

Summary

We see here that we can take a reasonable symbol table definition, name and length, convert it to a convenient form, and use it to unpack a record and return its fields as a conventional XSet. That much is, I think, sufficient to allow us to read and process a flat file.

There will be concerns, in particular dealing with fixed length strings vs the more convenient short strings that we might program with, or put in input fields. The user, as well as the programmer, would like to say "Boss" instead of "Boss ".

As we get more experience flinging these fields around, we’ll surely want to include a type to which to cast them. It’s possible that we’ll want a specialized class for strings, that compares without trailing spaces. I’m not entirely comfortable putting specialized classes that deep in the system, inside the tuples, but maybe there’s a decent way to convert things at the last minute or something like that.

For now … a decent start at flat records. I call it a win!

See you next time!