FAFO 16: Project

FAFO on GitHub

Lets do project, as in projection, as preparation for trying a generator approach to set expressions. This part should be easy.

Hello, friends!

In a little while, possibly tomorrow, I plan to try to use Python generators to build up set expressions. For that to make a bit sense, I’d like to have another operation in addition to restrict. The project operation produces a set of records from an input set, each record including only the “fields” whose scopes appear in a control set. We’ll have it work like this:

    def test_project(self):
        ron = XSet([Atom("jeffries", "last"), Atom("ron", "first"), Atom("boss", "job")])
        ron_name = XSet([Atom("jeffries", "last"), Atom("ron", "first")])
        chet = XSet([Atom("chet", "first"), Atom("hendrickson", "last"), Atom("boss", "job")])
        chet_name = XSet([Atom("chet", "first"), Atom("hendrickson", "last")])
        hill = XSet([Atom("hill", "last"), Atom("geepaw", "first"), Atom("serf", "job")])
        personnel = XSet.classical_set([ron, chet, hill])
        fields = XSet.classical_set(("first", "last"))
        result = personnel.project(fields)
        assert result.includes(ron_name)
        assert result.includes(chet_name)

I think that’s what I want. Now let’s see if we can implement it.

The following works but even I cannot understand it and I just wrote it.

    def project(self, other) -> Self:
        projected = []
        for record_atom in self.contents:
            new_list = []
            for atom_scope in other.contents:
                for atom_fields in record_atom.element.contents:
                    if atom_scope.element == atom_fields.scope:
                        new_list.append(atom_fields)
            new_rec = XSet(new_list)
            projected.append(new_rec)
        return XSet.classical_set(projected)

I wonder if we can expand out element and scope in a for. Let me try it. First, since this runs, let’s commit: initial project.

Here’s what may be a slight improvement:

    def project(self, field_selector: Self) -> Self:
        projected = []
        for record_element, record_scope in self.contents:
            new_atoms = []
            for desired_field_name, _ in field_selector.contents:
                for field, field_name in record_element.contents:
                    if desired_field_name == field_name:
                        new_atoms.append(Atom(field, field_name))
            new_rec = XSet(new_atoms)
            projected.append(new_rec) # should retain input scope?
        return XSet.classical_set(projected)

We are assuming a classical set of records, both in our test and here. I suspect that, in principle, we should retain whatever scope a record may have, unless we explicitly use an operation to renumber or remove them.

Let’s commit this: slightly improve code, and then extract a method:

    def project(self, field_selector: Self) -> Self:
        projected = []
        for record_element, record_scope in self.contents:
            new_rec = self.project_one_record(record_element, field_selector)
            projected.append(new_rec) # should retain input scope?
        return XSet.classical_set(projected)

    def project_one_record(self, record_element, field_selector):
        new_atoms = []
        for desired_field_name, _ in field_selector.contents:
            for field, field_name in record_element.contents:
                if desired_field_name == field_name:
                    new_atoms.append(Atom(field, field_name))
        new_rec = XSet(new_atoms)
        return new_rec

That’s not really much better, is it? Let’s push on a bit. Wait … can I use in here? No, not quite.

What are we trying to do here? We have some record in hand, with a set of fields scoped by field names, and we want to return all the fields in the record whose scope appears in field_selector (as an element).

Hmm … for every field name … collect from this record all the atoms whose scope is in the field selector, and we know they are in there at null … at least let’s suppose that we know that.

Let’s turn that loop around:

    def project_one_record(self, record_element, field_selector):
        new_atoms = []
        for field, field_name in record_element.contents:
            for desired_field_name, _ in field_selector.contents:
                if desired_field_name == field_name:
                    new_atoms.append(Atom(field, field_name))
        new_rec = XSet(new_atoms)
        return new_rec

Is that more amenable to improvement? Tried a comprehension with no luck. Let’s roll back to before that extract method.

    def project(self, field_selector: Self) -> Self:
        projected = []
        for record_element, record_scope in self.contents:
            new_atoms = []
            for desired_field_name, _ in field_selector.contents:
                for field, field_name in record_element.contents:
                    if desired_field_name == field_name:
                        new_atoms.append(Atom(field, field_name))
            new_rec = XSet(new_atoms)
            projected.append(new_rec) # should retain input scope?
        return XSet.classical_set(projected)

We’re green. We have the new method. It’s just that we hate it. We can probably use it to work on our generator idea as it stands, maybe.

Try the same extract again, fresh mind:

    def project(self, field_selector: Self) -> Self:
        projected = []
        for record_element, record_scope in self.contents:
            new_rec = self.project_one_record(record_element, field_selector)
            projected.append(new_rec) # should retain input scope?
        return XSet.classical_set(projected)

    def project_one_record(self, record_element, field_selector):
        new_atoms = []
        for desired_field_name, _ in field_selector.contents:
            for field, field_name in record_element.contents:
                if desired_field_name == field_name:
                    new_atoms.append(Atom(field, field_name))
        new_rec = XSet(new_atoms)
        return new_rec

Convert top method to comprehension, inline one temp:

    def project(self, field_selector: Self) -> Self:
        projected = [self.project_one_record(record_element, field_selector)
                     for record_element, record_scope in self.contents]
        return XSet.classical_set(projected)

    def project_one_record(self, record_element, field_selector):
        new_atoms = []
        for desired_field_name, _ in field_selector.contents:
            for field, field_name in record_element.contents:
                if desired_field_name == field_name:
                    new_atoms.append(Atom(field, field_name))
        return XSet(new_atoms)

Can I make that second method use a comprehension and if I do will I be sorry?

    def project(self, field_selector: Self) -> Self:
        projected = [self.project_one_record(record_element, field_selector)
                     for record_element, record_scope in self.contents]
        return XSet.classical_set(projected)

    def project_one_record(self, record_element, field_selector):
        new_atoms = [Atom(field, field_name)
                     for field, field_name in record_element.contents
                     for desired_field_name, _ in field_selector.contents
                     if desired_field_name == field_name]
        return XSet(new_atoms)

Not too awful. Commit: tidying with comprehensions.

Let’s reflect and sum up.

ReflectoSummary

I ran into a lot of trouble at first, because it is hard to keep track of when I need to unwind an atom, and when I need to create one. They’re not helping me, as the programmer, even though they are helpful, almost essential to making the sets work like sets at the bottom.

I finally had to print out what I was creating and compare it visually to what I wanted, before I realized what was going on. After that, the method came together pretty easily.

I’d like to find a way of constructing a set that hews more closely to the math, and that doesn’t require us to create atoms explicitly. The Python dictionary key-value notation would be good, except that the compiler won’t allow duplicates, and you can’t just say “key”:”value” … that thing is not an object, unfortunately.

We could use simple tuples and cast them to Atom inside. We could use simple tuples and always unwind them explicitly wherever we use them, as in for element, scope in tuples:

I don’t know. This function is a good start, and it will probably serve OK for an experiment with generators and set expressions. But it has highlighted some issues in the design, which we’d be wise to sort out sooner rather than later.

I’m going to pause here and will see you again at about 3 AM.