Calculations

We’ve been working on calculated fields. Let’s see about getting them built into sets. Yucch! I touched a debugger.

Hello, again, friends!

The lexer, parser, and expression code are fairly solid, and even produce some almost-useful error messages when used incorrectly. They could use some refactoring, but the code isn’t bad. The facility is only exposed to the programmer, like me, not to actual humans, so we’ll live with cryptic error messages for now. We need to make progress and the error messages are not slowing us down.

Foreshadowing: The error messages aren’t perfect and if they had been better, a sticking point below would not have been a sticking point at all.

Our mission regarding expressions includes at least two notions:

Provide a way to produce a set based on an input set, containing new field values that are computed expressions in terms of the existing fields.
Provide a way to produce the same set as a “view” of an existing set, that is, without storing everything in another physical set.

A somewhat interesting question is which of these to do first. #1 is more basic and seems that it might be simpler. But if we had #2, then #1 would be trivial. On the gripping hand, #2 is probably a bit more difficult.

I think we’ll try #2, but we’ll be ready to fall back quickly if we get in trouble.

Let’s try to write a test. Suppose we have a set of records, each of which include a salary field and a bonus field. We want a set containing those fields plus a new field, total_pay with the sum of those values. And we want the set to be virtual. You can process it any way you like but while the set can produce all its records, it does not contain them, it only contains the input set.

The test may not need to specify that the result is virtual: we’ll see if it seems necessary.

I think we’ll have a new test file and class for this.

class TestExpressionSets:
    def test_total_pay(self):
        p1 = XSet.from_tuples((('joe', 'name'),('10000', 'salary'), ('2345', 'bonus')))
        p2 = XSet.from_tuples((('sam', 'name'),('50301', 'salary'), ('4020', 'bonus')))
        personnel = XSet.n_tuple((p1, p2))
        expression = 'total_pay = salary + bonus'
        result = personnel.calculate([expression])
        r1 = result[1]
        tp1 = r1['total_pay']
        assert tp1 == '12345'
        r2 = result[2]
        tp2 = r2['total_pay']
        assert tp2 == '54321'

I think that test should get us started. It may be a bit large. If so, we’ll do some smaller tests, but that’s where we’re going.

I’d like to have a smaller one already, but what could it be?

I think we are creating a new kind of XImplementation, and that the set result above will have an XImplementation that holds a list of calculations, and an XSet.

So let’s see if we can write a test for that:

    def test_x_calculation(self):
        expression = 'total_pay = salary + bonus'
        data = XSet.n_tuple(["a", "b"])
        x_calc = XCalculation(data, [expression])
        assert x_calc.data == data
        assert expression in x_calc.expressions

That should drive out a bit of XCalculation. We’ll install it in the test file to begin with.

We put in the basic stuff and let PyCharm produce dummy methods, which immediately teaches us something:

class XCalculation(XImplementation):
    def __init__(self, base_set, expressions):
        self.base = base_set
        self.expressions = expressions
        
    def __iter__(self):
        pass

    def __hash__(self):
        pass

    def __len__(self):
        pass

    def __repr__(self):
        pass

It’ll be easy enough to do hash, len and repr. What about the iter, though? We need it to create and yield records. Let’s do the others and then iter.

    def __hash__(self):
        return hash(self.base)

    def __len__(self):
        return len(self.base)

    def __repr__(self):
        return f'XCalc({[self.expressions]}'

I have those two tests failing. Need to get one to go green soon. Oh. All i have to do is to fix the test to refer to base:

    def test_x_calculation(self):
        expression = 'total_pay = salary + bonus'
        data = XSet.n_tuple(["a", "b"])
        x_calc = XCalculation(data, [expression])
        assert x_calc.base == data
        assert expression in x_calc.expressions

So that has gone green. Let me try iter right offhand. If I can’t get it, I’ll try something smaller.

Let’s have a test for iter.

    def test_x_calculation_iter(self):
        p1 = XSet.from_tuples((('joe', 'name'),('10000', 'salary'), ('2345', 'bonus')))
        p2 = XSet.from_tuples((('sam', 'name'),('50301', 'salary'), ('4020', 'bonus')))
        personnel = XSet.n_tuple((p1, p2))
        expression = 'total_pay = salary + bonus'
        x_calc = XCalculation(personnel, [expression])
        for e, s in x_calc:
            if s == 1:
                assert e['name'] == 'joe'

This should let me do this much:

    def __iter__(self):
        for e, s in self.base:
            yield e, s

That just returns the records of the base set. So we make the test more difficult:

    def test_x_calculation_iter(self):
        p1 = XSet.from_tuples((('joe', 'name'),('10000', 'salary'), ('2345', 'bonus')))
        p2 = XSet.from_tuples((('sam', 'name'),('50301', 'salary'), ('4020', 'bonus')))
        personnel = XSet.n_tuple((p1, p2))
        expression = 'total_pay = salary + bonus'
        x_calc = XCalculation(personnel, [expression])
        for e, s in x_calc:
            if s == 1:
                assert e['name'] == 'joe'
                assert e['total_pay'] == '12345'

Fails, of course. Should get a None if I’m not mistaken.

Expected :'12345'
Actual   :None

Right. Now how can we construct a new record with all the stuff that’s in this record, plus our new stuff?

LOL. I found a bug in the lexer: ‘total_pay’ gets lexed into ‘total pay’. Fix the test:

    def test_x_calculation_iter(self):
        p1 = XSet.from_tuples((('joe', 'name'),('10000', 'salary'), ('2345', 'bonus')))
        p2 = XSet.from_tuples((('sam', 'name'),('50301', 'salary'), ('4020', 'bonus')))
        personnel = XSet.n_tuple((p1, p2))
        expression = 'totalpay = salary + bonus'
        x_calc = XCalculation(personnel, [expression])
        for e, s in x_calc:
            if s == 1:
                assert e['name'] == 'joe'
                assert e['totalpay'] == '12345'

And this is my iter:

    def __iter__(self):
        for record, s in self.base:
            calculated = []
            for expr in self.expressions:
                rpn = Parser(expr).rpn()
                calc = Expression(None, rpn)
                value = calc.result(record)
                calculated.append((value, calc.scope()))
            all_results = XSet.from_tuples(calculated)
            full_record = record.union(all_results)
            yield full_record, s

This is horrible but it has a very desirable property: it actually works. The test passes. Enhance the test a bit:

    def test_x_calculation_iter(self):
        p1 = XSet.from_tuples((('joe', 'name'),('10000', 'salary'), ('2345', 'bonus')))
        p2 = XSet.from_tuples((('sam', 'name'),('50301', 'salary'), ('4020', 'bonus')))
        personnel = XSet.n_tuple((p1, p2))
        expression = 'totalpay = salary + bonus'
        x_calc = XCalculation(personnel, [expression])
        for e, s in x_calc:
            if s == 1:
                assert e['name'] == 'joe'
                assert e['totalpay'] == '12345'
            elif s == 2:
                assert e['name'] == 'sam'
                assert e['totalpay'] == '54321'

Let’s pause to reflect and taste our chai.

What do we see in that __iter__? Well, first of all, we should lex the input better, but we’ll get there. Note made. Second, we should parse once and save the Expression instances. I think the solution of unioning the sets is fairly nice.

We have a half-way decent test. I feel confident that the code processes more than one expression, but we should probably extend the test to try two.

    def test_x_calculation_iter(self):
        p1 = XSet.from_tuples((('joe', 'name'),('10000', 'salary'), ('2345', 'bonus')))
        p2 = XSet.from_tuples((('sam', 'name'),('50301', 'salary'), ('4020', 'bonus')))
        personnel = XSet.n_tuple((p1, p2))
        total = 'totalpay = salary + bonus'
        double = 'double = bonus * 2'
        x_calc = XCalculation(personnel, [total, double])
        for e, s in x_calc:
            if s == 1:
                assert e['name'] == 'joe'
                assert e['totalpay'] == '12345'
                assert e['double'] == '4690'
            elif s == 2:
                assert e['name'] == 'sam'
                assert e['totalpay'] == '54321'
                assert e['double'] == '8040'

That works just fine.

Let’s refactor. First let’s do the expressions in __init__. I don’t think there are any tools in PyCharm to do that one.

Damn, I wish I had committed. I messed something up. Oh well, undo until green. Commit: XCalculation tests run.

Now what I was trying to do was to move the creation of the Expressions up into init. Let’s try again.

class XCalculation(XImplementation):
    def __init__(self, base_set, expressions):
        self.base = base_set
        self.expressions = []
        for expr in expressions:
            rpn = Parser(expr).rpn()
            calc = Expression(None, rpn)
            self.expressions.append(calc)

    def __iter__(self):
        for record, s in self.base:
            calculated = []
            for calc in self.expressions:
                value = calc.result(record)
                calculated.append((value, calc.scope()))
            all_results = XSet.from_tuples(calculated)
            full_record = record.union(all_results)
            yield full_record, s

This doesn’t work. Again. The error is:

Expected :'54321'
Actual   :"operator/operand mismatch: ['totalpay', 'salary', 'bonus', '+', '=']"

That’s telling me that the Expression didn’t strip the assignment.

Confession: I debugged this. I stepped it until I finally recognized what is going on. It is somewhat arcane. I don’t fully understand it yet but I can write a test that should fail. It’s about using the same expression twice.

    def test_twice(self):
        text = 'pay = salary + bonus'
        rpn = Parser(text).rpn()
        print()
        print("rpn", rpn)
        record = XSet.from_tuples((('10000', 'salary'), ('2345', 'bonus')))
        assert record.get('salary') == '10000'
        expr = Expression('ignored', rpn)
        assert expr.scope() == 'pay'
        result = expr.result(record)
        assert result == '12345'
        result = expr.result(record)
        assert result == '12345'

The second assert fails:

Expected :'12345'
Actual   :"operator/operand mismatch: ['pay', 'salary', 'bonus', '+', '=']"

What is wrong? I think what’s wrong is that we have consumed all our tokens and on the second time through we have no calculation to do. I think the message is poor and that we can improve it thus:

    def result(self, record):
        stack = []
        if not self._tokens:
            return 'Empty expression'
        while self._tokens:
            ...

Yes. The test above gets the Empty expression error. Now we want to use a copy of the tokens, not the real ones, so that we can reuse them.

    def result(self, record):
        stack = []
        working_tokens = [t for t in self._tokens]
        if not working_tokens:
            return 'Empty expression'
        while working_tokens:
            token = working_tokens.pop()
            if token.kind == 'literal':
                stack.append(token.value)
            elif token.kind == 'scope':
                scope = token.value
                value = record.get(scope)
                if not value:
                    return f'Record has no scope: {scope}'
                stack.append(value)
            elif token.kind == 'operator':
                try:
                    arg_1 = self.to_number(stack.pop())
                    arg_2 = self.to_number(stack.pop())
                except IndexError:
                    return f'Too many operators: {self._cached_tokens}'
                res = self.execute_operation(token, arg_1, arg_2)
                stack.append(str(res))
            else:
                return f'Unrecognized token: {token}'
        if len(stack) != 1:
            return f'operator/operand mismatch: {self._cached_tokens}'
        return stack.pop()

Reflection: I don’t quite see how I could have thought of that test until this happened, but it is easy to see that if the test had been written, I’d have saved a lot of debugging time. But even though the expressions were clearly intended to be used over and over, I don’t think it would have occurred to me to test calling the same one twice.; Maybe if you had been pairing with me, you would have thought of it. My loss.

Now the only failing test is the one I mentioned above:

    def test_x_calculation(self):
        expression = 'totalpay = salary + bonus'
        data = XSet.n_tuple(["a", "b"])
        x_calc = XCalculation(data, [expression])
        assert x_calc.base == data
        assert expression in x_calc.expressions

The expression isn’t in there any more, it has been thrashed, trashed, bashed, eaten, and consumed. Other tests work. We just needed this test to drive out the class. Remove it.

We are green. And tired. Commit: XCalculation(XImplementation) creates virtual records from input set and set of expressions.

I’m about whipped but let’s clean up that code just a bit.

    def __iter__(self):
        for record, s in self.base:
            calculated = []
            for calc in self.expressions:
                value = calc.result(record)
                calculated.append((value, calc.scope()))
            all_results = XSet.from_tuples(calculated)
            full_record = record.union(all_results)
            yield full_record, s
~~

Rename `s`:

~~~python
    def __iter__(self):
        for record, record_scope in self.base:
            calculated = []
            for calc in self.expressions:
                value = calc.result(record)
                calculated.append((value, calc.scope()))
            all_results = XSet.from_tuples(calculated)
            full_record = record.union(all_results)
            yield full_record, record_scope

Extract method:

    def __iter__(self):
        for record, record_scope in self.base:
            all_results = self.create_calculated_values(record)
            full_record = record.union(all_results)
            yield full_record, record_scope

    def create_calculated_values(self, record):
        calculated = []
        for calc in self.expressions:
            value = calc.result(record)
            calculated.append((value, calc.scope()))
        all_results = XSet.from_tuples(calculated)
        return all_results

Rename variable:

    def __iter__(self):
        for record, record_scope in self.base:
            calculated_values = self.create_calculated_values(record)
            full_record = record.union(calculated_values)
            yield full_record, record_scope

Make a comprehension:

    def create_calculated_values(self, record):
        calculated = [(calc.result(record), calc.scope()) for calc in self.expressions]
        all_results = XSet.from_tuples(calculated)
        return all_results

Strictly speaking, I could be committing on each of these steps, and because I know that I tend to take bigger steps than I might, I probably “should” be. Commit: refactoring. Why don’t I make a detailed commit message? Two reasons: 1) that would slow me down, and B) if someone wants to see the refactoring they can look.

Inline:

    def create_calculated_values(self, record):
        calculated = [(calc.result(record), calc.scope()) for calc in self.expressions]
        return XSet.from_tuples(calculated)

Commit: refactoring.

Here’s our entire XCalculation class, for a final look:

class XCalculation(XImplementation):
    def __init__(self, base_set, expressions):
        self.base = base_set
        self.expressions = []
        for expr in expressions:
            rpn = Parser(expr).rpn()
            calc = Expression(None, rpn)
            self.expressions.append(calc)

    def __iter__(self):
        for record, record_scope in self.base:
            calculated_values = self.create_calculated_values(record)
            full_record = record.union(calculated_values)
            yield full_record, record_scope

    def create_calculated_values(self, record):
        calculated = [(calc.result(record), calc.scope()) for calc in self.expressions]
        return XSet.from_tuples(calculated)

    def __hash__(self):
        return hash(self.base)

    def __len__(self):
        return len(self.base)

    def __repr__(self):
        return f'XCalc({[self.expressions]}'

We could crunch that __init__ down, but I’m tired and if I did I would take it too far. Don’t believe me? Hold my chai:

class XCalculation(XImplementation):
    def __init__(self, base_set, expressions):
        self.base = base_set
        self.expressions = [Expression(None, Parser(expr).rpn()) for expr in expressions]

And now that I’ve gone that far: Commit: refactoring unto oblivion.

Actually, that’s pretty nifty, if you’re into comprehensions, which if you are a True Pythonista you surely are.

Let’s sum up.

Summary

We saw today the kind of issue that arises when we use very simple lexing and parsing. We tend to get odd things like failure to accept underbar as a letter. I’ve made a note to fix that up.

The worst thing here is that although I had the code for the XCalculation correct, the Expressions were not reusable, because they ate their tokens on the first execution. The fix was easy. Discovering the need took a lot of stepping, more than was ideal.

How much stepping is ideal? Well, zero. I made the mistake a week or so ago of learning a bit about the PyCharm debugger. Prior to that the most I would do would be a few prints, but there are cases where the debugger is actually a faster way to find a defect if the step is a bit large.

The real bug is that the step is too large, and I think I’d like myself better if I never resorted to the debugger.

Aside: I’m not sure why I have this prejudice. In Smalltalk one basically programs in the debugger. Write a test calling for something that doesn’t exist yet. Run it. The debugger pops up saying “where’s that at?” and, get this, you type it in and say ‘continue’ and the program continues just as if the thing had always been there!!; There is nothing else like it in the world, that I know of. It’s marvelous. So possibly I shouldn’t be so hard on myself about using the debugger. It’s just that when I get to the debugger frame of mind in other languages, the real issue is usually that I’ve taken too big a step, either with too difficult a test or, worst times, with no test at all. But I digress.

In this case, I don’t know how long it would have taken me to think of the test that ran the same expression twice. I stepped through the thing about a time and a half, and I got a glimmer: somehow the expression was changing. At the time I wrote the confession above, I still didn’t know what it was, I just knew it was changing. Then, with the pause for the confession, I saw a glimmer of the issue and we were quickly back on track.

As we see in the code, the implementation of a virtual calculation is trivial. I had thought, but didn’t even think long enough to mention it, that we might need a single-record kind of XImplementation, kind of like XFlat, but the yield capability in __iter__ let us build the records on the fly.

?: I wonder if we can get rid of XFlat class the same way. We have no real use for it. Hmm

It remains to make an XSet with these virtual calculations in it. I have a test marked skip for that: it’s the one we wrote initially. And if we need a physical set for some reason, it will be easy to make one. Trust me on that.

We’ll proceed a bit further down this path next time, but I am quite pleased with this morning, even if I did dirty my fingers on the debug button.

What do you think? How hard should we try to work in such small steps that we never need the debugger? When will it pay off and when will it not?

See you next time!