Values from Records

FAFO on GitHub

Let’s fetch values from XST records for our Expressions. I expect this to go smoothly. It does. We discuss small steps. No, smaller than that.

Hello, friends!

Our expressions are getting fairly robust, with error messages and all. I have some notes for additional things that may cause errors, and we should probably put in a general scheme for recovering, but while that can be seen as “important”, I don’t think it’ll be very “interesting”. We are here to find things that are interesting. We’re not building a product, we are learning about building products by working (playing?) around.

Mind you, there is a lot of learning to be had in bullet-proofing things, and I freely grant that I’m not covering all the topics a professional developer needs to know. I’m covering the ones that I find interesting, and the ones where I feel at least somewhat qualified to speak.

As Wittgenstein put it: “Whereof one cannot speak, thereof one must be silent.” I’m pretty sure he had me and security in mind when he said that. Probably me and some other topics as well.

Anyway, when we see an expression like pay=salary+bonus as an input to our parsing and expression handling, we intend that salary and bonus are scopes (fields) in an input record, and pay is a new field, possibly virtual, that contains the indicated sum. Let’s write a test conveying that, and make it work.

    def test_record(self):
        text = 'pay = salary + bonus'
        rpn = Parser(text).rpn()
        record = XSet.from_tuples((('salary', '10000'), ('bonus', '2345')))
        expr = Expression('ignored', rpn)
        assert expr.scope() == 'pay'
        result = expr.result(record)
        assert result == '12345'

I think this should do the job. I get the expected red bar. I’m not sure what to expect from the result: I kind of think it might raise an unhandled exception.

Expected :'12345'
Actual   :"Too many operators: ['pay', 'salary', 'bonus', '+', '=']"

Ah, that will be because we don’t handle the scope tokens at all:

    def result(self, record):
        stack = []
        while self._tokens:
            token = self._tokens.pop()
            if token.kind == 'literal':
                stack.append(token.value)
            elif token.kind == 'operator':
                try:
                    arg_1 = self.to_number(stack.pop())
                    arg_2 = self.to_number(stack.pop())
                except IndexError:
                    return f'Too many operators: {self._cached_tokens}'
                res = self.execute_operation(token, arg_1, arg_2)

                stack.append(str(res))
        if len(stack) != 1:
            return f'operator/operand mismatch: {self._cached_tokens}'
        return stack.pop()

You may notice that I’ve renamed some variables there, and extracted the actual execution of the operation. I did that last night when you weren’t watching. I think we should first improve this code by adding an else clause that provides an error with an unrecognized token:

    def result(self, record):
        stack = []
        while self._tokens:
            token = self._tokens.pop()
            if token.kind == 'literal':
                stack.append(token.value)
            elif token.kind == 'operator':
                try:
                    arg_1 = self.to_number(stack.pop())
                    arg_2 = self.to_number(stack.pop())
                except IndexError:
                    return f'Too many operators: {self._cached_tokens}'
                res = self.execute_operation(token, arg_1, arg_2)
                stack.append(str(res))
            else:
                return f'unrecognized token: {token}'
        if len(stack) != 1:
            return f'operator/operand mismatch: {self._cached_tokens}'
        return stack.pop()

I expect that result as my failure, and I get it:

Expected :'12345'
Actual   :'unrecognized token: Token(scope, bonus, None)'

Should capitalize ‘un’.

N.B.: Coulda shoulda committed here. No biscuit!

But now let’s deal with the scope token.

REDACTED: I will spare you some debugging, but I wasted a lot of time finding a defect in the code that strips the assignment out of the tokens and puts the name in the Expression. I need a test for that.

    def test_expression_gets_scope(self):
        text = 'four = 3 + 1'
        rpn = Parser(text).rpn()
        tokens = [t.value for t in rpn]
        assert tokens == ['four', '3', '1', '+', '=']
        expr = Expression('wrong', rpn)
        assert expr.scope() == 'four'
        adjusted_tokens = [t.value for t in expr._tokens]
        assert adjusted_tokens == ['+', '1', '3']

The answer comes back as [’+’, 1’]. Not so good.

The fix is here:

    def handle_assignment(self):
        if self._tokens:
            initial_token = self._tokens[0]
            if initial_token.is_assignment():
                final_token = self._tokens[-1]
                self._scope = final_token.value
                self._tokens = self._tokens[1:-2]

That final 2 should be 1, because slice is “up to” not “up to including”.

    def handle_assignment(self):
        if self._tokens:
            initial_token = self._tokens[0]
            if initial_token.is_assignment():
                final_token = self._tokens[-1]
                self._scope = final_token.value
                self._tokens = self._tokens[1:-1]

N.B.: Coulda shoulda committed here. Again, bad Ron, no biscuit.

And to make our expression test run, we have this in result:

    def result(self, record):
        stack = []
        while self._tokens:
            token = self._tokens.pop()
            if token.kind == 'literal':
                stack.append(token.value)
            elif token.kind == 'scope':
                scope = token.value
                value = record.get(scope)
                stack.append(value)
            elif token.kind == 'operator':
                try:
                    arg_1 = self.to_number(stack.pop())
                    arg_2 = self.to_number(stack.pop())
                except IndexError:
                    return f'Too many operators: {self._cached_tokens}'
                res = self.execute_operation(token, arg_1, arg_2)
                stack.append(str(res))
            else:
                return f'Unrecognized token: {token}'
        if len(stack) != 1:
            return f'operator/operand mismatch: {self._cached_tokens}'
        return stack.pop()

The addition is just the elif token.kind == 'scope' bit.

We are green. Commit: expressions can refer to fields in records (scopes in XSets).

Now let’s test what happens if there is no such field, and produce a decent message.

    def test_scope_not_in_record(self):
        text = 'pay = salary + bogus'
        rpn = Parser(text).rpn()
        print()
        print("rpn", rpn)
        record = XSet.from_tuples((('10000', 'salary'), ('2345', 'bonus')))
        assert record.get('salary') == '10000'
        expr = Expression('ignored', rpn)
        assert expr.scope() == 'pay'
        result = expr.result(record)
        assert result == 'Record has no scope: bogus'

The initial failure is this:

>           return int(string)
E           TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

src/expressions.py:63: TypeError

During handling of the above exception, another exception occurred:

We need to bullet-proof our to_number, which looks like this:

    @staticmethod
    def to_number(string):
        try:
            return int(string)
        except (ValueError, TypeError):
            return float(string)

It failed in the retry, because it got a None. We’ll work to make sure that doesn’t happen, but this needs to do something sensible. I think we’ll have it return zero.

    @staticmethod
    def to_number(string):
        try:
            return int(string)
        except (ValueError, TypeError):
            try:
                return float(string)
            except (ValueError, TypeError):
                return 0

Now our message is:

Expected :'Record has no scope: bogus'
Actual   :'10000'

N.B.: Commit was possible here.

Already I don’t like returning zero. But for now:

    def result(self, record):
        stack = []
        while self._tokens:
            token = self._tokens.pop()
            if token.kind == 'literal':
                stack.append(token.value)
            elif token.kind == 'scope':
                scope = token.value
                value = record.get(scope)
                if not value:
                    return f'Record has no scope: {scope}'
                stack.append(value)
            elif token.kind == 'operator':
                try:
                    arg_1 = self.to_number(stack.pop())
                    arg_2 = self.to_number(stack.pop())
                except IndexError:
                    return f'Too many operators: {self._cached_tokens}'
                res = self.execute_operation(token, arg_1, arg_2)
                stack.append(str(res))
            else:
                return f'Unrecognized token: {token}'
        if len(stack) != 1:
            return f'operator/operand mismatch: {self._cached_tokens}'
        return stack.pop()

We are green. Commit: Diagnose missing field in expression.

Reflection

This has gone well, except for my diversion caused by the incorrect AND UNTESTED slice when extracting the assignment from the rpn.

The actual work we had to do was little more than add a few lines to the result method. Much of what we did was ancillary to the main task, improving error handling.

Speaking of the result method, it’s getting a bit long and messy. In particular, there are all those early returns, bearing error messages. I think what we need, roughly, is:

Reduce the while loop to call a single method;
Put that call in a try/except, expecting a specialized exception type;
In the called method, raise the exception for errors instead of returning;
Return only if everything seems to have gone OK.
When the exception occurs, return its message as the result, as we do now with our interleaved returns.

That’s more than I want to do this morning. We’ll save that for another session. I’ve never even created a specialized exception in Python, so there’s some elementary learning to be done.

We continue to take small steps toward our current goal, to allow calculations to take place in XSets, with current expectations being that we’ll have the option of creating a new set with calculated values in it, or some kind of “virtual” set that returns all the fields of an input XSet, plus calculated fields, as if they were really there in the input set.

Small Steps

When members of my cabal speak to teams about small steps, some concerns arise frequently. These include:

But our stories are huge, they can’t be done in small steps;
We already use small steps, having sliced our stories as thin as we can;
Our stories include changes to more than one layer in the software;
We already use the smallest possible steps.

I think these all boil down to the last one. And the last one, well, anyone who says that is almost certainly mistaken. They are probably doing the smallest steps they could think of, but that’s not quite the same as the smallest steps possible.

This morning I did two commits: diagnose missing field in expression, and expressions can refer to fields in records.

The “diagnose” commit included two changes, one of two lines, and one of four. Six lines. It could have been done in two steps. In fact when I did it, I took two steps: I just didn’t remember to commit.

The “expressions can refer” commit included

one line to fix the slicing when extracting the assignment;
four lines to handle looking up field values;
two lines to diagnose unhandled token types;
one line to add an exception to an except statement;

Four changes. Eight lines. Each of those could have been done as a separate commit.

I did two commits, could have done six. The largest change size was four lines. I do not at this writing know how to get those big changes down to fewer lines than that.

This is why we don’t believe that anyone, anywhere, including our own damn selves, is really taking the smallest steps possible.

But but but …

Your tests run in 1.5 seconds. Ours take too long to be committing that often!

That, my dears, is not a problem with small steps. And it’s not 1.5 seconds, it’s 140 milliseconds.

No, to be fair, we do accept that a large projects’ tests may require a long time to run, and yes, when that’s the case, we feel that we cannot commit our code, and given that, we feel that it’s OK to do enough work to justify taking a break while the tests run.

And next thing they know, they’re playin’ for money in a pinch-back suit and list’nin to some big out-a-town Jasper hearin’ him tell about horse-race gamblin’. Well, maybe not, but even if they’re not taking the first big steps on the road to the depths of deg-ra-Day …

Seriously, odds are that they, and I, and possibly even you, would benefit from finding ways to take smaller steps. No, much smaller than that. Much smaller.

Summary

We’ve done well. Over a small number of sessions, we’ve built up a facility that can parse expressions and compile them to an executable form that can fetch values from records. We’ve gone in small steps, though we could have gone in even smaller steps.

And we found a defect with difficulty that would have been found instantly had we ever tested it.

It never ceases to amaze me how much of all this comes down to taking small steps supported by tests telling us first, that something isn’t as we want it, and then, soon after, that it is.

Of course, as we spoke about yesterday, there is a great deal of thinking going on, in the creation of those tests, and the crafting of the code that passes them. Thinking is good. We should do a lot of it. For best results, if we take small steps, we can back that thinking up immediately with tangible results showing which thoughts were accurate and which were … not so accurate.

Good results. See you next time!