Interpreter

We can parse a simple expression. Let’s sketch the expression interpreter. Bear gives us a nip, but we have progress.

We’re not finished with parsing at all. We need to deal with parentheses and perhaps other constructs, and the code is far from what I’d consider clear. But I think we should try interpreting the result, so that we’ll have a sense of both sides of the parse-execute issue.

We’re to the point where this expression:

a123 + 37*b17 - c*37.5

Parses down to this list, in Reverse Polish Notation:

['a123', '37', 'b17', '*', 'c', '37.5', '*', '-', '+']

Performing the indicated calculation, if I’m not mistaken, comes down to this:

Go through the list from left to right. If the next item is an operand (name or number), push its value onto a stack. If the item is an operator, execute that operator against the top items on the stack, and push the result back onto the stack. When you’re done, the answer is on the top of the stack.

So let’s try that, with some tests.

    def test_interpret(self):
        rpn = [ '37', '5', '+']
        result = interpret(rpn)
        assert result == 42

I do suppose that we’ll put parsing and interpreting into some objects, but for now I’m just coding them in open functions.

I swear that I just typed this in and the test passed:

def interpret(rpn):
    stack = []
    r = rpn[::-1]
    while r:
        item = r.pop()
        if item == '+':
            stack.append(to_number(stack.pop()) + to_number(stack.pop()))
        else:
            stack.append(item)
    return stack.pop()

def to_number(string):
    try:
        return int(string)
    except ValueError:
        return float(string)

Let’s do a harder one:

    def test_interpret_2(self):
        expr = '10 * 2 + 10 * 2 + 2'
        lexed = lex(expr)
        assert lexed == ['10', '*', '2', '+', '10', '*', '2', '+', '2']
        rpn = dij(lexed)
        assert rpn == ['10', '2', '*', '10', '2', '*', '2', '+', '+']
        result = interpret(rpn)
        assert result == 42

This fails for want of multiplication.

def interpret(rpn):
    stack = []
    r = rpn[::-1]
    while r:
        item = r.pop()
        if item == '+':
            stack.append(to_number(stack.pop()) + to_number(stack.pop()))
        elif item == '*':
            stack.append(to_number(stack.pop()) * to_number(stack.pop()))
        else:
            stack.append(item)
    return stack.pop()

And we pass. Let’s commit: prototype lexer, parse, interpreter coming along nicely.

Reflection

I think we have the basic shape of lexing, parsing, and interpreting in place, just about enough to begin to see how to make some objects here. I think we should put a bit of emphasis on what our interpreter would like to receive, because we will presumably parse seldom and interpret often. And anyway, it’s nice to provide things that our users want.

We can’t be too precise about this, because we don’t know quite how this will all plug in. We can guess that there will be a new kind of XImplementation, perhaps XCalculation, which will contain a number of “field definitions”, a field name (scope) and an RPN. The XCalculation will also have an attached XSet with the input records. As it spins out its records, it will produce an output that is, or appears to be, records with all the fields of the attached XSet plus the new calculated fields.

We see already in the shape of the interpret function above that each kind of thing in the rpn has an associated action. A constant puts itself on the stack. A name (not implemented yet) will fetch a value from the current record and put that on the stack. An operator will pop one or more items, process them, and push a result back on the stack.

It would be nice if interpret just turned into something like this:

while rpn:
    rpn.pop().execute(stack, current_record)
return  stack.pop()

Or perhaps even something like this:

while rpn:
    rpn.pop()(stack, current_record)
return  stack.pop()

If that latter form, the rpn has callable things in it, possibly lambdas.

Let’s see if we can mash interpret to sort of work that way. I think we’ll want to move the behavior I create here around but I am reluctant to break my current tests while I’m just trying to learn some things.

def interpret(rpn):
    stack = []
    r = rpn[::-1]
    executable_r = make_executable(r)
    while executable_r:
        executable_r.pop()(stack)
    return stack.pop()

This of course breaks both of my interpreter tests. I try this:

def make_executable(rpn):
    exec = []
    op = None
    for r in rpn:
        if r == '+':
            exec.append(lambda stack: stack.append(stack.pop() + stack.pop()))
        elif r == '*':
            exec.append(lambda stack: stack.append(stack.pop() * stack.pop()))
        else:
            exec.append(lambda stack: stack.append(to_number(r)))
    return exec

And I get really interesting results:

Expected :42
Actual   :74
Expected :42
Actual   :210

I see right away that 74 is two times 37, the first value in the simple test (37 + 5). The 210 is far from obvious but presumably when I fix the first issue we’ll be good to go.

I’ve done some printing and debugging. I do not understand what is happening, but what I’m seeing is that the first constant is being used over and over, instead of the new ones that are put in by the lambda.

I should roll back. I do not.

A bit more printing and testing shows me that this lambda:

        else:
            rn = to_number(r)
            exec.append(lambda stack: stack.append(rn))
            print('push number', rn)

is always bound to the first occurrence of rn (37 in our first test). So each time a new value should be pushed, we push the first value we ever saw. Clearly I do not understand something. Now roll back.

This time, not rolling back when it first hit me that I “should” was OK: I zeroed in a bit more on the issue. Of course, we’ll never know what I might have done if I’d tried again. If I’d used lambda again, surely there’d be no joy. But if I had done smaller steps, or not used lambda? We’ll never know.

I decide the try a simple lambda before I close out. This test fails:

    def test_lambda(self):
        v = 2
        l1 = lambda: 2*v
        v = 3
        assert l1() == 4

The result is 6 not 4. The lambda is bound to the variable, not to the value.

I’m not quite sure how things unwind in the make_executable that I wrote above, but clearly something similar to this is happening, with the lambda in that case bound to the same constant.

Well. What to do? We have an interpret that works, so we can build from there. For this morning, however, we’ll stop and think a bit more about just how to do what we need to do. I think that what we’ll want is a somewhat more intelligent object to push onto the original rpn stack. We’ll pick this up, either this afternoon or tomorrow.

I am a bit disappointed, but glad to know something that won’t work, and interested to figure out what I do not understand about Python lambda.

See you next time!