Expressions

FAFO on GitHub

I think I’ll break down and do expressions. I’m thinking a simple Dijkstra-style parser will do the job for us. Let’s see if we can find a simple way to develop one.

I learned the Dijkstra “shunting yard algorithm” ages ago, and I think it will serve to provide at least a simple form of expression calculation for us. It’s possible to evaluate expressions directly with the algorithm, or to produce Reverse Polish Notation (RPN) or an abstract syntax tree.

We’ll do a rough-and-ready version of all this. The point is just to get enough calculation done to see how it fits into our scheme. We can improve the parsing as needed.

Let’s begin by breaking up an expression into tokens.

    def test_lexing(self):
        expr = 'a123 + 37*b17 - c*37.5'
        no_space = expr.replace(' ', '')
        rx = '([^a-zA-Z0-9.])'
        lexed = re.split(rx, no_space)
        expected = ['a123', '+', '37', '*', 'b17', '-', 'c', '*', '37.5']
        assert lexed == expected

Remove all spaces, and then anything that isn’t alphanumeric plus dot is an operator. Yes, we would accept 123abc as a variable. Not gonna worry about it. Yes, scopes can be numbers. Not going to worry about that, this will work for alphanumeric scopes only. Probably.

Let’s see how this algorithm works.

    def test_dijkstra_1(self):
        lexed = ['a123', '+', '37']
        rpn = dij(lexed)
        assert rpn == ['a123', '37', '+']

    def test_dijkstra_2(self):
        lexed = ['a123', '+', '37', '*', 'b17', '-', 'c', '*', '37.5']
        rpn = dij(lexed)
        assert rpn == ['a123', '37', 'b17', '*', 'c', '37.5', '*', '-', '+']

I actually wrote test 2 first. And then typed in the python from the wikipedia page, minus parens and functions, just to get going:

def dij(lexed: list):
    priority = { '+':1, '-':1, '*': 2}
    stack = []
    output = []
    expr = lexed[::-1]
    while expr:
        token = expr.pop()
        if token[0].isalpha():
            output.append(token)
        elif token[0].isnumeric():
            output.append(token)
        else:
            o1 = token
            p1 = priority[o1]
            while stack and stack[-1] != '(' and priority[stack[-1]] > p1:
                o2 = stack.pop()
                output.append(o2)
            stack.append(o1)
    while stack:
        output.append(stack.pop())
    return output

That’s passing the test. So as our next trick, not to be accomplished today, we’ll want to interpret the RPN. And very likely we should improve it for our convenience. We’ll see how that goes.

Summary

We have a quick and dirty scheme that will convert a carefully formed expression to RPN, and it seems straightforward enough to interpret the RPN in the context of a record to evaluate and store an expression. Our parser does not currently deal with the name of the variable to be stored into. Easy enough to do, I should think. We can just rip anything up through an initial “=” off the front of the string. We’re not concerned at this point about malformed expressions.

What I want to explore is whether we can create a view of a set that calculates the expression on the fly, producing views of the individual records, that will then be able to be processed further as needed.

And after that, who knows?

I think this went well. Copying off of Dijkstra’s paper worked particularly well. We’ll refactor a bit soon enough.

See you next time!