FAFO on GitHub

We’ll take some steps along the expression path. I’m slightly questioning part of what we have. The path is zig-zag but leads to a good place.

Hello, Friends!

I’m a little troubled by this morning’s tests, which make the assumption that what the Expression executes is a list of functions. I do think that’s what we want, but the code has not told us that yet. Let’s do some more TDDing and see what comes out.

    def test_make_rpn(self):
        text = '21 * 2'
        rpn = Parser(text).rpn()
        assert rpn == ['21', '2', '*']

I want the code to tell me what the right little objects are to do this job. And I want to drive out the code with tests. I have to make some kind of assumptions, but I expect them to be wrong. This is going to require some test revision, but I don’t really see another way, other than to just code up the whole thing without tests, and that seems certain to lead me into trouble.

So we’ll go with that test.

class Parser:
    def __init__(self, expression_string):
        self._expr = expression_string

    def rpn(self):
        return ['21', '2', '*']

Green. Commit: initial Parser, fake it till you make it.

I decided there was more to code than I wanted, so I faked the answer. Not that I couldn’t do what was needed: I just didn’t want to.

Let’s write a lexing test:

    def test_lexing(self):
        text = '21 * 2'
        lex = Parser(text).lex()
        assert lex == [ '21', '*', '2']

Easily done, we have one we can rip off.

    def lex(self):
        no_space = self._expr.replace(' ', '')
        rx = '([^a-zA-Z0-9.])'
        return re.split(rx, no_space)

So let’s make the parser parse the lexed input. I’ll change the code and break my rpn test.

    def rpn(self):
        lexed = self.lex()
        result = self.parse(lexed)
        return result

I am going for ‘parse’, at least a simple version. It’s easy enough. Shall we fake it? No, let’s not.

    def parse(self, forward_lexed):
        stack = []
        result = []
        lexed = forward_lexed[::-1]
        while lexed:
            item = lexed.pop()
            if item in ['*']:
                stack.append(item)
            else:
                result.append(item)
        while stack:
            result.append(stack.pop())
        return result

Green, very sparse parse, but the main shape of things is in there. Commit: minimal parse.

I don’t like what we have. Let’s see what I dislike:

  • The parse method could be static;
  • Items are going to be checked individually to see if they are operators, right in the middle of parsing;
  • Priority is going to have to be looked up.

I conclude that lexing needs to be more helpful and to create more info. We’ll start with a little sequence, but we really want a tiny object, I think.

We can refactor and otherwise change this code, because we have tests that can break. I think I want a new method that will make tokens out of the lexed list.

    def parse(self, forward_lexed):
        stack = []
        result = []
        lexed = forward_lexed[::-1]
        tokens = self.make_tokens(lexed)
        while tokens:
            item = tokens.pop()
            if item in ['*']:
                stack.append(item)
            else:
                result.append(item)
        while stack:
            result.append(stack.pop())
        return result

And:

    def make_tokens(self, lexed_list):
        return [self.make_token(item) for item in lexed_list]

    def make_token(self, string_item):
        if string_item in ['*', '/']:
            return ('operator', string_item, 2)
        elif string_item in ['+', '-']:
            return ('operator', string_item, 1)
        elif string_item[0].isalpha():
            return ('scope', string_item, None)
        else:
            return ('literal', string_item, None)

I pulled that out of the air, and really need to test it. My bad.

This passes:

    def test_token_sequence(self):
        parser = Parser('')
        token = parser.make_token('*')
        assert token == ('operator', '*', 2)

Also the test_make_rpn fails:

Expected :['21', '2', '*']
Actual   :[('literal', '21', None), ('operator', '*', 2), ('literal', '2', None)]

No surprise there, fix up the test:

    def test_make_rpn(self):
        text = '21 * 2'
        rpn = Parser(text).rpn()
        assert rpn == [('literal', '21', None), ('literal', '2', None), ('operator', '*', 2)]

And the parsing can be improved a bit:

    def parse(self, forward_lexed):
        stack = []
        result = []
        lexed = forward_lexed[::-1]
        tokens = self.make_tokens(lexed)
        while tokens:
            item = tokens.pop()
            kind, value, priority = item
            if kind == 'operator':
                stack.append(item)
            else:
                result.append(item)
        while stack:
            result.append(stack.pop())
        return result

Green. Commit: initial cut at tokens.

I’m making good time but don’t know where I am. Let’s slow down, have a snack, and reflect.

I think these little tokens are a step in the right direction. I also think their specific contents are questionable. And, finally, never send a system object to do an application object’s job. Let’s make a tiny class for these guys. We’ll change that test above to drive out some tiny token behavior:

    def test_make_rpn(self):
        text = '21 * 2'
        rpn = Parser(text).rpn()
        assert rpn[0].kind == 'literal'
        assert rpn[1].kind == 'literal'
        assert rpn[2].kind == 'operator'

I would prefer to call kind type but that’s a well-known Python word, so let’s not. Test fails.

Change this:

    def make_token(self, string_item):
        if string_item in ['*', '/']:
            return Token('operator', string_item, 2)
        elif string_item in ['+', '-']:
            return Token('operator', string_item, 1)
        elif string_item[0].isalpha():
            return Token('scope', string_item, None)
        else:
            return Token('literal', string_item, None)

That drives out this:

class Token:
    def __init__(self, kind, value, priority):
        self.kind = kind
        self.value = value
        self.priority = priority

Things break. I was hoping they wouldn’t. Oh, it’s the unpack:

>           kind, value, priority = item
E           TypeError: cannot unpack non-iterable Token object

We’ll not unpack it:

    def parse(self, forward_lexed):
        stack = []
        result = []
        lexed = forward_lexed[::-1]
        tokens = self.make_tokens(lexed)
        while tokens:
            item = tokens.pop()
            if item.kind == 'operator':
                stack.append(item)
            else:
                result.append(item)
        while stack:
            result.append(stack.pop())
        return result

And another test still failing:

    def test_token_sequence(self):
        parser = Parser('')
        token = parser.make_token('*')
        assert token == ('operator', '*', 2)

Right, needs fixing:

    def test_token_sequence(self):
        parser = Parser('')
        token = parser.make_token('*')
        assert token.kind == 'operator'
        assert token.value == '*'
        assert token.priority == 2

Green. Commit: new Token object.

I think we’ll try another rpn test, with a plus and a times.

    def test_make_harder_rpn(self):
        text = 'pay * 1.1 + bonus'
        rpn = Parser(text).rpn()
        values = [t.value for t in rpn]
        assert values == ['pay', '1.1', '*', 'bonus', '+' ]

I think that’s what we want, and it makes a decently-reading test as well. Now if we can only make it work.

    def parse(self, forward_lexed):
        stack = []
        result = []
        lexed = forward_lexed[::-1]
        tokens = self.make_tokens(lexed)
        while tokens:
            item = tokens.pop()
            if item.kind == 'operator':
                while stack and stack[-1].priority > item.priority:
                    result.append(stack.pop())
                stack.append(item)
            else:
                result.append(item)
        while stack:
            result.append(stack.pop())
        return result

We are green. Commit: operators times and divide have higher priority than plus and minus. only times and plus actually tested. Patience advised.

We need a test for divide and subtract, and we need parentheses, and we are well behind on actual evaluation of the expression.

Evaluating will break at least one test. What we want to do is to pass a parser output to Expression and have it come up with the answer. Let’s write such a test, one that could work right now.

I think this is close to what we want:

    def test_round_trip(self):
        text = '10 * 2 * 2 + 2'
        rpn = Parser(text).rpn()
        result = Expression('Ignored', rpn).result(None)
        assert result == '42'

We need to redo result() to expect tokens and deal with them, not to expect executable things. At least not yet.

class Expression:
    def __init__(self, scope, operations):
        self._scope = scope
        self._operations = operations[::-1]

    def result(self, record):
        def to_number(string):
            return int(string)

        stack = []
        while self._operations:
            op = self._operations.pop()
            if op.kind == 'literal':
                stack.append(op.value)
            elif op.kind == 'operator':
                op1 = self.to_number(stack.pop())
                op2 = self.to_number(stack.pop())
                match op.value:
                    case '+':
                        res = op1 + op2
                    case '-':
                        res = op1 - op2
                    case '*':
                        res = op1 * op2
                    case '/':
                        res = op1 / op2
                    case _:
                        res = f'Unknown operator{op.value}'

                stack.append(str(res))
        return stack.pop()

The round trip test is green. Two others now fail, but they should. Let’s see whether to fix them or remove them and move on.

    def test_returns_constant(self):
        op = lambda self, stack, record: stack.append('42')
        record = None
        expression = Expression('Answer', [op])
        assert expression.scope() == 'Answer'
        assert expression.result(record) == '42'

    def test_rpn(self):
        op21 = lambda self, stack, record: stack.append('21')
        op2 = lambda self, stack, record: stack.append('2')
        def times(self, stack, number):
            op1 = self.to_number(stack.pop())
            op2 = self.to_number(stack.pop())
            stack.append(str(op1 * op2))

        operations = [op21, op2, times]
        expression = Expression('Answer', operations)
        assert expression.result(None) == '42'

Let me see if I can fix up the first one. If I can, the second should be easy as well.

    def test_returns_constant(self):
        op = Token('literal', '42', None)
        record = None
        expression = Expression('Answer', [op])
        assert expression.scope() == 'Answer'
        assert expression.result(record) == '42'

That goes green. Sweet!

    def test_rpn(self):
        op21 = Token('literal', '21', None)
        op2 = Token('literal', '2', None)
        times = Token('operator', '*', 2)
        operations = [op21, op2, times]
        expression = Expression('Answer', operations)
        assert expression.result(None) == '42'

Green. Sweet: Commit: Expression now operates on Tokens, Round trip complete.

Let’s reflect.

Reflection

So that went in a kind of zig-zag path, but it never felt like the wrong path. It was just that the rocks in the stream were not in a straight line across. It was a bit of a journey of discovery and yet the tests held up fairly well, without difficult revisions.

Could we have done better along the way? Perhaps, but it went so smoothly that I don’t think we can complain. Yes, we could have predicted that we’d need a Token, but the path we took made it pretty clear what the Token should look like, and we only lived with the intermediate sequence for a very short time.

We’ll take a look at the code and see what refactoring we might do. Perhaps we can move a little of the match-case logic over to lexing.

A larger question, of course, has to do with how far we wish to take this exercise. We certainly need parentheses, and a case could be made for unary minus. I’m not sure how the shunting yard handles that.

We also need to plug a record in and fetch a value from it, but that will be trivial. Less trivial will be creating a set that calculates fields on the fly. That will be interesting, I think. I’m wondering about it, particularly because we’ll have an XImplementation that is holding on to an XSet. In a sense, XSets are a design layer above XImplementations, but that may just be a bug in my mind. Wouldn’t be the only one.

Summary

We’ve done a few sessions of experimentation with converting expression strings to RPN and evaluating them, we have a few cooperating objects that are sharing the load in what seems like a reasonable division of responsibilities. We’ll be extending them, and we’ll be looking to see whether the responsibilities need to be rebalanced.

We have lost the character of the spike where the RPN was functions that we just called. If we could make token callable, or something like that, we might get an improvement, but it’s not bad as it stands.

But so far, it looks pretty decent! I’ll include the source below, so that it won’t bother you but will be there if you want to scan it.

See you next time!


import re


class Expression:
    def __init__(self, scope, operations):
        self._scope = scope
        self._operations = operations[::-1]

    def result(self, record):
        def to_number(string):
            return int(string)

        stack = []
        while self._operations:
            op = self._operations.pop()
            if op.kind == 'literal':
                stack.append(op.value)
            elif op.kind == 'operator':
                op1 = self.to_number(stack.pop())
                op2 = self.to_number(stack.pop())
                match op.value:
                    case '+':
                        res = op1 + op2
                    case '-':
                        res = op1 - op2
                    case '*':
                        res = op1 * op2
                    case '/':
                        res = op1 / op2
                    case _:
                        res = f'Unknown operator{op.value}'

                stack.append(str(res))
        return stack.pop()

    def scope(self):
        return self._scope

    def to_number(self, string):
        try:
            return int(string)
        except ValueError:
            return float(string)


class Parser:
    def __init__(self, expression_string):
        self._expr = expression_string

    def make_tokens(self, lexed_list):
        return [self.make_token(item) for item in lexed_list]

    def make_token(self, string_item):
        if string_item in ['*', '/']:
            return Token('operator', string_item, 2)
        elif string_item in ['+', '-']:
            return Token('operator', string_item, 1)
        elif string_item[0].isalpha():
            return Token('scope', string_item, None)
        else:
            return Token('literal', string_item, None)

    def parse(self, forward_lexed):
        stack = []
        result = []
        lexed = forward_lexed[::-1]
        tokens = self.make_tokens(lexed)
        while tokens:
            item = tokens.pop()
            if item.kind == 'operator':
                while stack and stack[-1].priority > item.priority:
                    result.append(stack.pop())
                stack.append(item)
            else:
                result.append(item)
        while stack:
            result.append(stack.pop())
        return result

    def rpn(self):
        lexed = self.lex()
        result = self.parse(lexed)
        return result

    def lex(self):
        no_space = self._expr.replace(' ', '')
        rx = '([^a-zA-Z0-9.])'
        return re.split(rx, no_space)


class Token:
    def __init__(self, kind, value, priority):
        self.kind = kind
        self.value = value
        self.priority = priority

    def __repr__(self):
        return f'Token({self.kind}, {self.value}, {self.priority})'