Make it [More] Right

I think our little Forth is working as so far intended. Let’s improve the code. Some thought on the original Forth(s).

In History Back

Forth’s claim to fame was that it was fast and compact code, and it was easy to bootstrap onto a new processor, because it had the ability to define words using actual machine code. In those days, every bit of speed and size you could squeeze out paid off. Machines today provide three or four orders of magnitude more speed and storage.

Sometimes senior (ultra-senior?) [old] computer folks will deplore the fact that while machines are a zillion times more capable, nothing actually happens any faster, because the space and time are all consumed by huge operating systems blah blah etc etc. They have a point, but on the other hand, your tiny little Forth-running computer wasn’t running my editor, my PyCharm, my FTP client, Slack, and 17 browser tabs, while streaming The Bridge.

Here Chez Ron

We are concerned with changing code¹. We are concerned with the process of starting with nothing, very soon having something, evolving that something in sometimes surprising, certainly unanticipated directions, while keeping the code flexible and ready for further change. And we’re concerned with the fact that I find programming to be fun and diverting, in a time when many of us need more fun and diversion. So we are perfectly happy to write a Forth in Python, an interpreter on top of an interpreter, because it amuses us.

And while we do want our code to be reasonably efficient and compact, we are more concerned with making it easy to understand, easy to change, and expressive.

What we have this morning is, I believe, a Forth implementation that is pretty close to what we need. I think it can be used as we intend, as a scripting language for our robot world’s bots. I think it might even be useful for other things, with a few tweaks.

But what we also have this morning is code that, while it works, and while wee (I at least) believe it is capable of being extended as we need … it is not particularly good code.

Let me rephrase that: the current code is not even up to my standards. I don’t know that my betters would ever consider my best code to be very good. I do know that I don’t recall ever having written code that couldn’t be further improved. I think it was either Leonardo DaVinci or Paul Valery who said “A computer program is never finished, only abandoned”. Someone like that. Something like that.

So today, we’ll examine our code and [try to] improve it. Maybe even for a few days. There is probably a lot of ~~mediocre code~~ opportunity here.

Browsing

Since all our changes have been focused on the Forth class, and because it’s really the only largish thing we have, we’ll be focusing on that class. I bet we’ll find opportunities to improve other classes, and perhaps even opportunity to write some new ones.

The forth.py is about 232 lines now, so I won’t print it here for you to scan. I’ll do the scanning and pull out chunks that pop out at me as interesting and perhaps needing improvement.

process

I find the experimental process method that never went anywhere. It had two tests. I wire those up to use the current compile and they run. I remove process. Commit.

definitions

There are now six methods for defining primary words. Most of them are static, and with a little work they all could be. Each of them just stuffs definitions into the lexicon. The lexicon is just a list. If we had a Lexicon class it would likely be good. In fact, recall that we have an issue with redefining a Forth word. An object supporting lexicon behavior would likely help with that.

I’d like to get the primary definitions out of the way, but they’re not really a problem

init

Looking at the primary word definitions drew my attention to the init:

class Forth:
    def __init__(self):
        self.active_words = []
        self.compile_stack = Stack()
        self.lexicon = []
        self.define_primaries()
        self.rest_iter = None
        self.return_stack = Stack()
        self.stack = Stack()
        self.tokens = None
        self.token_index = 0

I don’t think we use rest_iter any more. PyCharm agrees, as do the tests. Remove, commit.

I notice tokens and token_index. Even if we hadn’t just done it yesterday, this is a dead giveaway that those two variables relate to each other. Turns out that their relationship is in the very next method, not because it really belongs there, but because I wrote it there for convenience:

    def next_token(self):
        if self.token_index >= len(self.tokens):
            return None
        token = self.tokens[self.token_index]
        self.token_index += 1
        return token

So a list of tokens and the index into them is trying to be a thing. We’ll defer that concern. Let’s look at something that might be more fruitful:

    def compile(self, text):
        new_text = re.sub(r'\(.*?\)', ' ', text)
        self.tokens = new_text.split()
        self.token_index = 0
        while self.token_index < len(self.tokens):
            word_list = self.compile_word_list()
            word = SecondaryWord('', word_list)
            word.do(self)

    def compile_word_list(self):
        word_list = []
        while True:
            token = self.next_token()
            if token is None:
                copy = word_list[:]
                word_list.clear()
                return copy
            if token in [':', ';','IF', 'THEN', 'ELSE', 'BEGIN', 'UNTIL', 'DO', 'LOOP']:
                self.compile_action_word(token, word_list)
            elif (definition := self.find_word(token)) is not None:
                word_list.append(definition)
            elif (num := self.compile_number(token)) is not None:
                definition = self.find_word('*#')
                word_list.append(definition)
                word_list.append(num)
            else:
                raise SyntaxError(f'Syntax error: "{token}" unrecognized')
            if self.compile_stack.is_empty():
                copy = word_list[:]
                word_list.clear()
                return copy

By my lights, code is not supposed to look like that. It’s jagged and bumpy, and hard to figure out what is going on. The early return in compile_word_list isn’t helping. The while True isn’t terribly helpful either, even if there were only one way out.

We exit in two cases. If there is no token to process, we exit immediately. If, after processing the token, the compile stack is empty, we exit.

I think it’s likely that if we run out of tokens and the compile stack is not empty, it’s an error, one that we don’t detect, although doubtless something will go wrong soon …

I remove the check for token being None and all the tests still run. Fine, keep it out. Now we have this:

    def compile_word_list(self):
        word_list = []
        while True:
            token = self.next_token()
            if token in [':', ';','IF', 'THEN', 'ELSE', 'BEGIN', 'UNTIL', 'DO', 'LOOP']:
                self.compile_action_word(token, word_list)
            elif (definition := self.find_word(token)) is not None:
                word_list.append(definition)
            elif (num := self.compile_number(token)) is not None:
                definition = self.find_word('*#')
                word_list.append(definition)
                word_list.append(num)
            else:
                raise SyntaxError(f'Syntax error: "{token}" unrecognized')
            if self.compile_stack.is_empty():
                copy = word_list[:]
                word_list.clear()
                return copy

I try this:

    def compile_word_list(self):
        word_list = []
        while True:
            token = self.next_token()
            if token in [':', ';','IF', 'THEN', 'ELSE', 'BEGIN', 'UNTIL', 'DO', 'LOOP']:
                self.compile_action_word(token, word_list)
            elif (definition := self.find_word(token)) is not None:
                word_list.append(definition)
            elif (num := self.compile_number(token)) is not None:
                definition = self.find_word('*#')
                word_list.append(definition)
                word_list.append(num)
            else:
                raise SyntaxError(f'Syntax error: "{token}" unrecognized')
            if self.compile_stack.is_empty():
                break
        copy = word_list[:]
        word_list.clear()
        return copy

Extract a method for the num case:

    def compile_word_list(self):
        word_list = []
        while True:
            token = self.next_token()
            if token in [':', ';','IF', 'THEN', 'ELSE', 'BEGIN', 'UNTIL', 'DO', 'LOOP']:
                self.compile_action_word(token, word_list)
            elif (definition := self.find_word(token)) is not None:
                word_list.append(definition)
            elif (num := self.compile_number(token)) is not None:
                self.append_number(num, word_list)
            else:
                raise SyntaxError(f'Syntax error: "{token}" unrecognized')
            if self.compile_stack.is_empty():
                break
        copy = word_list[:]
        word_list.clear()
        return copy

Next, I think I want to extract those last three lines, but there is a related larger concern: look at these two together:

    def compile(self, text):
        new_text = re.sub(r'\(.*?\)', ' ', text)
        self.tokens = new_text.split()
        self.token_index = 0
        while self.token_index < len(self.tokens):
            word_list = self.compile_word_list()
            word = SecondaryWord('', word_list)
            word.do(self)

    def compile_word_list(self):
        word_list = []
        while True:
            token = self.next_token()
            if token in [':', ';','IF', 'THEN', 'ELSE', 'BEGIN', 'UNTIL', 'DO', 'LOOP']:
                self.compile_action_word(token, word_list)
            elif (definition := self.find_word(token)) is not None:
                word_list.append(definition)
            elif (num := self.compile_number(token)) is not None:
                self.append_number(num, word_list)
            else:
                raise SyntaxError(f'Syntax error: "{token}" unrecognized')
            if self.compile_stack.is_empty():
                break
        copy = word_list[:]
        word_list.clear()
        return copy

The compile method doesn’t want a word list, it wants a Word to execute. So compile_word_list would be more helpful if it were compile_a_word. Like this:

    def compile(self, text):
        new_text = re.sub(r'\(.*?\)', ' ', text)
        self.tokens = new_text.split()
        self.token_index = 0
        while self.token_index < len(self.tokens):
            self.compile_a_word().do(self)

    def compile_a_word(self):
        word_list = []
        while True:
            token = self.next_token()
            if token in [':', ';','IF', 'THEN', 'ELSE', 'BEGIN', 'UNTIL', 'DO', 'LOOP']:
                self.compile_action_word(token, word_list)
            elif (definition := self.find_word(token)) is not None:
                word_list.append(definition)
            elif (num := self.compile_number(token)) is not None:
                self.append_number(num, word_list)
            else:
                raise SyntaxError(f'Syntax error: "{token}" unrecognized')
            if self.compile_stack.is_empty():
                break
        copy = word_list[:]
        word_list.clear()
        return SecondaryWord('nameless', copy)

I think we could get rid of the word list and create the secondary right out of the box but we’ll let this ride for now. But now there is no need for the rigmarole around copying the list. We’re returning from the method, so next time around, we’ll create a new list, not clearing the old one.

    def compile(self, text):
        new_text = re.sub(r'\(.*?\)', ' ', text)
        self.tokens = new_text.split()
        self.token_index = 0
        while self.token_index < len(self.tokens):
            self.compile_a_word().do(self)

    def compile_a_word(self):
        word_list = []
        while True:
            token = self.next_token()
            if token in [':', ';','IF', 'THEN', 'ELSE', 'BEGIN', 'UNTIL', 'DO', 'LOOP']:
                self.compile_action_word(token, word_list)
            elif (definition := self.find_word(token)) is not None:
                word_list.append(definition)
            elif (num := self.compile_number(token)) is not None:
                self.append_number(num, word_list)
            else:
                raise SyntaxError(f'Syntax error: "{token}" unrecognized')
            if self.compile_stack.is_empty():
                break
        return SecondaryWord('nameless', word_list)

Now I don’t like that long list of keywords. Let’s make it a class member.

class Forth:
    action_tokens = [':', ';','IF', 'THEN', 'ELSE', 'BEGIN', 'UNTIL', 'DO', 'LOOP']

    def compile_a_word(self):
        word_list = []
        while True:
            token = self.next_token()
            if token in self.action_tokens:
                self.compile_action_word(token, word_list)
            elif (definition := self.find_word(token)) is not None:
                word_list.append(definition)
            elif (num := self.compile_number(token)) is not None:
                self.append_number(num, word_list)
            else:
                raise SyntaxError(f'Syntax error: "{token}" unrecognized')
            if self.compile_stack.is_empty():
                break
        return SecondaryWord('nameless', word_list)

I like that better. Commit.

Would find_word be better named as definition? Or find_definition? Would compile_number be better named number or literal? Would compile_action_word be better named ‘action’ or take_action? perform_action?

We’ll let those concerns bubble for a while, see if they bother us enough to make a change. But there’s this:

    def append_number(self, num, word_list):
        definition = self.find_word('*#')
        word_list.append(definition)
        word_list.append(num)

Every time we parse a number we look up the definition of *#. That seems kind of wasteful. But I think we’ll address that fairly well later.

We spoke above about the lexicon being a simple list, and of the ~~bug~~ ~~defect~~ fact that we need to deal with words being redefined, and that we might possibly want the Forth word FORGET, which forgets the current definition and exposes a previous one if there is one. (It also forgets everything else defined after the word that we FORGET.)

I think that one of these first days we’ll try a Lexicon object that embodies a dictionary, which will make lookup O(1) instead of O(N) for some small N. That will make me feel OK about doing the find on every literal.

I might inline this, however:

    def append_number(self, num, word_list):
        word_list.append(self.find_word('*#'))
        word_list.append(num)

And I did. One more thing that I just noticed:

    def i_word(self):
        limit, index = self.return_stack[-1]
        self.stack.push(index)

    def star_do(self):
        start = self.stack.pop()
        limit = self.stack.pop()
        self.return_stack.push((limit, start))

    def star_loop(self):
        jump = self.next_word()
        limit, index = self.return_stack.pop()
        index += 1
        if index < limit:
            self.return_stack.push((limit, index))
            self.active_word.skip(jump)

If we were to push (start, limit) instead of (limit, start), this code could be more compact. The decision is local to these three methods. So:

    def i_word(self):
        index, limit = self.return_stack[-1]
        self.stack.push(index)

    def star_do(self):
        start = self.stack.pop()
        limit = self.stack.pop()
        self.return_stack.push((start, limit))

    def star_loop(self):
        jump = self.next_word()
        index, limit = self.return_stack.pop()
        index += 1
        if index < limit:
            self.return_stack.push((index, limit))
            self.active_word.skip(jump)

Commit. And then … I was going to do this:

    def star_do(self):
        self.return_stack.push(((self.stack.pop()), (self.stack.pop())))

Yes, more compact. But too cryptic. The explaining variable names are to valuable to toss out. Revert that.

OK, we were down to something trivial and then didn’t do it. Time to stop. There’s a larger change coming tomorrow, but let’s sum up for now.

Summary

We found a few dregs, things that weren’t used, or that didn’t need to be used, and removed them. Every little bit we can delete is something we won’t have to look at, look over, overlook, or wonder about later.

The compile and compile_word_list methods became compile and compile_a_word, and went from 32 lines to 22. (There are a couple of new subordinate methods, but chez Ron, “out of sight out of mind” is a real design strategy. If we have a meaningful name like append_number, we don’t worry about the details, so they are out of our face unless we are interested in them.)

The compile method actually wanted a SecondaryWord so instead of having it create one, we had compile_a_word take that responsibility, and that’s its only responsibility, even though it has to work pretty hard to do that job. compile doesn’t care, it asks for a word and gets one. It’s all good.

I think we might benefit from some more small objects, including

A Lexicon with faster lookup, the ability to handle redefinitions and possibly FORGET. It might even offload the definition of our PrimaryWords, getting it out of the Forth class.
A token provider that would let us remove the indexing logic from right up front in the Forth class.

There is another naked list in Forth class, active_words, which is actually a stack and could be implemented with one.

All those are for another day. Tomorrow we’ll deal with a much messier situation, the action words handler.

Small changes add up to code that is easier to understand and to deal with. Perhaps there are important things that cannot be done ins mall steps … but the more I look, the more I seem to find ways to get to better code without those huge delays.

See you next time!

H/T GeePaw Hill ↩