Forth: 0 IF THIS

The Repo on GitHub

Today I believe we’ll actually get IF-THEN working. But first, I want to mention an odd thinking mistake that I made. (Result: Good changes, no work on IF. Yet.)

Weird Thinking

When you read about how to implement a language like Forth, there’s a lot of very interesting machine code involved, because both speed and space were quite precious back in the day. So possibly that’s what caused me to do this odd thing.

We have our lexicon, which is a list of words, each with a string name. PrimaryWords, the ones that are the foundation of all words, also have a lambda expression that defines the world. The lambda will do the work directly where possible, but owing to the limitations of lambdas in Python, sometimes they’ll call a method on Forth.

In Forth, secondary words are made up of other words, primaries or other secondaries. Because Forth is as simple as it is, basically just an RPN calculator with pretensions, a secondary word is just a linear list of calls to other words. In a real Forth, I think each secondary would amount to a compact table of very short calls to other words. In the Forth we’re writing here, we can’t really just branch to random code and so I decided that a SecondaryWord would be a list of the indices of the words its calls. So when we execute a Secondary, it goes like this:

class SecondaryWord:
    def do(self, forth):
        forth.begin(self)
        lexicon = forth.lexicon
        self.pc = 0
        while self.pc < len(self.word_indices):
            lexicon[self.next_word()].do(forth)
        forth.end()

    def next_word(self):
        word =  self.word_indices[self.pc]
        self.pc += 1
        return word

We index through the word’s word_indices member, then fetch the lexicon entry at that index and execute its do.

Readers may recall the comment in the compile method from yesterday:

class Forth:
    def compile(self, text):
        # why don't we just store the word in the list, it's no larger than the index
        words = text.split()
        ...

A typical SecondaryWord isn’t huge but it has a string containing its name, and it has a list of other words, generally only a few, because long words in Forth are hard to write.

But here’s the odd thing. I chose to use the indices because they are short. I didn’t put the referenced Words themselves in the Secondary because they were too long! Now in a real Forth, that’s a valid concern, and the Secondary consists of short calls to the subordinate words. But this is the 21st century, and an integer object reference is no shorter than a reference to any other object (as far as we know).

But I was thinking “List of words in secondary, no, words are too big, use their index in the table.”

That’s just a very odd mistake in thinking, it seems to me. Save space by storing a subscript into a table instead of the object from the table. Weird.

Let’s see about fixing this. No time like the present, but I think it may be a bit tricky, not in the code but in the tests. We’re on a fresh commit, so let’s just try it and see what breaks.

The action is here:

class Forth:
    def compile_word_list(self, rest):
        word_list = []
        for word in rest:
            if (ix := self.find_word_index(word)) is not None:
                word_list.append(ix)
            elif (num := self.compile_number(word)) is not None:
                ix = self.find_word_index('*#')
                word_list.append(ix)
                word_list.append(num)
            else:
                raise SyntaxError(f'Syntax error: "{word}" unrecognized')
        return word_list

We want to append the words themselves to the list, not the index. We have a find_word method, though we’ll want to improve it if we keep this idea.

I rename the temp:

    def compile_word_list(self, rest):
        word_list = []
        for word in rest:
            if (definition := self.find_word_index(word)) is not None:
                word_list.append(definition)
            elif (num := self.compile_number(word)) is not None:
                definition = self.find_word_index('*#')
                word_list.append(definition)
                word_list.append(num)
            else:
                raise SyntaxError(f'Syntax error: "{word}" unrecognized')
        return word_list

Then change the two calls to find_word_index:

    def compile_word_list(self, rest):
        word_list = []
        for word in rest:
            if (definition := self.find_word(word)) is not None:
                word_list.append(definition)
            elif (num := self.compile_number(word)) is not None:
                definition = self.find_word('*#')
                word_list.append(definition)
                word_list.append(num)
            else:
                raise SyntaxError(f'Syntax error: "{word}" unrecognized')
        return word_list

Three tests fail, but we’re not done causing chaos yet. We need to fix do in the secondary:

    def do(self, forth):
        forth.begin(self)
        lexicon = forth.lexicon
        self.pc = 0
        while self.pc < len(self.word_indices):
            self.next_word().do(forth)
        forth.end()

    def next_word(self):
        word =  self.word_indices[self.pc]
        self.pc += 1
        return word

We’ll want to rename that word_indices when this works. Still just three tests failing. I had expected worse or better. Some kind of change. Let’s see what they are worried about.

    def find_word(self, word):
>       return self.lexicon[self.find_word_index(word)]
E       TypeError: list indices must be integers or slices, not NoneType

This confuses me but let’s see what find_word_index is up to.

    def find_word_index(self, word):
        lex = self.lexicon
        for i in range(len(lex)):
            if lex[i].name == word:
                return i
        return None

We didn’t find a name we were looking for. Let’s just print.

did not find UNKNOWN_WORD

I’ll patch find_word for now, not really a good idea, but quick.

    def find_word(self, word):
        index = self.find_word_index(word)
        if index is None:
            return None
        return self.lexicon[index]

Down to one failure.

    def test_lit_hand_compiled(self):
        f = Forth()
        # s = ': 3 DUP +'
        lit = f.find_word_index('*#')
        dup = f.find_word_index('DUP')
        plus = f.find_word_index('+')
        indices = [lit, 3, dup, plus]
        sw = SecondaryWord('TEST', indices)
        sw.do(f)
        assert f.stack.pop() == 6

Right, that’s incorrectly hand-compiled now. We require:

    def test_lit_hand_compiled(self):
        f = Forth()
        # s = ': 3 DUP +'
        lit = f.find_word('*#')
        dup = f.find_word('DUP')
        plus = f.find_word('+')
        indices = [lit, 3, dup, plus]
        sw = SecondaryWord('TEST', indices)
        sw.do(f)
        assert f.stack.pop() == 6

Works. Let’s sort out those finds before we commit this little change. No. We’re green, therefore commit: converted to use words in secondary, not indices.

class Forth:
    def find_word(self, word):
        for definition in self.lexicon:
            if definition.name == word:
                return definition
        return None

That’s better. The index version was no longer used and is removed. Green. Commit: tidying

Reflection

So, a nice little improvement, removal of a method, simplification of some code, which happens to be pretty much the main loop of Forth. But let’s talk about find_word there in the Forth class. What is it doing? It’s searching the lexicon. Whose job should it be to search the lexicon? That’s right: searching the lexicon should be handled by the lexicon. Why isn’t it? Here’s why:

class Forth:
    def __init__(self):
        self.stack = Stack()
        self.lexicon = []
        self.define_primaries()
        self.active_words = []

The lexicon is a vanilla list. We should make our find method shorter. A favorite trick of some Pythonistas would be this:

    def find_word(self, word):
        return next(filter(lambda d: d.name == word, self.lexicon), None)

Commit: “improve” find_word.

That’s all well and good, but the Lexicon should really be helping us more than it is. I’ll make a card for that. But there is one more thing we should do, here:

class SecondaryWord:
    def __init__(self, name, word_indices):
        self.name = name
        self.word_indices = word_indices
        self.pc = 0

word_indices isn’t the right name, as we noted above. It’s the actual words now.

class SecondaryWord:
    def __init__(self, name, word_list):
        self.name = name
        self.words = word_list
        self.pc = 0

In this method, PyCharm has vagued the word lexicon:

    def do(self, forth):
        forth.begin(self)
        lexicon = forth.lexicon
        self.pc = 0
        while self.pc < len(self.words):
            self.next_word().do(forth)
        forth.end()

That tells me that we’re not using it and someone (me) should remove the line. Done. Commit: tidying.

Let’s break here. This took up more lines than I had anticipated.

Summary

An odd brain thing caused me to write code with a level of indirection that provided no advantage and just obfuscated the code, We fixed that, and a few related little things. We note the feature envy that suggests that the Lexicon might want to be a first-class object.

We leave the code a bit better. I’ll close this article and start a new one. IF-THEN for sure. See you there!