The Repo on GitHub

I think I see, vaguely, what needs to be done to get our Forth to accept any Forth string and execute it. I am hopeful, if not entirely confident. Spoiler: It works!!!

Well, I should say “if not entirely short-term confident”. Medium term, I’m almost always confident that we’ll find a way. I am not what you could call confident that today’s attempt will succeed. We’ll find out.

Intuitively, what needs to happen is that when we are processing our list of tokens, we maintain two states, compiling and executing. Normally, we are executing, and in that state, we’ll pull one token from the list, look it up, and return it, where our outer loop will execute it. Some words, like ‘:’ and ‘IF’, enter compiling mode, and in that mode, they build up a list of words from the input tokens, until they find the matching ‘;’ or ‘THEN’. When the match is found, they can return a list of words, or, in the case of ‘:’ and ‘; ‘, they store away a word in the lexicon and return … what? I think they might return nothing, or they might consume the next word locally and carry on.

The compiling mode has to stack. Inside a loop there can be an IF, and so on.

Let’s think in a bit more detail, about an IF-THEN inside a DO-LOOP. Suppose we have something like this:

10 0 DO I 5 > IF 1 ELSE 0 THEN LOOP

I think that code will loop 10 tines from 0 through 9 and will put a one on the stack if the loop index is greater than five, otherwise zero. So the stack will end up as ‘0 0 0 0 0 0 1 1 1 1’. Not that we care, we’re here to talk about how that series of tokens should be compiled. It goes something like this:

  • See 10, put *# 10 into the output, return it, execute it, pushing 10 onto the stack;
  • See 0, put *# 0 into the output, return it, execute it, pushing 0 onto the stack;
  • See DO, enter compile mode, push ‘DO’ and the current location onto the compile stack, put *DO in the output, and carry on:
  • …. See I, put its code in the output;
  • …. See 5, put its code in the output;
  • …. See >, put its code in the output;
  • …. See IF, push ‘IF’ and the location onto the compile stack, put *IF in the output
  • …. …. See 1, put its code in the output;
  • …. …. etc etc
  • …. …. See THEN, use the stack info to patch the code in the output.
  • …. …. POP the compile stack (but we’re still in compile mode because of the DO)
  • ….. See LOOP, put the loop-ending code into the output, pop the stack
  • No longer in compile mode, return all that stuff we just compiled;
  • Execute it.

I find that hard to think about even step by step. Perhaps it’s more clear to you, but owing to the inadequacy of my explanation, perhaps it is not. Let’s look now at the current code and see how we might change it, or write different code, to do what we need.

    def compile(self, text):
        new_text = re.sub(r'\(.*?\)', ' ', text)
        words = new_text.split()
        match words:
            case ':', defining, *rest, ';':
                word_list = self.compile_word_list(rest)
                word = SecondaryWord(defining, word_list)
                self.lexicon.append(word)
                return word
            case _:
                raise SyntaxError(f'Syntax error: "{text}". Missing : or ;?')

    def compile_word_list(self, rest):
        word_list = []
        self.rest_iter = iter(rest)
        while word := next(self.rest_iter, None):
            if word in [':', ';','IF', 'THEN', 'ELSE', 'BEGIN', 'UNTIL', 'DO', 'LOOP']:
                self.compile_action_word(word, word_list)
            elif (definition := self.find_word(word)) is not None:
                word_list.append(definition)
            elif (num := self.compile_number(word)) is not None:
                definition = self.find_word('*#')
                word_list.append(definition)
                word_list.append(num)
            else:
                raise SyntaxError(f'Syntax error: "{word}" unrecognized')
        return word_list

That’s how things work now. We also sketched this yesterday:

    def process(self, word_string):
        to_compile = ': XYZZY ' + word_string + ' ;'
        code = self.compile(to_compile)
        to_execute = self.find_word('XYZZY')
        to_execute.do(self)

There, what I did was to take the Forth statement and compile a word with a made-up name and then look it up and execute it. That works, and would handle IF and DO OK, but could not handle a colon-semicolon definition.

Belay that, I’m sorry we looked at it. Retained for narrative completeness.

I almost regret having the PrimaryWord and SecondaryWord classes. Why? I’m not sure. It almost feels like if we had just written code that dealt with a list of words, things would be simpler. And I need simpler to deal with this, so that I don’t have to keep too many stacks in my head.

However, our compile_word_list method does return a list of words, and it is currently the top level that wraps them into a SecondaryWord and stores them, because it knows it is compiling a definition.

Let’s posit two notions:

  1. The word list that comes back from compile_word_list is intended to be executed. Therefore, our compile method will wrap whatever comes back into an anonymous SecondaryWord, and the execute it by sending do to it.
  2. The colon-semi will be handled inside compile_word_list, taking the then-current word list, making a SecondaryWord, and storing it in the lexicon.

Turns out that I’m wrong here.

For that to work, I think, we’ll have to save and restore the entire working word list on the compile stack. No problem, we can do that. We may need to make it a member variable. We’ll probably make a bit of a mess but we know we can refactor. Should be OK.

Ah. I think that gives us a way forward without breaking too much.

What if we go ahead and implement colon and semi inside compile_word_list. That won’t break anything because so far, we are dealing with colon definitions outside, in compile. Then we’ll find a way to un-wire that fixed point and use the new colon-semi code.

We’ll need tests along the way and all that but it might be a way forward.

Turns out that I’m right here.

I might be wrong about needing to save and restore the word list. Let’s wait and see.

We currently just have the beginning of the new colon-semi code, here:

    def compile_action_word(self, word, word_list):
        match word:
            case ':':
                definition_name = next(self.rest_iter)
                self.compile_stack.push((':', definition_name))
            case ';':
                pass
            case 'IF':
                self.compile_conditional('*IF', word_list)
            ...

So if we even did see a colon, we’d push that onto the compile stack, with the name of the word being defined, next(self.rest_iter). Everything will proceed, adding more to the word list until we hit the semicolon. Ah. We want to wrap up the definition, and that means that we need to remove from the output word_list, all the words that have gone into the word list while the colon was active. Ah, but wait, there’s more.

When we encounter a colon definition, shouldn’t the output word list be empty at that point? Let’s require that.

    def compile_action_word(self, word, word_list):
        match word:
            case ':':
                if word_list:
                    raise SyntaxError(f'Syntax error: "{word_list}" not empty')
                definition_name = next(self.rest_iter)
                self.compile_stack.push((':', definition_name))
            ...

That may be righteous but even if it isn’t, we won’t be handling it, so the error will drive out whatever we have to do if there is some legitimate way for that to happen.

Now we can, in the semi-colon handler, define our word and empty the word list. I think I can do the define part.

    def compile_action_word(self, word, word_list):
        match word:
            case ':':
                if word_list:
                    raise SyntaxError(f'Syntax error: "{word_list}" not empty')
                definition_name = next(self.rest_iter)
                self.compile_stack.push((':', definition_name))
            case ';':
                key, definition_name = self.compile_stack.pop()
                if key != ':':
                    raise SyntaxError(f'Syntax error: ; without :')
                word = SecondaryWord(definition_name, word_list)
                self.lexicon.append(word)

Now can we empty the word list from here? We can send it clear. That could work. No. If we clear it, we’ll be clearing the one in the definition, object = object. We need a copy of the list in the object.

            case ';':
                key, definition_name = self.compile_stack.pop()
                if key != ':':
                    raise SyntaxError(f'Syntax error: ; without :')
                word = SecondaryWord(definition_name, word_list[:])
                self.lexicon.append(word)
                word_list.clear()

Yes, boys, girls, and others, a_list[:] makes a shallow copy of the list. I learned that the other day when I ran across some code that did that.

I actually think that those colon and semicolon bits will work. Let’s try a test. This will be seriously weird: I’m going to try calling the compile_action_word method directly. We’ll creep up on it with a lot of printing.

    def test_new_colon(self):
        f = Forth()
        s = ': ADD5 5 + ;'
        rest = s.split()
        f.rest_iter = iter(rest)
        colon = next(f.rest_iter)
        word_list = []
        f.compile_action_word(colon, word_list)
        stacked = f.compile_stack[0]
        assert stacked == (':', 'ADD5')

I have to fudge around a lot to set up all the members correctly but after we’ve compiled the colon, the compile stack is good. We can’t put any words in vie compilation but I can insert some directly. I have this ugly test, but the beauty of it is that it works:

    def test_new_colon(self):
        f = Forth()
        s = ': ADD5 5 + ;'
        rest = s.split()
        f.rest_iter = iter(rest)
        colon = next(f.rest_iter)
        word_list = []
        f.compile_action_word(colon, word_list)
        stacked = f.compile_stack[0]
        assert stacked == (':', 'ADD5')
        five = next(f.rest_iter)
        assert five == '5'
        lit = f.find_word('*#')
        word_list.append(lit)
        word_list.append(5)
        plus_sign = next(f.rest_iter)
        assert plus_sign == '+'
        plus = f.find_word('+')
        word_list.append(plus)
        f.compile_action_word(';', word_list)
        assert word_list == []
        add5 = f.find_word('ADD5')
        f.stack.push(37)
        add5.do(f)
        assert f.stack.pop() == 42

The test above fudges the insides of the word ADD5, but leaves the colon and semi to be done by our new code in compile_action_word, and then looks up the word and executes it, adding 5 to 37 to get, of course, 42.

Note that the test above checks word_list after compiling the semicolon, to be sure that it is empty. I put that in just before I wrote this note, but pasted it into the previous display of the test to save us all looking at the before-after.

Let’s reflect.

Reflection

I think we can be confident that the handlers for colon and semicolon, in compile_action_word, work as advertised. I believe we should check the compile stack for being empty in the colon handler: you can’t be defining a word inside defining a word, nor can you define one inside an if, at least by my lights.

class Forth:
        ...
            case ':':
                if self.compile_stack.is_not_empty():
                    raise SyntaxError(f'Syntax error: nested word definition')
                if word_list:
                    raise SyntaxError(f'Syntax error: "{word_list}" not empty')
                definition_name = next(self.rest_iter)
                self.compile_stack.push((':', definition_name))
            ...

class Stack:
    def is_not_empty(self):
        return len(self.stack) > 0

Now what needs to happen? Contents may include:

  • Remove the special ‘:’, ‘;’ handling in compile (or replace compile entirely).
  • Arrange for compile_word_list to return the word list whenever the compile stack is empty.
  • Arrange for compile to wrap the word list it gets back in a SecondaryWord and execute it.

Let’s commit what we have: colon and semi work in compile_action_word.

I think we have to do all three of those things before everything works again … if it even does then. Here’s compile_word_list:

    def compile_word_list(self, rest):
        word_list = []
        self.rest_iter = iter(rest)
        while word := next(self.rest_iter, None):
            if word in [':', ';','IF', 'THEN', 'ELSE', 'BEGIN', 'UNTIL', 'DO', 'LOOP']:
                self.compile_action_word(word, word_list)
            elif (definition := self.find_word(word)) is not None:
                word_list.append(definition)
            elif (num := self.compile_number(word)) is not None:
                definition = self.find_word('*#')
                word_list.append(definition)
                word_list.append(num)
            else:
                raise SyntaxError(f'Syntax error: "{word}" unrecognized')
        return word_list

We need to loop forever, I think, returning from inside.

    def compile_word_list(self, rest):
        word_list = []
        self.rest_iter = iter(rest)
        while True:
            word = next(self.rest_iter, None)
            if word is None:
                copy = word_list[:]
                word_list.clear()
                return copy
            if word in [':', ';','IF', 'THEN', 'ELSE', 'BEGIN', 'UNTIL', 'DO', 'LOOP']:
                self.compile_action_word(word, word_list)
            elif (definition := self.find_word(word)) is not None:
                word_list.append(definition)
            elif (num := self.compile_number(word)) is not None:
                definition = self.find_word('*#')
                word_list.append(definition)
                word_list.append(num)
            else:
                raise SyntaxError(f'Syntax error: "{word}" unrecognized')
            if self.compile_stack.is_empty():
                copy = word_list[:]
                word_list.clear()
                return copy

I have 15 tests failing, but am not surprised. I am filled with uncertainty, but not surprise. I’ll go ahead and do what I said above needed to be done. I change compile from this:

    def compile(self, text):
        new_text = re.sub(r'\(.*?\)', ' ', text)
        words = new_text.split()
        match words:
            case ':', defining, *rest, ';':
                word_list = self.compile_word_list(rest)
                word = SecondaryWord(defining, word_list)
                self.lexicon.append(word)
                return word
            case _:
                raise SyntaxError(f'Syntax error: "{text}". Missing : or ;?')

To this:

    def compile(self, text):
        new_text = re.sub(r'\(.*?\)', ' ', text)
        words = new_text.split()
        word_list = self.compile_word_list(words)
        word = SecondaryWord('', word_list)
        word.do(self)

Right, now 17 tests fail. Ah … we have totally changed the semantics of compile. It used to return a word and then we called do. Now we have executed the word, if I’m not mistaken. So this test:

    def test_lit_compiled(self):
        f = Forth()
        f.compile(': TEST 3 4 + ;').do(f)
        assert f.stack.pop() == 7

Needs to be changed to this:

    def test_lit_compiled(self):
        f = Forth()
        f.compile('3 4 +')
        assert f.stack.pop() == 7

However, it still fails. This time we should learn something useful.

>       assert f.stack.pop() == 7
E       assert 3 == 7

Ah. Of course. Now compile executed the 3 and returned. It needs to eat all the tokens, not just quit after the first return. I’ll put compile back the way it was. and try again from there.

I think we have to make the list of tokens a member, so that compile_word_list and compile can both see it, with compile word list doing the work, but compile calling again and again until the tokens are gone.

And I think we should do a better job of keeping notions of word and token separate. Anyway, trying again:

    def compile(self, text):
        new_text = re.sub(r'\(.*?\)', ' ', text)
        self.tokens = new_text.split()
        self.token_index = 0
        while self.token_index < len(self.tokens):
            word_list = self.compile_word_list()
            word = SecondaryWord('', word_list)
            word.do(self)

    def compile_word_list(self):
        word_list = []
        while True:
            token = self.next_token()
            if token is None:
                copy = word_list[:]
                word_list.clear()
                return copy
            if token in [':', ';','IF', 'THEN', 'ELSE', 'BEGIN', 'UNTIL', 'DO', 'LOOP']:
                self.compile_action_word(token, word_list)
            elif (definition := self.find_word(token)) is not None:
                word_list.append(definition)
            elif (num := self.compile_number(token)) is not None:
                definition = self.find_word('*#')
                word_list.append(definition)
                word_list.append(num)
            else:
                raise SyntaxError(f'Syntax error: "{token}" unrecognized')
            if self.compile_stack.is_empty():
                copy = word_list[:]
                word_list.clear()
                return copy

Yeah, right, 28 failing. Let’s hope this is easy. It appears that the test_lit works. Maybe we just have to recast all the tests to the new format.

But this fails:

    def test_rot(self):
        forth = Forth()
        forth.stack.extend([0, 1, 2, 3, 4, 5, 6])
        forth.find_word('ROT').do(forth)
        assert forth.stack == [0, 1, 2, 3, 5, 6, 4]

That doesn’t even use our new code.

    def define_primaries(self):
>       lex = self.lexicon
E       AttributeError: 'Forth' object has no attribute 'lexicon'

Ah, I messed up the init. Now “only” 19 failing. Let’s hope for something trivial.

Some of them are syntax error checkers that are getting different errors. I’ll pass over those for now. Ah here is a place I didn’t change:

    case ':':
        if self.compile_stack.is_not_empty():
            raise SyntaxError(f'Syntax error: nested word definition')
        if word_list:
            raise SyntaxError(f'Syntax error: "{word_list}" not empty')
        definition_name = next(self.rest_iter)
        self.compile_stack.push((':', definition_name))

That should be:

    case ':':
        if self.compile_stack.is_not_empty():
            raise SyntaxError(f'Syntax error: nested word definition')
        if word_list:
            raise SyntaxError(f'Syntax error: "{word_list}" not empty')
        definition_name = self.next_token()
        self.compile_stack.push((':', definition_name))

That fixes a couple.

    def test_compile_if(self):
        f = Forth()
        s = ': TEST IF DUP + THEN ;'
        test_word = f.compile(s)
        assert test_word.words[1] == 0
        test_word.words[1] = 2
        f.stack.extend([5, 0])
        test_word.do(f)
        assert f.stack.pop() == 5
        f.stack.extend([5, 1])
        test_word.do(f)
        assert f.stack.pop() == 10

This needs recasting for sure. We don’t return a word from compile any more.

    def test_compile_if(self):
        f = Forth()
        s = ': TEST IF DUP + THEN ;'
        f.compile(s)
        test_word = f.find_word('TEST')
        print(test_word.words)
        f.stack.extend([5, 0])
        test_word.do(f)
        assert f.stack.pop() == 5
        f.stack.extend([5, 1])
        test_word.do(f)
        assert f.stack.pop() == 10

Passes. Let’s see how far recasting tests will take us. With any luck, that’s all we’ll need.

This one taught me something:

    def test_if_nested(self):
        f = Forth()
        s = ': TEST 200 100 1 1 IF 5 SWAP IF DUP THEN THEN + ;'
        f.compile(s)
        test_word = f.find_word('TEST')
        test_word.do(f)
        assert f.stack.pop() == 10
        assert f.stack == [200, 100]

        f = Forth()
        s = ': TEST 200 100 0 1 IF 5 SWAP IF DUP THEN THEN + ;'
        f.compile(s)
        test_word = f.find_word('TEST')
        test_word.do(f)
        assert f.stack.pop() == 105

        f = Forth()
        s = ': TEST 200 100 0 IF 5 SWAP IF DUP THEN THEN + ;'
        f.compile(s)
        test_word = f.find_word('TEST')
        test_word.do(f)
        assert f.stack.pop() == 300

I had to recreate the Forth instance for each sub-test. Why? Because we search the lexicon from first to last and therefore find the first definition of TEST not the last. Make a card for that one, and move on. So far none of them have found a flaw in what we have as the basic handling,

Recasting the tests to use the new protocol fixes each of them. Zero problems found other than calling correctly nothing inside Forth object. One test is still failing:

    def test_syntax_error(self):
        f = Forth()
        s = '; SQUARE DUP + ;'
        with pytest.raises(SyntaxError) as e:
            f.compile(s)
        assert (str(e.value) ==
                'Syntax error: "; SQUARE DUP + ;". Missing : or ;?')

This is a syntax error with ; before colon. We need to improve that handler:

    case ';':
        key, definition_name = self.compile_stack.pop()
        if key != ':':
            raise SyntaxError(f'Syntax error: ; without :')
        word = SecondaryWord(definition_name, word_list[:])
        self.lexicon.append(word)
        word_list.clear()

We enter with the compile stack empty. Better check that.

    case ';':
        if self.compile_stack.is_empty():
            raise SyntaxError(f'Syntax error: ; without :')
        key, definition_name = self.compile_stack.pop()
        if key != ':':
            raise SyntaxError(f'Syntax error: ; without :')
        word = SecondaryWord(definition_name, word_list[:])
        self.lexicon.append(word)
        word_list.clear()

Change the test to expect that message, and we are green.

I did remove that long invasive test of the colon and semicolon, since they are well tested by the other tests and the scaffolding one would have needed revision to deal with the token list instead of the word iterator. The whole team agreed with removing it.

Commit: New compile includes execute, passing all tests.

We’ll stop here. I am three(!) hours in. This was a bit tedious and I am tried. It was stressful, more than I’d like, because the necessary changes broke so many tests. Even though they were all easy to fix up, I can feel in my next that I was stressing. Or maybe just had bad posture: my eyes are a blit blurry this morning.

Anyway we’ll have a nice reflective summary now, if you please.

Summary

My basic plan held water! I am pleased. It’s doing what it needs to do!

We have a bug / feature request: Deal properly with redefining the same word in the same session. Forth’s “real” behavior is that you can redefine a word, and the old users get the old code and anyone using the word after the redefinition gets the new one. Should be easy enough. We’ll also look at Forth’s FORGET word when we deal with that. FORGET drops a word from the lexicon, and also any words defined after that word, whether they use the word we FORGET or not Caveat emptor, I guess.

I think it was wise to build and test the inside colon and semicolon handler first, because the tests gave us confidence that they were doing the right thing, so they weren’t part of our worries. That left the odd handling of the word list and token list, which did take two tries to get right. Pardon, I should have said “two tries to get working”.

I am not going to claim that the current code is “right”. It works solidly, at least when fed legitimate Forth. But I am far fro convinced that it is well-factored and as clear as we would like it to be. It seems too complicated to me just now. And our newer words, ‘:’, ‘;’, ‘DO’ and ‘LOOP’ are not as nicely factored as the earlier words were … and the existing methods that those older ones use do not quite fit the newer ones.

So I think we have some refactoring to do between compile and compile_word_list, and refactoring to do inside compile_action_word, before we can be entirely proud of our work. It’s working, but the code is not righteous. Yet.

But making code better, that’s what we do. And what we will do, in our upcoming sessions. See you then!