Forth: Is / Is Not

The Repo on GitHub

We need many things in our little Forth thing. Let’s try to get clear on what we’re doing here, and what we’re not. Bad forth code slows me down. But we’re doing OK.

Our two tiny classes, PrimaryWord and SecondaryWord, can execute some Forth-like code that runs a few levels deep in the return stack. I am confident that it will handle anything made up of calls to existing primary and secondary words. But there are things we surely need:

We have no conditionals. Forth has IF-THEN and IF-ELSE-THEN to handle conditional logic;
We have no looping. Forth has DO-LOOP, DO-+LOOP, BEGIN-UNTIL, BEGIN-AGAIN, BEGIN-WHILE-REPEAT, and you can exit a loop with LEAVE;
We have no compiler. I hand-compiled my SecondaryWord instances, and I got most of them wrong at least once. We need machine help.
We have no interactive mode. Forth allows you to type words into a REPL sort of thing.

Forth implementations, lookin’ inta history back, were quite tightly coded, using some of the most arcane and evil tricks we used to use in assembler. Ah, those were the days. I think our needs here are different. We want most of the full power of Forth available in our Robot DSL, perhaps without dynamic compilation or similar weirdness. But we do not need to do things the way that Moore and Loeliger did them back last century. We want good code, efficient code, but code that makes sense in our situation. Good, clean, efficient, clear, object-oriented code.

So our language wants to be very Forth-like, and our implementation doesn’t need to be historically accurate.

Design, Thinking

Right. No compiler, no handling of conditionals or loops. Which shall we undertake today? Or is there some other thing?

I think the compiler is relatively easy. We read words, look them up in the lexicon, append their indices to a list, append the result to the lexicon as a Secondary Word.

I feel less confident about conditionals and looping, although we do have a working version in our earlier tests, where we had a full interpreter thing running, working straight from the string words. So while I don’t quite see it, I am not terribly worried.

There is a small risk that we would do the compiler and then be unable to implement the conditionals or looping. That would waste the compiler effort (and everything else for the past week or so). We could reduce the potential loss by doing conditionals and looping first. But it’ll be a serious pain to test with no compiler.

We’ll do the compiler first.

Where should things live? Currently we have a stack that is a naked global list, and a lexicon that is also a naked global list. Generally speaking, our coding standards here chez Ron would call for those to be instances of some useful classes. Should those instances be global? Good question. Should they be Singletons? Another good question, the answer to which is almost always “no, not really”.

If the stack were a class of our own, we could implement methods like dup and swap directly, which would make the PrimaryWord code turn into a single method call in many cases, which would be pleasant. We should try that.

The lexicon clearly needs searching, but so far nothing else comes to mind.

One more idea slips into my mind: the list of word indices in the SecondaryWord’s do. We currently just iterate the words:

class SecondaryWord:
    def do(self):
        for word_index in self.code:
            lexicon[word_index].do()

The execution of a conditional or loop comes down to changing the index of the next word to be looked at in the current list. So instead of just doing a for over the code, we’re going to need a running index of some kind, and a way for our words to be able to set that index. Now, yes, YAGNI, but we’re doing design thinking here. Possibly we’ll discover a need for an object better than list to hold the word indices in a SecondaryWord.

A Tentative Plan

With these possible objects in mind, let’s work on compilation from a list of words, and we’ll see when and if the code we write invites us to build some of those objects we’ve just thought about. None of them is so clear in my mind as to suggest even one method that we’ll really surely want. We’ll let the code tell us.

Compilation

In Forth, we could define our method HYPSQ like this:

: HYPSQ SQUARE SWAP SQUARE + ;

The colon begins a definition, is followed by the word being defined, followed by the words making up the definition, followed by a semicolon.

Let’s retain the punctuation, although I’m not sure at this moment whether we’ll really need them.

Let’s write a test. I’m going to make a new file for compiler tests. I think I’ll be glad I did. We’ll see.

class TestCompile:
    def test_compiler(self):
        clear()
        w_plus = PrimaryWord('+', lambda: stack.append(stack.pop() + stack.pop()))
        w_dup = PrimaryWord('DUP', lambda: stack.append(stack[-1]))
        w_swap = PrimaryWord('SWAP', lambda: stack.extend([stack.pop(), stack.pop()]))
        w_times = PrimaryWord('*', lambda: stack.append(stack.pop() * stack.pop()))
        w_sqrt = PrimaryWord('SQRT', lambda: stack.append(math.sqrt(stack.pop())))
        lexicon.extend([w_plus, w_dup, w_swap, w_times, w_sqrt])
        s = ': SQUARE DUP * ;'
        forth_compile(s, lexicon)
        index = find_word('SQUARE')
        word = lexicon[index]
        assert word.name == 'SQUARE'
        assert word.code == [1, 3]

Quite a bit of setup, but we want a lexicon to work from. the test passes, using this code:

def forth_compile(text, lex):
    words = iter(text.split())
    word = next(words)
    if word != ':':
        raise SyntaxError
    defining = next(words)
    word_list = []
    while (word := next(words)) != ';':
        index = find_word(word)
        word_list.append(index)
    lexicon.append(SecondaryWord(defining, word_list))

def find_word(word):
    for i in range(len(lexicon)):
        if lexicon[i].name == word:
            return i

This is pretty fragile. I forgot the ‘;’ in the defining string and the result is an exception running off the end of the list. But it does compile the correct code. Let’s try something better:

def forth_compile(text, lex):
    words = text.split()
    match words:
        case ':', defining, *rest, ';':
            word_list = []
            for word in rest:
                index = find_word(word)
                if index is not None:
                    word_list.append(index)
            lexicon.append(SecondaryWord(defining, word_list))
        case _:
            pass

Better. We can improve that:

def forth_compile(text, lex):
    words = text.split()
    match words:
        case ':', defining, *rest, ';':
            word_list = [ix for word in rest if (ix := find_word(word)) is not None]
            lexicon.append(SecondaryWord(defining, word_list))
        case _:
            pass

def find_word(word):
    for i in range(len(lexicon)):
        if lexicon[i].name == word:
            return i

The match-case parses the list for us, eating the colon and semi, breaking out the first word as the name we’re defining, and the rest as the rest.

There are some rough edges here. In particular, if we pass in a word that is not defined in the lexicon, it will just be ignored. Maybe that’s good enough for now.

Let’s extend our test just for fun.

    def test_compiler(self):
        clear()
        w_plus = PrimaryWord('+', lambda: stack.append(stack.pop() + stack.pop()))
        w_dup = PrimaryWord('DUP', lambda: stack.append(stack[-1]))
        w_swap = PrimaryWord('SWAP', lambda: stack.extend([stack.pop(), stack.pop()]))
        w_times = PrimaryWord('*', lambda: stack.append(stack.pop() * stack.pop()))
        w_sqrt = PrimaryWord('SQRT', lambda: stack.append(math.sqrt(stack.pop())))
        lexicon.extend([w_plus, w_dup, w_swap, w_times, w_sqrt])
        s = ': SQUARE DUP * ;'
        forth_compile(s, lexicon)
        index = find_word('SQUARE')
        word = lexicon[index]
        assert word.name == 'SQUARE'
        assert word.code == [1, 3]
        hypsq = '; HYPSQ SQUARE SWAP SQUARE + ;'
        forth_compile(hypsq, lexicon)
        hyp = ': HYP HYPSQ SQRT ;'
        forth_compile(hyp, lexicon)
        index = find_word('HYP')
        word = lexicon[index]
        stack.extend([3, 4])
        word.do()
        assert stack.pop() == 5

That wasn’t as fun as it might be. Doesn’t work. Why? Because we have a lexicon defined in two files, and our objects all treat lexicon as global and so the Words aren’t looking at the same lexicon as the compiler.

I’m going to move the Words, lexicon and stack into their own file for now, leaving the globals global.

ARRGH: I’ve messed up the imports somehow. Got to step back. And cleverly have not committed all morning.

OK, I am at a semi-stable point. But there’s an import issue and I’m not sure how to deal with it. The SecondaryWord has direct access to lexicon, which is defined in the same text file as the class.

Python imports are weird. I try to avoid doing anything clever with them, because they basically recompile the code that you import. So, however it happens, I’m getting lexicon and stack in two places.

Nothing I try resolves the issue. I think I have to pass the lexicon into the do command. OK, let’s try that.

My big test is still failing. Best guess, the forth is wrong. Do some smaller tests.

Ha! Look closely at the definition of HYPSQ:

hypsq = '; HYPSQ SQUARE SWAP SQUARE + ;'

Starts with ‘;’, not ‘:’. Fixed, the test runs. Need a diagnostic out of the compiler:

    def test_syntax_error(self):
        s = '; SQUARE DUP + ;'
        with pytest.raises(SyntaxError):
            forth_compile(s, lexicon)

And

def forth_compile(text, lex):
    words = text.split()
    match words:
        case ':', defining, *rest, ';':
            word_list = [ix for word in rest if (ix := find_word(word)) is not None]
            lexicon.append(SecondaryWord(defining, word_list))
        case _:
            raise SyntaxError

Let’s raise an error on an undefined word as well.

    def test_undefined_word(self):
        s = ': SQUARE DUMB + ;'
        with pytest.raises(ValueError) as e:
            forth_compile(s, lexicon)
        assert str(e.value) == 'cannot find word "DUMB"'=

And

def find_word(word):
    for i in range(len(lexicon)):
        if lexicon[i].name == word:
            return i
    raise ValueError(f'cannot find word "{word}"')

Let’s extend the syntax error to include the input. Like this:

    def test_syntax_error(self):
        s = '; SQUARE DUP + ;'
        with pytest.raises(SyntaxError) as e:
            forth_compile(s, lexicon)
        assert str(e.value) == 'Syntax error: "; SQUARE DUP + ;". Missing : or ;?'

def forth_compile(text, lex):
    words = text.split()
    match words:
        case ':', defining, *rest, ';':
            word_list = [ix for word in rest if (ix := find_word(word)) is not None]
            lexicon.append(SecondaryWord(defining, word_list))
        case _:
            raise SyntaxError(f'Syntax error: "{text}". Missing : or ;?')

Let’s reflect. I think we’re done for the morning.

Reflection

My longest delays the past couple of days have been caused by getting Forth code wrong. I’ve compiled it wrong by hand, and typed it in erroneously. Because I am uncertain about my actual code, I’ve spent too much time looking there and only finally noticing something like the semicolon where a colon should be. These errors were exacerbated by my having not put in even a rudimentary raise when errors were detected.

On the bright side, I did manage to put in some messages which will probably help with future errors of the same kind.

Also on the bright side, the “compile” is short and sweet, as you’d think it would be.

Not so good is the fact that leaving stack and lexicon as globals in some random file is not going to hold water. I think we’ll want to do something like create a ForthEnvironment that contains a lexicon, stack, and probably the compile method. I’m not totally clear on how that should work. Our tests will tell us, as we discover that they are hard to write.

Speaking of tests, I noticed that the compile function accepts a lexicon and then does not use it.

Let’s change that right now: it should help with our global issues.

def forth_compile(text, lex):
    words = text.split()
    match words:
        case ':', defining, *rest, ';':
            word_list = [ix for word in rest if (ix := find_word(word, lex)) is not None]
            lex.append(SecondaryWord(defining, word_list))
        case _:
            raise SyntaxError(f'Syntax error: "{text}". Missing : or ;?')

def find_word(word, lex):
    for i in range(len(lex)):
        if lex[i].name == word:
            return i
    raise ValueError(f'cannot find word "{word}"')

Better. I think the imports are still odd. We should create our lexicons locally to the tests, most likely.

Summary

There was a bit of slippage, mostly due to syntax errors in my Forth code, and partly due to observing the issue with the global lexicon and trying to “fix” it with fancy imports, when the real solution is to pass the lexicon to the objects that need it. Bad Ron, no biscuit.

But the compiler works and creates lexicon entries that we can execute.

I think tomorrow we’ll use the compiler to allow us an immediate execution mode. Maybe we’ll even do a tiny REPL.

This is our sixth session and we have interpreted some Forth code one way, then compiled it by hand and interpreted it that way, and then compiled it with code and interpreted it that way. This is very similar to progress! I am pleased with progress. I can imagine doing better, and I’ve certainly done worse. So this is about nominal, and nominal is good.

See you next time!