Forth Clarifying

The Repo on GitHub

A bit of reading and thinking is beginning to give me a sense of direction. Let’s talk about it, and maybe implement a little something. This is how I work: read, think, code, rinse, repeat.

I have skim-read a lot of Loeliger’s Threaded Interpretive Languages, which is about the internals of a very Forth-like language. The book is focused on implementing the threaded interpretive language in assembler, or binary, aimed at small hardware where no useful software may exist. It is facing constraints of small memory, perhaps 4096 bytes, and slow processors. These are not constraints that concern us today, and we’ll address the construction of our Forth-like language using today’s facilities, which include Python and objects, and which do not include the ability to branch to an arbitrary location in a data structure made up of part data and part code. I almost regret that we can’t do that sort of thing. Almost.

The glass through which I’m looking is less dark than it was. Let me try to express my current understanding. You are my rubber duck, or my rubber duck is you. Expressing this will help me solidify and clarify my ideas.

Forth statements are simply a list of “words”. Internally, words can be primary or secondary. I sometimes say “primitive” for primary.

Primary words are defined and implemented by the author of the Forth, in this case YT. They are written in the language of the machine in hand, in this case Python. Secondary words can be defined by the author, and many will be. Others can be defined by the user of the Forth language, and in fact their entire job comes down to defining words that make up the DSL they want.

A secondary word’s definition consists entirely of a list of words. It never contains any code, although, of course, the words it lists may themselves be either secondary, made of other lists of words, or primary, made of code. In our tests currently, we have both cases.

This is a primary word:

    elif word == 'SWAP':
        # ( x1 x2 -- x2 x1 )
        x2, x1 = stack.pop(), stack.pop()
        stack.append(x2); stack.append(x1)

Its definition is those three Python statements. Code.

This is a secondary word:

    elif word == 'HYPOTENUSE':
        s = 'SQUARE SWAP SQUARE + SQRT'
        interpret(s)

However, as I’ve implemented things so far, the secondary word is also code: it isn’t just the list … yet. Early days. But you see the point, the one is just code grubbing around in the stack, the other is just a list of words to be processed.

Our current scheme is calling interpret recursively. Given this:

def interpret(s):
    print(f"interpret {s}")
    words = s.split()
    for word in words:
        interpret_word(word)
    print(f'end interpret {s}')

    def test_hyp(self):
        clear()
        s = '3 4 HYPOTENUSE'
        interpret(s)
        assert stack.pop() == 5

We get this print (I added the indentation manually, since I just thought of it):

interpret 3 4 HYPOTENUSE
    interpret SQUARE SWAP SQUARE + SQRT
        interpret DUP *
        end interpret DUP *
        interpret DUP *
        end interpret DUP *
    end interpret SQUARE SWAP SQUARE + SQRT
end interpret 3 4 HYPOTENUSE

So, recursive. Now back in the days of yore, Loeliger and Moore and those folks built their Forth systems to maintain a return stack manually and just branch to whatever it held, so their code would have looped doing the same things we saw here, and there would be pushes and pops to the return stack where we see calls and returns from interpret. Same thing, different mechanism. In those days we liked to get our hands dirty, pushing the bits around with our fingers. Dark times, in some ways, but great fun.

Now Loeliger and those folx built their word definitions very cryptically. Loeliger, for example, would store the name of a word HYPOTENUSE as 0xAHYP, with the 0xA = 10 being the number of letters in “hypotenuse”, and HYP being the first three. Pity the fool who also had the word HYPOTHESES, because those two words were going to be treated as the same. Hilarity would surely have ensued. But no one would use a word like that anyway because who can afford ten characters of data when you only have 4096 to play with?

So in a long table of tightly packed bytes, the ancient craftsmen would store a few characters for the name, a tiny bit of other information, and then either the actual code for a primary word or a series of 2-byte pointers back into the table for a secondary. If that’s not bad enough, and I don’t have all the details in my head yet so this is only approximate, the word pointed to in the table would either point to the real code for a primary, or to a patch of code in the main loop that would push the current location to the stack and then start spinning through the following words, now guaranteed to be pointers, branching through those, pushing the location, until they reached the last one which would branch to a patch of code that popped the stack and carried on.

This is hard to think about, so if your head is spinning, it’s not your fault, and if it isn’t, please send me an email explaining what I just said so that I can understand it.

The upshot of all this is that interpreting a secondary word executed only a handful of instructions that pushed a location onto a return stack, and picked up running in a new sublist. Almost magically, sooner or later you would hit an actual primary and something would happen to the “real” stack, which got the work done.

There will be additional magic for things like “if”. Let’s see if we can do an IF in our current little language.

I test and implement ‘>’:

    def test_greater(self):
        clear()
        s = '3 4 >'
        interpret(s)
        assert stack.pop() == 1
        s = '4 4 >'
        interpret(s)
        assert stack.pop() == 0

    elif word == '>':
        top, lower = stack.pop(), stack.pop()
        stack.append(1 if top > lower else 0)

Now a test for IF-THEN:

    def test_if_then(self):
        clear()
        s = '0 3 4 > IF 600 + THEN 400 +'
        interpret(s)
        assert stack.pop() == 1000
        clear()
        s = '0 4 4 > IF 600 + THEN 400 +'
        interpret(s)
        assert stack.pop() == 400

The definition if IF-THEN is, pop the stack and if the result is non-zero (true) do what’s inside the IF-THEN otherwise carry on after the THEN. Possibly we can implement this. But not easily: we have no good way to consume values from the list. Let me do something truly awful. This is just a spike after all.

We define a new global:

stack = []
skipping = False

In interpret word we do this:

def interpret_word(word):
    global skipping
    if word == 'THEN':
        skipping = False
        return
    if skipping:
        return
    if is_number(word):
        stack.append(int(word))
    ...

    elif word == 'IF':
        if stack.pop() == 0:
            skipping = True

And the test runs.

Reflection

Now that may seem a bit ugly, and it is, but in fact the fundamental operation of IF-THEN relies on the ability to consume words from the current list without executing them. If we were using the list other than in a for statement, we could have eaten all the unused values right there in the code for ‘IF’, but that option isn’t quite available to us as things stand.

Again, this is just experimentation, so we aren’t worried about niceties, we are here to figure out how all this stuff might work.

Summary

A bit of code tells us that our current scheme can handle the conditionals of Forth. I do not believe that it will handle looping, but I’d wager quite a bit that we can make a small adjustment and make loops work.

Our current scheme is arguably costly: it actually splits a secondary into a list of strings and interprets the words, going through the if cascade every time. The threaded code idea would convert that into little more than ticking through a thread of indices. I think we’ll do something similar, perhaps like this:

Looking Forward

I think we’ll create two Word objects with the same protocol, including at least the method execute or operate or do. One of those Word objects will be PrimaryWord, and it will contain a function pointer, and its do method will be self.function() or thereabouts.

The other Word object will be SecondaryWord. It will contain a list of Words, and its do method will be — if I’m not mistaken and at this point I could be:

for word in self.words:
    word.do()

It can’t really be quite that simple, because of the IF-THEN and looping. But I think it will be essentially that simple, except that perhaps we have to tick through the words with an index that can be set backward and forward by the words … and I think that we’ll have to check some words explicitly.

My recollection is that in the literature, some words are “immediate”, meaning that when the interpreter encounters them, some special action is taken out of the usual sequence.

One possibility is that our do method will take a parameter or two, so that words can affect what the interpreter loops do. We’ll have to figure that out. And, unless I miss my guess, we will.

I recall that the Forth implementations I’ve been skimming have an “immediate” flag on some words, which I think is used for exactly this kind of thing. Whether I can find a reference to that or not, I suspect we’ll need something of that kind.

You may recall me mentioning that Bill Wake said that the structure looks like the Composite Pattern. We have come to exactly that conclusion at this point, with a leaf word being a pointer to code and a composite word being a list of words.

It’s coming along! Ideas are gelling. This is how I do this, a combination of trying code, studying ideas, and thinking. See you next time!