FAFO

The Robot World Repo on GitHub
The Forth Repo on GitHub

I’m interested in reducing my two classes, Primary and SecondaryWord to a single object that will feel more like a classical Forth word. This will require some thinking and doubtless some experimentation. Result: I am pleasantly gobsmacked!

My Vague Understanding …

A classical Forth is “threaded”, a term used to describe code whose components are connected more by direct branches than by function calls. We do not have that ability any more in most languages, and that’s a good thing. But let’s think about what those old Forths did, to see whether we can give ours a bit more of that flavor, and whether, having tried it, we like it.

A classical Forth had some known locations in memory that were central to its operation. One of those might be called next_word, or probably something like 0x8013 in assembly language. A Forth word could be “primary” or “secondary”. A primary word consisted of a few lines of binary machine code. This code was not called, oh no, that would be inefficient. Instead, it would be branched to from somewhere inside next_word. And at the end of that code, the last instruction would be a branch back to next_word. Works like a call and return without all that messy stack management or whatever your subroutine call did in those days.

It was also fast. The overhead for executing that word would consist of the cost of the two branch instructions and whatever next_word had to do. We’ll come back to that. But classical Forths were fast, back in the day, much faster than a classical interpreter. It’s not their fault that my laptop is about a thousand times faster than computers were in those days.

So a primary word would be something like

instr 1
instr 2
instr 3
...
jmp next_word

How would next_word work? Well, still vaguely, it would have a program counter, pc, containing the address of the current word being executed. So it might look like this:

next_word:
increment pc
jmp *pc

If our program was just one big list of word addresses that pc pointed at, code like that would run the program with an overhead of just three instructions per word, the increment, jump in, and jump out.

But, of course, our program isn’t just one big list of addresses. We have definitions like this:

: DOUBLE DUP + ;
: QUAD DOUBLE DOUBLE ;

In the words above, DOUBLE would be a list of primaries, DUP and +. DOUBLE is a “secondary” word, made up of would-be calls to other words, rather than assembler instructions. And QUAD, also a “secondary” consists of a list of would-be calls to other secondary words.

So, given that next_word is just going to branch into whatever address it points at, what can we put into the definitions of DOUBLE and QUAD that will let things work?

Let’s try to figure it out¹.

In a primary word, we just jump in and back out. In a secondary word, we really need a subroutine kind of behavior. Suppose we were to take our two definitions above, and the code

25 QUAD .

Conceptually that has to do this, with indentation showing call and return and a notion of the program counter’s contents and behavior:

action      pc
--------------
push 25     input 0
QUAD        input 1
  DOUBLE    quad 0
    DUP     double 0
    +       double 1
            double return
  DOUBLE    quad 1
    DUP     double 0
    +       double 1
            double return
            quad return
.           input 2

We begin to get the picture. Each secondary word needs to end with some kind of code that causes it to “return”, so that the program counter can be returned to the next location in the caller. So now we can see that the sequence needs to be something like this:

action      pc
--------------
push 25     input 0
QUAD        input 1
            QUAD INIT
  DOUBLE    quad 0
            DOUBLE INIT
    DUP     double 0
    +       double 1
            DOUBLE RETURN
  DOUBLE    quad 1
            DOUBLE INIT
    DUP     double 0
    +       double 1
            DOUBLE RETURN
            QUAD RETURN
.           input 2

So if, when we compile a secondary word, we always start it with some standard init code, something about pushing down the program counter, and we always end it with code that pops the program counter, it would all be good.

I am fairly certain of this much: the secondary words all end with a standard branch to a different known location, which we can call pop_pc. The code at that location would restore the program counter and then just fall into next_word, and the process continues.

I am slightly less certain of this: I think that each secondary word probably had to contain the code to save the program counter. Why? Because otherwise we would have to check a flag in the word to see if it was secondary and then push the program counter up at the interpreter level. That would work but then you’d have that checking overhead for all the words, including all the primaries, increasing overhead.

So, I think, the sequence in classical Forth is this:

action      pc
--------------
push 25     input 0
QUAD        input 1
              SAVE PC
  DOUBLE      quad 0
                SAVE PC
    DUP         double 0
    +           double 1
                jmp pop_pc
  DOUBLE      quad 1
                SAVE PC
    DUP         double 0
    +           double 1
              jmp pop_pc
            jmp pop_pc
.           input 2

I’ve tried adding some additional indentation to help us see which word is active and where the jumps are coming from.

OK. Let’s pop up a level and see what we have.

In classical Forth primary word will be compiled as a series of machine instructions ending with a branch to next_word. A secondary word will be compiled as a series of instructions that adjust the return stack, followed by a series of word addresses, followed by a branch to the code that pops the return stack, returning attention to the word that called this one.

In our Forth here, a primary word is an instance of PrimaryWord, and it contains a function that is called when we do the word:

class PrimaryWord:
    def __init__(self, name, code, immediate=False):
        self.name = name
        self.code = code
        self.immediate = immediate

    def do(self, forth):
        self.code(forth)

A secondary word is a list of words, with a corresponding do:

class SecondaryWord:
    def __init__(self, name, word_list, immediate=False):
        self.name = name
        self.words = word_list
        self.immediate = immediate
        self.pc = 0

    def do(self, forth):
        forth.begin(self)
        self.pc = 0
        while self.pc < len(self.words):
            w =  self.next_word()
            w.do(forth)
        forth.end()

    def next_word(self):
        word =  self.words[self.pc]
        self.pc += 1
        return word

We can see that the equivalent things happen. We just do a primary word, and in a secondary word we step through its contents one after another, doing them. We have a polymorphic do here, so if the word we find is primary, we just do it, and if it is secondary we find ourselves recurring in SecondaryWord. The SecondaryWord itself acts as the program counter stack.

Reflection

Now the fact is that I like this structure just fine. It does the job, it encapsulates the essential difference between primary and secondary words, and it relies on Python’s fundamental object orientation to handle the program counter issues.

My starting mission this morning was to see whether we could combine those two classes into one, following a style a bit more like classical Forth.

Let’s try. I am just Fooling Around with ideas here. I don’t know what we need but the thinking so far begins to put some boundaries on what we might do.

We’re positing just one Word class. Therefore, it must consist of a list of … something … because a secondary word is a list of words.

A primary word, I guess, would have to be a one-element list, and when we execute a word, we must always do the same thing. What is that “same thing”? Let me move over to a test file and sketch some code.

    def test_primary_word(self):
        self.init()
        stack.append(2)
        word = Word()
        word.append(lambda: stack.append(stack.pop() * 2))
        word.do()
        assert stack.pop() == 4

This is a sketch of a primary word that multiplies stack top by 2. No reason, just seemed possible.

I can make it work:

stack = []

class Word:
    def __init__(self):
        self.code = []

    def append(self, func):
        self.code.append(func)

    def do(self):
        self.code[0]()

I’m wondering now what would happen if we just called do on each element of the list. This means I’m kind of wondering if a PrimaryWord is just a SecondaryWord with only one element and a special lambda.

I’m just following my nose.

    def test_more_than_one_word(self):
        self.init()
        stack.append(2)
        word = Word()
        word.append(lambda: stack.append(stack.pop() * 2))
        word.append(lambda: stack.append(stack.pop() * 2))
        word.do()
        assert stack.pop() == 8

Fails, of course but what if we just looped:

class Word:
    def do(self):
        for f in self.code:
            f()

Both tests are green. What about words calling words?

    def test_word_calls_word(self):
        self.init()
        stack.append(2)
        double = Word()
        double.append(lambda: stack.append(stack.pop() * 2))
        quad = Word()
        quad.append(double.do)
        quad.append(double.do)
        quad.do()
        assert stack.pop() == 8

That passes. Note that quad’s function list is a series of references to double.do. Since double is a Word, we know that it has a function do and we pass it to the secondary word quad as the lambda.

We could do it this way as well:

    def test_word_calls_word(self):
        self.init()
        stack.append(2)
        double = Word()
        double.append(lambda: stack.append(stack.pop() * 2))
        quad = Word()
        quad.append(double.do)
        quad.append(lambda: double.do())
        quad.do()
        assert stack.pop() == 8

I think that the first way is best: the second just adds another call with no benefit. That might change in a real implementation: we’re just spiking here. In a real implementation we might find the lambda more convenient.

Reflection

I am frankly surprised. This is simpler than I was thinking it would be. There are some unresolved issues that will make things at least somewhat more complicated, in particular the jumps, skips, and patching that we do for loops and conditionals.

In essence, I think, what I’ve discovered with 300 lines of thinking and writing, is that a primary word is a special case of a secondary word—in our scheme.

I am not only surprised: I am incredulous. I do not believe it can be this simple.

A way of verifying this idea comes to mind. If this is correct, then we should be able to define our PrimaryWords as secondary instead.

I can test that readily I should think. Yes, but what I try doesn’t work:

        dup_lambda = lambda f: f.stack.dup()
        dup_word = SecondaryWord('DUP', [dup_lambda])
        self.append(dup_word)

I tried to define DUP as a secondary word, using the same lambda as the primary uses. But the two do methods differ:

class PrimaryWord:
    def do(self, forth):
        self.code(forth)

class SecondaryWord:
    def do(self, forth):
        forth.begin(self)
        self.pc = 0
        while self.pc < len(self.words):
            w =  self.next_word()
            w.do(forth)
        forth.end()

So we tried to call do on our lambda, which was not on.

        while self.pc < len(self.words):
            w =  self.next_word()
>           w.do(forth)
E           AttributeError: 'function' object has no attribute 'do'

However, I think this hack would work:

    def do(self, forth):
        forth.begin(self)
        self.pc = 0
        while self.pc < len(self.words):
            w =  self.next_word()
            try:
                w(forth)
            except Exception:
                w.do(forth)
        forth.end()

I just try to call the thing and if that raises, I try doing it. All the tests go green.

Now a better scheme would be to change compilation so that all the secondaries are callable, so that we don’t need the try/except. And of course with the current hack, I really should work out what that exception is. Details, technicalities.

We’re green. I’m going to commit: experimental secondary word defines DUP. Fascinating!

I am tired and hungry and need an iced chai latte. Let’s sum up.

Summary

What have we here? We Fooled Around, and we Found Out. In a good way this time.

We have four lines of code in do that seem to make it possible to define all our words, primary or secondary, using a single class, presently called SecondaryWord. We can almost certainly reduce those four lines back down to one by compiling secondaries a bit differently.

I suspect that the me of decades ago, who was far more clever than the current one, albeit a lot less wise, might have seen almost immediately that PrimaryWord and SecondaryWord could be just one thing. That younger Ron would have done all the hundreds of lines of thinking here in a few moments or minutes and would have then spent some amount of time making it work.

Would he have completed the task sooner? Better? I don’t even know. What I do know is that all the thinking in this article had to go into that solution, one way or another. Reaching it in an intuitive leap seems better, but the fact is, it’s all the same stuff. We see some things immediately. We find some things by hard work.

Or, like today, we think a bit about how things might want to be, and then, with that as a framework, we write three quick tests, discover something nice, try an experiment that fails, and then type in the three additional lines that make it work.

And yet, I remain a bit astounded² at how simple and sweet this discovery is. A very interesting and pleasing outcome. I hope you can enjoy it as much as I can. See you next time!

Added in Post:: I wonder … will it suffice to make Word callable? In Python that’s easy … and it just might work!

This is not naked “figuring out” from scratch. You know that I’ve been reading about how classical Forths worked. But I’m not trying to reproduce what I’ve read. I’m trying to think deeply about what must happen, because we need to implement something similar, if we can. ↩
This article also includes a small linguistic experiment discovering how many different ways I can express my amazement at how simple this has turned out to be. I am truly astonished at this fine little result. ↩