Back 'n Forth

The Robot World Repo on GitHub
The Forth Repo on GitHub

I’ve been busy programming in another universe, and fretting on this one’s decline. It’s time to get back into this Forth diversion from the Robot World, itself a diversion from something or other. Brief reflection on the olden days.

The most recent Forth-focused article was 15 days ago. As you can imagine, I have no idea what was going on way back then, no sense of what to do next, and absolutely no recollection of how it all works. It’s a lot like working with legacy Code, really. I’m sure, though, that I’ll quickly come up to speed. I used to know the fellow who wrote this stuff.

A quick review of the two prior Forth articles tells me that we were at a fairly stable stopping point. We had just implemented CREATE ... DOES>, and had shortened the Lexicon setup by pushing more of the separate def functions for PrimaryWord instances into lambda. We do still have plenty of hand-coded words, but I think that’s in the nature of this beast. Let’s discuss that a bit.

Historically …

In a real old-school Forth, they would not likely have more than an assembler for the machine they were working on. In some cases they might not even have that: they’d hand-assemble the code. Yes, younglings, in those days we actually knew how to write programs by writing them out in binary. Remember, only a few decades before that, we were writing documents by pressing wedge marks into clay, so to us, this seemed perfectly normal. Anyway …

In those days, these ancient warriors hand-code a minimal Forth system in assembler or binary and then “bootstrap” it. Using hand-written words like CREATE-DOES>, they’d define new words, even some of the most elementary words. They’d use their CREATE-DOES> to define other words by using the comma operator to punch numbers into the lexicon, such that those numbers, when executed, performed the elementary operations. I am told that after adding only a few primitive words, and writing a very small interpreter, everything else in a Forth system could be implemented using the system itself. I myself have never done that: I have used old Forth systems but never built one from scratch. I have, however, coded small amounts of code in binary, octal, or hex. It was fun, in its way.

Spooling forward …

Today, we have different languages and tools, and so we do not—generally cannot—punch code into memory. We have compilers, objects, functions, lambdas. We are building our Forth out of those components. So, while in ancient times, the Forth builder would use a lot of CREATE-DOES> and similar words to punch code into memory, today We use our PrimaryWord class to build a word that holds a function that is to be called, where an old Forth would have a word with code that was branched to, after pushing an address onto the call stack, and that code exited by branching to a known location where the address was popped, resulting in a call and return from the word.

Same-same, just different affordances.

An old-school Forth programmer would be building up their tool set as they went. They used the tool to build the tool. Very cool, really. But we do not have to do that, and in some regards we cannot do that. There are echos of that old style in our current program. We do have a few words that are defined in Forth. For example:

class Lexicon:
    def define_secondaries(self, forth):
        forth.compile(': CONSTANT CREATE , DOES> @ ;')
        forth.compile(': VARIABLE CREATE ;')
        forth.compile(': *DO SWAP >R >R ;')
        forth.compile(': I R@ ;')

Those four words, CONSTANT, VARIABLE, *DO, and I, are defined in Forth! Whee! And two of them, *DO and I are pretty deep in the guts of making Forth work.

It seems likely to me that we could, if we chose, implement a few more primitive words, ones that manipulate our compile stack and other internal variables, and then use those words to elevate some of our existing words from primary to secondary, that is, from being programmed in Python to being programmed in Forth. Were we to do that—and and some point we might—we should learn a lot about how old-school Forth was implemented. But that’s not why we’re doing this. Or, at least, it’s not the official reason.

And what is that reason?

The official reason for this Forth is to provide a little language for robot programmers to program their robots, so that our robot world can accept that language and execute the logic all on the world side. Why do we want that? Well, it’s complicated, but it seemed like a good idea at the time.

So, we should probably keep at least one eye on the ball and work toward putting this Forth into the robot world and using it to define some robot behavior. We’re kind of honor-bound to get at least one round trip out of the thing.

A “Plan” Emerges

And I think that points to what we might want to do next. My rough notion of how it will work in Robot World is that the player will give the world a string or file or some textish thing, and the world will accept it as a Forth program, compile it, and then call EXECUTE or some such agreed word, running the player’s robots.

So we need a way to provide a big batch of Forth code. I sort of think we almost have that with our compile method on Forth, but we need to think a bit about how to make that useful.

What does that method do?

class Forth:
    def compile(self, text):
        new_text = re.sub(r'\(.*?\)', ' ', text)
        self.tokens = new_text.split()
        self.token_index = 0
        while self.token_index < len(self.tokens):
            self.compile_a_word().do(self)

We start off by viciously removing everything in between parentheses, as comments. A “real” forth would have built-in words for left and right paren and the right-paren word would just munch munch munch everything up to and including the right paren. In our case, having snipped out the parentheses, we then split out the tokens, which are character strings separated by white space. And then we compile_a_word and then do it.

To date, we use compile only in tests, except for the four secondary words discussed above.

If our Robot World users were behaving themselves, I think they would mostly limit themselves to sending us a series of colon definitions, culminating with a definition of EXECUTE, which we would call as agreed, to operate their robots in our world.

But in principle, they could do anything. They could send us this:

: EXECUTE BEGIN 666 0 UNTIL ;

Then, when the Robot World called EXECUTE bad things would happen. The stack would, after a very long time, overflow.

Working with that example results in this new test:

    def test_destroy_world(self):
        f = Forth()
        f.compile(': EXECUTE BEGIN 666 0 UNTIL ;')
        with pytest.raises(ValueError) as e:
            f.compile(' EXECUTE ' )
        assert str(e.value) == 'Stack is full'

And this new code in stack:

class Stack:
    def __init__(self, limit=100):
        self.stack = []
        self.limit = limit

    def _check_limit(self):
        if len(self.stack) > self.limit:
            raise ValueError("Stack is full")

    def extend(self, items):
        self.stack.extend(items)
        self._check_limit()

    def push(self, item):
        self.stack.append(item)
        self._check_limit()

Commit: add stack limit. It would take a very long time for the test above to actually consume all the Python memory.

So, what have we learned? Well, we have a hint that we’ll want some bullet-proofing built in. And there are other issues: it is an absolute certainty that users will give us programs that loop infinitely,, or that contain errors of other kinds. What about an incomplete definition, for example?

    def test_partial_definition(self):
        f = Forth()
        f.compile(': FOO 444 222 +')

This test fails with a very obscure message:

    def compile_number(self, word):
        try:
>           return int(word)
E           TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

I think this means that next_token returned None here:

    def compile_a_word(self):
        self.word_list = []
        while True:
            token = self.next_token()
            if (definition := self.find_word(token)) is not None:
                if definition.immediate:
                    definition.do(self)
                else:
                    self.word_list.append(definition)
            elif (num := self.compile_number(token)) is not None:
                self.append_number(num, self.word_list)
            else:
                raise SyntaxError(f'Syntax error: "{token}" unrecognized')
            if self.compile_stack.is_empty():
                break
        return SecondaryWord('nameless', self.word_list)

    def next_token(self):
        if self.token_index >= len(self.tokens):
            return None
        token = self.tokens[self.token_index]
        self.token_index += 1
        return token.upper()

It seems to me that running out of tokens is always an error. Nonetheless, let’s deal with it in compile_a_word rather than raise an exception further down. I’ll add to the test:

    def test_partial_definition(self):
        f = Forth()
        with pytest.raises(ValueError) as e:
            f.compile(': FOO 444 222 +')
        assert str(e.value) == 'Unexpected end of input'

And …

    def compile_a_word(self):
        self.word_list = []
        while True:
            token = self.next_token()
            if token is None:
                raise ValueError('Unexpected end of input')
            if (definition := self.find_word(token)) is not None:
                ...

Commit: raise value error if compile runs out of input.

There is still the very pernicious possibility of an infinite loop, such as:

BEGIN 0 UNTIL

I suppose we could put a limit on the number of words that a player can execute. I have no idea what that limit should be. Maybe we only allow some small number of milliseconds in Forth before you call a World operation. And maybe you only get some finite number of World operations per EXECUTE.

Reflection

Out of the blue, I’ve decided to stop. I’ve learned a bit and clearly need to think a bit about the real needs, rather than just poking around. I do think we’ve closed at least one important door with the stack limit.

It seems clear enough that our existing compile operation can deal with as large a string as the user might need to provide but I’m a little concerned with the end game. Look at the compile again:

    def compile(self, text):
        new_text = re.sub(r'\(.*?\)', ' ', text)
        self.tokens = new_text.split()
        self.token_index = 0
        while self.token_index < len(self.tokens):
            self.compile_a_word().do(self)

It’s that do at the end that concerns me. When we provide a well-formed definition to compile, the compiler accumulates words into a word list until it compiles ;, which is “immediate”. It pops, the compile stack, moves the current word list into a SecondaryWord, thus defining it, and clears the word list.

Then compile_a_word may discover that the compile stack is empty, which it should be because nesting colon definitions is not allowed (or should not be). So copile_a_word will return a “nameless” word with an empty word list … which the compile method will execute, doing nothing.

That’s kind of scary, isn’t it? In a way, that is the nature of Forth. If you do something wrong, it will just break. A “real” forth would simply return to the input prompt, probably after printing a question mark, possibly prefixed by a terse message.

In our case, we’ll need to return some kind of error to the user. Something terse, I imagine, ending in a question mark. Kind of in the spirit of the thing, know what I mean?

But there are a lot of question marks in my mind about how this will all come together at the Robot World level. The originally-contemplated joint project has turned into a solo thing, and I see almost no prospect of anyone really providing a little Forth program to run the bots around, other than Yours Truly.

Here are some things that might be upcoming steps:

Integrate this Forth into Robot World, providing a new message containing Forth code;
Provide Forth words in World, that drive the current bot, analogous to the existing commands;
Write little Forth scripts and learn what we need from trying to make them work.

That’s pretty vague, perhaps even by my standards. Sounds a lot like Fool Around and Find Out.

That’s probably OK … my purpose here is to face problems, slice them down to size, build code that solves them, and see what we can learn from doing that. So it doesn’t matter much what I work on, although I do like it to make some sense in a larger context.

Summary

What has happened today, you might ask? The answer is that we have immersed ourselves in the old code and old thought space, and then we considered things that seemed to come up when thinking about “make this work in Robot World”. We tested a few possible errors and covered them with exception handling. We enhanced the stack to have a limit, and we modified the compile operation to deal a bit more explicitly with running out of input unexpectedly.

All good things, kind of right on the sharp edge of what we’re supposedly working on.

I think that’s OK. See you next time!