Frank Admission

The Robot World Repo on GitHub
The Forth Repo on GitHub

At this writing I lack a feeling of direction. I’ll muse briefly on the causes of that feeling, then try to find something interesting to do. We explore, change a few things, get a few ideas, then wander off. Pride in our work.

Society Crumbling: The current social situation is surely part of what’s making me feel directionless. The bad guys seem to be on the ascendant. Hatred of our fellow humans seems to be a strong force just now, and it seems to be winning. That is, by my lights, Very Wrong, and I fear that we have a long and rough path ahead of us. That makes programming for fun seem even less valuable than it usually is, and it’s not working as well as usual as a distraction.
Another Program: I’ve been working on another programming problem, one that I am not writing about, and it has consumed a lot of my programming time and programming mind. It’s no more important than the Forth / Robot stuff, but has been consuming time and energy.
Divergent Possibilities: Plugging Forth into the Robot World would be interesting and perhaps valuable if it was going to become a product or toy that people would play with. That’s not going to happen. Even the original small group that thought it would be an interesting little project have drawn away. So I’m feeling a bit disappointed and lonely, and wondering what the point is.; And … Forth is kind of an interesting little language, and I’m feeling a small but real temptation to turn it into a little scripting language on my Mac. Not that I need one … but I’m kind of getting the Forth vibe, the sense of how this little thing is quite capable. That possibility diverges a bit from the Robot application, and since I don’t see yet what parts might apply to both, I’m kind of vacillating.
Forth-like or Not Forth-like?: As I read about the historical Forth implementations, and their intense focus on tight code, self-bootstrapping style, and as I read about some of the strange Forth versions that are out there, I feel less good about the version we’re working on here. It’s just a feeling, but it’s as if there is too much mechanism here for a language as simple as Forth. I feel that it might be too big, too complicated. There might be simpler, better ways. There might be smaller code that was still at least as clear as what we’ve got.; Just a kind of feeling that what we have here isn’t quite the thing.
Pride?: Is it a pride thing? Perhaps it is. Am I proud of this program? Would I set it forth (n.p.i.) as a good example of something? Has the process of implementation been a good example? Am I learning interesting things, showing interesting things in the articles?; Ow. That kind of stings. I think I’m onto something here. That’s almost important enough to work on.

Pride in Our Work

When the job sucks, and most of them do, I think that doing the work well, so that we can take pride in it, is a good way to proceed. We have to spend hours at it: let’s spend them well.

Much of what I write about goes to spending programming time well. We test and implement in small steps, so that progress is almost always forward. We make the code clear, which we can take pride in immediately, and we make future work as pleasant as we can manage. And I can imagine that my readers might occasionally get a good idea about something that will make their lives a bit better.

Aside: My brother GeePaw Hill reminds me that our feelings about “everything else” are affecting our feelings about everything these days. So maybe I’m just down. But even so … I think my sense of code is good enough to make me think we can do better.

So if we’re not feeling proud, it might behoove us to look around and see what we can spiff up. (We don’t get as many chances to use “behoove” as we might. Should look into that as well.)

Looking Around

One way of assessing a program is to look at the tests. We have three test files, test_initial, test_python, and test_compile. test_python is about 100 lines of random tests of how to do or say things in Python, testing how lambdas are written or checking list indexing, things that I just like to have concrete tests for, to focus my mind. I almost never look at them ever again and they never break.

The test_initial file is a bit under 200 lines and is the initial spike Forth interpreter that I coded up to get a sense of how to do it. It doesn’t include colon definitions and is a pure interpreter with no lexicon, just patches of code that compare the word string and execute code if it’s theirs.

For a bit of a look at how that works, you can check out the early articles in this series, such as this one.

And test_compile is over 400 lines of very small tests showing that all the Forth words actually work. A typical test looks like this:

    def test_direct_harder(self):
        f = Forth()
        s = (': SUM + ; '
             ' 3 4 SUM')
        f.compile(s)
        assert f.stack.pop() == 7

Later tests are checking more complicated situations:

    def test_variable(self):
        f = Forth()
        f.compile('VARIABLE FOO 1 ALLOT')
        f.compile('VARIABLE BAR 1 ALLOT')
        f.compile('VARIABLE BAZ 1 ALLOT')
        f.compile('666 FOO !')
        f.compile('777 BAR !')
        f.compile('888 BAZ !')
        f.compile('BAZ @ BAR @ FOO @')
        assert f.stack.stack == [888, 777, 666]

I think the tests are good. If something should be bothering me, I don’t think it’s the tests.

We have just five source files, forth.py, lexicon.py, stack.py, heap.py, and word.py.

The stack and heap ones just implement the Forth stacks (I think we use a couple or three instances) and heap. They provide the core pushing and popping methods and, for stack, most of the primitives, ROT and DUP and the like.

Hmm …

A “real” forth would have the stack manipulation code built into the primitive words for ROT and DUP and the like. We have, say, this:

        self.pw('DROP',  lambda f: f.stack.pop())
        self.pw('DUP',   lambda f: f.stack.dup())
        self.pw('OVER',  lambda f: f.stack.over())
        self.pw('ROT',   lambda f: f.stack.rot())

That’s supported by this:

class Stack:
    def pop(self):
        return self.stack.pop()

    def dup(self):
        self.push(self.stack[-1])

    def over(self):
        self.push(self.stack[-2])

    def rot(self):
        stack = self.stack
        stack[-1], stack[-2], stack[-3]  = stack[-3], stack[-1], stack[-2]

The Stack class is just a thin cover over a Python list.

It would be more like classical Forth if our code for, say, ROT, looked like this:

def rot(f):
        stack = f.stack
        stack[-1], stack[-2], stack[-3] = stack[-3], stack[-1], stack[-2]

self.pw('ROT', rot)

Or, if it were allowed:

self.pw(
    'ROT',
    lambda f: 
        stack = f.stack
        stack[-1], stack[-2], stack[-3] = stack[-3], stack[-1], stack[-2])

Then we’d have the actual code right in the word. But Python won’t let us do a multi-line lambda. A Python lambda is a statement. Life is tough, I guess.

I think I’m OK with the stack and heap and how we wire up the primary words. I think this looks pretty nearly OK, for example:

    def define_arithmetic(self):
        self.pw('-',  lambda f: f.stack.push(f.stack.swap_pop() - f.stack.pop()))
        self.pw('%', lambda f: f.stack.push(f.stack.swap_pop() % f.stack.pop()))
        self.pw('MOD', lambda f: f.stack.push(f.stack.swap_pop() % f.stack.pop()))
        self.pw('/',  lambda f: f.stack.push(f.stack.swap_pop() // f.stack.pop()))
        self.pw('+',  lambda f: f.stack.push(f.stack.pop() + f.stack.pop()))
        self.pw('*',  lambda f: f.stack.push(f.stack.pop() * f.stack.pop()))
        self.pw('1+', lambda f: f.stack.push(f.stack.pop() + 1))
        self.pw('1-', lambda f: f.stack.push(f.stack.pop() - 1))

    def define_comparators(self):
        self.pw('=',  lambda f: f.stack.push(1 if f.stack.pop() == f.stack.pop() else 0))
        self.pw('>',  lambda f: f.stack.push(1 if f.stack.pop() > f.stack.pop() else 0))
        self.pw('<',  lambda f: f.stack.push(1 if f.stack.pop() < f.stack.pop() else 0))
        self.pw('>=', lambda f: f.stack.push(1 if f.stack.pop() >= f.stack.pop() else 0))
        self.pw('<=', lambda f: f.stack.push(1 if f.stack.pop() <= f.stack.pop() else 0))

Now I’m not sure how those would be defined in classical Forth. Certainly they’d use either a colon definition or a CREATE-DOES>. They would be defined using the Forth language: mine here are defined using Python language.

Classical Forth would probably define, say, “+” with a DOES> sequence that amounted to the assembly code to pop the stack to an accumulator, add the stack top, replace the stack top with the accumulator. That would be done, I think, with a series of numbers and comma operators, where the numbers are the op codes and address parts of the instructions to do the work. Primitives like ‘+’ would all terminate with a branch to the global address of the next-word code, so you don’t even get a function call on a primitive, you just branch into it and it comes back to next-word.

If our purpose is to get Forth functionality, what we have now seems OK to me. If our purpose is to do it in the classical Forth style, we’re a bit further off the mark here. I think my priorities are probably 80-20 get function vs classical style. I do want to discover and apply the classical ideas but the point is to have a convenient little language, whether for the robots or for myself.

So far so good. What we’ve looked at so far seems OK to me. It’s pretty clean, pretty well organized, easy enough to understand. Nothing amazing, but workmanlike, I’d say.

I am not so positive about the more complex compiling and branching words. They certainly work, but the code doesn’t always leap out at me as obvious. I think the worst of them is if-else-then:

    def _define_if_else_then(self):
        def _compile_conditional(forth, word_to_compile, word_list):
            forth.compile_stack.push(len(word_list) + 1)
            word_list.append(forth.find_word(word_to_compile))
            word_list.append(0)

        def _patch_the_skip(forth, skip_adjustment, word_list):
            patch_loc = forth.compile_stack.pop()
            last_loc = len(word_list) + skip_adjustment
            word_list[patch_loc] = last_loc - patch_loc

        def _if(forth):
            _compile_conditional(forth,'*IF', forth.word_list)

        def _else(forth):
            _patch_the_skip(forth, 1, forth.word_list)
            _compile_conditional(forth, '*ELSE', forth.word_list)

        def _then(forth):
            _patch_the_skip(forth, -1, forth.word_list)

        self.pw('IF',   _if,   immediate=True)
        self.pw('ELSE', _else, immediate=True)
        self.pw('THEN', _then, immediate=True)

We’ll put those words on the list for a deeper look. Possibly inlining the code and then refactoring would make sense? There is duplication covered by the two functions _compile_conditional and _patch_the_skip. It might be OK as it stands. The +1 -1 parameter in _patch_the_skip seems questionable. We might be able to do better. I wonder if reversing the order of the two calls in _else might work? A quick experiment says no. Needs a deeper look.

OK, then. There’s the forth.py file itself.

I notice this:

    def next_token(self):
        if self.token_index >= len(self.tokens):
            return None
        token = self.tokens[self.token_index]
        self.token_index += 1
        return token.upper()

This is more Pythonic, I think:

    def next_token(self):
        try:
            token = self.tokens[self.token_index]
        except IndexError:
            return None
        self.token_index += 1
        return token.upper()

Commit: tidying.

If the tokens were some kind of a stream that returned token or None, that might be nicer. I don’t know if there is such an object, but we could write one. Might be worth a look someday. No big deal anyway.

The basic loop isn’t bad:

    def compile(self, text):
        new_text = re.sub(r'\(.*?\)', ' ', text)
        self.tokens = new_text.split()
        self.token_index = 0
        while self.token_index < len(self.tokens):
            self.compile_a_word().do(self)

    def compile_a_word(self):
        self.word_list = []
        while True:
            token = self.next_token()
            if token is None:
                raise ValueError('Unexpected end of input')
            if (definition := self.find_word(token)) is not None:
                if definition.immediate:
                    definition.do(self)
                else:
                    self.word_list.append(definition)
            elif (num := self.compile_number(token)) is not None:
                self.append_number(num, self.word_list)
            else:
                raise SyntaxError(f'Syntax error: "{token}" unrecognized')
            if self.compile_stack.is_empty():
                break
        return SecondaryWord('nameless', self.word_list)

I suspect compile_number should be renamed to parse_number, and maybe append_number would be better as compile_literal. That’s just off the cuff, though. And not terribly valuable, even if worth doing.

OK, it is worth doing, so do it.

            elif (num := self.parse_number(token)) is not None:
                self.compile_literal(num, self.word_list)

Committed.

Those two methods look very much like the equivalent code in a classical Forth. I would prefer a more closed form with fewer conditionals, a Composed Method sort of thing, but I don’t see anything offhand that looks like a step in the right direction.

Reflection

A decent open-eyed scan of the whole program doesn’t turn up anything to really dislike. Maybe it’s just my mood.

One thing that I would kind of like would be to be able to have only a single kind of word, rather than the Primary / Secondary distinction. Instead of having the two classes with their separate implementations of do, we might be able to put the distinction right into the functions we place in the two kinds of words.

In a classical Forth, you just branch into the word. If it is “primary”, the code executes and branches back to the next-word code. If it is “secondary” the code at the beginning of the word pushes return information on the stack and, I think, branches to next-word, which will then find the next thing in the current word, which will either execute and come back to next-word, or push and come back … sooner or later, a secondary word comes to its end, where it will branch to code that pops the return stack and goes to next-word, which will pick up wherever it was in the preceding word, rinse repeat.

It’s downright magical.

I think I’d like to experiment until we get something similar. It can’t be exactly that: we don’t have the ability to branch into arbitrary code. But maybe it can be more similar than it is. That would be pleasant.

Summary

For now … I am not suddenly filled with joy, but there’s nothing to really hate about this program. We’ll continue to follow our nose, see where we wind up.

See you next time! Good luck in this new world we find ourselves in!