Forth?

The Repo on GitHub

I’m going to take a hard right turn and see about giving the game a ‘scripting language’. I am proposing a Forth dialect. This could get weird. (Oops, it’s Forth not FORTH. Sorry, Chuck.)

The Scheme

When we were envisioning this program back last year, Bryan brought up the idea of a bot programming language embedded in the server, so that players could program their bots in that language. I thought that would be difficult. I still do. But as difficult goes, the Forth language has a lot going for it. It is very simple; it has been used for process control kinds of things; there’s lots of information about how to implement it, including at least three offerings in Python that I stumbled across last night.

Not that I’m going to port in one of those implementations, oh no, that would not be my way. What I propose to do is to learn about Forth in small bits, and to study how it’s implemented as little as possible, and then as abstractly as possible, i.e. not what it looks like in Python, but more like “and then we read tokens until we find a semicolon, and we stuff them into a frammis”.

So we’ll proceed by learning about what Forth is like, what its words and “syntax” are, how it is interpreted, how it is “compiled”. We’ll probably just implement whatever primitives we seem to need, although there are some “well-known” ways of minimizing the primitives that need to be written, which is handy when you’re implementing Forth from scratch on some new processor.

Forth

This description is improvised. It reflects my current understanding of Forth. I last used it some time before 1986. Unless I used it after then, but I don’t think so. ANd I spent about ten minutes looking at an introduction to Forth last night, which went too slowly and I became bored quickly.

Forth statements are strings of characters separated by one or more spaces. There is (almost?) no special punctuation. I think that, done correctly, there is none, but as we’ll see, certain punctuation characters do appear to be special. I think that perhaps they are not. We’ll find out.

Forth is a stack-based language. Generally speaking it processes elements by pushing them onto the stack, with operations removing them from the stack, and providing results by pushing them back onto the stack. It is a postfix language: consider this sequence:

3 4 +

This sequence pushes 3 onto the stack, then pushes 4 onto the stack. It then executes the “word” +, which removes the 4 from the stack, removes the 3 from the stack, adds them getting 7, and pushes 7 back onto the stack. The top of the stack is shown on the right. We’ll see that there is a standard notation for describing what an operation does, something like this:

(x1 x2 -- x1+x2)

That comment, as part of a word definition, means that the word takes the top two stack elements and replaces them with the sum.

And, by the way, Forth calls its operations “words”. As we’ll see, many of them look like words, which is part of its simplicity and elegance.

Forth recognizes numbers as words, and the associated operation for all numbers is to push that number onto the stack. I happened to look up strings last night and learned that ‘s”’, is used in string definition. If we had this input:

7734 s" Hello, world!"

The 7734 would be recognized as a number word and it would push 7734 onto the stack. The s” is a word that will skip the space after it and then read characters up to another quote mark. I think that it then pushes the address of the first character of the string onto the stack and then the length. I guess the string has been sequestered somewhere. Possibly, since presumably the line containing the string has been read into memory, the address is just the address within the input line(!!!)

!!!: That’s an interesting thought. If the program is just a bunch of text in memory, the string will be there and we can just point to it. Hm!

I think it’s time for a bit of experimentation. I’m going to create a new PyCharm project for Forth, and I’ll make a repo for it soon, though perhaps not today.

import pytest


class TestInitial:
    def test_hookup(self):
        assert True

Green. We could commit.

Let’s try an experiment:

    def test_stack_plus(self):
        stack = []
        def plus():
            x1, x2 = stack.pop(), stack.pop()
            stack.append(x1 + x2)

        def number(n):
            stack.append(n)

        number(3)
        number(4)
        plus()
        assert stack.pop() == 7

We have two “words”, plus and number. number pushes its argument onto the stack. plus pops off two numbers, adds them, and pushes the result back onto the stack.

The test works just fine. What is interesting is that PyCharm proposed the stack.pop() calls in plus, and the appends. Its suggestions are often quite prescient, and I do not have the “AI” turned on. Nor do I propose to turn it on.

Let’s pull those functions out so that we can write more tests with them and with functions like them.

stack = []

def clear():
    global stack
    stack = []

def plus():
    x1, x2 = stack.pop(), stack.pop()
    stack.append(x1 + x2)

def number(n):
    stack.append(n)

class TestInitial:
    def test_stack_plus(self):
        clear()
        number(3)
        number(4)
        plus()
        assert stack.pop() == 7

Green. Commit: early tests. No, not yet. PyCharm needs to be told which files to commit and which not, and I don’t know the answers. I’ll research that later.

Another test. Let’s see what happens when we try to pop things that aren’t there.

    def test_stack_insufficient(self):
        clear()
        number(3)
        with pytest.raises(IndexError):
            plus()
        assert len(stack) == 0

Let’s see if we can interpret a little string.

    def test_interpret_string(self):
        clear()
        s = '3 4 +'
        interpret(s)
        assert stack.pop() == 7

This fails for want of interpret. My plan is to split the line, check the elements first to see if they are numbers and then to see if they are ‘+’.

I implement interpret thus:

def interpret(s):
    words = s.split()
    for word in words:
        interpret_word(word)

def interpret_word(word):
    if is_number(word):
        stack.append(int(word))
    elif word == '+':
        plus()

def is_number(s):
    try:
        int(s)
        return True
    except ValueError:
        return False

Green. Let’s implement subtraction, we seem to be on a roll.

    def test_minus(self):
        clear()
        s = '100 49 -'
        interpret(s)
        assert stack.pop() == 51

And:

def interpret_word(word):
    if is_number(word):
        stack.append(int(word))
    elif word == '+':
        plus()
    elif word == '-':
        x1, x2 = stack.pop(), stack.pop()
        stack.append(x2 - x1)

Note that we subtract x1 from x2, since x1 is the first thing popped.

I think we’re done for the morning. Let’s reflect and sum up.

muS | Sum

Reflect and sum, get it? Get it?

What have we learned here? Well, we see that we can manage a stack fairly easily as a Python list. It even supports pop, though Python in its wisdom does not support push. We’ll clearly need some kind of try:except: to deal with stack underflow in a real implementation. Should be no biggie, and there will perhaps be other exceptions that could arise.

I wonder how a “real” Forth, implemented on some tiny computer, would deal with this. Did they have exceptions you could catch? Was there some standard known location you could branch to to recover? Interesting … but we have a real power tool here with Python.

It really does look as if we can “just” split the line on spaces, do some special checking for numeric types, and then just switch off to looking up words and executing them. We’ll be defining new words, beyond whatever primitives we define like ‘+’ and ‘-‘, so there will be a table of some kind. Each new word will be defined in terms of existing words. So far, everything is on the stack, and each word is duty-bound to meet its stack-change definition ( x1, x2 == x2 - x1). Does that mean that we don’t have to have a return stack? Can we just branch to the next word definition? No … I think we’ll have a location in the program (list of words) and when we do a word, it will have to return to the point where we invoked that word. So perhaps each non-primitive word is called with the location where we called it, and returns that location?

I don’t know. Something like that. Must experiment and perhaps do a bit of reading. Next time I’ll be sure to tell you what I’ve studied. I want to keep my mind as empty as possible, because much of the fun is in figuring out ways to do these things, not just copying a known solution. (I admit that I have a notion of “fun” that some would consider to be “odd”. So be it.)

For today, we have written a tiny Forth-like interpreter. Whee! See you next time!