Happy New Year!

The Repo on GitHub

And so, Forth. Here chez Ron, it’s another morning with Python and Forth. Some new year wishes within. Read on. (Spoiler: As often happens, things do not unfold as I anticipated. No worries.)

Happy New Year: I wish us all, the grand all, a happy, safe, healthy, all good things new year, and many to come. I will do what I can to make the wish come true. If we all, the grand all, were to do the same, the wish can come true.

Plan

Today, we’ll refactor a bit, to make our little Forth system look more like we like things to look. But first I want to muse a bit on my chat with GeePaw Hill last night.

Musing

In his youth, Hill worked a lot with Forth, including implementing it on raw hardware. I have never done that, but I have used Forth, studied it in the past and in the now, and I have written assembler for those same old machines, so I have a decent understanding of what a real old-time Forth is like. Generally speaking, a Forth is built up from essentially nothing but raw hardware, often not even an assembler. It was built on very small machines, so it was tight, compact, and used every trick to minimize both space and time.

So it is really quite anomalous to build Forth in Python, which is itself built with C, which is itself built mostly with C and somewhat with assembler, on top of an operating system that consumes 4 gigabytes of memory all on its own. An early-days Forth computer might have a few kilobytes of memory … or even less!

Still, we have a rationale for doing it: controlling bots in our “game”, with user-provided bot control code. And we don’t need much of a rationale anyway, You are talking to a man who has implemented Spacewar! and Space Invaders just for the sake of doing it.

So here we are, implementing Forth in Python. What are some of the desiderata for this implementation, in the light of the new year?

Desiderata

I’d like the code to meet my somewhat vague standards:

it should be well-tested, primarily driven by TDD tests;
it should consist mostly of small classes with small numbers of small methods;
most of the code should be easy to understand without a lot of digging, through the use of meaningful names and subordinate objects that carry out clearly discernible duties.

In the “spirit” of Forth, I would also like this program to be as compact and efficient as it can be, in the light of the general desires above.

And the truly vague desire: I would like the program to be implemented “like” Forth, in that we can find in it many of the same ideas that make up early-days Forth. This notion is not clear to me. It’s going to be one of those IKIWISI ideas: I’ll Know It When I See It.

Refactoring

For now, we can work on bringing the code up to Chez Ron Standards, small clear classes with small clear methods. There’s plenty to do. Our subject will be the Forth class, in forth.py, which is at this writing 235 lines long.

Hill commented last night that that wasn’t bad, but then remembered that he has been working on a project with an 8000 line file containing a single class. I think it’s bad, because it tempts me to open the project hierarchy widget in PyCharm, so that I can click on a method name to get to it. Horrors! Everything should be in view or no more than a page up or down away. Two at most, and easy to find at that.

As I mentioned yesterday, there are some classifications of methods in Forth class:

type	count
lexicon	7
immediate	9
star	5
i j k	3
other	13

The lexicon ones are the methods that define my PrimaryWords. The immediate ones are the individual code (methods) for immediate words like IF and THEN and colon. The star ones are code for the “secret” words that Forth uses as the run-time code for things like IF.

Today I propose to move the immediates to their own class, and then, time permitting, to do the same with the lexicon. In that case, I think we’ll need to create a somewhat more complex object than we’ll need for the immediates. Let’s get to it.

Immediates

There are nine methods like these two:

    def imm_begin(self):
        self.compile_stack.push(('BEGIN', len(self.word_list)))

    def imm_colon(self):
        if self.compile_stack.is_not_empty():
            raise SyntaxError(f'Syntax error: nested word definition')
        if self.word_list:
            raise SyntaxError(f'Syntax error: "{self.word_list}" not empty')
        definition_name = self.next_token()
        self.compile_stack.push((':', definition_name))

They are referenced exactly once each, in the definition of the corresponding PrimaryWord:

        lex.append(PrimaryWord(':', lambda f: f.imm_colon(), immediate=True))
        ...
        lex.append(PrimaryWord('BEGIN', lambda f: f.imm_begin(), immediate=True))

I propose a very simple set of changes: New class Immediate, containing all those methods. Forth will create an instance of Immediate, called imm, and the lexicon definitions will say f.imm.imm_begin where they now say f.imm_begin.

Grr. As I describe this idea, it becomes clear that the Immediate methods will have to forward everything they do back to the Forth object, resulting in something like this:

class Immediate:
    def imm_colon(self):
        if self.forth.compile_stack.is_not_empty():
            raise SyntaxError(f'Syntax error: nested word definition')
        if self.forth.word_list:
            raise SyntaxError(f'Syntax error: "{self.word_list}" not empty')
        definition_name = self.forth.next_token()
        self.forth.compile_stack.push((':', definition_name))

Or at least:

class Immediate:
    def imm_colon(self):
        forth = self.forth
        if forth.compile_stack.is_not_empty():
            raise SyntaxError(f'Syntax error: nested word definition')
        if forth.word_list:
            raise SyntaxError(f'Syntax error: "{word_list}" not empty')
        definition_name = forth.next_token()
        forth.compile_stack.push((':', definition_name))

Maybe that’s not too awful. Or, I have yet another idea. Let’s try it with colon since we have it here in front of us. This is an experiment that we may keep or may toss out.

class Immediate:

    def imm_colon(self, forth):
        if forth.compile_stack.is_not_empty():
            raise SyntaxError(f'Syntax error: nested word definition')
        if forth.word_list:
            raise SyntaxError(f'Syntax error: "{forth.word_list}" not empty')
        definition_name = forth.next_token()
        forth.compile_stack.push((':', definition_name))

I’m proposing passing the forth instance as a parameter. Now I need to have an instance to use:

class Forth:
    def __init__(self):
        self.active_words = []
        self.compile_stack = Stack()
        self.imm = Immediate()  # <====
        self.lexicon = []
        self.define_primaries()
        self.return_stack = Stack()
        self.stack = Stack()
        self.tokens = None
        self.token_index = 0
        self.word_list = None

And need to use it:

    def define_immediates(self, lex):
        lex.append(PrimaryWord(':', lambda f: f.imm.imm_colon(f), immediate=True))

We are green. I comment out the old method in forth. Why didn’t I delete it? Because I want to think and I don’t want to think about Git.

So, after taking PyCharm’s advice about static, we have this:

class Immediate:

    @staticmethod
    def imm_colon(forth):
        if forth.compile_stack.is_not_empty():
            raise SyntaxError(f'Syntax error: nested word definition')
        if forth.word_list:
            raise SyntaxError(f'Syntax error: "{forth.word_list}" not empty')
        definition_name = forth.next_token()
        forth.compile_stack.push((':', definition_name))

I am not as impressed with this idea as I was when I hadn’t really thought about it. We could just as well move the methods into a separate file and import them as is.

Let’s belay this idea, roll back, and reflect.

Reflection

I want to think more deeply about the imm methods. Some of them are quite short, and some are long:

    def imm_colon(self):
        if self.compile_stack.is_not_empty():
            raise SyntaxError(f'Syntax error: nested word definition')
        if self.word_list:
            raise SyntaxError(f'Syntax error: "{self.word_list}" not empty')
        definition_name = self.next_token()
        self.compile_stack.push((':', definition_name))

    def imm_do(self):
        self.compile_stack.push(('DO', len(self.word_list)))
        self.word_list.append(self.find_word('*DO'))

If they were all short, I don’t think I’d mind them so much. Most of the long ones are long because they are checking the stacks and lists and such. Now the original Forth implementations had a tendency to just say “?” when something went wrong. We’d like to do better than that, but what if we were to take out all the error checking and see what happens. Let’s try that as an experiment.

If I remove all those checks, only one test fails. It fails popping an empty stack, which is surely a common error in Forth. Those checks are not tested. Let’s remove them, and as errors arise, deal with them some other way. We have one to play with right now.

The imm methods are now a bit less messy and therefore a bit less irritating:

    def imm_begin(self):
        self.compile_stack.push(('BEGIN', len(self.word_list)))

    def imm_colon(self):
        definition_name = self.next_token()
        self.compile_stack.push((':', definition_name))

    def imm_do(self):
        self.compile_stack.push(('DO', len(self.word_list)))
        self.word_list.append(self.find_word('*DO'))

    def imm_else(self):
        self.patch_the_skip(['*IF'], 1, self.word_list)
        self.compile_conditional('*ELSE', self.word_list)

    def imm_if(self):
        self.compile_conditional('*IF', self.word_list)

    def imm_loop(self):
        key, jump_loc = self.compile_stack.pop()
        loop = self.find_word('*LOOP')
        self.word_list.append(loop)
        self.word_list.append(jump_loc - len(self.word_list))

    def imm_semi(self):
        key, definition_name = self.compile_stack.pop()
        word = SecondaryWord(definition_name, self.word_list[:])
        self.lexicon.append(word)
        self.word_list.clear()

    def imm_then(self):
        self.patch_the_skip(['*IF', '*ELSE'], -1, self.word_list)

    def imm_until(self):
        key, jump_loc = self.compile_stack.pop()
        until = self.find_word('*UNTIL')
        self.word_list.append(until)
        self.word_list.append(jump_loc - len(self.word_list) - 1)

We have one test failing:

    def test_syntax_error(self):
        f = Forth()
        s = '; SQUARE DUP + ;'
        with pytest.raises(SyntaxError) as e:
            f.compile(s)
        assert (str(e.value) ==
                'Syntax error: ; without :')

It’s now getting a different error:

tests/forth.py:53: in <lambda>
    lex.append(PrimaryWord(';', lambda f: f.imm_semi(), immediate=True))
tests/forth.py:156: in imm_semi
    key, definition_name = self.compile_stack.pop()
    def pop(self):
>       return self.stack.pop()
E       IndexError: pop from empty list

I think what we need to do, if we’re going to go this way, is to put a try up in compile and when we get an exception, report some information about the situation. We’ll save that for another day. For now, we’ll fix this one syntax error test and wait to see what happens over time.

    def test_syntax_error(self):
        f = Forth()
        s = '; SQUARE DUP + ;'
        with pytest.raises(IndexError) as e:
            f.compile(s)
        assert (str(e.value) ==
                'pop from empty list')

We are green. I choose to commit: removing all error checking from immediate words.

And, while we’re less than 90 minutes in, this has gone weirdly enough that I think we’ll finish up here with a reflective summary or perhaps a summary reflection.

Summary

So. Offloading the immediate methods to a separate class kind of lost its appeal to me. It is certainly doable but it just didn’t seem tasty. Looking at the methods themselves showed that they did a lot of error checking that was not tested, and perhaps was not called for. If we do need some ongoing integrity checks, we’ll look for a better way of doing them than listing them in line. Perhaps they can be made declarative in some way or at least embodied in a one-line call to a suitable checker, maybe that won’t be so bad.

Removing the error checking is of course “dangerous”. But not very dangerous, because all the error checks were fatal anyway. So what we will lose will be precision in the error messages, not any particular safety.

On the brighter side, I am getting a glimmer of what might be an idea. It might possibly lead us to one or more separate classes like Immediate .. but with a more meaningful reason to exist. Something about a virtual machine comes to mind. I don’t know, just musing at this point.

There are smaller refactoring steps to be taken: there is similarity here that could be exploited. We may well try to create more similarity and then exploit it better.

In a real Forth, if a word is to be predefined as a primary, the definition would consist of a series of calls to a very small number of basic words that would allow for pasting actual machine code into a word’s definition. We aren’t going to use Python’s eval, and its lambda is weaker than we’d like, but perhaps we can come up with something that keeps the word’s code and the word definition together. Words like these are quite nice in that regard:

    def define_comparators(lex):
        lex.append(PrimaryWord('=', lambda f: f.stack.push(1 if f.stack.pop() == f.stack.pop() else 0)))
        lex.append(PrimaryWord('>', lambda f: f.stack.push(1 if f.stack.pop() > f.stack.pop() else 0)))
        lex.append(PrimaryWord('<', lambda f: f.stack.push(1 if f.stack.pop() < f.stack.pop() else 0)))
        lex.append(PrimaryWord('>=', lambda f: f.stack.push(1 if f.stack.pop() >= f.stack.pop() else 0)))
        lex.append(PrimaryWord('<=', lambda f: f.stack.push(1 if f.stack.pop() <= f.stack.pop() else 0)))

All the code is there in the definition. It’s just that some of our definitions aren’t acceptable to Python’s weak lambda. But maybe we can get closer to that ideal.

It’s going to take some experimentation. If and when I get a good (or tempting) idea, you’ll be the second to know. And meanwhile we can make some smaller but still valuable improvements.

So: Tried an idea and didn’t like the result enough to continue. If we had continued, maybe we’d like it better, but while it would off-load some methods, the resulting code just didn’t please me. No point replacing one dissatisfaction with another. We’ll see if we can come up with an idea that increases satisfaction overall.

An odd morning. Sometimes it do be like that. I’m content: we did the right thing given what we could see. See you soon!