This Will Be FINE!

The Repo on GitHub

Unless I am mistaken, which is often the case, this morning’s refactoring is going to be fine, F-I-N-E FINE. And I do not mean the meme, I mean really fine.

Our Story Begins

    def compile_a_word(self):
        word_list = []
        while True:
            token = self.next_token()
            if token in self.action_tokens:
                self.compile_action_word(token, word_list)
            elif (definition := self.find_word(token)) is not None:
                word_list.append(definition)
            elif ...
                ...
        return SecondaryWord('nameless', word_list)

    def compile_action_word(self, word, word_list):
        match word:
            case ':':
                if self.compile_stack.is_not_empty():
                    raise SyntaxError(f'Syntax error: nested word definition')
                if word_list:
                    raise SyntaxError(f'Syntax error: "{word_list}" not empty')
                definition_name = self.next_token()
                self.compile_stack.push((':', definition_name))
            case ';':
                if self.compile_stack.is_empty():
                    raise SyntaxError(f'Syntax error: ; without :')
                key, definition_name = self.compile_stack.pop()
                if key != ':':
                    raise SyntaxError(f'Syntax error: ; without :')
                word = SecondaryWord(definition_name, word_list[:])
                self.lexicon.append(word)
                word_list.clear()
            case 'IF':
                self.compile_conditional('*IF', word_list)
            case 'THEN':
                self.patch_the_skip(['*IF', '*ELSE'], -1, word_list)
            case 'ELSE':
                self.patch_the_skip(['*IF'], 1, word_list)
                self.compile_conditional('*ELSE', word_list)
            case 'BEGIN':
                self.compile_stack.push(('BEGIN', len(word_list)))
            case 'UNTIL':
                key, jump_loc = self.compile_stack.pop()
                if key != 'BEGIN':
                    raise SyntaxError(f'UNTIL without BEGIN')
                until = self.find_word('*UNTIL')
                word_list.append(until)
                word_list.append(jump_loc - len(word_list) - 1)
            case 'DO':
                self.compile_stack.push(('DO', len(word_list)))
                word_list.append(self.find_word('*DO'))
            case 'LOOP':
                key, jump_loc = self.compile_stack.pop()
                if key != 'DO':
                    raise SyntaxError(f'LOOP without DO')
                loop = self.find_word('*LOOP')
                word_list.append(loop)
                word_list.append(jump_loc - len(word_list))

That compile_action_word method simply will not do. It’s hard to find the token we care about, amid all that mess. A slightly lesser concern is that the individual cases have strange similarities and differences, and it’s hard to tell which differences are germane and which are just cosmetic.

We must fix this. Well, we choose to fix this. There is no must. I’m not some ancient evil grandfather or other relative or “authority” who tells you what you must do, lest you lose your honor or little badge or something. I don’t even tell myself what I must do. I choose to do things based on my best judgement at the time, fortified by ideas and advice from all over. We choose to fix this.

Stepping back, we note that the tokens in this method are the ones that take action immediately, during the compilation process, as opposed to the words that are just plunked into the word we’re compiling. What happened, as I recall it, was that when it was time to do the first one of those, ‘IF’, I just decided to check for it with a literal compare and then do what was needed. Naturally, I had to do THEN the same way. After that, it just growed¹.

Now yesterday at about this time, I was planning to refactor compile_action_word, probably by inlining all the current sub-methods and then making the branches look more alike and then factoring better sub-methods back out. That was because that is what I might generally do when faced with a big match-case switch like that.

The match-case here is just a dispatcher from a name like ‘IF’ to the code for the if notion, and so on. But we already have a dispatcher. (In fact, we have at least two. We’ll come back to that.)

Our lexicon is a collection of named “words”, which are looked up when being compiled, and which are called during forth execution, using the do method. (I think we’ll come back to that, as well.)

Last night, I was thinking about a completely different approach to Forth, which we might explore at some future time. Somehow it came back to me that Forth had the notion of “immediate” words, which were executed immediately during compilation, as we do with ours, but that the original Forth treated those just like any other words, except that they had an “immediate” bit set, which told the compiler to run them, not just tuck them into the word being built.

You probably get the idea now. We’ll add an “immediate” flag, we’ll look up the input token in the lexicon and if we find it, we’ll check its immediate bit. If it is set, we’ll call it, otherwise we’ll add it to our growing word. Each branch of our match_case will turn into a method called by the word, as are our other PrimaryWords.

(Strictly speaking, most of our primary words are expressed directly in the word instance, as a lambda, but that lambda can invoke any method if it wants to.)

We’ll proceed roughly like this:

Provide an immediate flag in our Word classes. Default it to False.
Modify compile_a_word to look for immediates and distinguish them from non-immediates.
Pick one token to be the first one to use the new scheme.
Make it work.
Rinse, repeat until all are done.
See what further refactoring is desirable.

Let’s get to it

class PrimaryWord:
    def __init__(self, name, code, immediate=False):
        self.name = name
        self.code = code
        self.immediate = immediate

class SecondaryWord:
    def __init__(self, name, word_list, immediate=False):
        self.name = name
        self.words = word_list
        self.immediate = immediate
        self.pc = 0

Green. Commit: working toward immediate words.

Now in compile_a_word, we start with this:

    def compile_a_word(self):
        word_list = []
        while True:
            token = self.next_token()
            if token in self.action_tokens:
                self.compile_action_word(token, word_list)
            elif (definition := self.find_word(token)) is not None:
                word_list.append(definition)
            elif (num := self.compile_number(token)) is not None:
                self.append_number(num, word_list)
            else:
                raise SyntaxError(f'Syntax error: "{token}" unrecognized')
            if self.compile_stack.is_empty():
                break
        return SecondaryWord('nameless', word_list)

Now we “know” that were going to be moving one token at a time from being handled in compile_action_word to an immediate word. I think we’d be wise to reverse the order of the first two ifs, so that the new word will automatically take precedence. Otherwise we’d have to remember to remove the token from action_tokens.

    def compile_a_word(self):
        word_list = []
        while True:
            token = self.next_token()
            if (definition := self.find_word(token)) is not None:
                word_list.append(definition)
            elif token in self.action_tokens:
                self.compile_action_word(token, word_list)
            elif (num := self.compile_number(token)) is not None:
                self.append_number(num, word_list)
            else:
                raise SyntaxError(f'Syntax error: "{token}" unrecognized')
            if self.compile_stack.is_empty():
                break
        return SecondaryWord('nameless', word_list)

Green. Commit ditto.

Now let’s modify the code where we have found the definition to detect the case. I don’t know what to code yet, so I’ll include an exception. I just want to get the rough code in place where we can see it.

    def compile_a_word(self):
        word_list = []
        while True:
            token = self.next_token()
            if (definition := self.find_word(token)) is not None:
                if definition.immediate:
                    raise SyntaxError('not supported')
                else:
                    word_list.append(definition)
            elif ...

Fine. Commit ditto.

Now we need to pick a word from compile_action_word to be our lab rat².

    def compile_action_word(self, word, word_list):
        match word:
            case ':':
                ...
            case ';':
                ...
            case 'IF':
                self.compile_conditional('*IF', word_list)
            case 'THEN':
                self.patch_the_skip(['*IF', '*ELSE'], -1, word_list)
            ...

IF is pretty clearly our baby, it has a simple call. And now we encounter our first glitch: this code, and pretty much every one of the immediates, needs access to the word_list that we are building up.

I can think of two options: rework the Word classes to accept another parameter, the word_list, which most of them will ignore but the immediates will not … or, promote the word list to a member variable in Forth class, which will give them all access to it.

The first way is arguably better in that it keeps the Forth class more narrow. But it would involve changing all the words all at once. Let’s not. We’ll choose option two.

    def compile_a_word(self):
        word_list = []
        while True:
            token = self.next_token()
            if (definition := self.find_word(token)) is not None:
                if definition.immediate:
                    raise SyntaxError('not supported')
                else:
                    word_list.append(definition)
            elif token in self.action_tokens:
                self.compile_action_word(token, word_list)
            elif (num := self.compile_number(token)) is not None:
                self.append_number(num, word_list)
            else:
                raise SyntaxError(f'Syntax error: "{token}" unrecognized')
            if self.compile_stack.is_empty():
                break
        return SecondaryWord('nameless', word_list)

That becomes …

    def compile_a_word(self):
        self.word_list = []
        while True:
            token = self.next_token()
            if (definition := self.find_word(token)) is not None:
                if definition.immediate:
                    raise SyntaxError('not supported')
                else:
                    self.word_list.append(definition)
            elif token in self.action_tokens:
                self.compile_action_word(token, self.word_list)
            elif (num := self.compile_number(token)) is not None:
                self.append_number(num, self.word_list)
            else:
                raise SyntaxError(f'Syntax error: "{token}" unrecognized')
            if self.compile_stack.is_empty():
                break
        return SecondaryWord('nameless', self.word_list)

Green, commit.

Now I think we can create an immediate word that will work. I add this to the definitions

        lex.append(PrimaryWord('IF', lambda f: f.imm_if(), immediate=True))

Some tests break. They all have to do with IF. This might not be a surprise. We need the method imm_if:

    def imm_if(self):
        self.compile_conditional('*IF', self.word_list)

And of course if I had looked at the tests I’d have been reminded of this:

            if (definition := self.find_word(token)) is not None:
                if definition.immediate:
>                   raise SyntaxError('not supported')
E                   SyntaxError: not supported

But I believe we know what we have to do now:

    def compile_a_word(self):
        self.word_list = []
        while True:
            token = self.next_token()
            if (definition := self.find_word(token)) is not None:
                if definition.immediate:
                    definition.do(self)
                else:
                    self.word_list.append(definition)
            ...

And we are green. Commit: IF converted to a PrimaryWord.

We can, if we wish, remove the case from the switch, and remove the token from the list of action words. We’ll do that: it is a sensible, orderly way to proceed.

Let’s briefly reflect. We may have learned something.

Reflection

Well, the good news, of course, is that this is rather clearly going to work. But there is more. If we look at the details of how we did it, we created a new method, named imm_if, that contained the exact same code as the ‘IF’ case in compile_action_word.

Therefore, we we would just extract each case’s code to a method with a name like imm_if, we would be perfectly set up to define the new words and have them take up the job.

Let’s try one that way.

            case 'THEN':
                self.patch_the_skip(['*IF', '*ELSE'], -1, word_list)

Extract method:

            case 'THEN':
                self.imm_then(word_list)

Define the word:

        lex.append(PrimaryWord('THEN', lambda f: f.imm_then(), immediate=True))

Doesn’t quite work. It’s not really an extract method, because we want our imm method not to take the parameter. I think PyCharm might help with that. For THEN, I have this:

    def imm_then(self):
        self.patch_the_skip(['*IF', '*ELSE'], -1, self.word_list)

Nearly good. Try another simple one.

            case 'BEGIN':
                self.compile_stack.push(('BEGIN', len(word_list)))

    def imm_begin(self):
        self.compile_stack.push(('BEGIN', len(self.word_list)))

        lex.append(PrimaryWord('BEGIN', lambda f: f.imm_begin(), immediate=True))

OK, this is down to pretty rote. I’ll just do the rest and check back in here if (when) something goes wrong.

Ha. Done. Here’s what’s left of the match-case. I added the default just so it would compile, so I could show it to you:

    def compile_action_word(self, word, word_list):
        match word:
            case _:
                pass

The new lexicon-defining method is this:

    def define_immediates(self, lex):
        lex.append(PrimaryWord(':', lambda f: f.imm_colon(), immediate=True))
        lex.append(PrimaryWord(';', lambda f: f.imm_semi(), immediate=True))
        lex.append(PrimaryWord('IF', lambda f: f.imm_if(), immediate=True))
        lex.append(PrimaryWord('ELSE', lambda f: f.imm_else(), immediate=True))
        lex.append(PrimaryWord('THEN', lambda f: f.imm_then(), immediate=True))
        lex.append(PrimaryWord('BEGIN', lambda f: f.imm_begin(), immediate=True))
        lex.append(PrimaryWord('UNTIL', lambda f: f.imm_until(), immediate=True))
        lex.append(PrimaryWord('DO', lambda f: f.imm_do(), immediate=True))
        lex.append(PrimaryWord('LOOP', lambda f: f.imm_loop(), immediate=True))

I remove the action word list, the check for action words, the unused and empty compile_action_word method. Commit: all action words converted to immediate.

This will be a great place to pause and sum up. There are a few topics to touch upon, and there is still some detailed refactoring that we might choose to do later.

Reflection

We have replaced that nasty compile_action_word above with the define_immediates method above, which all takes place once and for all at the time we create a Forth instance. Now it is true that the code for each case still exists: it’s broken out into the individual imm_xyz methods. And at least a couple of those could use a little refactoring. There is a certain vague sense of similarity here, for example:

    def imm_loop(self):
        key, jump_loc = self.compile_stack.pop()
        if key != 'DO':
            raise SyntaxError(f'LOOP without DO')
        loop = self.find_word('*LOOP')
        self.word_list.append(loop)
        self.word_list.append(jump_loc - len(self.word_list))

    def imm_until(self):
        key, jump_loc = self.compile_stack.pop()
        if key != 'BEGIN':
            raise SyntaxError(f'UNTIL without BEGIN')
        until = self.find_word('*UNTIL')
        self.word_list.append(until)
        self.word_list.append(jump_loc - len(self.word_list) - 1)

We know how to deal with that, and surely we will, in due time.

We have not reduced the line count of the file, in fact we have probably increased it by a few lines, mostly blank. The Forth class is now 238 lines counting white space, and it has 37 methods counting __init__. Average method length is about 5.5 lines, not too shabby, but 37 is a lot. Maybe we can do something about that.

type	count
lexicon	7
immediate	9
star	5
i j k	3
other	13

“Star” methods are primary word methods that can’t be expressed using Python’s weak lambda. And “i j k” are the primary word methods for fetching loop indices. We only have ‘I’ so far but J and K will come along.

I do think this is a bit much, and we’ll look to see whether we can ship some of these off to helper classes or something.

I was going to speak about dispatching, and the do method. I mentioned that we have at least two ways of dispatching. There are at least five that come to mind:

If statements;
Match-case statements:
The lexicon and its lambdas, used this morning to our delight;
Python’s __call__ capability;
Python’s built=in class/method dispatching.

The “two” that I had in mind up above were the lexicon, __call__, and method dispatching³. We converted from match-case to the lexicon, which uses lambdas to either do what needs to be done, or to call a method that needs to be called.

When a SecondaryWord is being executed, we call do on each of its contained Words, until finally we wind down and call do on a PrimaryWord, which actually does something and then returns. It is an odd characteristic of Forth that the SecondaryWords never actually do anything. Everything that happens is down to what the PrimaryWords do. Anyway, do calls the code of the PrimaryWord, which is the lambda inside it. So there is a call and a call. I think we can probably save one level of calling there, by using Python’s __call__ capability, and I think we can possibly save more than that, if we can get python’s method dispatch involved sooner.

Anyway, the sole point here is that there are quite a few different dispatch methods. Today we changed from if statements to a table lookup and a method do, invoking a lambda. The result was more compact code that is no less clear, quite likely more clear, than what we had.

I feel pretty sure that we can take a link out of that call chain, perhaps even more than one. We’ll consider that another day. Probably next year.

Summary

By folding our very messy compile_action_word into the lexicon, we have managed to remove that method completely, and to process all the immediate words by simply looking them up in the lexicon and calling them. This is quite nifty, in my view.

It does have an aspect that some folx will not prefer: it used to be that all the immediate actions were inside that one messy method. At least we knew where they all were, and I know that some people like things that way. I myself do not. I prefer to look at code only when it is in my “thought path”, so that large constructions like the old compile_action_word are just in the way, if there is a faster more direct way to find what I want, and to avoid the code that is not germane.

For example, if I want to know how ‘ELSE’ works, this line:

        lex.append(PrimaryWord('ELSE', lambda f: f.imm_else(), immediate=True))

… points to imm_else and Command+B takes me there:

    def imm_else(self):
        self.patch_the_skip(['*IF'], 1, self.word_list)
        self.compile_conditional('*ELSE', self.word_list)

I can explore further or not, but in either case I’m not distracted or confused by all the other lines that used to surround that code, where I might finally find, among the 44 lines that surround it:

            case 'IF':
                self.compile_conditional('*IF', word_list)
            case 'THEN':
                self.patch_the_skip(['*IF', '*ELSE'], -1, word_list)
            case 'ELSE':
                self.patch_the_skip(['*IF'], 1, word_list)
                self.compile_conditional('*ELSE', word_list)
            case 'BEGIN':
                self.compile_stack.push(('BEGIN', len(word_list)))

It does take a while to come to prefer tiny methods and you may or may not have reached that point. You may freely choose never to reach it, though I hope you’ll try the idea, and all kinds of ideas, because once in a while we can make our lives better with simple changes to what we do.

And that, in my view, is what a good practitioner of our craft does: continually try new things, keeping the ones that help and dropping the ones that do not.

Today, we converted a bunch of code to data, and broke a large method into 9 tiny ones. This pleases me, and I hope that it pleases or at least entertains you.

Wishes

I hope your new year is a happy one, that things don’t go as badly as it currently seems they might, and that, in any case, you’ll join me here for more programming fun.

See you next year!

Like Topsy. ↩
No animals were harmed or even inconvenienced during this procedure. ↩
Yes, I know that’s more than two. It’s really about 2 and a half, since __call__ is a method. Anyway we’ll see what happens when it happens. ↩