An Idea

The Robot World Repo on GitHub
The Forth Repo on GitHub

#StopTheCoup! We’re working toward an INCLUDE word to read Forth from a file, where we’d save the definitions for whatever app we’re writing. I think we need an inversion of control.

Note: This article got very long. There is a rest stop in the middle. If you choose to rest, when you come back, click here to continue.

As things stand now, we are driving our Forth by sending it things to do, via its compile method, or that method’s subordinate, process_line.

class Forth:
    def compile(self, text):
        try:
            self.process_line(text)
        except Exception as e:
            msg = str(e)
            if msg  == 'Unexpected end of input':
                return '...'
            else:
                self.abend()
                return f'{e} ?'
        return 'ok'

    def process_line(self, text):
        self.input_line = re.sub(r'\(.*?\)', ' ', text)
        while self.input_line:
            self.process_token(self.next_token())
        if self.compilation_state:
            raise ValueError('Unexpected end of input')

Our main, for example, looks like this:

if __name__ == '__main__':
    forth = Forth()
    prompt = 'Forth> '
    while True:
        result = forth.compile(input(prompt))
        if forth.compilation_state:
            prompt = '...'
        else:
            print(result)
            prompt = 'Forth> '

And tests either use compile or process_line':

    def test_conditionals(self):
        f = Forth()
        f.process_line(' 1 1 = ')
        assert f.stack.pop() == f.true
        f.process_line(' 2 1 = ')
        assert f.stack.pop() == f.false
        ...

Now, this works well enough, and it seems clear that when we provide INCLUDE, to read from a file, we can read the file and push lines into compile or process_line, and make everything work. But there’s a problem, and, I think, an opportunity.

A classical Forth is in charge of the computer. It isn’t told what to do, it demands a token (from somewhere) and executes (or compiles) based on that token. Then it gets another. Forever, until it is interrupted or gets a special token that tells it to stop.

I think that we should invert the control in our Forth, making the token-processing loop the top of things, and making the token providers subordinate to the demand for tokens.

Notice the noun!: See me saying “token providers”? That idea, that noun, is a strong clue to what we need to do: we need to have a TokenProvider class (or group of classes, StringTokenProvider, FileTokenProvider, KeyboardTokenProvider) and give one or more of those to our Forth’s greedy token-getting loop.; As we think or talk about things to do, we can often help our design take shape by noting the noun phrases and verb phrases we use in our mind or on paper. Nouns may be hinting at objects, and verbs at methods.

We’d like to do this in small steps, never breaking Forth: that almost goes without saying, except that I like to emphasize it for myself every chance I get, because I am prone to wild leaps of the imagination, and wild leaps in code often lead to long screaming plummeting into the void. Small steps, Ron, small steps.

Let’s look at how our Forth loop works now, and how it gets its tokens. We just changed that yesterday, but that doesn’t mean that my memory of what it does is perfect: it never is.

class Forth:
    def process_line(self, text):
        self.input_line = re.sub(r'\(.*?\)', ' ', text)
        while self.input_line:
            self.process_token(self.next_token())
        if self.compilation_state:
            raise ValueError('Unexpected end of input')

    def process_token(self, token):
        definition = self.get_definition(token)
        if not self.compilation_state or definition.immediate:
            definition(self)
        else:
            self.append_word(definition)

    def next_token(self):
        trimmed = self.input_line.strip()
        index = trimmed.find(' ')
        if index == -1:
            token, self.input_line = trimmed.upper(), ''
        else:
            token, self.input_line = trimmed[:index].upper(), trimmed[index+1:].strip()
        return token

process_line contains the token-greedy code, calling next_token, unless the input_line has gone empty, at which point, the loop exits and possibly raises an exception. Looking at compile above, we’re reminded that that method turns the exception into a return string or returns ok. We’ll need to deal with that sort of thing somewhere. For now, let’s keep after the token provider notion.

Let’s imagine a TokenProvider object with two methods: has_tokens and next_token. The loop in process_line would be changed just a bit:

class Forth:
    def process_line(self, text):
        self.input_line = re.sub(r'\(.*?\)', ' ', text)
        while self.provider.has_tokens():
            self.process_token(self.provider.next_token())
        if self.compilation_state:
            raise ValueError('Unexpected end of input')

Let’s make that work. We can use a trick, setting self.provider to self:

class Forth:
    def __init__(self):
        self.abend()
        self.lexicon = Lexicon()
        self.lexicon.define_primaries(self)

    # noinspection PyAttributeOutsideInit
    def abend(self):
        self.active_words = Stack()
        self.compile_stack = Stack()
        self.compilation_state = False
        self.c_stack_top = None
        self.heap = Heap()
        self.input_line = ''
        self.provider = self  # <===
        self.return_stack = Stack()
        self.stack = Stack()
        self.word_list = []
        self.stack.clear()

Then we implement has_tokens on Forth:

class Forth:
    def has_tokens(self):
        return self.input_line 

Now we can make the process_line feel that it is in control:

    def process_line(self, text):
        self.input_line = re.sub(r'\(.*?\)', ' ', text)
        while self.provider.has_tokens():
            self.process_token(self.provider.next_token())
        if self.compilation_state:
            raise ValueError('Unexpected end of input')

We’re now using Forth as its own provider. We’re green. Let’s commit: moving toward token provider model.

What are we doing here? I’m sure there’s a pattern with a name for what we just did, although it’s the sort of thing we can do more readily in a duck-typing language like Python than we could in Java (where everything is harder) or Kotlin (where you’d better conform).

Now, we could TDD a little class, StringProvider. And I guess we should, although I think we could just as easily create it and plug it in.

I’m going to take the risky path without additional tests for it. No, I’ve talked myself out of it. Why? Because while we can clearly do this easy one, later providers will need to conform to the same protocol, and having tests set up for that protocol will be valuable. So here we go:

class TestProviders:
    def test_hookup(self):
        assert False

Red. Hooked up. Now a real test:

class TestProviders:
    def test_string_provider(self):
        provider = StringProvider('abc def ghi')
        assert provider.has_tokens()
        assert provider.next_token() == 'ABC'
        assert provider.has_tokens()
        assert provider.next_token() == 'DEF'
        assert provider.has_tokens()
        assert provider.next_token() == 'GHI'

And I basically cut and paste to get this class:

class StringProvider:
    def __init__(self, text):
        self.input_line = text

    def has_tokens(self):
        return self.input_line

    def next_token(self):
        trimmed = self.input_line.strip()
        index = trimmed.find(' ')
        if index == -1:
            token, self.input_line = trimmed.upper(), ''
        else:
            token, self.input_line = trimmed[:index].upper(), trimmed[index+1:].strip()
        return token

My test is of course green. Let’s move StringProvider to prod and use it. First, though, commit: StringProvider works in test.

Moved. Commit again: move StringProvider to source side.

Now use it in Forth:

    def __init__(self):
        self.abend()
        self.lexicon = Lexicon()
        self.lexicon.define_primaries(self)

    # noinspection PyAttributeOutsideInit
    def abend(self):
        self.active_words = Stack()
        self.compile_stack = Stack()
        self.compilation_state = False
        self.c_stack_top = None
        self.heap = Heap()
        self.provider = StringProvider('')
        self.return_stack = Stack()
        self.stack = Stack()
        self.word_list = []
        self.stack.clear()

That breaks about 50 or 60 tests, no surprise there. But now:

    def process_line(self, text):
        clean_line = re.sub(r'\(.*?\)', ' ', text)
        self.provider = StringProvider(clean_line)
        while self.provider.has_tokens():
            self.process_token(self.provider.next_token())
        if self.compilation_state:
            raise ValueError('Unexpected end of input')

    def next_token(self):
        return self.provider.next_token()

The next_token method is required because some words want the next token and ask their Forth instance for it.

We are green. Commit: Forth using provider logic in process_line.

However, although process_line thinks it is in charge, it is really still running subordinate to compile.

I’m not sure yet how we’ll sort that out. Instead of dealing with that directly, let’s look at main and see how we might give it a KeyboardProvider. (I am starting to think that we should create Forth with a provider, or at least call it with one. We’ll find our way.)

Note: I want to talk about this “finding our way” thing. Yes, I surely could draw a couple of pictures, scribble some notes, mumble to my friends, and figure out just how all the providers will be assembled and just how the Forth object should deal with this new situation. And what I came up with would be pretty close to right, and we’d sort out any quibbles as we implement the idea.; I choose not to do that “big” design, though of course I’m always thinking. Instead, I take small steps that seem to me to be “in the direction” of where we want to be. I seem always to find small steps that keep the program working, and I don’t usually find myself with a lot of changes to make to get somewhere where I might have gone directly.; For me, this approach works very nicely. I keep most of my detailed ideas in the code, as soon as I can see how to make them even somewhat concrete, and I use the less detailed ideas as guides as to where to go.; Working with main will, I think, drive out a keyboard provider and give us a sense of where that can be provided. I think we’ll find that we want to refactor process_line … and ultimately, perhaps, to eliminate that method entirely, other than perhaps as a testing method.; Let’s see what happens.

Here’s main again:

if __name__ == '__main__':
    forth = Forth()
    prompt = 'Forth> '
    while True:
        result = forth.compile(input(prompt))
        if forth.compilation_state:
            prompt = '...'
        else:
            print(result)
            prompt = 'Forth> '

I think that what we need here is a KeyboardProvider that will work like that loop, deferring the looping down to Forth’s token-getting loop. We’ll have some rigmarole to sort out around the prompting, perhaps, but we will have the forth instance to look at. It’s just that our KeyboardProvider will know too much for a while.

Let’s guess what main will look like given a KeyboardProvider. Something like this:

if __name__ == '__main__':
    forth = Forth()
    forth.main_loop(KeyboardProvider(forth))

You can see why I imagine that we might create Forth with a provider. Anyway, let’s see about writing that class. Can we TDD this thing? We’ll have to fake its input method somehow, since we can’t be going to the real keyboard during a unit test.

We’ll try. I am tempted just to code it up and make it work. But we do have that nice test file, so we can at least try:

    def test_keyboard_provider(self):
        provider = KeyboardProvider()
        provider.input_line = 'abc def ghi'
        assert provider.has_tokens()
        assert provider.next_token() == 'ABC'
        assert provider.has_tokens()
        assert provider.next_token() == 'DEF'
        assert provider.has_tokens()
        assert provider.next_token() == 'GHI'
        assert provider.has_tokens() # always ready

Same test except that KeyboardProvider always has tokens. I get about this far with the implementation and have an idea:

class KeyboardProvider:
    def __init__(self):
        self.input_line = ''

    def has_tokens(self):
        return True

    def next_token(self):

I think that KeyboardProvider would like to use StringProvider to do its work. And quite possibly, all the other providers will feel that way as well. Let’s try it …

class KeyboardProvider:
    def __init__(self):
        self.provider = StringProvider('')

    def has_tokens(self):
        return True

    def next_token(self):
        if not self.provider.has_tokens():
            self.set_line(input('Forth>'))
        return self.provider.next_token()

    def set_line(self, line):
        self.provider = StringProvider(line)

The test is green with the change to use set_line:

    def test_keyboard_provider(self):
        provider = KeyboardProvider()
        provider.set_line('abc def ghi')
        assert provider.has_tokens()
        assert provider.next_token() == 'ABC'
        ...

I’ll move the class to source and then commit: KeyboardProvider passes tests.

Interim Summary (Rest Stop)

This would be a good place for you to pause, if you’re tired. The article will continue after this summary. I’ll put a link to the continuation up at the top so you can find your way back if you do decide to break.

We have done a couple of nice moves here. One was to set up the Forth class as its own Provider, which allowed us to modify the class to expect a Provider, at first just itself, but then, quickly a StringProvider class. In two simple moves we got the provider in place. I am proud of that first move. I’ll pause a moment to bask in your admiration for that simple but strong move.

More important, we now have a working but incomplete KeyboardProvider. In what follows, we’ll be dealing with using it in main, getting it to prompt better, and then to use string results from forth to report ok or errors. That will entail changing where and how we deal with exceptions. And we don’t quite get done, leaving a few tests skipped until next time, because I get tired and know it would be unwise to continue without a break.

So if you wish, take a break now. Otherwise …

Continuing

Now how can we plug this new thing into main and test that by hand?

if __name__ == '__main__':
    forth = Forth()
    prompt = 'Forth> '
    while True:
        result = forth.compile(input(prompt))
        if forth.compilation_state:
            prompt = '...'
        else:
            print(result)
            prompt = 'Forth> '

Ah, I forgot that I want to pass a Forth to the KeyboardProvider. We’ll deal with that along the way.

if __name__ == '__main__':
    forth = Forth()
    provider = KeyboardProvider(forth)
    forth.main_loop(provider)

This is red for lack of a parameter on KeyboardProvider, and no main_loop method in Forth.

class KeyboardProvider:
    def __init__(self, forth=None):
        self.provider = StringProvider('')
        self.forth = forth

That will hold for now. And we refactor this:

    def process_line(self, text):
        clean_line = re.sub(r'\(.*?\)', ' ', text)
        self.provider = StringProvider(clean_line)
        while self.provider.has_tokens():
            self.process_token(self.provider.next_token())
        if self.compilation_state:
            raise ValueError('Unexpected end of input')

Via Extract Method, to this:

    def process_line(self, text):
        clean_line = re.sub(r'\(.*?\)', ' ', text)
        self.provider = StringProvider(clean_line)
        self.main_loop()

    def main_loop(self):
        while self.provider.has_tokens():
            self.process_token(self.provider.next_token())
        if self.compilation_state:
            raise ValueError('Unexpected end of input')

We’re green and could commit this. Let’s do: refactoring to get main loop

But we want main_loop to take the provider as a parameter, though it still needs to be a member variable as well.

    def process_line(self, text):
        clean_line = re.sub(r'\(.*?\)', ' ', text)
        provider = StringProvider(clean_line)
        self.main_loop(provider)

    def main_loop(self, provider):
        self.provider = provider
        while self.provider.has_tokens():
            self.process_token(self.provider.next_token())
        if self.compilation_state:
            raise ValueError('Unexpected end of input')

Green. Commit: main_loop takes provider and saves it.

Let’s see what main does now.

/Users/ron/PycharmProjects/FORTH/.venv/bin/python /Users/ron/PycharmProjects/FORTH/main.py 
Forth> 2 2 + .
4 Forth>: c 32 - 5 * 9 / . ;
Forth>34 c
1 Forth>-40 c
-40 Forth>

That’s good. However, when I type a bare return, I get chaos:

Traceback (most recent call last):
  File "/Users/ron/PycharmProjects/FORTH/main.py", line 9, in <module>
    forth.main_loop(provider)
  File "/Users/ron/PycharmProjects/FORTH/source/forth.py", line 105, in main_loop
    self.process_token(self.provider.next_token())
  File "/Users/ron/PycharmProjects/FORTH/source/forth.py", line 110, in process_token
    definition = self.get_definition(token)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ron/PycharmProjects/FORTH/source/forth.py", line 122, in get_definition
    raise SyntaxError(f'Syntax error: "{token}" unrecognized')
SyntaxError: Syntax error: "" unrecognized

Process finished with exit code 1

I am not exactly sad about this. It was clear that we need to do something about quitting. However, we need to understand just what happened and to decide what to do instead.

When we just hit return, we get an empty line, and so the token we pull from it will also be empty.

I suppose we could define a Word with an empty string as its name. Shall we try that?

    self.pw('', lambda f: None)

If this works, we’ll just get another Forth prompt in response to an empty line. We have good news and bad news:

Forth> 2 3 * .
6 Forth>
Forth> whee!
Traceback (most recent call last):
  File "/Users/ron/PycharmProjects/FORTH/main.py", line 9, in <module>
    forth.main_loop(provider)
  File "/Users/ron/PycharmProjects/FORTH/source/forth.py", line 105, in main_loop
    self.process_token(self.provider.next_token())
  File "/Users/ron/PycharmProjects/FORTH/source/forth.py", line 110, in process_token
    definition = self.get_definition(token)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/ron/PycharmProjects/FORTH/source/forth.py", line 122, in get_definition
    raise SyntaxError(f'Syntax error: "{token}" unrecognized')
SyntaxError: Syntax error: "WHEE!" unrecognized

Process finished with exit code 1

The good news, the blank line worked. The bad news, we would prefer a somewhat less abrupt reaction to an error.

We have a somewhat odd approach to errors, since we cannot just branch to error in Python. We throw exceptions, which are currently handled up in compile:

    def compile(self, text):
        try:
            self.process_line(text)
        except Exception as e:
            msg = str(e)
            if msg  == 'Unexpected end of input':
                return '...'
            else:
                self.abend()
                return f'{e} ?'
        return 'ok'

We’re moving toward compile and process_line either going away or at worst, setting up providers for the main loop. Our current testing assumes that compile returns a string result, and that process_line will raise exceptions, which we actually check in various tests.

Let’s move toward always returning the string results, not the exceptions. That may require us to fiddle a bunch of tests. Let’s make the main work by fielding inside main_loop and then see what we see.

    def main_loop(self, provider):
        self.provider = provider
        try:
            while self.provider.has_tokens():
                self.process_token(self.provider.next_token())
            if self.compilation_state:
                raise ValueError('Unexpected end of input')
        except Exception as e:
            self.abend()
            return f'{e}? '
        return 'ok'

First let’s see if this works for main, but ten tests are breaking, no surprise.

~~~pythonForth> 7 9 * . 63 Forth> Forth>whee!

Process finished with exit code 0

No. We want to stay in the loop! I don't instantly see my way out. Revert.

I go in an odd direction:

~~~python
class KeyboardProvider:
    def next_token(self):
        if not self.provider.has_tokens():
            prompt = 'Forth>'
            if self.forth.compilation_state:
                prompt = '...'
            self.set_line(input(prompt))
        return self.provider.next_token()

I want to get a Forth> prompt in the usual case and ... if there’s a semicolon missing so far.

Forth>2 2 + .
4 Forth>: c 32 - 5 * 9 /
...;
Forth>332 c .
166 Forth>32 c .
0 Forth>bye

Process finished with exit code 0

So that worked as expected. Let’s have our main loop set a forth.result, either OK or a message, but carry on in any case.

class Forth:
    def main_loop(self, provider):
        self.provider = provider
        while self.provider.has_tokens():
            self.result = 'ok'
            try:
                self.process_token(self.provider.next_token())
            except Exception as e:
                self.result = f'{e}'
        if self.compilation_state:
            raise ValueError('Unexpected end of input')

class KeyboardProvider:
    def next_token(self):
        if not self.provider.has_tokens():
            print(self.forth.result)
            prompt = 'Forth>'
            if self.forth.compilation_state:
                prompt = '...'
            self.set_line(input(prompt))
        return self.provider.next_token()

I’m not sure that final exception belongs there. We’ll check main now.

ok
Forth>2 2 +
ok
Forth>.
4 ok
Forth>: c 32 - 5 * 9 / ;
ok
Forth>32 c .
0 ok
Forth>
ok
Forth>whee
ok
Forth>I did not expect that
ok
Forth>

We’re stepping on the result. Code should be this:

    def main_loop(self, provider):
        self.provider = provider
        while self.provider.has_tokens():
            try:
                self.process_token(self.provider.next_token())
            except Exception as e:
                self.result = f'{e}'
            else:
                self.result = 'ok'
        if self.compilation_state:
            raise ValueError('Unexpected end of input')

That gives me what I was looking for:

ok
Forth>2 2 + .
4 ok
Forth>whee
Syntax error: "WHEE" unrecognized
Forth>wahoo!
Syntax error: "WAHOO!" unrecognized
Forth>33 9 + .
42 ok
Forth>

Wow, this is getting super long. We should wrap up. But we have borked tests. Here’s a typical one:

    def test_br_target(self):
        f = Forth()
        test = ': TEST BR_TARGET ;'
        result = f.compile(test)
        result = f.compile('TEST')
        assert (result ==
                'branch not patched in '
                ': TEST BR_TARGET ; ?')

Could this be as simple as changing compile to return the result?

    def compile(self, text):
        self.process_line(text)
        return self.result

More tests are failing, but why?

Here’s the result from the one above:

Expected :'branch not patched in : TEST BR_TARGET ; ?'
Actual   :'branch not patched in : TEST BR_TARGET ;'

I think we can fix that:

    def main_loop(self, provider):
        self.provider = provider
        while self.provider.has_tokens():
            try:
                self.process_token(self.provider.next_token())
            except Exception as e:
                self.result = f'{e} ?'
            else:
                self.result = 'ok'
        if self.compilation_state:
            raise ValueError('Unexpected end of input')

Added space question mark to the result. The BR_TARGET test passes. Still lots of fails, though:

Many of them are failure to raise an exception. I think we’ll plan to fix those up. But some are different:

    def test_clears_compilation_state(self):
        f = Forth()
        msg = f.compile(': foo bar ;')
        assert msg == 'Syntax error: "BAR" unrecognized ?'
        assert f.compilation_state == False

We returned ‘ok’ from that one.

I am somewhat fried and should take a break. And this article is longer than any two or three should be.

Calling abend fixes some of the issues:

    def main_loop(self, provider):
        self.provider = provider
        while self.provider.has_tokens():
            try:
                self.process_token(self.provider.next_token())
            except Exception as e:
                self.result = f'{e} ?'
                self.abend()
            else:
                self.result = 'ok'
        if self.compilation_state:
            raise ValueError('Unexpected end of input')

Let’s see what the rest of the errors are. If they are all raises, I’m prepared to skip them and commit, and clean them up next time.

They are all raises errors except one:

    def test_safe_compile_needs_more_input(self):
        f = Forth()
        result = f.compile(': FOO 42 ')  # no semicolon
        assert result == '...'

That one is raising an exception, the one at the end of main_loop.

I need to sum up and break, if only to show some kind of mercy about article length. There are two ways to go: I can skip these tests and commit as is, or I can keep the branch open. I ask the team and we decide to skip the tests and come back to them next time.

Done, commit: new main loop, skipping six tests marked main loop.

Summary

I was sweeping right along, and I think what we have is pretty decent. The handling of the KeyboardProvider’s dealing with errors seems nearly good, and we should be able to convert all the raises tests to check results instead, and since no one wants an exception anyway, that will be an overall improvement.

A stronger or less wise person would fix those tests right now. I’m over two hours in and am tired, and I do not do good work when tired, so I am going to do the wise thing and rest.

Was it legitimate to commit with those tests broken? I think so, because the operational tests all run and the Forth prompt runs pretty much as intended. Any work that needs to be done defining words or the like can proceed pretty safely. A bit of extra care would need to be used in case of exceptions, which are somewhat swallowed right now. That does need sorting out. But I think it will be, quite soon, in the next article or two.

I am satisfied with progress, and a little disappointed to leave a few laces untied. Imperfect, but we always are.

#StopTheCoup! See you next time!