Toward REPL
We need to change our little Forth so that it can ‘compile’ and execute any Forth code, not just compile definitions. I’m expecting trouble and will regroup in that case. Results: I do a little refactoring and decide to regroup.
A “real” Forth consumes code tokens, executing them as it goes. Some tokens are special, and their execution consists of consuming following tokens, up to some marker, and compiling their definitions into a list, which may be saved away or executed immediately, depending what is going on. Frankly, I find the process more than a little difficult to think about. This is surely going to make it no easier to implement.
I’m not sure, at this writing, how we’ll even know we have what I’m trying to do. We’ll try to write some tests, and I rather suspect that our existing tests might need to be changed.
As part of warming up, let me try to describe how a “real” Forth works, at an abstract level. Note that this description is surely flawed, but expressing our understanding is part of figuring out how to implement what we need to implement.
Forth has two modes, which we’ll call “execute” and “compile”. Forth processes a series of “tokens”, strings of characters, separated by blanks. In execute mode, as soon as a token is read, it is looked up in the list of all words. When its definition is found, the code for that word is executed. Some of the words are “primary”, and when they are executed, they perform some small operation, such as adding the top two numbers on the stack and pushing the result back to the stack. Some words are secondary, consisting of a list of other words. When in execute mode, this list is executed by executing each of the words in the list, which may be primary or secondary.
Some words put Forth into compile mode. In this mode, Forth is building a secondary word, or at least the list that makes it up, and if proceeds by reading tokens, looking them up, and putting the result into the list. Sooner or later, it encounters a matching word that leaves compile mode. It finishes up the list it was working on and goes back into execute mode.
But wait, it gets worse. When in compile mode, some words are “immediate”. They get executed immediately, even in compile mode, because what they do is to compile stuff into the word list. Words like IF-ELSE-THEN and DO-LOOP are,I believe, examples of these.
I emphasize that all of the above is a flawed description. It is the description I can give, in a small number of words, based on my reading of Loeliger and reading the code of the Forth on py4fun, which is not a classic Forth implementation.
Let’s try to write a test that should run when we get this working.
def test_direct_execute(self):
f = Forth()
f.process('3 4')
assert f.stack.stack == [3, 4]
I chose a new name, process
, since I’ve already named something compile
. We will sort out names later.
We wish that this would work:
def process(self, word_string):
code = self.compile(word_string)
code.do(self)
It will not, because our compile
won’t accept anything that isn’t a definition beginning with ‘:’ and ending with ‘;’.
I am just fiddling here, and will surely roll back a few times as we work. What if we did this:
def process(self, word_string):
to_compile = ': XYZZY ' + word_string + ' ;'
code = self.compile(to_compile)
to_execute = self.find_word('XYZZY')
to_execute.do(self)
We enclose whatever we’re given in a colon definition defining XYZZY and then look up XYZZY and execute it. Our test passes.
That is interesting but it isn’t really what we want. What we really want is immediate execution. But this does offer some possibilities. Let’s try a harder test using this code:
def test_direct_harder(self):
f = Forth()
s = (': SUM + ;'
'3 4 SUM')
f.process(s)
assert f.stack.pop() == 7
This does not work, because our compiler doesn’t recognize colon and semicolon as indicating that it should compile, and anyway, we’re already compiling a definition and I’m not sure a definition inside a definition is even meaningful.
OK, nothing for it, we’re going to have to think harder. Or, possibly, start over, writing process
more or less from scratch, using parts of the existing Forth class as we go. Or, even worse, start a new class. Or still worse, start a new project. All of these are possible. But moving forward from here is my preferred approach, if it will work.
Counting the new process
function, the Forth class is just under 200 lines. That’s a lot for one class. A lot of that is just word definitions. Fifty lines of direct calls to append a definition to the lexicon, plus 34 lines of functions defining words that couldn’t be done in-line in lambdas.
Maybe we’d do well to refactor what we have. We could certainly have a class that defined the lexicon, especially if we made it a first-class object instead of the simple list that it is now. As for the words whose definitions are methods the 34 lines, they could be spun out also but all they really want to do is to talk to the stack and the word list that is being built up.
Moving the lexicon setup out would at least reduce clutter.
But there is also this clutter:
def compile(self, text):
new_text = re.sub(r'\(.*?\)', ' ', text)
words = new_text.split()
match words:
case ':', defining, *rest, ';':
word_list = self.compile_word_list(rest)
word = SecondaryWord(defining, word_list)
self.lexicon.append(word)
return word
case _:
raise SyntaxError(f'Syntax error: "{text}". Missing : or ;?')
def compile_word_list(self, rest):
word_list = []
for word in rest:
if word == 'IF':
self.compile_conditional('*IF', word_list)
elif word == 'THEN':
self.patch_the_skip(['*IF', '*ELSE'], -1, word_list)
elif word == 'ELSE':
self.patch_the_skip(['*IF'], 1, word_list)
self.compile_conditional('*ELSE', word_list)
elif word == 'BEGIN':
self.stack.push(('BEGIN', len(word_list)))
elif word == 'UNTIL':
key, jump_loc = self.stack.pop()
if key != 'BEGIN':
raise SyntaxError(f'UNTIL without BEGIN')
until = self.find_word('*UNTIL')
word_list.append(until)
word_list.append(jump_loc - len(word_list) - 1)
elif word == 'DO':
self.stack.push(('DO', len(word_list)))
word_list.append(self.find_word('*DO'))
elif word == 'LOOP':
key, jump_loc = self.stack.pop()
if key != 'DO':
raise SyntaxError(f'LOOP without DO')
loop = self.find_word('*LOOP')
word_list.append(loop)
word_list.append(jump_loc - len(word_list))
elif (definition := self.find_word(word)) is not None:
word_list.append(definition)
elif (num := self.compile_number(word)) is not None:
definition = self.find_word('*#')
word_list.append(definition)
word_list.append(num)
else:
raise SyntaxError(f'Syntax error: "{word}" unrecognized')
return word_list
That’s pretty nasty as well. Let’s belay the execution concern and refactor a bit. I think it will pay off with a bit of luck. It usually does.
Let’s replace that batch of if-elif
for the special words with a single check and a call to a method that deals with the dispatch.
def compile_word_list(self, rest):
word_list = []
for word in rest:
if word in ['IF', 'THEN', 'ELSE', 'BEGIN', 'UNTIL', 'DO', 'LOOP']:
self.compile_action_word(word, word_list)
elif (definition := self.find_word(word)) is not None:
word_list.append(definition)
elif (num := self.compile_number(word)) is not None:
definition = self.find_word('*#')
word_list.append(definition)
word_list.append(num)
else:
raise SyntaxError(f'Syntax error: "{word}" unrecognized')
return word_list
def compile_action_word(self, word, word_list):
if word == 'IF':
self.compile_conditional('*IF', word_list)
elif word == 'THEN':
self.patch_the_skip(['*IF', '*ELSE'], -1, word_list)
elif word == 'ELSE':
self.patch_the_skip(['*IF'], 1, word_list)
self.compile_conditional('*ELSE', word_list)
elif word == 'BEGIN':
self.stack.push(('BEGIN', len(word_list)))
elif word == 'UNTIL':
key, jump_loc = self.stack.pop()
if key != 'BEGIN':
raise SyntaxError(f'UNTIL without BEGIN')
until = self.find_word('*UNTIL')
word_list.append(until)
word_list.append(jump_loc - len(word_list) - 1)
elif word == 'DO':
self.stack.push(('DO', len(word_list)))
word_list.append(self.find_word('*DO'))
elif word == 'LOOP':
key, jump_loc = self.stack.pop()
if key != 'DO':
raise SyntaxError(f'LOOP without DO')
loop = self.find_word('*LOOP')
word_list.append(loop)
word_list.append(jump_loc - len(word_list))
I’ve marked that last test to skip and am green with that one ignored. We’ll do some improvement with the likelihood that it will make our current effort easier. Commit: refactor out action words.
I think we’ll change that action method to use match-case
as it may look a bit better.
def compile_action_word(self, word, word_list):
match word:
case 'IF':
self.compile_conditional('*IF', word_list)
case 'THEN':
self.patch_the_skip(['*IF', '*ELSE'], -1, word_list)
case 'ELSE':
self.patch_the_skip(['*IF'], 1, word_list)
self.compile_conditional('*ELSE', word_list)
case 'BEGIN':
self.stack.push(('BEGIN', len(word_list)))
case 'UNTIL':
key, jump_loc = self.stack.pop()
if key != 'BEGIN':
raise SyntaxError(f'UNTIL without BEGIN')
until = self.find_word('*UNTIL')
word_list.append(until)
word_list.append(jump_loc - len(word_list) - 1)
case 'DO':
self.stack.push(('DO', len(word_list)))
word_list.append(self.find_word('*DO'))
case 'LOOP':
key, jump_loc = self.stack.pop()
if key != 'DO':
raise SyntaxError(f'LOOP without DO')
loop = self.find_word('*LOOP')
word_list.append(loop)
word_list.append(jump_loc - len(word_list))
Not a fantastic improvement, I freely grant. But it’s a bit less cluttered.
Now for something speculative but with a reason. We know that we’re working toward our Forth having a compile mode and an execute mode. I’m going to steal an idea from the py4fun forth, and create a separate stack for pushing the words like ‘BEGIN’ and such. It will be called the compile_stack. Later, we’ll be able to know whether we’re compiling by checking to see if there is anything on that stack.
That’s the plan. I’m sure enough to be willing to make the change, since it will be simple and otherwise harmless.
It was quite simple:
class Forth:
def __init__(self):
self.stack = Stack()
self.return_stack = Stack()
self.compile_stack = Stack()
self.lexicon = []
self.define_primaries()
self.active_words = []
def compile_action_word(self, word, word_list):
match word:
case 'IF':
self.compile_conditional('*IF', word_list)
case 'THEN':
self.patch_the_skip(['*IF', '*ELSE'], -1, word_list)
case 'ELSE':
self.patch_the_skip(['*IF'], 1, word_list)
self.compile_conditional('*ELSE', word_list)
case 'BEGIN':
self.compile_stack.push(('BEGIN', len(word_list)))
case 'UNTIL':
key, jump_loc = self.compile_stack.pop()
if key != 'BEGIN':
raise SyntaxError(f'UNTIL without BEGIN')
until = self.find_word('*UNTIL')
word_list.append(until)
word_list.append(jump_loc - len(word_list) - 1)
case 'DO':
self.compile_stack.push(('DO', len(word_list)))
word_list.append(self.find_word('*DO'))
case 'LOOP':
key, jump_loc = self.compile_stack.pop()
if key != 'DO':
raise SyntaxError(f'LOOP without DO')
loop = self.find_word('*LOOP')
word_list.append(loop)
word_list.append(jump_loc - len(word_list))
In the above, BEGIN, UNTIL, DO, and LOOP use the new stack. In addition IF, THEN, and ELSE use it in the methods they call:
def patch_the_skip(self, expected, skip_adjustment, word_list):
key, patch_loc = self.compile_stack.pop()
if key not in expected:
raise SyntaxError(f'malformed IF-ELSE-THEN, found: "{key}"')
last_loc = len(word_list) + skip_adjustment
word_list[patch_loc] = last_loc - patch_loc
def compile_conditional(self, word_to_compile, word_list):
self.compile_stack.push((word_to_compile, len(word_list) + 1))
word_list.append(self.find_word(word_to_compile))
word_list.append(0)
We are greenish (that one test is skipped), so commit: special words now use compile_stack.
It seems to me that we’d like ‘:’ and ‘;’ to be in the special list. How does compile work now?
def compile(self, text):
new_text = re.sub(r'\(.*?\)', ' ', text)
words = new_text.split()
match words:
case ':', defining, *rest, ';':
word_list = self.compile_word_list(rest)
word = SecondaryWord(defining, word_list)
self.lexicon.append(word)
return word
case _:
raise SyntaxError(f'Syntax error: "{text}". Missing : or ;?')
It kind of cheats, pulling off the defining word and so on. And I think that in compile_word_list
, we just loop over the words, which won’t quite do.
def compile_word_list(self, rest):
word_list = []
for word in rest:
if word in ['IF', 'THEN', 'ELSE', 'BEGIN', 'UNTIL', 'DO', 'LOOP']:
self.compile_action_word(word, word_list)
elif (definition := self.find_word(word)) is not None:
word_list.append(definition)
elif (num := self.compile_number(word)) is not None:
definition = self.find_word('*#')
word_list.append(definition)
word_list.append(num)
else:
raise SyntaxError(f'Syntax error: "{word}" unrecognized')
return word_list
What are we up to here?
Good question, thanks. We’re looking to define the colon and semicolon as special words and move them into our match-case. For that to work, we’ll have to be able to look at the next word in the word list, since that is the name of the word we want to define.
Oh, and I think I almost see something we can do to get our immediate execution. Suppose that at the end of compile_word_list
, we checked to see if we are in compile mode. If we are not, we return the word list immediately. Otherwise we keep looping.
rest
is a list. We could make an iterator and then use next
to get the next word whenever we wish.
That seems like a decent next small step. Let’s try it. But what does next
do when the list is empty? We can provide a default, otherwise it would raise an exception.
def compile_word_list(self, rest):
word_list = []
rest_iter = iter(rest)
while word := next(rest_iter, None):
if word in [':', ';','IF', 'THEN', 'ELSE', 'BEGIN', 'UNTIL', 'DO', 'LOOP']:
self.compile_action_word(word, word_list)
elif (definition := self.find_word(word)) is not None:
word_list.append(definition)
elif (num := self.compile_number(word)) is not None:
definition = self.find_word('*#')
word_list.append(definition)
word_list.append(num)
else:
raise SyntaxError(f'Syntax error: "{word}" unrecognized')
return word_list
That works. I think we need the iterator to be a member variable, but we’ll hold off on that for now, until we really need it. Commit: use iterator to um iterate th words.
Now let’s see about defining how colon and semicolon should work inside compile_word_list
:
def compile_action_word(self, word, word_list):
match word:
case ':':
pass
case ';':
pass
case 'IF':
self.compile_conditional('*IF', word_list)
OK, clearly at least part of what colon wants to do is to read the next word and start compiling it. And it will put itself on the compile_stack
, so that semicolon can check it and finish up the job. And … do we need to call f.compile
recursively here?
I think we do. If we were doing real threaded code we might just pick up from the stacked words but I’m not sure about that.
OK, so colon will get the name of the word being defined, and maybe just compile a word list. Let’s write some code and see what we think.
def compile_action_word(self, word, word_list):
match word:
case ':':
new_word = next(self.rest_iter)
self.compile_stack.push((':', new_word))
It seems to me that we need to create a new word list at this point, into which to compile our new word. Maybe this will all sort out somehow but I don’t think a recursive call to either compile
or compile_action_word
will quite do the trick.
This code is never executed. I’m going to commit and leave it here as a reminder, and then reflect and likely stop. Commit: begin sketch of colon word.
Reflection and Stopping
Either my brain can’t get there from here, or my code can’t. I’m not sure, but I suspect both. We’ll stop here, and pick up next time, presumably tomorrow, when we’ll do a spike.
The spike will be a small version of a Forth system that has the colon and semicolon words as words, or at least cases in a match like the one we have here. and that does immediate execution in the style we want.
When we get far enough with that, we’ll have a better understanding of how to proceed. Options include
- Declare the spike to be a good start and continue;
- Learn from the spike and refactor the existing Forth accordingly;
- Learn from the spike and build a new Forth from scratch, reusing what we can;
- Give up and pretend this never happened.
We do have our little initial experiment, which was basically an immediate interpreter. It might be a better basis for what we want than what we have now. We’ll consider that possibility as well: it might be the spike I’m talking about here. I’ll list it below.
For now, we’ll take a break. See you next time!
stack = []
skipping = False
def clear():
global stack, skipping
stack = []
skipping = False
def plus():
# ( x1 x2 -- x1+x2 )
x2, x1 = stack.pop(), stack.pop()
stack.append(x1 + x2)
def number(n):
stack.append(n)
def interpret(s):
print(f"interpret {s}")
words = s.split()
for word in words:
interpret_word(word)
print(f'end interpret {s}')
def interpret_word(word):
global skipping
if word == 'THEN':
skipping = False
return
if skipping:
return
if is_number(word):
stack.append(int(word))
elif word == '+':
plus()
elif word == '-':
# ( x1 x2 == x1 - x2
x2, x1 = stack.pop(), stack.pop()
stack.append(x1 - x2)
elif word == '*':
x2, x1 = stack.pop(), stack.pop()
stack.append(x1 * x2)
elif word == 'SQRT':
stack.append(math.sqrt(stack.pop()))
elif word == 'SWAP':
# ( x1 x2 -- x2 x1 )
x2, x1 = stack.pop(), stack.pop()
stack.append(x2); stack.append(x1)
elif word == 'DUP':
x1 = stack.pop()
stack.append(x1); stack.append(x1)
elif word == '.':
print(stack.pop())
elif word == '>':
top, lower = stack.pop(), stack.pop()
stack.append(1 if top > lower else 0)
elif word == 'IF':
if stack.pop() == 0:
skipping = True
elif word == 'NEGATE':
s = '0 SWAP -'
interpret(s)
elif word == 'SQUARE':
interpret('DUP *')
elif word == 'HYPOTENUSE':
s = 'SQUARE SWAP SQUARE + SQRT'
interpret(s)
else:
raise ValueError(f'Invalid word {word}')
def is_number(s):
try:
int(s)
return True
except ValueError:
return False