Forth: Still IF

The Repo on GitHub

The main story is still IF. Ron has what may be a good idea. We’ll divert for literals. Lots of thinking written down here. A bit of code; a bit of success.

Last Episode

When we left you last time, we had a huge update hanging. Well, huge by our standards. We were in the middle of working on IF, and I got tired and was in danger of losing the thread entirely. We have a primitive *IF that accepts a parameter, a newly added feature. The *IF pops stack and if not true, tells the active word to skip forward the number of words in the *IF parameter. This works, if only we could get the right number into the parameter.

We added the “immediate” option to PrimaryWord instances. During compilation, if we parse an immediate word, instead of putting its address into the active word’s list, we execute the word immediately. This seems a bit exotic but it’s really just the same as having a case statement in the compiler for all the words we might find where we want to do something more complex than just look them up and add them to the list.

We have but one “immediate”, namely IF. It compiles a *IF into the current word. The intention is that soon, real soon, we’ll have the immediate IF save its location on a stack, and the THEN, which will also be immediate, will pop tat stack and pass the skip distance to the parameter in the *IF. In short, we’ll patch the code to know how far to skip.

All of this, and more, is hanging in a huge—well, larger than I’m comfortable with—update. This morning, we need to decide whether to roll back, which would probably be the wise thing to do, or to push forward until we work a state that both works and seems nearly on the right track.

TL;DR Begins Here: What follows is a long consideration of approaches to literals in the code, with digressions into thoughts about what is in the SecondaryWord, a list of lexicon indices, or functions, or what. You should feel free to skim, read, skip, or bail out. But after this musing is over, I roll all the past couple of days back and do literals in a rather nice fashion.

Furthermore … it would really aid my testing if the compiler could handle literals, at least integers. As things stand, we feed the stack manually, as with the stack.extend here:

    def test_over(self):
        f = Forth()
        f.stack.extend([4, 2])
        f.find_word('OVER').do(f)
        assert f.stack == [4, 2, 4]

It’s barely acceptable to do it that way when testing just one word. but now I want to test some IF words and I’d really like to have numbers. So I have sort of started with that.

What’s so hard about that, you ask? Well, bunkie, it goes like this:

When we encounter a number during compilation, such as in

: ADD3 3 + ;

The 3 needs to be pushed onto the stack, so that the + can add it to whatever is under it. We are compiling the new SecondaryWord ‘ADD3’, and, today, that object has just a list of the indices of the words in the lexicon that make up the word. So, as things stand, we would need a word in the lexicon that pushes 3 onto the stack.

Well, we could do that. But then we would add every constant in the program to the lexicon, which seems like a bit much. We could find them and reuse them, of course, but still, seems like not the thing to do.

I could be wrong about that. It could be the thing to do. But there is another possibility, and I was working on it alone. What if we could put the number right into the active word’s list, like this:

LITERAL 3 +

The word LITERAL would by definition pick up the next word, which would contain the actual 3, and push it. As things stand, the word would then tick its program counter index and try to execute 3, which would be bad, but if the LITERAL also gave the index a little nudge, the word would skip over the 3, carrying right on to the + as we intended.

So that’s one way. I think FORTH has a magic word with essentially that meaning, and it’s probably named “’” or something. Maybe “,”.

But I had this idea this morning, and while I was half asleep it seemed tasty enough that I got up and here we are.

Every object in Python is the same length. I think of it as being “one word” but in fact it’s probably eight or a dozen bytes, but the point is, an integer is the same size as, say, a function pointer.

So, the idea is, what if instead of a list of integers pointing to words in the lexicon, the SecondaryWord had a list of functions that it called? Mostly, those functions would be saying “do the lexicon entry number 17”, but they could say other things entirely.

And what if those functions all returned the number of cells to skip? The common one would return 1. Uncommon ones could return 2 or 5 or eleventy-seven. That would allow us to put any kind of useful item in the program, so long as it got skipped over at the right time.

I’m not sure that this is a good idea but it seems simple and yet quite flexible. I want to try it.

I’m getting really close to rolling the code back and beginning again, starting with literals, using this new idea. But not yet. I want to express a concern, for my own benefit if not yours.

I’ll do this with a test, so that we can see the issue:

    def test_function_list(self):
        stack = []
        def plus(word, location):
            stack.append(stack.pop() + stack.pop())
            return 1

        def lit(word, location):
            stack.append(word[location + 1])
            return 2

        my_word = [lit, 3, lit, 4, plus]
        index = 0
        while index < len(my_word):
            function = my_word[index]
            increment = function(my_word, index)
            index += increment
        assert stack[0] == 7

This test runs. The function lit looks in the location after its own, and pushes that value, returning 2 so that we skip that value.

So that’s OK, I think. What about this:

    def test_function_list_lambda(self):
        # this version assumes only functions in the list
        stack = []

        my_list = [lambda : stack.append(3),
                   lambda : stack.append(4),
                   lambda : stack.append(stack.pop() + stack.pop())]

        for w in my_list:
            w()
        assert stack[0] == 7

Wow, that is strikingly simple. Of course in our scheme we need to pass in the Forth instance. We currently pass the word being executed as well, because our lexicon includes words that can be executed directly in the lexicon code, and words that call methods on PrimaryWord, although the only ones that do that are our new IF and *IF experiments.

But this test shows the issue that concerns me:

    def test_function_list_lambda_temp(self):
        # this version assumes only functions in the list
        stack = []

        my_list = []
        word = 3
        my_list.append(lambda : stack.append(word))
        word = 4
        my_list.append(lambda : stack.append(word))
        my_list.append(lambda : stack.append(stack.pop() + stack.pop()))

        for w in my_list:
            w()
        assert stack[0] == 7

This fails! I predict that it will say 8, not 7.

tests/test_python.py:52 (TestPython.test_function_list_lambda_temp)
8 != 7

Expected :7
Actual   :8

Why? Because the lambda is bound to the temp word in this particular call to the function test_function_list_lambda_temp, and the final value of word is 4, so both of the first two lambdas push 4.

Sigh. This is intricate, isn’t it? But the upshot is that while functions in the SecondaryWord may be more useful than just indices, we won’t be able to just stuff arbitrary convenient lambda expressions in that refer back to values we’ve just read from the definition or the like. The binding of the lambda won’t let that work as we might think.

The first scheme, though, where the functions in the word return the number of cells by which to increment the index … that does work, and I think it will work OK. But let’s try to break that test as we did with the other.

    def test_function_list(self):
        stack = []
        def plus(word, location):
            stack.append(stack.pop() + stack.pop())
            return 1

        def lit(word, location):
            stack.append(word[location + 1])
            return 2

        my_word = []
        word = 3
        my_word.append(lit)
        my_word.append(word)
        word = 4
        my_word.append(lit)
        my_word.append(word)
        my_word.append(plus)
        index = 0
        while index < len(my_word):
            function = my_word[index]
            increment = function(my_word, index)
            index += increment
        assert stack[0] == 7

This one does work. The reason is that when we say word = 3 and then append it, we append the object 3, not a pointer to the word variable. In the lambda the code is bound to the variable not to its value.

TL;DR Ends: We are just about to get down to work now. Not that thinking isn’t work. Sometimes thinking is what we need.

Decision Time

Are we going to press on, or roll back?

Press on
Roll back

What should be in the SecondaryWord’s list?

lexicon indices
Function pointers
?

How shall we do literals?

Store the value in the word, preceded by an operator that fetches it and pushes it to the list, skipping the value somehow
Store the value in the lexicon, perhaps as a call to a PrimaryWord with its parameter equal to the value, which would allow a constant skip of one.
?

I’m going to roll back. I’ll roll back judiciously, retaining tests but not the code. Then I’ll see which tests to retain, which to remove, which to rewrite.

ROLLBACK!

Bam! Rolled back forth.py and word.py.

Nine tests fail.

Many of them are failing because they are passing the Forth instance to the words and they no longer expect that. Some of them are probably asking for PrimaryWords no longer present. First the parameter.

    def define_primaries(self):
        lex = self.lexicon
        lex.append(PrimaryWord('DROP', lambda f: f.stack.pop()))
        lex.append(PrimaryWord('DUP', lambda f: f.stack.dup()))
        lex.append(PrimaryWord('OVER', lambda f: f.stack.over()))
        lex.append(PrimaryWord('ROT', lambda f: f.stack.rot()))
        lex.append(PrimaryWord('SWAP', lambda f: f.stack.swap()))
        lex.append(PrimaryWord('+', lambda f: f.stack.push(f.stack.pop() + f.stack.pop())))
        lex.append(PrimaryWord('-', lambda f: f.stack.push(f.stack.swap_pop() - f.stack.pop())))
        lex.append(PrimaryWord('*', lambda f: f.stack.push(f.stack.pop() * f.stack.pop())))
        lex.append(PrimaryWord('/', lambda f: f.stack.push(f.stack.swap_pop() / f.stack.pop())))
        lex.append(PrimaryWord('SQRT', lambda f: f.stack.push(math.sqrt(f.stack.pop()))))

Six now run, three still fail. Two are about IF. I’ll just … well, skip them, they might be useful later.

The remaining test is this:

    def test_compiler_hyp(self):
        f = Forth()
        f.compile(': SQUARE DUP * ;')
        f.compile(': HYPSQ SQUARE SWAP SQUARE + ;')
        f.compile(': HYP HYPSQ SQRT ;')
        f.stack.extend([3, 4])
        f.find_word('HYP').do(f)
        assert f.stack.pop() == 5
        assert f.active_words == []

This test relies on Forth understanding begin and end, which got lost in the rollback because I failed to commit it a couple of days ago.

class SecondaryWord:
    def do(self, forth):
        forth.begin(self)
        lexicon = forth.lexicon
        self.ix = 0
        while self.ix < len(self.word_indices):
            word_index = self.word_indices[self.ix]
            lexicon[word_index].do(forth)
            self.ix += 1
        forth.end()

We implement those:

class Forth:
    def __init__(self):
        self.stack = Stack()
        self.lexicon = []
        self.define_primaries()
        self.active_words = []

    def begin(self, word):
        self.active_words.append(word)

    def end(self):
        self.active_words.pop()

And we are green. Commit: save point. begin and end provided. Two tests skipped until possibly used later.

Some Forward Progress

OK, let’s do a literal first. But how? Let’s try one way and if that doesn’t work, give up entirely. No, wait, think of and try another way. Yeah, that’s it.

Let’s assume that there will be a new Primary, “LIT”. Here’s a test:

    def test_lit_hand_compiled(self):
        f = Forth()
        # s = ': 3 DUP +'
        lit = f.find_word_index('LIT')
        dup = f.find_word_index('DUP')
        plus = f.find_word_index('+')
        indices = [lit, 3, dup, plus]
        sw = SecondaryWord('TEST', indices)
        sw.do(f)
        assert f.stack.pop() == 6

This fails for want of lit, which is not yet defined.

[NSFW COMMMENTARY REDACTED!!!!]

My feet won’t reach the ground from here. I don’t see how to do this. Maybe defer it to Forth, that’s all I have access to.

    lex.append(PrimaryWord('LIT', lambda f: f.literal()))

OK maybe this:

class Forth:
    def literal(self):
        active_word = self.active_words[-1]
        pc = active_word.pc
        value = active_word[pc+1]
        self.stack.push(value)
        active_word.pc += 1

I am positing a value on the active word, pc, for program counter. It is not present but if it were, it would be pointing into the word right at the LIT operator so adding 1 to it would point at the literal value and if we were to increment it, it would cause the active word to skip over it.

Now in SecondaryWord we have this:

    def do(self, forth):
        forth.begin(self)
        lexicon = forth.lexicon
        self.ix = 0
        while self.ix < len(self.word_indices):
            word_index = self.word_indices[self.ix]
            lexicon[word_index].do(forth)
            self.ix += 1
        forth.end()

Ha! ix is pc. Let’s rename it here.

    def do(self, forth):
        forth.begin(self)
        lexicon = forth.lexicon
        self.pc = 0
        while self.pc < len(self.word_indices):
            word_index = self.word_indices[self.pc]
            lexicon[word_index].do(forth)
            self.pc += 1
        forth.end()

However, despite my fond hopes, the test still fails. Why?

    def literal(self):
        active_word = self.active_words[-1]
        pc = active_word.pc
>       value = active_word[pc+1]
E       TypeError: 'SecondaryWord' object is not subscriptable

Well, that’s true enough. For now, we’ll cheat.

class Forth:
    def literal(self):
        active_word = self.active_words[-1]
        pc = active_word.pc
        value = active_word.word_indices[pc+1]
        self.stack.push(value)
        active_word.pc += 1

The test passes! I withdraw my former cursing: this isn’t even half bad, really.

Let’s see if we can be a bit less invasive here. SecondaryWord has a leftover skip method:

    def skip(self, n):
        self.pc += n

Our ignored tests are using it but no one else is. Why not have SecondaryWord help us out though? Our literal method has serious feature envy going.

Why not this:

class Forth:
    @property
    def active_word(self):
        return self.active_words[-1]

    def literal(self):
        self.stack.push(self.active_word.next_word())

This fails, of course, because next_word is Wishful Thinking but sometimes wishes come true:

    def next_word(self):
        self.pc += 1
        return self.word_indices[self.pc]

Green. I am liking this. Let’s review do:

    def do(self, forth):
        forth.begin(self)
        lexicon = forth.lexicon
        self.pc = 0
        while self.pc < len(self.word_indices):
            word_index = self.word_indices[self.pc]
            lexicon[word_index].do(forth)
            self.pc += 1
        forth.end()

    def next_word(self):
        self.pc += 1
        return self.word_indices[self.pc]

Wouldn’t it be nice if we could use next_word in do? But it increments at the wrong time.

What if first, we change the do to increment before calling the word. Then pc will be pointing at the next word already.

    def do(self, forth):
        forth.begin(self)
        lexicon = forth.lexicon
        self.pc = 0
        while self.pc < len(self.word_indices):
            word_index = self.word_indices[self.pc]
            self.pc += 1
            lexicon[word_index].do(forth)
        forth.end()

    def next_word(self):
        word =  self.word_indices[self.pc]
        self.pc += 1
        return word

Green. So therefore:

    def do(self, forth):
        forth.begin(self)
        lexicon = forth.lexicon
        self.pc = 0
        while self.pc < len(self.word_indices):
            word_index = self.next_word()
            lexicon[word_index].do(forth)
        forth.end()

    def next_word(self):
        word =  self.word_indices[self.pc]
        self.pc += 1
        return word

Or, just because we can:

    def do(self, forth):
        forth.begin(self)
        lexicon = forth.lexicon
        self.pc = 0
        while self.pc < len(self.word_indices):
            lexicon[self.next_word()].do(forth)
        forth.end()

I wish I could avoid that temp in next_word but I am so used to post-indexing that I really wouldn’t want to try pre-indexing here.

Anyway this is fine. We have a new PrimaryWord, “LIT” that precedes a literal. Now excuse me while I find out what the right word name is for that. It is “*#”, per Loeliger. We’ll try that for a while and see if we hate it or love it.

    def test_lit_hand_compiled(self):
        f = Forth()
        # s = ': 3 DUP +'
        lit = f.find_word_index('*#')
        dup = f.find_word_index('DUP')
        plus = f.find_word_index('+')
        indices = [lit, 3, dup, plus]
        sw = SecondaryWord('TEST', indices)
        sw.do(f)
        assert f.stack.pop() == 6

class Forth:        
    def define_primaries(self):
        lex = self.lexicon
        lex.append(PrimaryWord('DROP', lambda f: f.stack.pop()))
        lex.append(PrimaryWord('DUP', lambda f: f.stack.dup()))
        lex.append(PrimaryWord('*#', lambda f: f.literal()))
        lex.append(PrimaryWord('OVER', lambda f: f.stack.over()))
        lex.append(PrimaryWord('ROT', lambda f: f.stack.rot()))
        lex.append(PrimaryWord('SWAP', lambda f: f.stack.swap()))
        lex.append(PrimaryWord('+', lambda f: f.stack.push(f.stack.pop() + f.stack.pop())))
        lex.append(PrimaryWord('-', lambda f: f.stack.push(f.stack.swap_pop() - f.stack.pop())))
        lex.append(PrimaryWord('*', lambda f: f.stack.push(f.stack.pop() * f.stack.pop())))
        lex.append(PrimaryWord('/', lambda f: f.stack.push(f.stack.swap_pop() / f.stack.pop())))
        lex.append(PrimaryWord('SQRT', lambda f: f.stack.push(math.sqrt(f.stack.pop()))))

Wow, another long article. And it is now three hours in. Lots of thinking, just a bit of doing.

But I think that I like what we have. We can now store a value in the SecondaryWord, so long as it is prefixed by a word that will consume it. I think we can make our IF and *IF work with that.

Let’s sum up.

Summary

I found the thinking and listing of options useful. One reason why is that it built up a greater understanding in me that rolling back was desirable. There were too many options for what to do next, and the code wasn’t quite ready for any of them. So without my really recognizing the moment, I was finally ready to ditch all of yesterday and the afternoon before and start there.

Starting with the ‘*#’ is going to pay off, I believe. We have arranged things so that the SecondaryWord can have any object in its list, so long as a word ahead of it consumes it before we try to execute it. I believe this gives us two viable options for the IF-THEN: we can just save the skip count in the cell after the *IF and patch it there, or, we can find the *IF itself (at pc-1 mind you) and set its parameter. Since the literal scheme exists and works, we may decide to do away with the parameter. although it’s not inconceivable that we’ll have use for it.

Aside: I have to wonder whether we could have compiled a different LIT with the value in that word parameter. I don’t think we could define it in a lambda, however, as we do all our other words. It might be worth experimenting with that in a test. You could make a case that storing random stuff in the SecondaryWord’s list is a hack. It’s a Forth-like hack, but a hack nonetheless.

But the code is actually quite nice:

class Forth:
    def literal(self):
        self.stack.push(self.active_word.next_word())

class SecondaryWord:
    def do(self, forth):
        forth.begin(self)
        lexicon = forth.lexicon
        self.pc = 0
        while self.pc < len(self.word_indices):
            lexicon[self.next_word()].do(forth)
        forth.end()

    def next_word(self):
        word =  self.word_indices[self.pc]
        self.pc += 1
        return word

So a case can be made that literal is a one-liner. Nice, however you count it. The change to SecondaryWord makes sense, and it provides the ability to embed literals and other valued items right in the word definition.

I feel good about this. Took a long time to get here, but I think we’re in a good place. We might want to move back to passing the word parameter to the lambda but at this point I don’t see the value: we can get at everything we need via forth. And, Python’s limited lambda capability is probably protecting me from myself, since I can’t really do anything too fancy.

Like this, for example:

class Forth:
    lex.append(PrimaryWord('*#', lambda f: f.stack.push(f.active_word.next_word())))

That actually works. So does this:

class Forth:
    lex.append(PrimaryWord('*#', lambda f: f.stack.push(f.next_word())))

    def next_word(self):
        return self.active_word.next_word()

That’s actually rather good. Commit: implement, then simplify *#, add next_word. I’m glad we had this little chat.

Today I added some test code and about four lines of production code. But a very powerful four lines they were.

See you next time!