Location Patching

The Robot World Repo on GitHub
The Forth Repo on GitHub

If I recall correctly, it’s time to start patching branch locations in CASE. Let’s see how we might do that. We get a really nice start, despite nearly a week of forgetting everything we knew.

Lookin inta histry back

It has been a long time, almost a week, since I worked on Forth. I will need retraining. My recollection is that we were working on the CASE words, and I think we were just about ready to plug in the branch passes. Like any one-pass assembler or compiler, Forth keeps track of forward references and fills them in after the reference has been defined.

We use a “compile stack” for that purpose. I think that currently, when building a word-list, words like IF will push the index of the word that needs to be patched on to the compile stack, and THEN will pop the index off the stack and patch the location with the current index in the list. At one stage in the development, we pushed a structured object that included info about what word had pushed the info onto that stack, so that IF would push an if-item and THEN would check to be sure that what was on the stack was an if-item (or an else-item, which is also legitimate).

For reasons that escape me, I removed that structure. I think now we’re just pushing a location. That’s fine if the input program is well formed, but will lead to undiagnosed chaos if it isn’t.

So part of the plan is to (re)create a little structured object to go on the compile stack, and to do some checking to be sure that we have the right one. The Forth standard refers to the structure on the compile stack as “sys”, as in “comp-sys”, “if-sys” and such. I propose to name our class that serves that purpose “Sys”.

The current code for handling the patching is a bit ad hoc, by which I mean “messy”, and we’ll try to get things better organized as part of this effort.

Finally, I think we had a new style of test that actually checked the compiled code to see if it is as intended. We’ll start our code review there.

Current Tests and Code

The first thing I do is run the tests. The second is check to see if there are any changes in the code that are not yet committed. I find that there are. This is always irritating, although, since the tests are green, I could just commit and move on. I am not that trusting, so I’ll check the diff. I see a nice new test:

    def test_br_target(self):
        f = Forth()
        test = ': TEST BR_TARGET ;'
        result = f.compile(test)
        result = f.compile('TEST')
        assert (result ==
                'branch not patched in '
                ': TEST BR_TARGET ; ?')

And in Lexicon, corresponding to this:

    f.word_list.append(f.find_word('BR_TARGET'))

    def _br_target(f):
        msg = f'branch not patched in {f.active_word}'
        raise NotImplementedError(msg)

And when we compile OF, one of the CASE words:

    def _of(f):
        f.word_list.append(f.find_word('OVER'))
        f.word_list.append(f.find_word('='))
        f.word_list.append(f.find_word('0BR'))
        f.word_list.append(f.find_word('BR_TARGET'))
        f.word_list.append(f.find_word('DROP'))

We used to compile a zero after a branch instruction like 0BR, and patched the zero. Now we put the BR_TARGET word there instead, which will give us a diagnostic if we ever try to execute one, but, I think more to the point, will let our patching code be sure that it’s pointing to a word slot that actually needs patching. This will help us avoid the inevitable off-by-one (or more) errors as we code up the patching.

I think we can commit this happily: added BR_TARGET handling in prep for branch patching.

Finally, there is this little test fragment:

    def test_for_discussion(self):
        f = Forth()
        test = (': TEST'
                '  2 CASE'
                '    1 OF 111 ENDOF'
                '    2 OF 222 ENDOF'
                '  ENDCASE '
                ';')
        expected_stack = [222]
        f.process_line(test)
        w = f.find_word('TEST')
        # assert str(w) == ''

I wonder why I called it “for discussion”. Anyway let’s uncomment the assertion and let it fail to see what it produces.

The string that isn’t empty is this:

(': TEST *# 2 *# 1 OVER = 0BR BR_TARGET DROP *# 111 BR BR_TARGET *# 2 OVER = '
 '0BR BR_TARGET DROP *# 222 BR BR_TARGET DROP ;')

Let me format that a bit more like a forth word:

: TEST 
    *# 2 
        *# 1 OVER = 0BR BR_TARGET 
            DROP *# 111 
            BR BR_TARGET 
        *# 2 OVER = 0BR BR_TARGET 
            DROP *# 222 
            BR BR_TARGET 
        DROP 
;

I am pretty sure, p about 0.87, that that’s the actual code we want. A review of our most recent Forth article Small Case work confirms that way back then I thought that we “just” need to hook up the targets and we’ll be good.

Here’s a table from two Forth articles back:

label	stack	op
	2 1	OVER
	2 1 2	= (false)
	2 0	0BR a (done)
		DROP (skipped)
		*# 111 (skipped)
		BR x (skipped)
a	2 2	OVER
	2 2 2	= (true)
	2 -1	0BR b (skipped)
	2	DROP
	_	*# 222
	222	br x
b	222	DROP (skipped)
x	222	(result)

That’s showing us where we want the patched addresses to be. We see that in the case of the OF words, each occurrence of OF wants to branch to the end, after the final DROP. So we know that our “Sys” for CASE will need a list of locations needing patching. With that in mind, let’s write a few tests and drive out Sys.

I write a pretty huge test, certainly not minimal:

    def test_new_ sys(self):
        sys = Sys('CASE')
        assert sys.name == 'CASE'
        assert sys.locations == []

I’ll write the test in the test file and then immediately move it out.

class Sys:
    def __init__(self, name):
        self.name = name
        self.locations = []

It occurs to me that sys is the name of a key Python import, and I probably shouldn’t really have a file named sys. Let’s rename to something else. Anything will do, really, though I’d like it to be short (and sweet). Grr, make something up, Ron. CompileInfo.

Let’s wait for further testing on CompileInfo until we put it into use, since we want the class’s features to be what the users want, not what we might just invent. I expect it’ll just be adding and getting the list but we’ll see.

I think we’ll try doing the OF first. Here is its definition now:

    def _of(f):
        f.word_list.append(f.find_word('OVER'))
        f.word_list.append(f.find_word('='))
        f.word_list.append(f.find_word('0BR'))
        f.word_list.append(f.find_word('BR_TARGET'))
        f.word_list.append(f.find_word('DROP'))

We want to put an OF CompileInfo on the stack, pointing to the BR_TARGET, that is, containing the index of the BR_TARGET in the word list. That index will be the length of the word list prior to the append of the target, so how about this:

    def _of(f):
        f.word_list.append(f.find_word('OVER'))
        f.word_list.append(f.find_word('='))
        f.word_list.append(f.find_word('0BR'))
        info = CompileInfo('OF')
        info.add_target(len(f.word_list))
        f.word_list.append(f.find_word('BR_TARGET'))
        f.word_list.append(f.find_word('DROP'))

OK, add that method, because tests of OF are failing.

    def add_target(self, location: int):
        self.locations.append(location)

Green. We could commit but I don’t think this is actually working. I wonder what GeePaw Hill would do. Ah, he would have a test that still fails. I guess I could write one. I choose to proceed differently, un-stacking this info. We have this for ENDOF:

    def _endof(f):
        f.word_list.append(f.find_word('BR'))
        f.word_list.append(f.find_word('BR_TARGET'))

The unconditional branch there is the one that branches to the end of the CASE. So we want the 0BR to branch to the location after the ENDOF, like this:

Oh. I forgot to push the compile info onto the stack. I’ll do that now as well.

    def _of(f: Forth):
        f.word_list.append(f.find_word('OVER'))
        f.word_list.append(f.find_word('='))
        f.word_list.append(f.find_word('0BR'))
        info = CompileInfo('OF')
        info.add_target(len(f.word_list))
        f.compile_stack.push(info)
        f.word_list.append(f.find_word('BR_TARGET'))
        f.word_list.append(f.find_word('DROP'))

    def _endof(f: Forth):
        f.word_list.append(f.find_word('BR'))
        f.word_list.append(f.find_word('BR_TARGET'))
        info = f.compile_stack.pop()
        location = info.locations[0]
        f.word_list[location] = len(f.word_list)

This code is far too messy but first make it work, then make it right. Let’s check that test that shows the compiled string:

(': TEST *# 2 *# 1 OVER = 0BR 10 DROP *# 111 BR BR_TARGET *# 2 OVER = 0BR 19 '
 'DROP *# 222 BR BR_TARGET DROP ;') != ''

We have plugged in values of 10 and 19, and they are in the right place, right after the 0BR words. Are they pointing to the right place?

*# 2 *# 1 OVER = 0BR 10 DROP *# 111 BR BR_TARGET *# 2 OVER = 0BR 19
0  1 2  3  4   5  6   7  8   9  10  11  12       13     15    17

It appears that they are not. The 10 at location 7 should be pointing to location 13, the beginning of the second OF.

I’m actually rather surprised, and I don’t know what happened. The only thing I can imagine is that the word list doesn’t contain what I think it does, that it is somehow short of some items. Let me just dump the word list. I think I can make a fake word and use the word’s print. I “enhance” the print to show the address and get this:

: testing
*# 2
*# 1
OVER
=
0BR
10
DROP
*# 111
BR
BR_TARGET
 ;

Ah. I get it. I think we’ve talked about this before, and I’m sure I’ve thought about it. We presently compile a literal, such as 1, into *# 1. But we do not append those two items into the current word. Instead we compile a NEW word that contains those two. So my correct count on the print should be this:

 : TEST *# 2 *# 1 OVER = 0BR 10 DROP *# 111 BR BR_TARGET *# 2 OVER = 0BR 19
        0    1    2    3 4   5   6    7      8   9        10!

In short, our location patching is correct. Let’s see if we can write a test for that that’s somehow better than looking at the listing. But we can make that work at least as an approval test:

    def test_for_discussion(self):
        f = Forth()
        test = (': TEST'
                '  2 CASE'
                '    1 OF 111 ENDOF'
                '    2 OF 222 ENDOF'
                '  ENDCASE '
                ';')
        expected_stack = [222]
        f.process_line(test)
        w = f.find_word('TEST')
        expected = (
            ': TEST *# 2 *# 1 OVER = 0BR 10 '
            'DROP *# 111 BR BR_TARGET *# 2 OVER = 0BR 19 '
            'DROP *# 222 BR BR_TARGET DROP ;')
        assert str(w) == expected

That’s certainly better than nothing. And what we can do for the next phase is make the prediction of the value that should be plugged into the remaining `BR_TARGET’ occurrences.

Ah. I just rubber-ducked myself into an idea for a test. We’ll look for patterns in the word list and see if they are right. Let’s work up a test based on what we have now. Maybe we could say something like this:

    def test_pattern(self):
        f = Forth()
        test = (': TEST'
                '  2 CASE'
                '    1 OF 111 ENDOF'
                '    2 OF 222 ENDOF'
                '  ENDCASE '
                ';')
        expected_stack = [222]
        f.process_line(test)
        w = f.find_word('TEST')
        branch_location = w.index('0BR') + 1
        target_location = w.index('0BR', branch_location) - 3
        assert w.words[branch_location] == target_location

This assumes some kind of index method on Word that finds a word with the name indicated, starting at the optional location provided. Let’s see if I can get those indices manually to show that the idea might work.

    def test_pattern(self):
        f = Forth()
        test = (': TEST'
                '  2 CASE'
                '    1 OF 111 ENDOF'
                '    2 OF 222 ENDOF'
                '  ENDCASE '
                ';')
        expected_stack = [222]
        f.process_line(test)
        w = f.find_word('TEST')
        # branch_location = w.index('0BR') + 1
        branch_location = 4 + 1
        # target_location = w.index('0BR', branch_location) - 3
        target_location = 13 - 3
        assert w.words[branch_location] == target_location

4 is the location of the first 0BR and 13 the second. So let’s try to build index in Word.

Reverse the comments out of the test and put the index calls back so we have a fail to turn green.

class Word:
    def index(self, word_name, start=0):
        for i, word in enumerate(self.words[start:]):
            if type(word) == Word and word.name == word_name or word == word_name:
                return i + start
        return None

We slice off the start, search for a match (and we can deal with a non-word if we encounter one, no test for that, sorry) and return the index (adjusted for the slice). Our test passes:

    def test_pattern(self):
        f = Forth()
        test = (': TEST'
                '  2 CASE'
                '    1 OF 111 ENDOF'
                '    2 OF 222 ENDOF'
                '  ENDCASE '
                ';')
        expected_stack = [222]
        f.process_line(test)
        w = f.find_word('TEST')
        branch_location = w.index('0BR') + 1
        # branch_location = 4 + 1
        target_location = w.index('0BR', branch_location) - 3
        # target_location = 13 - 3
        assert w.words[branch_location] == target_location

We’ll surely want to enhance the pattern idea before we’re done but, for today, I think we’re done. We’re green, and I think this is going to work just fine. Let’s commit: initial implementation of CompileInfo and use in OF.

I’m tired and my eyes are as dry as something really dry. Let’s sum up.

Summary

We punched out a simple CompileInfo object and used it in OF to patch up the branch locations that OF uses to skip out if the OF does not apply. We have two tests, one that uses an approved string dump of the statement compiled, and one that looks for key elements in the word list and checks them for correctness. The latter test would continue to work even if we changed the contents of the OF or the details of its compilation.

The location patching code works but is not “right”. It just pulls out a value from the CompileInfo’s list and jams it into the word list. We’ll want to find a smoother way to do that, but it’s not all that bad as it stands, just a bit too free with its fingers in other people’s instance variables.

Surely the pattern-finding code, index will want to be enhanced as well. We’ll let that be driven by need.

And, finally, neither 0BR nor BR are doing anything yet, so they’ll have to be coded to fetch the patched value and do the necessary branching. We have code in IF that does that. We’ll steal its idea … and then quite likely use our new 0BR to implement IF. We’ll see, that’s just my guess, which is probably good but is not guaranteed.

This is good progress, and I feel confident in committing it. We’ll pick up where we left off next time. I hope you’ll join me then!