Code Review

The Repo on GitHub

Now that CREATE-DOES> works, and I am reasonably confident that I have it right, let’s review the code and check out the concerns left open last time. A perfect morning. (Added: P.S.

I have two notes on one small sticky note on my keyboard tray. One says “[?]” and the other says “DOES> semantics?”.

The former refers to my uncertainty about whether this code produces a new list on every call:

    def _create(forth):
        name = forth.next_token()
        lit = forth.find_word('*#')
        word = SecondaryWord(name, [lit, len(forth.heap)])
        forth.lexicon.append(word)

We’ll test that.

    def test_new_list_every_time(self):
        def make_list():
            return [1, 2]

        list_1 = make_list()
        list_1.append(3)
        list_2 = make_list()
        assert list_2 == [1, 2]
        assert list_1 is not list_2

Was it odd to write a test for that? If we’d been sitting together, you might have assured me that, no, we always get a new list from a literal except in the case of a default argument. And I might have believed you, and you might have been right.

Or I could have looked on the Internet or read a Python book to find out. Instead, I wrote actual code that really finds out. And now, because the default idea came up, I’m going to write yet another test.

    def test_default_list_is_not_new(self):
        def put_3_in_list(x=[]):
            x.append(3)
            return x

        list_1 = put_3_in_list()
        assert list_1 == [3]
        list_2 = put_3_in_list()
        assert list_2 == [3, 3]
        list_2.append(5)
        list_3 = put_3_in_list()
        assert list_3 == [3, 3, 5, 3]

For me, that test, plus the first one, really brings it home to me what is going on. I am more likely to remember it, I think, than if we had just discussed it or I had read it. And, even if I forget, I can always find this test or write it again.

For me, it’s best to resolve questions about what the language does by writing explicit tests.

As for the second question, it referred to the fact that the Forth Standard says that DOES> replaces the semantics of the most recent word, and the uses I’ve seen of it seem to be extending the semantics. That is certainly what our current implementation does, and if it didn’t, the CONSTANT definition would not work.

I have not completed my research on that question and will look into it further.

To the Code Cave, Robin!

I think it is past time to create a little object to embody the heap. We have code all around that is looking at it:

class Lexicon:
    def _at(forth):
        index = forth.stack.pop()
        forth.stack.push(forth.heap[index])

    def _put(forth):
        index = forth.stack.pop()
        value = forth.stack.pop()
        forth.heap[index] = value

    def _allot(forth):
        forth.heap.extend([0]*forth.stack.pop())

    def _comma(forth):
        forth.heap.append(forth.stack.pop())

    def _variable(forth):
        name = forth.next_token()
        value = len(forth.heap)
        literal = forth.find_word('*#')
        word = SecondaryWord(name, [literal, value])
        forth.lexicon.append(word)

    def _create(forth):
        name = forth.next_token()
        lit = forth.find_word('*#')
        word = SecondaryWord(name, [lit, len(forth.heap)])
        forth.lexicon.append(word)

That should be enough references to convince us that there’s an object trying to be born here, though I do think we’ll replace our primary word for VARIABLE with a secondary when we can.

Now we could just use our existing Stack class as a heap. But why not have one that does just what we want, no more and no less?

We could TDD a new Heap class but instead let’s assume that we have one and change all these existing references to say what we’d like to say. They’ll break, and they will drive out the new Heap class for us.

I think that what I’ll do is create an empty Heap class, use it, and then our tests will discover all the uses above.

class Heap:
    def __init__(self):
        pass

class Forth:
class Forth:
    def __init__(self):
        self.active_words = []
        self.compile_stack = Stack()
        self.heap = Heap()
        ...

Note: What follows is a very nice sequence of choosing a broken test, looking at the line that breaks, changing it to refer to the Heap in terms we like, and then sometimes adding a new method to Heap. I recommend reading at skimming speed, or skipping down to the Summary. Maybe read a few and then skip. Or read from the end to the beginning: I’m not your boss. (But I don’t think reading from the end to the beginning is going to work out well for you.)

13 tests break. Now we can just follow our nose to the various uses of the heap that need improvement.

Here’s one:

    def test_rudimentary_heap(self):
        f = Forth()
        f.compile('9 ALLOT')
        f.compile('666 4 !')
        assert f.heap[4] == 666
        f.compile('4 @')
        assert f.stack.pop() == 666

It’s failing here:

    def _allot(forth):
>       forth.heap.extend([0]*forth.stack.pop())
E       AttributeError: 'Heap' object has no attribute 'extend'

Clearly we should leave it up to the heap how to allot things, so we change that to:

    def _allot(forth):
        forth.heap.allot(forth.stack.pop())

And let’s do a little work in Heap:

class Heap:
    def __init__(self):
        self._heap = []

    def allot(self, n):
        self._heap.extend([0] * n)

Still 13 fails, but now:

    def _put(forth):
        index = forth.stack.pop()
        value = forth.stack.pop()
>       forth.heap[index] = value
E       TypeError: 'Heap' object does not support item assignment

Seems to me that we should have the Heap accept the two popped values, so:

    def _put(forth):
        index = forth.stack.pop()
        value = forth.stack.pop()
        forth.heap.put(index, value)

We could inline those, but we’re here to pay attention to Heap.

class Heap:
    def put(self, index, value):
        self._heap[index] = value

Only 12 failing, woot!

The next failure is in a test:

>       assert f.heap[4] == 666
E       TypeError: 'Heap' object is not subscriptable

We have @ in Forth and _at as the primitive. Let’s require our tests to use an ‘at’ method on the heap. We want to keep it narrow.

class Heap:
    def at(self, index):
        return self._heap[index]

Use that in the test. Now the error is:

    def _at(forth):
        index = forth.stack.pop()
>       forth.stack.push(forth.heap[index])
E       TypeError: 'Heap' object is not subscriptable

Right now it’s telling us to fix the _at function.

    def _at(forth):
        index = forth.stack.pop()
        forth.stack.push(forth.heap.at(index))

Only 10 failing now, we’re nearly done.

    def _variable(forth):
        name = forth.next_token()
>       value = len(forth.heap)
E       TypeError: object of type 'Heap' has no len()

Ah this is interesting. What we want here is the next address in the heap. Let’s ask for that.

    def _variable(forth):
        name = forth.next_token()
        address = forth.heap.next_available()
        literal = forth.find_word('*#')
        word = SecondaryWord(name, [literal, address])
        forth.lexicon.append(word)

class Heap:
    def next_available(self):
        return len(self._heap)

Only 8 failing. This is fun, has a kind of rhythm to it.

    def test_comma(self):
        f = Forth()
>       assert len(f.heap) == 0
E       TypeError: object of type 'Heap' has no len()

Test needs revision to refer to next_avaiable, and with that we get this:

    def _comma(forth):
>       forth.heap.append(forth.stack.pop())
E       AttributeError: 'Heap' object has no attribute 'append'

I think we’ll give our heap the method we want, namely comma:

    def _comma(forth):
        forth.heap.comma(forth.stack.pop())

class Heap:
    def comma(self, value):
        self.allot(1)
        self._heap[-1] = value

Still 8 failing, though. I am a bit surprised. Oh, it’s just an indexed reference in the test that needed to be changed:

    def test_comma(self):
        f = Forth()
        assert f.heap.next_available() == 0
        f.compile('33 ,')
        assert f.heap.next_available() == 1
        assert f.heap.at(0) == 33

Down to seven now.

    def _create(forth):
        name = forth.next_token()
        lit = forth.find_word('*#')
>       word = SecondaryWord(name, [lit, len(forth.heap)])
E       TypeError: object of type 'Heap' has no len()

    def _create(forth):
        name = forth.next_token()
        lit = forth.find_word('*#')
        word = SecondaryWord(name, [lit, forth.heap.next_available()])
        forth.lexicon.append(word)

Down to 3. Tests needed revision. Here’s an example:

    def test_compile_create_does(self):
        f = Forth()
        f.compile('1 , 2 , 3 ,')
        s = ': CONSTANT CREATE , DOES> @ ;'
        f.compile(s)
        expected = f.heap.next_available()
        f.compile('2025 CONSTANT YEAR')
        assert f.heap.at(expected) == 2025
        f. compile('YEAR')
        assert f.stack.pop() == 2025

Needed to use at, not []. After adjusting all three tests, we are green.

Commit: convert to use new Heap object.

I want to double-check all references to heap. PyCharm will help with its usual um charm. Just bringing up the reference lets me read each line and verify that they all use proper parlance.

Let’s do a bit of refactoring though.

    def _at(forth):
        index = forth.stack.pop()
        forth.stack.push(forth.heap.at(index))

Inline:

    def _at(forth):
        forth.stack.push(forth.heap.at(forth.stack.pop()))

Then there’s this:

    def _put(forth):
        index = forth.stack.pop()
        value = forth.stack.pop()
        forth.heap.put(index, value)

I don’t feel comfortable inlining that one, it would be too cryptic. And based on that reasoning, I think I’ll undo the inlining on the other one.

And these two:

    def _allot(forth):
        forth.heap.allot(forth.stack.pop())

    def _comma(forth):
        forth.heap.comma(forth.stack.pop())

I think they should have Explaining Temporary Variables also:

    def _allot(forth):
        number_to_allot = forth.stack.pop()
        forth.heap.allot(number_to_allot)

    def _comma(forth):
        value_to_store = forth.stack.pop()
        forth.heap.comma(value_to_store)

I don’t do that often with a one-liner but sometimes they’re just a bit obscure. I think this is better.

These three are very similar:

    def _constant(forth):
        literal = forth.find_word('*#')
        name = forth.next_token()
        value = forth.stack.pop()
        word = SecondaryWord(name, [literal, value])
        forth.lexicon.append(word)

    def _variable(forth):
        name = forth.next_token()
        address = forth.heap.next_available()
        literal = forth.find_word('*#')
        word = SecondaryWord(name, [literal, address])
        forth.lexicon.append(word)

    def _create(forth):
        name = forth.next_token()
        lit = forth.find_word('*#')
        word = SecondaryWord(name, [lit, forth.heap.next_available()])
        forth.lexicon.append(word)

Let’s make them more similar and see what we get.

        def _constant(forth):
            value = forth.stack.pop()
            literal = forth.find_word('*#')
            name = forth.next_token()
            word = SecondaryWord(name, [literal, value])
            forth.lexicon.append(word)

        def _variable(forth):
            address = forth.heap.next_available()
            literal = forth.find_word('*#')
            name = forth.next_token()
            word = SecondaryWord(name, [literal, address])
            forth.lexicon.append(word)

        def _create(forth):
            address = forth.heap.next_available()
            literal = forth.find_word('*#')
            name = forth.next_token()
            word = SecondaryWord(name, [literal, address])
            forth.lexicon.append(word)

I edited each of those separately, extracting the temp, renaming the literal temp, and moving the lines around. And voila! we note that the last two are identical and the first is nearly so

We could combine them in the Python. But CREATE is more primitive than VARIABL and CONSTANT: CREATE is the word that creates things pointing to the heap. So let’s define VARIABLE and CONSTANT in terms of CREATE. We have already done CONSTANT in tests.

    def _define_create_does(self):
        def _create(forth):
            address = forth.heap.next_available()
            literal = forth.find_word('*#')
            name = forth.next_token()
            word = SecondaryWord(name, [literal, address])
            forth.lexicon.append(word)

        def _does(forth):
            forth.active_word.copy_to_latest(forth.lexicon)

        self.pw('DOES>', _does)
        self.pw('CREATE', _create)

# and elsewhere:

        forth.compile(': CONSTANT CREATE , DOES> @ ;')
        forth.compile(': VARIABLE CREATE ;')

And green. Commit: convert CONSTANT and VARIABLE to colon definitions.

Enough already. This article is long … but everything in it was straightforward. Let’s sum up.

Summary

The New Heap

The bulk of the work today was creating the simple object Heap, to serve as our heap storage. It just has a few methods: allot, at, comma, next_avaiable and put. Those are custom-made to be exactly what the user methods want to say. Its internals are a secret known only to you and me, and if and when we want diagnostics for the only possible error, accessing outside the heap, we’ll know where to put them.

We created the Heap class, used it in its single application, in Forth class, and simply ticked through the broken tests, which led us directly to everything that needed changing.

There wasn’t that much. If this were a huge application and we had waited ages to make this change, it would have been more difficult, and we might have had to provide some special handling in Heap for things like subscript access, even though we’d rather not have it. So we shouldn’t leave creating new helpful objects too long.

On the other hand, if we had decided on day one that we needed a Heap object, we would have done only as much work as we have already done, minus all the changes to the methods that did the old access to the heap knowing it as a list. I suspect that if one had the rule never to use a primitive object like list or sequence or dictionary, one would do better. Sometimes it wouldn’t pay off, but usually it would, and we would save time and probably move faster because the objects are more helpful.

Something to think about anyway.

Explaining Variable Names

I think it’s noteworthy that I decided against using the most compact code that PyCharm and I could devise, using the Explaining Variable Name pattern to clarify what the little helper methods are doing. It might be a bit slower and it is certainly a line longer every time we do it, but it makes reading the code much easier. And we read it more often than we write it.

A Perfect Morning

So lots of code shown here but the work was all relaxed, test-driven, and quite rhythmic and mellow.

Every day should be so pleasant. See you next time!

P.S.

I just discovered that I had a small TDD’d Heap class. I had entirely forgotten that I had done it. It was six days ago, so why would I remember. A quick review this morning told me that it was a bit odd, aimed at keeping the names of the items in the heap. That won’t do: they have to be words in the lexicon, and they have to represent numeric addresses. As an experiment, as billed, it was OK. The one we did today fits our needs better.

Live and learn.