Variables Emerge

The Repo on GitHub

Interesting though compiled immediate words might be, I decide that variables are more important. I experiment with a somewhat smart heap and then, suddenly, realize how much we can do with a very rudimentary heap. I think we’re onto something good!

If we were to build this capability, I think it would require just a couple of new primary words. Then, with perhaps some other small features, it would be possible to build compile-time words in Forth instead of in Python. That would permit users, if we had any users, to build their own compile-time words, so that Hill, for example, could build the CASE-ESAC words that he thinks he might want if he were, contrary to any realistic expectations, ever to use this Forth.

My main argument in favor of doing it is that it will be amusing, possibly of value, and will help me understand how to produce a better Forth implementation. The main argument against, I suppose, is that there might be something more valuable that’s worth doing. Let’s explore that possibility.

Forth typically offers named variables, names that point to values, so that you can set X to 5 and later on look at X and find the 5. And it often offers named constants, variables that don’t vary after they are first set. It’s quite difficult to write even a moderately complicated Forth program without variables. Variables and possibly constants are probably more valuable than the ability to define one’s own immediate words.

Offhand, those are the only Forth capabilities where I’d make the case that they’re more important than the immediate secondary notion. And they actually have a bit of a relationship to that problem.

Let’s think about and experiment with the notion of variables.

In Forth, if we were to say VARIABLE X, a single cell of memory would be allocated, and the word X would be defined so as to put the address of that cell on the stack. Then, Forth provides words @ and !. X @ places the contents of X on the stack, accessing the variable’s value. And X ! stores whatever is in the second position on the stack into the address on top. So 3 X ! sets X to 3.

Of course, you can add to any value on the stack, so that

X 1 + @

returns whatever value is in the cell after the variable X. Even worse

666 X + 1 !

stores 666 into the cell following X, which up until now was probably something you cared about.

For your convenience, Forth typically allows you to allocate more than one cell for a variable, perhaps something like

VARIABLE X 5 ALLOT
VARIABLE Y

Which will allocate 6 cells for X, the original one and 5 more. Now you can’t destroy the universe quite so easily unless you do

666 X 6 + !

And poof! there goes your Y variable again.

Now we could probably do better, since we have the power of Python at our disposal. Perhaps we would implement a Variable class and create an instance of Variable when the user uses VARIABLE, and the Variable class might have a small list of items in it and it might have a limit on how long that list is, and two methods at and bang that set the values in the list but do not let anything vicious happen.

Or we could allocate variables as pointers (indices) into one big list, implement @ and ! appropriately, and let people store 666 into their Y variable by accident or on purpose if they want to.

What would X and Y be in that case? Here’s an idea.

Suppose our variable memory, the heap, is an object containing a dictionary of name to index and a big list. And suppose that we define the word X to put that index on the stack (somehow) and define @ and ! to use those values as indexes into the list part of the heap.

Note: What follows is a bit of heap experimentation. It’s not what we wind u doing but it does inform our thinking. Would it have been better to skip right to the next better idea? Sure, in retrospect. But we got there bu this short learning experiment.

I think I like this idea. Let’s try to write some tests for it.

class TestHeap:
    def test_heap_access(self):
        heap = Heap(10)
        assert heap.at(5) == 0
        heap.bang(5, 666)
        assert heap.at(5) == 666
        with        pytest.raises(IndexError):
            heap.bang(10, 666)

That test passes with this Heap class:

class Heap:
    def __init__(self, limit):
        self.limit = limit
        self.data = [0] * self.limit

    def at(self, index):
        return self.data[index]

    def bang(self, index, value):
        self.data[index] = value

OK, now the idea was that the Heap would have a dictionary of names and indices. Maybe index and length, to allow for arrays.

So let’s add tests for that. I think we’ll probably get tangled up in our own feet here but we’ll see.

    def test_named_access(self):
        heap = Heap(10)
        heap.allot('X', 1)
        assert heap.at_name('X') == 0
        heap.bang_name('X', 666)
        assert heap.at_name('X') == 666

Thinking ahead to adding to the index, I’m not sure what we’ll do about that. But for now:

    def test_named_access(self):
        heap = Heap(10)
        heap.allot('X', 1)
        assert heap.at_name('X') == 0
        heap.bang_name('X', 666)
        assert heap.at_name('X') == 666

And that passes with these added methods:

    def allot(self, name, count):
        self.names[name] = (self.next, count)
        self.next += count
        if self.next > self.limit:
            self.data.extend([0] * (self.next - self.limit))

    def at_name(self, name):
        return self.at(self.names[name][0])

    def bang_name(self, name, value):
        self.bang(self.names[name][0], value)

I haven’t tested the code that extends the list, but I was there, so I wrote something that might work. Let’s have that test. I think it should be like this:

    def test_extension(self):
        heap = Heap(4)
        heap.allot('X', 2)
        heap.allot('Y')
        heap.allot('Z')
        heap.allot('EXTRA')
        heap.bang('EXTRA', 666)
        assert len(heap.data) == 5
        assert heap.limit == 4

This is failing and I’ll know the reason why in a moment. (I do know that I haven’t updated limit but that’s not the error.) Oh, I said bang, not bang_name. Let’s rename, making the at and bang names available to the public.

With those changes my test passes and it shouldn’t. I am not using the limit at all. Let’s just remove it for now until we need it.

This passes:

    def test_extension(self):
        heap = Heap(4)
        heap.allot('X', 2)
        heap.allot('Y')
        heap.allot('Z')
        heap.allot('EXTRA')
        heap.bang('EXTRA', 666)
        assert len(heap.data) == 5
        assert heap.at('EXTRA') == 666

With this class:

class Heap:
    def __init__(self, number_of_cells):
        self.data = [0] * number_of_cells
        self.names = {}
        self.next = 0

    def _at(self, index):
        return self.data[index]

    def _bang(self, index, value):
        self.data[index] = value

    def allot(self, name, count=1):
        self.names[name] = (self.next, count)
        self.next += count
        if self.next >= len(self.data):
            self.data.extend([0] * (self.next - len(self.data)))

    def at(self, name):
        return self._at(self.names[name][0])

    def bang(self, name, value):
        self._bang(self.names[name][0], value)

Maybe we should allocate more than one cell at a time there in the allot. Let’s allocate, oh, ten. Change the test:

    def test_extension(self):
        heap = Heap(4)
        heap.allot('X', 2)
        heap.allot('Y')
        heap.allot('Z')
        heap.allot('EXTRA')
        heap.bang('EXTRA', 666)
        assert len(heap.data) == 14
        assert heap.at('EXTRA') == 666

With:

class Heap:
    def allot(self, name, count=1):
        self.names[name] = (self.next, count)
        self.next += count
        if self.next >= len(self.data):
            self.data.extend([0] * 10)

OK, I think that’s good enough for now. Commit: initial Heap running in test.

Note: We file this away under learning, and with whatever we’ve learned in mind, think about how we might use a heap if we had one. Ideally, we’d have started here. But we have to start where we are, not where we’re going. It took a step to get here, but we now have a lot of useful notions of a smarter heap. And now, we get the insight that gives us a spark of real joy.

Now that we have this thing, how can we use it? The classical syntax, I think, would be

VARIABLE X 1 ALLOT
VARIABLE Y

Where the 1 ALLOT allots 1 MORE cell to X. That’s going to be tricky unless we were to require the ALLOT word as we require THEN. I think that wouldn’t be Forthright. (By analogy to Pythonic?) Let’s belay the concern about ALLOT for now.

When we use a variable, it would be like this:

VARIABLE X
X 42 !
X @ 42 = IF THIS_SHOULD_HAPPEN ELSE THIS_SHOULDNT THEN

So the question becomes, how do we implement @ and !, and what do they expect to find on the stack?

Furthermore if X has more than one cell, we need to be able to say

X 1 + 43 !

And expect to find 43 in what we would call X[1].

So what is X, and what are @ and !?

A Miracle Occurs!: What comes out now is a key idea: let the variable put a simple integer on the stack, and let the @ and ! deal with the integers. This is obvious once we see it! Nice!

Could we allow variables to be simple integers, and then translate them into locations in the heap via @ and !?

OK, not with the Heap we just created but suppose that our Heap had a symbol table:

name	index	len
X	0	2
Y	2	1
Z	3	1
E	4	1

Wait!: Never mind how we get the index. It’s what we do with the index that counts!

And suppose that VARIABLE Y was somehow defined so that it pushed 2 onto the stack, its index in the table above.

And suppose that @ “just” takes any old integer off the stack and yanks that value out of the heap’s list, and that ! “just” just jams the second value on the stack into the heap list at the index at the top of the stack.

Let’s spike an example of this. We’ll give Forth a list called “heap”, of some convenient length for now.

class Forth:
    def __init__(self):
        self.active_words = []
        self.compile_stack = Stack()
        self.heap = [0] * 10

And then define @ and !:

        def _at(forth):
            index = forth.stack.pop()
            forth.stack.push(forth.heap[index])

        def _put(forth):
            index = forth.stack.pop()
            value = forth.stack.pop()
            forth.heap[index] = value

        self.append(PrimaryWord('@', _at))
        self.append(PrimaryWord('!', _put))

I didn’t write a test for those yet, but here goes:

    def test_rudimentary_heap(self):
        f = Forth()
        f.compile('666 4 !')
        assert f.heap[4] == 666
        f.compile('4 @')
        assert f.stack.pop() == 666

Passes. Commit: rudimentary heap as list.

Two more tests, also passing:

    def test_rudimentary_heap_arithmetic(self):
        f = Forth()
        f.compile('666 4 !')
        f.compile('1 3 + @')
        assert f.stack.pop() == 666

    def test_rudimentary_heap_overflow(self):
        f = Forth()
        with pytest.raises(IndexError):
            f.compile('666 10 !')

Note that final test. It shows that we can’t destroy the world by indexing out of the heap. If we do, we’ll get an exception that we will deal with at the top of Forth one of these first days.

Commit: additional rudimentary heap tests.

I think we’re at a lovely stopping point. Let’s sum up.

Summary

We started by considering priorities and it seemed that variables are more valuable than the ability to define immediate words using Forth. So then we played with the idea of a Heap that understood string names and had some simple limits built in. That has a bit of value in that it gave us a feeling for how we might use limits.

But then, a miracle occurred. It came to me that the @ and ! could treat an integer on the stack as the index of a value in the heap. (This is exactly what Forth actually does, I think, although it might actually be allocating in the lexicon in some versions.)

From that little insight comes a very rudimentary but extensible notion for access to variables.

What I can’t explain is why it wasn’t obvious all along, but I guess I just have too much experience with objects, where we aren’t really able to just reach inside them and grab things. Anyway, for some reason it wasn’t obvious until it was obvious. This is the way with ideas, isn’t it?

This is going to open some doors, I’m sure. I’m pleased. Thanks for joining me, see you next time!