It’s still the weekend, so I’m going to follow Peter Norvig’s Python LISP Implementation a while and see where it takes me.

Besides, this is my computer rig and my house and my web site and I can do what I want to.

We’re working from Peter Norvig’s article. I think I’ll follow along more or less exactly, that is, doing the Lispy Calculator first. I figure we can follow the Python implementation pretty closely, but I’ll write whatever thoughts come to mind here in the article, as I always do.

I gather from the article that we’re doing pretty much the Scheme version of LISP, though I haven’t looked it all up to work out the differences between dialects.

Language 1: Lispy Calculator

We’re going to write a smaller-than-LISP calculator language that can say things like this:

(define r 10)
(* pi (* r r))

As Peter has it, we’re going to need variable references, which are symbols, constant literals, which are numbers, a conditional (if test yes no), a definition (define symbol expression), and a procedure call (proc arg arg …).

The plan is to parse and then eval. Parse produces an “abstract syntax tree, and eval evaluates it. Our XPR language from the past few articles has an abstract syntax tree that I’ve hand crafted and evaluates it. The Norvig version of the tree is simpler than mine.

Peter calls his shot in the article. I’ll call mine with tests (and text).

I begin a new Codea project, LISP, and hook it into WorkingCopy. Commit: initial commit.

Now begin with a test.

        _:test("parse", function()
            local prog = "(begin (define r 10) (* pi (* r r)))"
            local expect = { "begin", { "define", "r", 10},{"*", "pi", { "*", "r", "r"}}}
            local parsed = LISP:parse(prog)
            _:expect(parsed[1]).is(expect(1)
        end)

This is rather a large test but since I’m following Peter rather than inventing, I’m hoping it’ll be OK. Obviously I’m going to need something to compare nested tables, but for now I can make do by disassembling the tables in parallel. Anyway I’m just starting with the first item. Let’s in fact do two just to make it harder. Let’s also not overload the word expect, and use correct syntax. That should make the job easier:

        _:test("parse", function()
            local prog = "(begin (define r 10) (* pi (* r r)))"
            local expected = { "begin", { "define", "r", 10},{"*", "pi", { "*", "r", "r"}}}
            local parsed = LISP:parse(prog)
            _:expect(parsed[1]).is(expected[1])
            _:expect(parsed[2][1]).is(expected[2][1])
        end)

OK. This should error not knowing LISP. As I look at it now, I’m not sure whether the class notation is quite what I want. Let’s assume yes and sort it out later.

2: parse -- Tests:21: attempt to index a nil value (global 'LISP')

I do this:

LISPCalc = class()

LISP = LISPCalc()

I prefer that even when there will be only one (such as LISP) that it be an instance, not a class. It just seems to work out better for me. In this case, it’ll work out better because we’re going to need some LISP-related storage sooner or later.

Now it should fail not knowing parse:

1: parse -- Tests:17: attempt to call a nil value (method 'parse')

And it does.

function LISPCalc:parse(prog)
    return {}
end

That will fail unhappily, probably not finding begin.

1: parse  -- Actual: nil, Expected: begin
1: parse -- Tests:19: attempt to index a nil value (field 'integer index')

The second error is of course the nested comparison.

So this is good. Now back to Peter.

Peter indicates what his representations are for the Scheme objects. We’ll use the Lua equivalents, I imagine. Then he reveals that we need to “tokenize” the program before parsing it. If I had looked ahead in the reading, I’d have written a different test first, the one for tokenize. We’ll do that now, and set our parse test to ignore.

        _:test("tokenize", function()
            local prog = "(begin (define r 10) (* pi (* r r)))"
            local correct = { "(", "begin", "(", "define", "r", "10", ")", "(", "*", "pi", "(", "*", "r", "r", ")", ")", ")"  }
            local tokens = LISP:tokenize(prog)
            for i,token in ipairs(tokens) do
                _:expect(token).is(correct[i])
            end
        end)

That’s hard to read but I think it’s what we want. I decided to iterate over the tokens list because it’ll shorter until it’s correct and otherwise I’ll get umpty messages. Need a better testing tool here. Run it, expect fail on tokenize.

2: tokenize -- Tests:25: attempt to call a nil value (method 'tokenize')

Implement. Peter uses two replaces and a split. I’ll have to look up what split does but it looks like it splits a string into an array, on whitespace.

His replaces make sure there are spaces around all the parens. We can do this:

function LISPCalc:tokenize(prog)
    local spaced = prog:gsub('%(',' ( '):gsub('%)',' ) ')
    return self:split(spaced)
end

My tests tell me that the substitution worked, but I’ve not managed to create a split pattern that finds anything.

Let me write a test just for that.

        _:test("split", function()
            local input = " a bb ccc "
            local split = LISP:split(input)
            _:expect(split[1]).is("a")
            _:expect(split[2]).is("bb")
            _:expect(split[3]).is("ccc")
        end)

That’ll fail. My current incorrect version tends to return nothing.

3: split  -- Actual: nil, Expected: a
3: split  -- Actual: nil, Expected: bb
3: split  -- Actual: nil, Expected: ccc

Darn patterns. It seems to me that my pattern wants to be whitespace of arbitrary length, non-whitespace ditto, whitespace ditto, with the non-whitespace captured.

I have this, which does not work .. ah! Reading the test to explain it to you, I found an extra * in the pattern. This seems to work:

function LISPCalc:split(str)
    local result = {}
    local pat = "%s*(%g*)s*"
    for w in str:gmatch(pat) do
        table.insert(result,w)
    end
    return result
end

Now I can unignore the tokenize test and I expect it should pass.

Tokenize says OK ten times and then says:

2: tokenize  -- Actual: , Expected: nil

The actual must be an empty string. I think that’s because I allow a zero-length capture in the split. Let’s not do that.

    local pat = "%s*(%g+)s*"

Tests pass. Commit: tokenize works.

Now back to Peter Norvig. He says:

Our function parse will take a string representation of a program as input, call tokenize to get a list of tokens, and then call read_from_tokens to assemble an abstract syntax tree. read_from_tokens looks at the first token; if it is a ‘)’ that’s a syntax error. If it is a ‘(‘, then we start building up a list of sub-expressions until we hit a matching ‘)’. Any non-parenthesis token must be a symbol or number. We’ll let Python make the distinction between them: for each non-paren token, first try to interpret it as an int, then as a float, and if it is neither of those, it must be a symbol.

We’ll try to follow his code function for function, writing in Lua instead of Python. This seems wise, because I’m talking to a Lua compiler, not a Python compiler.

I can unignore my parse test:

        _:test("parse", function()
            local prog = "(begin (define r 10) (* pi (* r r)))"
            local expected = { "begin", { "define", "r", 10},{"*", "pi", { "*", "r", "r"}}}
            local parsed = LISP:parse(prog)
            _:expect(parsed[1]).is(expected[1])
            _:expect(parsed[2][1]).is(expected[2][1])
        end)

That needs beefing up but it’ll do for now. Test will fail with wrong answer.

1: parse  -- Actual: nil, Expected: begin

I think this is because my parse isn’t up to snuff:

function LISPCalc:parse(prog)
    return {}
end

I’m really going to just transcribe Peter’s code into Lua directly, so long as I can.

function LISPCalc:parse(prog)
    return self:readFromTokens(self:tokenize(prog))
end

Should fail on read …

3: parse -- LISPCalc:8: attempt to call a nil value (method 'readFromTokens')

I like to run the tests because they tell me whether I’m tracking where I think I am.

I quickly observe that Python has a pop function. Lua does not. I’ll just write that out longhand for now.

OK, I’ve just typed this monster in. I am not comfortable. Also I will need to create the class (struct) for Symbol. Let’s see what the test does.

4: parse -- LISPCalc:16: bad argument #1 to 'remove' (table expected, got string)

Oh my bad. This is the line:

    tokens = table.remove(tokens,1)

Turns out that table.remove modifies the table in place (why wouldn’t it, insert does?) and returns the value. So pop == remove.

4: parse -- LISPCalc:35: attempt to call a nil value (global 'Symbol')

Well, we are deep in now. Create Symbol class:

Symbol = class()

function Symbol:init(token)
    self.token = token
end
4: parse -- LISPCalc:25: attempt to call a table value

Hmn. What’s that?

            for _i, tok in t do
                table.insert(L,t)
            end

Somebody forgot to say ipairs

4: parse -- attempt to index a number value

That’s not helpful at all. I don’t know where it came from. I kind of suspect that it’s back in the test.

A print tells me that it isn’t.

I’m going to toss some prints into the parsing. That’s useful because of the final thing it prints:

return atom 	10
4: parse -- attempt to index a number value

Everything else said things like “return atom begin, Symbol begin”. So what’s going on here?

Ah. It’s my transcription of this:

        while tokens[0] != ')':
            L.append(read_from_tokens(tokens))

I thought that would be a table coming back. I think it’s just a token, so my code should be:

        while tokens[1] ~= ")" do
            local t = self:readFromTokens(tokens)
            table.insert(L,t)
        end

I’m going to have to pause, if not stop. The eggs are in the pan.

I have new and exciting error:

4: parse  -- Actual: table: 0x2827eec40, Expected: begin
4: parse  -- Actual: table: 0x2827ec8c0, Expected: define

I think I’ll dump whatever came out.

Breakfast etc intervenes …

I’m back. My print of what’s going in looks nearly right. What comes out is tables three levels down at least.

The bug is in the test. I’m checking for strings. I need to expect symbols. Here’s the test and code that works so far:

        _:test("parse", function()
            local prog = "(begin (define r 10) (* pi (* r r)))"
            local expected = { "begin", { "define", "r", 10},{"*", "pi", { "*", "r", "r"}}}
            local parsed = LISP:parse(prog)
            _:expect(parsed[1].token).is(expected[1])
            _:expect(parsed[2][1].token).is(expected[2][1])
        end)

Symbol = class()

function Symbol:init(token)
    self.token = token
end

function Symbol:__tostring()
    return "Symbol "..self.token
end

function Symbol:__eq(aSymbol)
    return self.token == aSymbol.token
end

LISPCalc = class()

LISP = LISPCalc()

function LISPCalc:parse(prog)
    return self:readFromTokens(self:tokenize(prog))
end

function LISPCalc:readFromTokens(tokens)
    if #tokens == 0 then
        error("Unexpected EOF")
    end
    local token = table.remove(tokens,1)
    if token == "(" then
        local L = {}
        while tokens[1] ~= ")" do
            local t = self:readFromTokens(tokens)
            table.insert(L,t)
        end
        table.remove(tokens,1) -- remove the )
        return L
    elseif token == ")" then
        error("Syntax error unexpected )")
    else
        return self:atom(token)
    end
end

function LISPCalc:atom(token)
    -- return number or symbol
    return tonumber(token) or Symbol(token)
end

function LISPCalc:tokenize(prog)
    local spaced = prog:gsub('%(',' ( '):gsub('%)',' ) ')
    return self:split(spaced)
end
function LISPCalc:split(str)
    local result = {}
    local pat = "%s*(%g+)s*"
    for w in str:gmatch(pat) do
        table.insert(result,w)
    end
    return result
end

The test is just checking two elements. How do you even test these things to be sure they are equal all the way down?

By intention:

        _:test("parse", function()
            local prog = "(begin (define r 10) (* pi (* r r)))"
            local expected = { "begin", { "define", "r", 10},{"*", "pi", { "*", "r", "r"}}}
            local parsed = LISP:parse(prog)
            _:expect(parsed[1].token).is(expected[1])
            _:expect(parsed[2][1].token).is(expected[2][1])
            compare(parsed,expected)
        end)

function compare(p,e)
    for i,e in ipairs(e) do
        if type(e) == "table" then
            _:expect(type(p[i])).is("table")
            compare(p[i],e)
        elseif type(e) == "string" then
            _:expect(p[i]).is(Symbol(e))
        else
            _:expect(p[i]).is(e)
        end
    end
end

Test is green. Commit: Calculator parse works.

This is enough for today and in the summary, I’ll tell you why.

Summary

If you know Python 3, you may have detected that I do not. In particular, I’ve come to realize that there is real meaning to these bits of Python from Norvig’s article:

Symbol = str              # A Scheme Symbol is implemented as a Python str
Number = (int, float)     # A Scheme Number is implemented as a Python int or float
Atom   = (Symbol, Number) # A Scheme Atom is a Symbol or Number
List   = list             # A Scheme List is implemented as a Python list
Exp    = (Atom, List)     # A Scheme expression is an Atom or List
Env    = dict             # A Scheme environment (defined below) 
                          # is a mapping of {variable: value}

And these markers on the functions:

def parse(program: str) -> Exp:
    "Read a Scheme expression from a string."
    return read_from_tokens(tokenize(program))

def read_from_tokens(tokens: list) -> Exp:
    "Read an expression from a sequence of tokens."
    if len(tokens) == 0:
        raise SyntaxError('unexpected EOF')
    token = tokens.pop(0)
    if token == '(':
        L = []
        while tokens[0] != ')':
            L.append(read_from_tokens(tokens))
        tokens.pop(0) # pop off ')'
        return L
    elif token == ')':
        raise SyntaxError('unexpected )')
    else:
        return atom(token)

def atom(token: str) -> Atom:
    "Numbers become numbers; every other token is a symbol."
    try: return int(token)
    except ValueError:
        try: return float(token)
        except ValueError:
            return Symbol(token)

Now that I look at those, they appear to be some kind of definition of a type sort of thing, saying that an Atom is a Symbol or a Number and a Number is an int or a float. Before I go any further, I want to do a quick scan of the entire Python 3 literature, to make sure that I understand what’s going on there. My suspicion is that it’s just doing a bit of type-checking for the intrepid Python programmers, and that I can ignore it (at my peril). But I’d like to be sure.

The second more important reason is that today is Sunday and I have important news programs, reading, and napping to do.

But this is fun. I’m going to continue into the week until I have this little Scheme / LISP thing working. I’m sure it’ll pay off somehow.

In fun and mockery if nothing else. See you then!