XST 42: LISP, Again
It’s still the weekend, so I’m going to follow Peter Norvig’s Python LISP Implementation a while and see where it takes me.
Besides, this is my computer rig and my house and my web site and I can do what I want to.
We’re working from Peter Norvig’s article. I think I’ll follow along more or less exactly, that is, doing the Lispy Calculator first. I figure we can follow the Python implementation pretty closely, but I’ll write whatever thoughts come to mind here in the article, as I always do.
I gather from the article that we’re doing pretty much the Scheme version of LISP, though I haven’t looked it all up to work out the differences between dialects.
Language 1: Lispy Calculator
We’re going to write a smaller-than-LISP calculator language that can say things like this:
(define r 10)
(* pi (* r r))
As Peter has it, we’re going to need variable references, which are symbols, constant literals, which are numbers, a conditional (if test yes no), a definition (define symbol expression), and a procedure call (proc arg arg …).
The plan is to parse and then eval. Parse produces an “abstract syntax tree, and eval evaluates it. Our XPR language from the past few articles has an abstract syntax tree that I’ve hand crafted and evaluates it. The Norvig version of the tree is simpler than mine.
Peter calls his shot in the article. I’ll call mine with tests (and text).
I begin a new Codea project, LISP, and hook it into WorkingCopy. Commit: initial commit.
Now begin with a test.
_:test("parse", function()
local prog = "(begin (define r 10) (* pi (* r r)))"
local expect = { "begin", { "define", "r", 10},{"*", "pi", { "*", "r", "r"}}}
local parsed = LISP:parse(prog)
_:expect(parsed[1]).is(expect(1)
end)
This is rather a large test but since I’m following Peter rather than inventing, I’m hoping it’ll be OK. Obviously I’m going to need something to compare nested tables, but for now I can make do by disassembling the tables in parallel. Anyway I’m just starting with the first item. Let’s in fact do two just to make it harder. Let’s also not overload the word expect, and use correct syntax. That should make the job easier:
_:test("parse", function()
local prog = "(begin (define r 10) (* pi (* r r)))"
local expected = { "begin", { "define", "r", 10},{"*", "pi", { "*", "r", "r"}}}
local parsed = LISP:parse(prog)
_:expect(parsed[1]).is(expected[1])
_:expect(parsed[2][1]).is(expected[2][1])
end)
OK. This should error not knowing LISP. As I look at it now, I’m not sure whether the class notation is quite what I want. Let’s assume yes and sort it out later.
2: parse -- Tests:21: attempt to index a nil value (global 'LISP')
I do this:
LISPCalc = class()
LISP = LISPCalc()
I prefer that even when there will be only one (such as LISP) that it be an instance, not a class. It just seems to work out better for me. In this case, it’ll work out better because we’re going to need some LISP-related storage sooner or later.
Now it should fail not knowing parse:
1: parse -- Tests:17: attempt to call a nil value (method 'parse')
And it does.
function LISPCalc:parse(prog)
return {}
end
That will fail unhappily, probably not finding begin.
1: parse -- Actual: nil, Expected: begin
1: parse -- Tests:19: attempt to index a nil value (field 'integer index')
The second error is of course the nested comparison.
So this is good. Now back to Peter.
Peter indicates what his representations are for the Scheme objects. We’ll use the Lua equivalents, I imagine. Then he reveals that we need to “tokenize” the program before parsing it. If I had looked ahead in the reading, I’d have written a different test first, the one for tokenize. We’ll do that now, and set our parse test to ignore.
_:test("tokenize", function()
local prog = "(begin (define r 10) (* pi (* r r)))"
local correct = { "(", "begin", "(", "define", "r", "10", ")", "(", "*", "pi", "(", "*", "r", "r", ")", ")", ")" }
local tokens = LISP:tokenize(prog)
for i,token in ipairs(tokens) do
_:expect(token).is(correct[i])
end
end)
That’s hard to read but I think it’s what we want. I decided to iterate over the tokens list because it’ll shorter until it’s correct and otherwise I’ll get umpty messages. Need a better testing tool here. Run it, expect fail on tokenize.
2: tokenize -- Tests:25: attempt to call a nil value (method 'tokenize')
Implement. Peter uses two replaces and a split. I’ll have to look up what split does but it looks like it splits a string into an array, on whitespace.
His replaces make sure there are spaces around all the parens. We can do this:
function LISPCalc:tokenize(prog)
local spaced = prog:gsub('%(',' ( '):gsub('%)',' ) ')
return self:split(spaced)
end
My tests tell me that the substitution worked, but I’ve not managed to create a split pattern that finds anything.
Let me write a test just for that.
_:test("split", function()
local input = " a bb ccc "
local split = LISP:split(input)
_:expect(split[1]).is("a")
_:expect(split[2]).is("bb")
_:expect(split[3]).is("ccc")
end)
That’ll fail. My current incorrect version tends to return nothing.
3: split -- Actual: nil, Expected: a
3: split -- Actual: nil, Expected: bb
3: split -- Actual: nil, Expected: ccc
Darn patterns. It seems to me that my pattern wants to be whitespace of arbitrary length, non-whitespace ditto, whitespace ditto, with the non-whitespace captured.
I have this, which does not work .. ah! Reading the test to explain it to you, I found an extra * in the pattern. This seems to work:
function LISPCalc:split(str)
local result = {}
local pat = "%s*(%g*)s*"
for w in str:gmatch(pat) do
table.insert(result,w)
end
return result
end
Now I can unignore the tokenize test and I expect it should pass.
Tokenize says OK ten times and then says:
2: tokenize -- Actual: , Expected: nil
The actual must be an empty string. I think that’s because I allow a zero-length capture in the split. Let’s not do that.
local pat = "%s*(%g+)s*"
Tests pass. Commit: tokenize works.
Now back to Peter Norvig. He says:
Our function parse will take a string representation of a program as input, call tokenize to get a list of tokens, and then call read_from_tokens to assemble an abstract syntax tree. read_from_tokens looks at the first token; if it is a ‘)’ that’s a syntax error. If it is a ‘(‘, then we start building up a list of sub-expressions until we hit a matching ‘)’. Any non-parenthesis token must be a symbol or number. We’ll let Python make the distinction between them: for each non-paren token, first try to interpret it as an int, then as a float, and if it is neither of those, it must be a symbol.
We’ll try to follow his code function for function, writing in Lua instead of Python. This seems wise, because I’m talking to a Lua compiler, not a Python compiler.
I can unignore my parse test:
_:test("parse", function()
local prog = "(begin (define r 10) (* pi (* r r)))"
local expected = { "begin", { "define", "r", 10},{"*", "pi", { "*", "r", "r"}}}
local parsed = LISP:parse(prog)
_:expect(parsed[1]).is(expected[1])
_:expect(parsed[2][1]).is(expected[2][1])
end)
That needs beefing up but it’ll do for now. Test will fail with wrong answer.
1: parse -- Actual: nil, Expected: begin
I think this is because my parse
isn’t up to snuff:
function LISPCalc:parse(prog)
return {}
end
I’m really going to just transcribe Peter’s code into Lua directly, so long as I can.
function LISPCalc:parse(prog)
return self:readFromTokens(self:tokenize(prog))
end
Should fail on read …
3: parse -- LISPCalc:8: attempt to call a nil value (method 'readFromTokens')
I like to run the tests because they tell me whether I’m tracking where I think I am.
I quickly observe that Python has a pop function. Lua does not. I’ll just write that out longhand for now.
OK, I’ve just typed this monster in. I am not comfortable. Also I will need to create the class (struct) for Symbol. Let’s see what the test does.
4: parse -- LISPCalc:16: bad argument #1 to 'remove' (table expected, got string)
Oh my bad. This is the line:
tokens = table.remove(tokens,1)
Turns out that table.remove modifies the table in place (why wouldn’t it, insert does?) and returns the value. So pop == remove.
4: parse -- LISPCalc:35: attempt to call a nil value (global 'Symbol')
Well, we are deep in now. Create Symbol class:
Symbol = class()
function Symbol:init(token)
self.token = token
end
4: parse -- LISPCalc:25: attempt to call a table value
Hmn. What’s that?
for _i, tok in t do
table.insert(L,t)
end
Somebody forgot to say ipairs
4: parse -- attempt to index a number value
That’s not helpful at all. I don’t know where it came from. I kind of suspect that it’s back in the test.
A print tells me that it isn’t.
I’m going to toss some prints into the parsing. That’s useful because of the final thing it prints:
return atom 10
4: parse -- attempt to index a number value
Everything else said things like “return atom begin, Symbol begin”. So what’s going on here?
Ah. It’s my transcription of this:
while tokens[0] != ')':
L.append(read_from_tokens(tokens))
I thought that would be a table coming back. I think it’s just a token, so my code should be:
while tokens[1] ~= ")" do
local t = self:readFromTokens(tokens)
table.insert(L,t)
end
I’m going to have to pause, if not stop. The eggs are in the pan.
I have new and exciting error:
4: parse -- Actual: table: 0x2827eec40, Expected: begin
4: parse -- Actual: table: 0x2827ec8c0, Expected: define
I think I’ll dump whatever came out.
Breakfast etc intervenes …
I’m back. My print of what’s going in looks nearly right. What comes out is tables three levels down at least.
The bug is in the test. I’m checking for strings. I need to expect symbols. Here’s the test and code that works so far:
_:test("parse", function()
local prog = "(begin (define r 10) (* pi (* r r)))"
local expected = { "begin", { "define", "r", 10},{"*", "pi", { "*", "r", "r"}}}
local parsed = LISP:parse(prog)
_:expect(parsed[1].token).is(expected[1])
_:expect(parsed[2][1].token).is(expected[2][1])
end)
Symbol = class()
function Symbol:init(token)
self.token = token
end
function Symbol:__tostring()
return "Symbol "..self.token
end
function Symbol:__eq(aSymbol)
return self.token == aSymbol.token
end
LISPCalc = class()
LISP = LISPCalc()
function LISPCalc:parse(prog)
return self:readFromTokens(self:tokenize(prog))
end
function LISPCalc:readFromTokens(tokens)
if #tokens == 0 then
error("Unexpected EOF")
end
local token = table.remove(tokens,1)
if token == "(" then
local L = {}
while tokens[1] ~= ")" do
local t = self:readFromTokens(tokens)
table.insert(L,t)
end
table.remove(tokens,1) -- remove the )
return L
elseif token == ")" then
error("Syntax error unexpected )")
else
return self:atom(token)
end
end
function LISPCalc:atom(token)
-- return number or symbol
return tonumber(token) or Symbol(token)
end
function LISPCalc:tokenize(prog)
local spaced = prog:gsub('%(',' ( '):gsub('%)',' ) ')
return self:split(spaced)
end
function LISPCalc:split(str)
local result = {}
local pat = "%s*(%g+)s*"
for w in str:gmatch(pat) do
table.insert(result,w)
end
return result
end
The test is just checking two elements. How do you even test these things to be sure they are equal all the way down?
By intention:
_:test("parse", function()
local prog = "(begin (define r 10) (* pi (* r r)))"
local expected = { "begin", { "define", "r", 10},{"*", "pi", { "*", "r", "r"}}}
local parsed = LISP:parse(prog)
_:expect(parsed[1].token).is(expected[1])
_:expect(parsed[2][1].token).is(expected[2][1])
compare(parsed,expected)
end)
function compare(p,e)
for i,e in ipairs(e) do
if type(e) == "table" then
_:expect(type(p[i])).is("table")
compare(p[i],e)
elseif type(e) == "string" then
_:expect(p[i]).is(Symbol(e))
else
_:expect(p[i]).is(e)
end
end
end
Test is green. Commit: Calculator parse works.
This is enough for today and in the summary, I’ll tell you why.
Summary
If you know Python 3, you may have detected that I do not. In particular, I’ve come to realize that there is real meaning to these bits of Python from Norvig’s article:
Symbol = str # A Scheme Symbol is implemented as a Python str
Number = (int, float) # A Scheme Number is implemented as a Python int or float
Atom = (Symbol, Number) # A Scheme Atom is a Symbol or Number
List = list # A Scheme List is implemented as a Python list
Exp = (Atom, List) # A Scheme expression is an Atom or List
Env = dict # A Scheme environment (defined below)
# is a mapping of {variable: value}
And these markers on the functions:
def parse(program: str) -> Exp:
"Read a Scheme expression from a string."
return read_from_tokens(tokenize(program))
def read_from_tokens(tokens: list) -> Exp:
"Read an expression from a sequence of tokens."
if len(tokens) == 0:
raise SyntaxError('unexpected EOF')
token = tokens.pop(0)
if token == '(':
L = []
while tokens[0] != ')':
L.append(read_from_tokens(tokens))
tokens.pop(0) # pop off ')'
return L
elif token == ')':
raise SyntaxError('unexpected )')
else:
return atom(token)
def atom(token: str) -> Atom:
"Numbers become numbers; every other token is a symbol."
try: return int(token)
except ValueError:
try: return float(token)
except ValueError:
return Symbol(token)
Now that I look at those, they appear to be some kind of definition of a type sort of thing, saying that an Atom
is a Symbol or a Number and a Number is an int or a float. Before I go any further, I want to do a quick scan of the entire Python 3 literature, to make sure that I understand what’s going on there. My suspicion is that it’s just doing a bit of type-checking for the intrepid Python programmers, and that I can ignore it (at my peril). But I’d like to be sure.
The second more important reason is that today is Sunday and I have important news programs, reading, and napping to do.
But this is fun. I’m going to continue into the week until I have this little Scheme / LISP thing working. I’m sure it’ll pay off somehow.
In fun and mockery if nothing else. See you then!