More work on our nascent Making App. Dave1707 has some interesting tricks in his programs. And we have a fine small example of refactoring ‘legacy’ code.

I enjoyed yesterday’s initial foray into Dave1707’s program for listing the classes and functions in a project, and after over 180 dungeon articles, maybe I deserve a break. I’m sure that you do. So we’ll proceed with a bit of tool work, building up things we consider useful, and trying to learn a bit about programming, legacy code, and such, as we go.

Dave posted another interesting program yesterday. This one displays two columns of information. The left column shows, for each tab in the project, the classes that that tab defines. The right column shows, for each tab, the classes it uses (in the sense of creating an instance).

We may look at that program in the near future, and if not, we’ll certainly be borrowing some ideas from it. What strikes me most about Dave’s programs is how much information he was able to wring out of the program text, and the very simple things he did to get the info.

I like to think of myself as favoring simple solutions that work, and I would like to point out that I haven’t tried to build little programs like his on my own, so it is possible that I’d have come up with simple ideas like Dave’s. It is also possible, since I have a long and storied background in compilers and such, that I’d have embarked on a long and fruitless journey trying to parse Lua.

I am pretty sure I would’t have done that, because it would be difficult–given my lack of practice in such things–and so, lazy creature that I am, I would not have started without a simpler idea. I might well have come up with these ideas, but now I don’t have to. I can just report what Dave did.

What Dave Did

Well, if you’re not going to parse the actual statements, it’s clear what you do, right? You look for patterns in the code, and you deal with them simply, so that if the programmer has written the code in the way we expect, our program will get good answers.

An example: To declare a class in Lua, as all now know, we type something like:

Monster = class()

We might have a superclass, in which case we might type:

Monster = class(Entity)

As we’ll see in Dave’s program that we’re working with now, Dave wrote a simple find to detect that kind of statement:

            s1,e1=string.find(str1,"class")
            if s1~=nil then
                s2,e2=string.find(str1,")")
                if s2~= nil then
                    str1=removeSpaces(string.sub(str1,1,e2))                            
                    table.insert(tab,extra.."    "..str1)
                    b1=e2+1
                end

If he finds the word “class”, then he finds a right paren. If he finds that, he removes the contents of the line after the paren, removes all the spaces from the front of the line and puts it into the output table with the necessary spacing to line up properly.

This technique can fail. In fact, it mistakenly produces this line in the dungeon program:

    _:describe("Monsters class", function()

The line contains the string “class” and a right paren, so it is printed by Dave’s program. If I don’t like that, I can do something better.

And in the other program, that we’re not going to look at today unless we do, Dave did two things better.

First, he searched for "class(". That would reject my case above. OK, that was obvious, and as soon as you see it picking up the word “class” in strings, you beef up the checks a bit. But he did something else that I hadn’t thought of, and it is elegant in its simplicity.

Dave removed all the spaces from the input before he checked it. What does that do? Well, it means that the patterns you check for can be simpler and more robust. You can look for “=class(“, for example, and be sure you’ll find it if the programmer has typed anything at all reasonable.

Now, clearly you can’t always work with all the spaces removed, so he generally makes a de-spaced copy but retains the one with the spaces. And he has a function that removes all spaces but the last one, which will normalize the look of things even if the programmer has multi-spaced sometimes.

These ideas are obvious, aren’t they? At least they are obvious now, even if they weren’t a few minutes ago. But it is a real delight to read some code and find something interesting and then have the light go on: “Oh! That gives us a guaranteed form! Nice!”

This reminds me of something else. In the case of programs intended for others, or ourselves of the future, to read, it might be a courtesy to write the code so that interesting aspects are easy to discover.

This is not a knock on Dave: he’s writing little programs to entertain himself, and he writes them incrementally, expanding them in place, until they do what he wants. And he’s successful at it: his programs generally work properly. And he knows a lot about Codea: he’s often the one who answers questions on the forum, including my questions. So Dave’s just fine.

And–when I write code, I try to write it so that it will be as clear as I can make it. I do that for a number of reasons:

  • I mostly write code to share programming ideas, not just solutions;
  • The ideas I want to share have to do with how we modularize code, how we refactor it, how we make it expressive;
  • I’ve found that when I look at a program a few months–or a few minutes–after I’ve written it, I often need all the help I can get figuring out what is going on.

And that leads us to what I’m doing with Dave’s first program.

What Are You Doing, Anyway?

I’m trying to get this program under control. By under control, I mean, under my control. I am the new owner of this program, its trainer. I am its master, in the same way that my cat is the master of me and my wife. The cat wants us to do what the cat wants: I want this program to do what I want.

So when I read this program, I read it in the presence of tests, which I write to verify things, and to be sure that I haven’t broken things. And when I understand some bit of it, I try to pull that bit out into a separate function or method, so that its meaning can be more visible next time through.

So let’s do some of that today. We’ll see if I can figure out what I mean by the above. And, you know what? I bet I can.

Where we are now is that the program shows three buttons, to run the tests, to make a “golden master”, and to run the report on the dungeon program, “D2”. Here’s a picture, showing the three buttons and the result of pressing the CodeaUnit Runner button:

buttons

The tests so far are trivial. One is the hookup test. I’ll remove that right now. Now I have exactly one test:

        _:test("Sample", function()
            project = "Making"
            font("Courier")
            textMode(CORNER)
            dy=0
            tab={}   
            local str = readProjectTab("Sample")
            local result = processTab(str,"Sample")
            tab = result
            local tabString = table.concat(result, "\n")
            local tstString = readLocalData("tab")
            _:expect(tabString).is(tstString)
        end)

This test has some weird stuff in it, like the display stuff. That should be moved out. I jammed it in there yesterday to make the display work. We’ll move it in a moment.

Then we set up some globals, dy, and tab. That’s necessary if the display is to work. As Dave wrote it, the program relies on those globals in the draw code, so we need to preserve them. Most of Dave’s code referred to the tab global, but I changed that yesterday, after pulling out the processTab function:

function processTab(str,b)
    local tab = {}
    local extra = "                    "
    table.insert(tab,"\n")
    table.insert(tab,"\n")
    table.insert(tab,extra..project.." : "..b)
...

    end
    return tab
end

Now that code doesn’t mess with the globals, which I think is a good thing, but I didn’t have to change it to refer to a new variable, I just gave it a nice local one.

So. Yesterday we got the program barely under test, using the notion of a golden copy of the desired output, which we compare, string for string, with the current output. So if we break something as we work, we’ll get an error.

Let’s break something on purpose, to see it work. I think I’ll break the comment code:

        sc1,ec1=string.find(str1,"%-%-%[%[")  
        if sc1 ~= nil then
            startComment=true
        end            
        --sc2,ec21=string.find(str1,"%-%-%]%]")  
        sc2,ec21=string.find(str1,"%]%]")  
        if sc2 ~= nil then
            startComment=false
        end  

This code turns off parsing inside –[[–]] brackets. Suppose we break that:

        sc1,ec1=string.find(str1,"%-%-%-%[%[")  

I added an extra %-. Run the tests, get long error message:

1: Sample  -- Actual: 



                    Making : Sample
                            function sample(a,b,c)
                        Barker = class()
                            function Barker:init(dog, name)
                            function Barker:speak()
                            function Barker:woof()
                        Porker = class()
                            function Porker:init(piggie, piggieName)
                            function shouldIgnore()
                            function nothing(value), Expected: 



                    Making : Sample
                            function sample(a,b,c)
                        Barker = class()
                            function Barker:init(dog, name)
                            function Barker:speak()
                            function Barker:woof()
                        Porker = class()
                            function Porker:init(piggie, piggieName)
                            function nothing(value)

We quickly(?) see the function shouldIgnore() in the Actual. We look at the Sample file and see:

--[[
function shouldIgnore()
end
--]]

Pretty good clue what we’ve broken. Not great, but good.

Now that we have this bug, let’s see if we can write a simpler test, to make the defect easier to see.

With some difficulty, I come up with this test:

        _:test("multi-line comments ignored", function()
            local str = "--[[\nfunction foo()\n--]]"
            local result = processTab(str, "Multi")
            _:expect(#result).is(3)
            _:expect(result[1]).is("\n")
            _:expect(result[2]).is("\n")
            _:expect(result[3]).is("                    Making : Multi")
        end)

The basic idea was simple: a string including multi-line comment, with something inside that we’d normally parse but now should skip. But the output? What should it be?

I just printed what cam out in result and then edited the test to expect everything but the line with foo in it. That required me to insert the “\n” in the first two (empty) lines, and to copy umpteen blanks into the final line.

But now the test runs after I fix the defect. Neat. And we learn a few things. The code we pulled out creates the header for the report. That’s just going to get in our way. So we’d like to factor that out from our smaller tests.

Second, we’re going to need an easier way to deal with these newlines and such.

Aside: why are there newlines anyway? He’s displaying these lines on the screen, one at a time. Do we even need them? Well, we kind of like them in our golden output, so maybe we’ll continue to deal with them for now.

Let’s refactor, pulling out that header bit. The function starts off like this:

function processTab(str,b)
    local tab = {}
    local extra = "                    "
    table.insert(tab,"\n")
    table.insert(tab,"\n")
    table.insert(tab,extra..project.." : "..b)
    b1=1
    s=1

And goes on for decades. Keep calm, we’re working on it. Let’s see, what’s a good thing to do here?

Let’s try the Composed Function pattern. That pattern says that when a function does more than one “thing”, we can pull the things out into separate functions and call them one after another in a replacement for the messy function.

First let’s Extract Function for that first header bit:

function processTabHeader(tab,extra,b)
    table.insert(tab,"\n")
    table.insert(tab,"\n")
    table.insert(tab,extra..project.." : "..b)
end

function processTab(str,b)
    local tab = {}
    local extra = "                    "
    processTabHeader(tab,extra,b)
    b1=1

I expect this to work. And the test runs. I also see the prospect for a better name than “b”. That’s the tab name.

function processTabHeader(tab,extra,tabName)
    table.insert(tab,"\n")
    table.insert(tab,"\n")
    table.insert(tab,extra..project.." : "..tabName)
end

I also think we should be passing in the project (name) as a parameter, but we’re on a mission toward Composed Function here. Can we rename the “b” in the larger function? Is it even there? Let’s not worry, let’s just extract.

function processTab(str,b)
    local tab = {}
    local extra = "                    "
    processTabHeader(tab,extra,b)
    processTabContents(str, tab, extra)
    return tab
end

function processTabContents(str, tab, extra)
    b1=1
    s=1
    comment=false
    startComment=false
    while true do
        s,e=string.find(str,"\n",s)
        if s==nil then
            break
        end
        s=e+1                        
        str1=string.sub(str,b1,e)   
        
        sc1,ec1=string.find(str1,"%-%-%[%[")  
        if sc1 ~= nil then
            startComment=true
        end            
        --sc2,ec21=string.find(str1,"%-%-%]%]")  
        sc2,ec21=string.find(str1,"%]%]")  
        if sc2 ~= nil then
            startComment=false
        end  
        
        comment=false
        sc3,ec3=string.find(str1,"%-%-")            
        if sc3 ~= nil then
            comment=true
            sc4,ec4=string.find(str1,"class")
            if sc4~=nil and sc4<sc3 then
                comment=false
            end
            sc5,ec5=string.find(str1,"function")
            if sc5~=nil and sc5<sc3 then
                comment=false
            end
        end  
        
        process=true
        if comment or startComment then
            process=false
        end 
        
        if process then
            s1,e1=string.find(str1,"class")
            if s1~=nil then
                s2,e2=string.find(str1,")")
                if s2~= nil then
                    str1=removeSpaces(string.sub(str1,1,e2))                            
                    table.insert(tab,extra.."    "..str1)
                    b1=e2+1
                end
            else
                s2,e2=string.find(str1,"function ")
                if s2~=nil then
                    s3,e3=string.find(str1,")")
                    if s3~= nil then
                        table.insert(tab,extra.."        "..string.sub(str1,s2,e3))
                        b1=e3+1
                    end
                end
            end
            if s==nil then
                break
            end
        end
        if e~=nil then
            b1=e+1
        end
    end
end

Test runs. Extract worked. Note that I had to pass in “str”, the tab contents. Again we become aware of better names that we might want to use. With a stronger replace function, I would be more inclined to do it, but I don’t quite trust Codea’s.

Now, we can simplify our new test. From:

        _:test("multi-line comments ignored", function()
            local str = "--[[\nfunction foo()\n--]]"
            local result = processTab(str, "Multi")
            _:expect(#result).is(3)
            _:expect(result[1]).is("\n")
            _:expect(result[2]).is("\n")
            _:expect(result[3]).is("                    Making : Multi")
        end)

To this:

        _:test("multi-line comments ignored", function()
            local str = "--[[\nfunction foo()\n--]]"
            local tab = {}
            processTabContents(str, tab, "useless")
            _:expect(#tab).is(0)
        end)

Now we just expect the processTabContents to produce nothing. If we put the bad code back in, we get this:

2: multi-line comments ignored  -- Actual: 1, Expected: 0

Nice. Tells us exactly what’s wrong.

We should commit this: improving tests and refactoring out processTabContents.

We don’t really love having to pass that table in and then fill it. It’s obvious in the test that it’s strange. We init it, call the function, and then check the function’s side effect. Meh.

It would be better to have the function return the table that represents its processing of the tab. We’d have to append the results back into the final report table. Oddly, Lua doesn’t conveniently provide that capability. The reason is that sometimes tables are arrays and sometimes they aren’t, so you have to know what you’re dealing with before trying to mush tables together.

Let’s refactor so that our processTabContents builds and returns its own table.

function processTabContents(str, extra)
    local tab = {}
    b1=1
    s=1
...
    end
    return tab
end

And now we need to change the caller:

function processTab(str,b)
    local tab = {}
    local extra = "                    "
    processTabHeader(tab,extra,b)
    local tabReport = processTabContents(str, extra)
    for i,l in ipairs(tabReport) do
        table.insert(tab,l)
    end
    return tab
end

We should really change the header to work locally as well. Let’s do: our tests are solid enough for this.

function processTabHeader(extra,tabName)
    local tab = {}
    table.insert(tab,"\n")
    table.insert(tab,"\n")
    table.insert(tab,extra..project.." : "..tabName)
    return tab
end

And the caller:

function processTab(str,b)
    local extra = "                    "
    local tab = processTabHeader(tab,extra,b)
    local tabReport = processTabContents(str, extra)
    for i,l in ipairs(tabReport) do
        table.insert(tab,l)
    end
    return tab
end

Test finds the bug: I forgot to remove tab from the call to processTabHeader.

function processTab(str,b)
    local extra = "                    "
    local tab = processTabHeader(extra,b)
    local tabReport = processTabContents(str, extra)
    for i,l in ipairs(tabReport) do
        table.insert(tab,l)
    end
    return tab
end

Tests run. Commit: convert to use local tables.

We have some duplication:

    for i,l in ipairs(tabReport) do
        table.insert(tab,l)
    end

And:

        for i,line in ipairs(tabData) do
            table.insert(tab,line)
        end

The latter one is in the main report method that is used to produce real output. But it’s the same thing, innit.

I have a feeling this will be more commonly useful, although we may find it only temporarily so. We want a function that appends a table to another table. Should we do it in place, or pass in a table to modify? The latter will save memory as the table grows. But maybe it’s better. We’ll try it that way.

function appendTable(result, newData)
    for i,line in ipairs(newData) do
        table.insert(result,line)
    end
end

And use it here:

function report()
    project="D2"
    font("Courier")
    textMode(CORNER)
    dy=0
    tab={}   
    r=listProjectTabs(project)
    local tabData
    for a,b in pairs(r) do
        str=readProjectTab(project..":"..b)
        tabData = processTab(str,b)
        appendTable(tab,tabData)
    end
    table.insert(tab,"\n")
    table.insert(tab,"\n")
    table.insert(tab,"END of "..project)
end

And here:

function processTab(str,b)
    local extra = "                    "
    local tab = processTabHeader(extra,b)
    local tabReport = processTabContents(str, extra)
    appendTable(tab,tabReport)
    return tab
end

And the tests run. Commit: Add appendTable utility function.

Let’s do some more tidying. What looks interesting? Let me expose you to the whole processTabContents, which should tell you why I’m interested in more simplification:

function processTabContents(str, extra)
    local tab = {}
    b1=1
    s=1
    comment=false
    startComment=false
    while true do
        s,e=string.find(str,"\n",s)
        if s==nil then
            break
        end
        s=e+1                        
        str1=string.sub(str,b1,e)   
        
        sc1,ec1=string.find(str1,"%-%-%[%[")  
        if sc1 ~= nil then
            startComment=true
        end            
        --sc2,ec21=string.find(str1,"%-%-%]%]")  
        sc2,ec21=string.find(str1,"%]%]")  
        if sc2 ~= nil then
            startComment=false
        end  
        
        comment=false
        sc3,ec3=string.find(str1,"%-%-")            
        if sc3 ~= nil then
            comment=true
            sc4,ec4=string.find(str1,"class")
            if sc4~=nil and sc4<sc3 then
                comment=false
            end
            sc5,ec5=string.find(str1,"function")
            if sc5~=nil and sc5<sc3 then
                comment=false
            end
        end  
        
        process=true
        if comment or startComment then
            process=false
        end 
        
        if process then
            s1,e1=string.find(str1,"class")
            if s1~=nil then
                s2,e2=string.find(str1,")")
                if s2~= nil then
                    str1=removeSpaces(string.sub(str1,1,e2))                            
                    table.insert(tab,extra.."    "..str1)
                    b1=e2+1
                end
            else
                s2,e2=string.find(str1,"function ")
                if s2~=nil then
                    s3,e3=string.find(str1,")")
                    if s3~= nil then
                        table.insert(tab,extra.."        "..string.sub(str1,s2,e3))
                        b1=e3+1
                    end
                end
            end
            if s==nil then
                break
            end
        end
        if e~=nil then
            b1=e+1
        end
    end
    return tab
end

That’s a lot to take in, which is kind of my point. And the variables … there’s a pattern here, I think, mostly about b and e meaning beginning and end, and the numbers having to do with the next one we want to think about.

I am tempted to rename some of them. And I am temped to do some extractions. Our golden test provides a lot of confidence. If we do encounter problems with it, we can enhance the “Sample” tab and add cases to it. And, of course, we’ll write simpler tests when we think of them and when we extract things of interest.

That bit at the top:

    while true do
        s,e=string.find(str,"\n",s)
        if s==nil then
            break
        end
        s=e+1                        
        str1=string.sub(str,b1,e)   

What that’s doing, I believe, is checking to see whether the input string, str has a newline in it. If it does not, it breaks the loop. If it does, we pull off str1, which is the substring from b1–whatever that is–up to and including the newline.

Way down at the bottom of the loop, we find:

        if e~=nil then
            b1=e+1
        end

That’ll set b1 to the character after the newline we scanned at the top.

This code is looping over lines of the input string, one at a time.

I think I know another way to do that: gmatch. Let’s write a test to see whether that does what we need:

        _:test("gmatch", function()
            local str = [[line 1
            line 2
            line 3
            ]]
            local tab = {}
            for l in str:gmatch("[^\r\n]+") do
                table.insert(tab,l)
            end
            _:expect(tab[1]).is("line 1")
        end)

I’m glad I wrote that test because my first match pattern didn’t work. This one does.

Extend the test for more lines:

        _:test("gmatch", function()
            local str = [[line 1
line 2
line 3
]]
            local tab = {}
            for l in str:gmatch("[^\r\n]+") do
                table.insert(tab,l)
            end
            _:expect(tab[1]).is("line 1")
            _:expect(tab[2]).is("line 2")
            _:expect(tab[3]).is("line 3")
        end)
    end)

Note that I had to tab the inner lines over. Lua doesn’t trim leading blanks from multi-line comments. I guess that’s fair.

Now I think we can use our gmatch in the program:

function processTabContents(str, extra)
    local tab = {}
    b1=1
    s=1
    comment=false
    startComment=false
    for str1 in str:gmatch("[^\r\n]+") do
        
        sc1,ec1=string.find(str1,"%-%-%[%[")  
...
            if s==nil then
                break
            end
        end
    end
    return tab
end

The tests run. You’ll note that I removed that bit about b at the bottom of the loop. But what’s that s==nil? That’s a check, redundant, I think, for being done. Anyway, we’re not using s any more so we can delete that and the init for it.

Tests work. Commit: use gmatch instead of looping substringing the input string.

What else? Our work here may never be done, but I’d like to get closer to something I can take in with a few glances.

This code here:

        sc1,ec1=string.find(str1,"%-%-%[%[")  
        if sc1 ~= nil then
            startComment=true
        end            
        --sc2,ec21=string.find(str1,"%-%-%]%]")  
        sc2,ec21=string.find(str1,"%]%]")  
        if sc2 ~= nil then
            startComment=false
        end  

That sets startComment to true if we’re on a line containing “–[[”, and to false if we’re on a line containing “]]”. This may be moderately dangerous: I just used ]] to end a string rather than a comment. But we’re not improving this program, we are refactoring it. Let’s Extract Function here commentState.

I try this:

function processTabContents(str, extra)
    local tab = {}
    b1=1
    comment=false
    startComment=false
    for str1 in str:gmatch("[^\r\n]+") do
        
        startComment = commentState(str1)
        
...

function commentState(str1)
    local startComment
    sc1,ec1=string.find(str1,"%-%-%[%[")  
    if sc1 ~= nil then
        startComment=true
    end            
    --sc2,ec21=string.find(str1,"%-%-%]%]")  
    sc2,ec21=string.find(str1,"%]%]")  
    if sc2 ~= nil then
        startComment=false
    end
    return startComment
end

But my tests fail, including this one:

2: multi-line comments ignored -- Report:100: attempt to concatenate a table value (local 'extra')

Better check that test, it apparently wants another parm.

Oh: I forgot to convert it. It has been passing spuriously.

        _:test("multi-line comments ignored", function()
            local str = "--[[\nfunction foo()\n--]]"
            local tab = processTabContents(str, "useless")
            _:expect(#tab).is(0)
        end)

It still fails, but now with a message that I prefer:

2: multi-line comments ignored  -- Actual: 1, Expected: 0

So this simple change didn’t work. Normally I’d revert, and I still might, but let’s first see about testing the commentState method. I’ll remove the call from the reporter first, so the other tests run.

Ah. I see the bug. commentState has long endurance. The code as written can set it true, or false … or leave it alone.

My function needs to take the state as an input.

function commentState(startComment, str1)
    -- may change or just return input
    sc1,ec1=string.find(str1,"%-%-%[%[")  
    if sc1 ~= nil then
        startComment=true
    end            
    --sc2,ec21=string.find(str1,"%-%-%]%]")  
    sc2,ec21=string.find(str1,"%]%]")  
    if sc2 ~= nil then
        startComment=false
    end
    return startComment
end

Now I can use it. And I finish the test I was writing …

        _:test("commentState", function()
            local cs = false
            cs = commentState(cs, "foo--[[bar")
            _:expect(cs).is(true)
            cs = commentState(cs, "mumble")
            _:expect(cs).is(true)
            cs = commentState(cs, "]] baz")
            _:expect(cs).is(false)
        end)

All the tests pass. I hate the gmatch one because of the multi-line string being so ugly. Change it:

        _:test("gmatch", function()
            local str = "line 1\nline 2\nline 3\n"
            local tab = {}
            for l in str:gmatch("[^\r\n]+") do
                table.insert(tab,l)
            end
            _:expect(#tab).is(3)
            _:expect(tab[1]).is("line 1")
            _:expect(tab[2]).is("line 2")
            _:expect(tab[3]).is("line 3")
        end)

Tests run. Commit: Extract commentState function.

Let’s pause and reflect.

Reflection

We’ve been whipping along fairly rapidly, slowly pecking away at this big long function, extracting meaningful bits. We could keep doing that. There’s plenty of grist for our mill, whatever that means.

This:

        comment=false
        sc3,ec3=string.find(str1,"%-%-")            
        if sc3 ~= nil then
            comment=true
            sc4,ec4=string.find(str1,"class")
            if sc4~=nil and sc4<sc3 then
                comment=false
            end
            sc5,ec5=string.find(str1,"function")
            if sc5~=nil and sc5<sc3 then
                comment=false
            end
        end  

That’s just asking for similar treatment to our commentState function: it’s checking for single-line comments (coming before keywords class and function).

Even this:

        process=true
        if comment or startComment then
            process=false
        end 

There must be a one-liner waiting in there somewhere.

Taking a more broad look at the function, what can I spot?

Well, first I see that we are nearly done. Take a look:

function processTabContents(str, extra)
    local tab = {}
    b1=1
    comment=false
    startComment=false
    for str1 in str:gmatch("[^\r\n]+") do
        
        startComment = commentState(startComment, str1)
        
        comment=false
        sc3,ec3=string.find(str1,"%-%-")            
        if sc3 ~= nil then
            comment=true
            sc4,ec4=string.find(str1,"class")
            if sc4~=nil and sc4<sc3 then
                comment=false
            end
            sc5,ec5=string.find(str1,"function")
            if sc5~=nil and sc5<sc3 then
                comment=false
            end
        end  
        
        process=true
        if comment or startComment then
            process=false
        end 
        
        if process then
            s1,e1=string.find(str1,"class")
            if s1~=nil then
                s2,e2=string.find(str1,")")
                if s2~= nil then
                    str1=removeSpaces(string.sub(str1,1,e2))                            
                    table.insert(tab,extra.."    "..str1)
                    b1=e2+1
                end
            else
                s2,e2=string.find(str1,"function ")
                if s2~=nil then
                    s3,e3=string.find(str1,")")
                    if s3~= nil then
                        table.insert(tab,extra.."        "..string.sub(str1,s2,e3))
                        b1=e3+1
                    end
                end
            end
        end
    end
    return tab
end

There’s that comment blob, then the little process flag setting thing, then one use of the flag, and the code that actually looks for our keywords and produces output.

A few extracts from now, this thing will be down to a few lines.

There is another matter, however. Every one of those little s and e and b variables is global. In a one-shot program, maybe that’s OK, though as an old hand with globals, it makes me nervous. But if this thing is going to grow up to be a tool, perhaps even a tool that we can build into our app, it can’t be firing random globals out there. So we’ll need to bring that under control as well.

In general, it appears that the little globals are mostly created and used immediately, and then not looked at again, There may be exceptions, and inspection is pretty much the only way to deal with it.

Or is it?

Let’s try something. I feel sure we can do this:

function commentState(startComment, str1)
    -- may change or just return input
    local sc1,ec1,sc2,ec21
    sc1,ec1=string.find(str1,"%-%-%[%[")  
    if sc1 ~= nil then
        startComment=true
    end            
    --sc2,ec21=string.find(str1,"%-%-%]%]")  
    sc2,ec21=string.find(str1,"%]%]")  
    if sc2 ~= nil then
        startComment=false
    end
    return startComment
end

I just declared all those guys local and ran the tests. Works. Commit: made some globals local in commentState.

Are you getting the sense of how much these simple tests are helping me? I can just do a change, run the tests and if they run, I’m confident that things worked. This is why we like these tests, the micro ones and the large golden one.

I think that one can make blocks with do...end not just to enclose loops, but any time you want to. So we could use them to declare locals and have them go out of scope later.

Consider this:

        comment=false
        sc3,ec3=string.find(str1,"%-%-")            
        if sc3 ~= nil then
            comment=true
            sc4,ec4=string.find(str1,"class")
            if sc4~=nil and sc4<sc3 then
                comment=false
            end
            sc5,ec5=string.find(str1,"function")
            if sc5~=nil and sc5<sc3 then
                comment=false
            end
        end  

Are those variables all logically local? Let’s do this to find out.

        do
            local sc3,ec3,sc4,ec4,sc5,ec5
            comment=false
            sc3,ec3=string.find(str1,"%-%-")            
            if sc3 ~= nil then
                comment=true
                sc4,ec4=string.find(str1,"class")
                if sc4~=nil and sc4<sc3 then
                    comment=false
                end
                sc5,ec5=string.find(str1,"function")
                if sc5~=nil and sc5<sc3 then
                    comment=false
                end
            end  
        end

This will make those six variables local inside the block and make them go out of scope afterward. If someone is using those values, something is sure to break.

The tests don’t break. Whee, we just eliminated six globals without even extracting anything.

Let’s look at this:

        process=true
        if comment or startComment then
            process=false
        end 
        
        if process then
...

Looks to me like we want to process unless comment or startComment is true. Lua doesn’t have unless but we can do this:

        local process = not comment and not startComment
        
        if process then

Tests run. Let’s inline the local variable:

        if not comment and not startComment then
            s1,e1=string.find(str1,"class")
            if s1~=nil then
                s2,e2=string.find(str1,")")
                if s2~= nil then
                    str1=removeSpaces(string.sub(str1,1,e2))                            
                    table.insert(tab,extra.."    "..str1)
                    b1=e2+1
                end
            else
                s2,e2=string.find(str1,"function ")
                if s2~=nil then
                    s3,e3=string.find(str1,")")
                    if s3~= nil then
                        table.insert(tab,extra.."        "..string.sub(str1,s2,e3))
                        b1=e3+1
                    end
                end
            end
        end

How about localizing those variables?

        if not comment and not startComment then
            local s1,e1,s2,e2, s3,e3
            s1,e1=string.find(str1,"class")
            if s1~=nil then
                s2,e2=string.find(str1,")")
                if s2~= nil then
                    str1=removeSpaces(string.sub(str1,1,e2))                            
                    table.insert(tab,extra.."    "..str1)
                    b1=e2+1
                end
            else
                s2,e2=string.find(str1,"function ")
                if s2~=nil then
                    s3,e3=string.find(str1,")")
                    if s3~= nil then
                        table.insert(tab,extra.."        "..string.sub(str1,s2,e3))
                        b1=e3+1
                    end
                end
            end
        end
    end

What’s up with that b1 stuff? Didn’t we get rid of it without our tests failing? It was used here:

        if e~=nil then
            b1=e+1
        end

And there were other places where b1 was incremented. However, I can find no place where it is referenced.

Am I committed? If so, I’m going to revert back to the very beginning and check, because I’m curious.

I needed to commit: removed more globals in process area.

The checkout showed me that the only use of b1 was in the code that selected the next line to parse. My new gmatch approach breaks the input on newlines. Could the b1 code have done something different?

Yes, I think it could. The code there in process sets b1 to the first character after the right paren following the “class()” or “function foo(a,b,c)” . Could there be something following those that is important enough not to skip?

I can think of one thing:

function foo(bar) --[[ 
    local function mumble()
        doMumbleStuff()
    end
--]]
   doMoreStuff()

If some hideous demon from hell were to start a multi-line comment on the end of a line that is matched, we might in principle process some lines we shouldn’t.

It could happen. We can generate a test to show the case, but I am making the command decision to lose that “feature”. But let me update Sample:

I add this to the Sample tab:

function bizarreComment() --[[
    local function shouldNotBeFound()
    end
    --]]
end

Run the tests expecting a fail on the golden, and expecting the output to include “shouldNotBeFound”.

The test passes! Which means it did not find the bizarreComment function either. I think it should have found that. I’l remove the bizarre bit and try again.

Right, it fails now, showing the new function correctly reported:

1: Sample  -- Actual: 



                    Making : Sample
                            function sample(a,b,c)
                        Barker = class()
                            function Barker:init(dog, name)
                            function Barker:speak()
                            function Barker:woof()
                        Porker = class()
                            function Porker:init(piggie, piggieName)
                            function nothing(value)
                            function bizarreComment(), Expected: 



                    Making : Sample
                            function sample(a,b,c)
                        Barker = class()
                            function Barker:init(dog, name)
                            function Barker:speak()
                            function Barker:woof()
                        Porker = class()
                            function Porker:init(piggie, piggieName)
                            function nothing(value)

Save that as golden. Tests now run again, bizarre was found. Now what’s up with the comment? First try this:

function bizarreComment() -- trailing comment
end

That works OK.

Now how about the other kind of comment?

function bizarreComment() --[[ trailing comment --]]
end

That still passes.

function bizarreComment() --[[ 
    trailing comment --]]
end

Curiously, this causes the program not to find the function bizarreComment.

This tells me that my commentState isn’t working right. Unless, of course, the original program wouldn’t have handled this properly either. Let’s find out.

I’ll have to revert and reproduce this test, but that’s fine.

Good news. When I revert to the original testable version, before my changes, it, too, does not see that function. We have not introduced a new defect.

I think I’ll put in the bizarre case to document how it works.

function bizarreCommentNotFound() --[[ comments aren't that clever.
    trailing comment --]]
end

OK, good. I’m relieved, but not too surprised. I was quite sure that the changes we made couldn’t have broken anything, but if the b1 code had worked as it may have been intended, we’d have had an interesting difference.

Let’s remove b1.

Tests still run. Commit: add bizarreComment sample and remove use of b1 variable.

It’s nearly lunch time. I’ve been having so much fun I lost track of time. Shall we do one more? I say yes. Let’s do the other comment stuff:

        do
            local sc3,ec3,sc4,ec4,sc5,ec5
            comment=false
            sc3,ec3=string.find(str1,"%-%-")            
            if sc3 ~= nil then
                comment=true
                sc4,ec4=string.find(str1,"class")
                if sc4~=nil and sc4<sc3 then
                    comment=false
                end
                sc5,ec5=string.find(str1,"function")
                if sc5~=nil and sc5<sc3 then
                    comment=false
                end
            end  
        end

Does this one have the same weird persistence of the other one? Honestly I don’t think it does. What it does do is notice a – comment after “class” or “function” and not treat that line as a comment.

Are you wondering why that didn’t deal with the –[[? That’s because there’s that other flag, and it doesn’t check for being past “class” or “function”. Odd but true.

Anyway, we can extract this:

function oneLineCommentState(str1)
    local sc3,ec3,sc4,ec4,sc5,ec5
    local comment=false
    sc3,ec3=string.find(str1,"%-%-")            
    if sc3 ~= nil then
        comment=true
        sc4,ec4=string.find(str1,"class")
        if sc4~=nil and sc4<sc3 then
            comment=false
        end
        sc5,ec5=string.find(str1,"function")
        if sc5~=nil and sc5<sc3 then
            comment=false
        end
    end
    return comment
end

And use it:

function processTabContents(str, extra)
    local tab = {}
    comment=false
    startComment=false
    for str1 in str:gmatch("[^\r\n]+") do
        
        startComment = commentState(startComment, str1)
        comment = oneLineCommentState(str1)
        
        if not comment and not startComment then

And test. Test runs. Commit: inline oneLineCommentState.

Nice.

Our processTabContents function now fits on one screen of my iPad:

function processTabContents(str, extra)
    local tab = {}
    comment=false
    startComment=false
    for str1 in str:gmatch("[^\r\n]+") do
        
        startComment = commentState(startComment, str1)
        comment = oneLineCommentState(str1)
        
        if not comment and not startComment then
            local s1,e1,s2,e2, s3,e3
            s1,e1=string.find(str1,"class")
            if s1~=nil then
                s2,e2=string.find(str1,")")
                if s2~= nil then
                    str1=removeSpaces(string.sub(str1,1,e2))                            
                    table.insert(tab,extra.."    "..str1)
                end
            else
                s2,e2=string.find(str1,"function ")
                if s2~=nil then
                    s3,e3=string.find(str1,")")
                    if s3~= nil then
                        table.insert(tab,extra.."        "..string.sub(str1,s2,e3))
                    end
                end
            end
        end
    end
    return tab
end

I call that a good thing. Extracting those last bits is just a tiny bit tricky, so we’ll do that another time, probably tomorrow. But you never know. Let’s sum up.

Summary

We’ve had a good look at what we can do when we have even just a few tests that we can rely on. And the “golden” test is particularly useful when dealing with a legacy situation, because it keeps an eye on the big picture of the total output of the program.

It can be–usually is–difficult interpreting the output of a test like that, because we have two immense strings that aren’t equal. So, often, we wind up writing smaller tests, and diffing the two strings, and so on. If we find ourselves having to do that often, however, we need to make sure we are plugging in our test in the right spot, and that we’re refactoring carefully enough.

We mostly proceed after that by extracting small bits that make sense together, and making them work. Often we need to patch the bits back together, as we did when we decided to return a table rather than edit a global, or an outer table. (That change changed performance, as well, and it’s something we’d want to be aware of in some situations.)

We were plagued by many globals, and there may still be some with us. We were able to localize some inside our functions, and I was happy to try the do-end block, that let me localize some variables without doing an extract.

But the main thing to notice is that once we’ve done the work to get our first golden test in place, we can make changes freely and be confident that we haven’t broken anything.

Note, however, that my work today could have broken something, thatb1 stuff. Had the original program been using b1 as may have been intended, my change, which resulted in b1 not being used, could have changed the results. And–my Sample tab did not contain an example that would have detected that defect.

So I got a bit lucky there. On the other hand, when I went to remove the b1, I saw the issue by checking history, and was able to verify whether or not we had made a change to the program behavior. If we had, I think I’d have accepted it and said “Don’t do that” to anyone who started a block comment on the end of a function’s first line. Sheesh, some people’s kids.

All in all, a good demo of how we can bring a program under our control, and how we can move in small steps to make code that we don’t like, more lovable.

See you next time!


Making.zip