Maker App: Codea Project Stats

In the interest of variety, let’s do some work for the makers here. Let’s see how we can collect some interesting statistics from Codea programs.

It’s not like I’m getting paid for this stuff. I even turned off my Ko-Fi, it was too depressing. So when an idea comes to me, no reason not to follow it. Today’s idea is to create a Codea App that provides statistics on any Codea project.

Possible statistics that might be interesting include the number of:

tabs
classes
methods (per class?)
non-method functions
test suites
tests
test “expect” calls

As with all things, I plan to build this up incrementally. I do have a general idea about how it should work. We’ll fetch all the tabs from the app, read each tab, apply regular expressions to identify things we want to count, and count them.

Codea can read any tab of any project in your space. The contents of the tab are returned as a single string.

Codea has regular expressions in strings. Functions include find, match, gmatch, and gsub. These all return captures if the pattern includes any.

My general plan is that we’ll have a collection of patterns that we look for, and that we’ll use captures from those patterns to pull out any interesting information if we need it. I have in mind that we’ll be wise to define some objects that get created along the way, but just what they’ll look like, I’m not sure … maybe there’ll be a class for each type of match, and they’ll all contain the captures, and all have some standard method like “doIt” or “process” that tells them to do their thing.

We’ll discover that as we go. Or we’ll discover something else. One never knows, do one? Let’s get started.

Getting Started

Naturally I plan to test-drive this baby, and equally naturally, I plan to point it at the D2 Dung project. In the fullness of time, we’ll give it a small user interface that let you type in the project you want, or select it somehow, and then say “go” or “carry on” or “let’s do it”, and, well, do it. We’ll start without the interface, because I think it’s boring.

We’re not really parsing the program, so there are things we won’t be able to do, and a sufficiently nasty formatting might confuse the program. The rules of the game are: “Don’t Do That”.

I think we’ll start by counting classes. Why? I don’t know, just seems like about the right one to do. I start with the standard first test:

-- RJ 20211021
-- Project Statistics Tests

function testProjectStatistics()
    CodeaUnit.detailed = true

    _:describe("Project Statistics", function()

        _:before(function()
        end)

        _:after(function()
        end)

        _:test("HOOKUP", function()
            _:expect("Foo", "testing fooness").is("Bar")
        end)

    end)
end

Fails. Fix it:

        _:test("HOOKUP", function()
            _:expect("Bar", "testing fooness").is("Bar")
        end)

I was briefly tempted to divert to getting stats on tests rather than on classes. I’m torn. I have a test, so it is relatively easy to get test data. But the answers will be changing all the time. Better not. We’ll create a separate tab that we’ll use for test data.

That tab will be full of weird pattern testing stuff and may not compile. If that becomes an issue, we’ll put interesting things inside comments. I think I’ve done that in the past for something I was writing that did some of these same things.

Creating a new class gives me a nice sample class:

-- Sample Code
-- RJ 20211021

SampleCode = class()

function SampleCode:init(x)
    -- you can accept and set parameters here
    self.x = x
end

function SampleCode:draw()
    -- Codea does not automatically call this method
end

function SampleCode:touched(touch)
    -- Codea does not automatically call this method
end

Nice. Let’s write a test. I don’t know what any of these functions return so we’ll be doing a lot of discovery tests like this:

        _:test("Read SampleCode tab from this project", function()
            list = listProjectTabs()
            _:expect(list[1]).is("foo")
        end)

I expect that to fail with interesting information.

2: Read SampleCode tab from this project  -- Actual: Main, Expected: foo

I was hoping the tab name would include the name of the current project, because I wanted to include that in the tab reading from the beginning. But I know the name of the project, it is “PSprojectStats”. Let’s try that as the key to reading our test tab.

Here’s the test I wrote. I wasn’t sure about the 13.

        _:test("Read SampleCode tab from this project", function()
            tab = readProjectTab("PSprojectStats:SampleCode")
            pattern = "--%s(SampleCode)"
            start,stop,capture = tab:find(pattern)
            _:expect(start).is(1)
            _:expect(stop).is(13)
            _:expect(capture).is("SampleCode")
        end)

However, the test results surprised me:

Read SampleCode tab from this project  -- Actual: 31, Expected: 1
Read SampleCode tab from this project  -- Actual: 41, Expected: 13
Read SampleCode tab from this project  -- OK

I really don’t understand why we got 31,41 at all. The difference, 10, is the length of “SampleCode”, so that could be the length of the capture. Of course at this point I don’t really know what comes back from a string.find and the Codea docs are of limited help. However, I do think that dash is probably a magic character and needs to be escaped. Let’s try that. Codea escapes with %.

        _:test("Read SampleCode tab from this project", function()
            tab = readProjectTab("PSprojectStats:SampleCode")
            pattern = "%-%-%s(SampleCode)"
            start,stop,capture = tab:find(pattern)
            _:expect(start).is(1)
            _:expect(stop).is(13)
            _:expect(capture).is("SampleCode")
        end)

New result, this code matches nothing. After longer than I care to admit, I notice that in the SampleCode tab, there is a space in the comment line at the top, “Sample Code”. Removing that gives me a passing test. We’re on the way.

Let’s find a class.

        _:test("Find a class", function()
            local pattern = "%s*(%S*)%s*=%s*class%(%)"
            start,stop,capture = tab:find(pattern)
            _:expect(capture).is("SampleCode")
        end)

That pattern, if I’m not mistaken, is spaces,non-spaces,spaces,equal,spaces,”class()”, where spaces means zero or more whitespace characters. That is a fairly decent pattern for any reasonable class definition. Oh, except it won’t recognize a class that inherits from another.

Let’s create one and search for it.

SubclassOfSample = class(SampleCode)

I realize that “not space” isn’t a really great pattern for names, so I convert the test to use %w+. I’ve moved the tab creation out to before. Let me show you the whole setup so far.

-- RJ 20211021
-- Project Statistics Tests

local tab

function testProjectStatistics()
    CodeaUnit.detailed = true

    _:describe("Project Statistics", function()

        _:before(function()
            tab = readProjectTab("PSprojectStats:SampleCode")
        end)

        _:after(function()
        end)

        _:test("HOOKUP", function()
            _:expect("Bar", "testing fooness").is("Bar")
        end)
        
        _:test("Read SampleCode tab from this project", function()
            local pattern = "%-%-%s*(SampleCode)"
            local start,stop,capture = tab:find(pattern)
            _:expect(start).is(1)
            _:expect(stop).is(13)
            _:expect(capture).is("SampleCode")
        end)
        
        _:test("Find a class", function()
            local pattern = "%s*(%w+)%s*=%s*class%(%)"
            start,stop,capture = tab:find(pattern)
            _:expect(capture).is("SampleCode")
        end)
        
        _:test("Find a subclass", function()
            local pattern = "%s*(%w+)%s*=%s*class%(%w+%)"
            start,stop,capture = tab:find(pattern)
            _:expect(capture).is("SubclassOfSample")
        end)
        
    end)
end

The tests run. And I note that I’ve created the first pattern that I would really like to use, there in that last test. I also note that it takes me approximately N tries to get a pattern right, for N larger than any currently tried integer. So we need an object to hold on to our patterns. It doesn’t need to be much more than a table, but you’ll recall that I’ve recently resolved not to use raw primitives so much. So let’s create a handy little class.

I’m not sure how I want this to work, but I’m pretty sure that I don’t want to create a method for every stored pattern. No, I do want that. I first wrote this:

PatternDictionary = class()

function PatternDictionary:init()
    local tab = {}
    self.tab = tab
    tab.classDefinition = "%s*(%w+)%s*=%s*class%(%w+%)"
end

function PatternDictionary:definitions()
    return self.tab
end

But no. We’ll make this a class-only object and by golly a method per stored pattern. Rewrite:

PatternDictionary = class()

function PatternDictionary:init()
end

function PatternDictionary:classDefinition()
    return "%s*(%w+)%s*=%s*class%(%w+%)"
end

Fix the test that uses it:

        _:before(function()
            pd = PatternDictionary()
            tab = readProjectTab("PSprojectStats:SampleCode")
        end)

        _:test("Find a subclass", function()
            local pattern = pd:classDefinition()
            start,stop,capture = tab:find(pattern)
            _:expect(capture).is("SubclassOfSample")
        end)

Tests all run. I think I’ll use the classDefinition in the first class test:

Ah. Before I even do this, I realize that I have an issue. My class finder requires a word inside the class() parens. We really want to match either. But as soon as I make the first class use this pattern, it will fail.

3: Find a class  -- Actual: SubclassOfSample, Expected: SampleCode

Then I fix the pattern:

function PatternDictionary:classDefinition()
    return "%s*(%w+)%s*=%s*class%(%w*%)"
end

The “%w+” is now “%w*”. And the second test fails:

4: Find a subclass  -- Actual: SampleCode, Expected: SubclassOfSample

Since the pattern matches either, we’ll have to be a bit more clever to test both cases with the given pattern. We can always provide specific strings. Let’s do that for the second test.

        _:test("Find a subclass", function()
            tab = " SubclassOfSample = class(Sample)"
            local pattern = pd:classDefinition()
            start,stop,capture = tab:find(pattern)
            _:expect(capture).is("SubclassOfSample")
        end)

OK, that’ll do.

I’ve been ripping along pretty fast here. Let’s slow down a moment and think about what we have so far.

One issue here is that the pattern stuff seems to return a more or less arbitrary number of items, start/stop and then all the captures it happens to have. There could be repeats in there. We have no reason to know how many captures there might be in a given pattern, much less how many matches.

It’s about time for an object. I think I’ll start by trying table:pack on the result of a pattern match. That turns out not to be ideal. This is better:

        _:test("Pack the return", function()
            tab = "<123,456,789>"
            local pattern = "<(%d*),(%d*),(%d*)>"
            local result = {tab:find(pattern)}
            _:expect(#result).is(5)
            _:expect(result[1]).is(1)
            _:expect(result[2]).is(13)
            _:expect(result[3]).is("123")
            _:expect(result[4]).is("456")
            _:expect(result[5]).is("789")
        end)

Enclosing the call in table brackets constructs the table from the arguments. table:pack adds a parameter ‘n’ which seems useless to me.

It also turns out that if we were to do our pattern find in an object constructor, it would pass all the arguments in but I think we area going to prefer the table format.

Heads Up For a Moment

Time to think a bit. I’m pretty sure we’re going to want some kind of smarts wrapped around the output of our find, and it may be time to start creating that object. In addition, I’d like to think about match and gmatch and compare them to find to see if they might be more useful, sometimes or always.

It turns out that match returns just the captures, unlike find, which returns the start and stop indices of the match as well as the captures.

The gmatch function returns an iterator, so you can get all the matches in a large string. That will save us from having to use the start/stop values to prime the string.

Let’s try that.

        _:test("Find all classes", function()
            local pattern = pd:classDefinition()
            local iter = tab:gmatch(pattern)
            local names = {}
            for k,v in iter do
                table.insert(names,k)
            end
            _:expect(names[1]).is("SampleCode")
            _:expect(names[2]).is("SubclassOfSample")
        end)

So that’s nice. The values that come back seem to be nil. I’m not sure if they are ever anything else. The docs don’t seem to say.

I found this description in the Lua Reference Manual

string.gmatch (s, pattern)

Returns an iterator function that, each time it is called, returns the next captures from pattern (see §6.4.1) over the string s. If pattern specifies no captures, then the whole match is produced in each call. As an example, the following loop will iterate over all the words from string s, printing one per line:

     s = "hello world from Lua"
     for w in string.gmatch(s, "%a+") do
       print(w)
     end

The next example collects all pairs key=value from the given string into a table:

     t = {}
     s = "from=world, to=Lua"
     for k, v in string.gmatch(s, "(%w+)=(%w+)") do
       t[k] = v
     end

If that works, which I’m about to test, what would a triple match return?

It is returning k,v = from,world, then to,Lua. I’m going to try a triple match.

Ah. Here’s what I wrote:

        _:test("Lua 5.3 Ref example", function()
            t = {}
            s = "from=world=ron, to=Lua=jeffries"
            for k, v,w in string.gmatch(s, "(%w+)=(%w+)=(%w+)") do
                print(k,v,w)
                t[k] = v
            end
        end)

When I just said k,v, I just saw the first two. This example would be better like this:

        _:test("Lua 5.3 Ref example", function()
            t = {}
            s = "from=world=ron, to=Lua=jeffries"
            for m1,m2,m3 in string.gmatch(s, "(%w+)=(%w+)=(%w+)") do
                print(m1,m2,m3)
            end
        end)

There’s nothing special about them that makes them properly named k,v as regards the match. Of course you might be thinking k and v in the pattern.

I complete the test like this:

        _:test("Lua 5.3 Ref example", function()
            t = {}
            s = "from=world=ron, to=Lua=jeffries"
            local names = {}
            for m1,m2,m3 in string.gmatch(s, "(%w+)=(%w+)=(%w+)") do
                table.insert(names, {m1,m2,m3})
            end
            _:expect(names[2][3]).is("jeffries")
        end)

So gmatch’s iterator returns all the matches from each success. Is there a way to get those as a table? I suspect not, unless we did something with a very long list m1..m66. If there is a way, it’s deeper in the bag of tricks than I can reach.

So, I do like gmatch, as we should be able to use it to collect all the information of interest from a given tab, in one call for each pattern.

A Plan Sort of Thing …

I imagine this program, in the fullness of time, producing a report sort of like this:

Project: D2

Tab GameRunner
    Class GameRunner
        init
        createLevel
        createMonsters
...

Tab Tests
    test "Test Suite One"
        13 tests
        27 expects
         1 throws

Probably what we want will change, and we’ll probably even want at least some options to turn things on and off. And I have no idea how to produce a report from Codea anyway: I suppose we’ll format a string and write it to a file, although Lua does have require files to bring in various system capabilities. Anyway, that’s for another day.

The report above makes me envision a class Tab that contains member variables like classes and tests. Those variables would be instances, or collections of other class instances like ‘Class or 'Method.

I think I want a new test for this part, not part of my simple pattern tests. I’ll put it in a new tab.

        _:test("Tabs class", function()
            local names = {"SampleCode"}
            local tabs = Tabs(names)
            _:expect(tabs[1]:tabName()).is("SampleCode")
        end)

What I have in mind here is a Tabs class that takes a table of names and returns a table of Tab instances, one for each tab in the names. I should probably be concerned about the collection being a pure table, but this is about all the complexity I can handle in one bite.

I decided while writing that the Tabs should be an object holding the collection, like this:

Tabs = class()

function Tabs:init(names)
    local tabs ={}
    for _i,name in ipairs(names) do
        table.insert(tabs, Tab(name))
    end
    self.tabs = tabs
end

function Tabs:at(index)
    return self.tabs[index]
end

So I recoded the test:

        _:test("Tabs class", function()
            local names = {"SampleCode"}
            local tabs = Tabs(names)
            _:expect(tabs:at(1):tabName()).is("SampleCode")
        end)

This is one of the reasons we do TDD. It causes us to use the classes we write, and as we write them and test them, there’s a feedback loop there that helps us make them more testable, and better objects as well.

Test should fail looking for Tab, I hope.

1: Tabs class -- TabTests:30: attempt to call a nil value (global 'Tab')

We’ll have a new class:

Tab = class()

function Tab:init(name)
    self.name = name
end

function Tab:tabName()
    return self.name
end

I expect the test to pass. Not surprised when it does. Now let’s see about a method on Tab, to get the number of classes in the Tab.

        _:test("Tabs class", function()
            local names = {"SampleCode"}
            local tabs = Tabs(names)
            local tab = tabs:at(1)
            _:expect(tab:tabName()).is("SampleCode")
            _:expect(tab:numberOfClasses()).is(2)
        end)

You’ll notice that I broke out the tab variable, as I’ll be talking to it a bit here.

Fail will be on numberOfClasses. It is. I could use “fake it till you make it” here, but I think I’ll just code this baby up.

function Tab:numberOfClasses()
    return #self:classes()
end

function Tab:classes()
    if not self.classes then
        self:getClasses()
    end
    return self.classes
end

I’m supposing a member variable classes that is lazy-initialized. Why lazy? Because I don’t want to do the analysis until it’s called for. Not much more difficult and seems less wasteful. This frill may bite me. We’ll see.

Of course we have no getClasses, and of course I’ve named a method and a member the same. After some renaming and coding:

function Tab:classes()
    if not self.classtable then
        self:getClassTable()
    end
    return self.classTable
end

function Tab:getClassTable()
    self.classTable = {}
end

That’s the wrong answer and the test agrees.

1: Tabs class  -- Actual: 0, Expected: 2

Let’s do our thing.

I have in the back of my mind that we need to deal with the project name here. I think we should make Tabs know the project name. I’ll revise the tests a bit.

Ah, but there’s an issue. I’ve been using the SampleCode tab in this very project. It’s time to break out a special SampleProject to test against, so that we don’t start analyzing this program before we’re ready. That would surely confuse me.

        _:test("Tabs class", function()
            local tabs = Tabs("SampleStatsProject")
            local tab = tabs:at(2)
            _:expect(tab:tabName()).is("SampleCode")
            _:expect(tab:numberOfClasses()).is(2)
        end)

I left the Main in SampleStatsProject, since all Codea programs will have one. So we look at tab 2.

We’re going to have an issue with reading the tab, but should be simple enough to deal with. Let’s see. Let’s pass the full name to the Tab object.

function Tabs:init(projectName)
    self.projectName = projectName
    local names = listProjectTabs(projectName)
    local tabs = {}
    for _i,name in ipairs(names) do
        table.insert(tabs, Tab(self.projectName..":"..name))
    end
    self.tabs = tabs
end

No, that’ll make the name test break.

1: Tabs class  -- Actual: SampleStatsProject:SampleCode, Expected: SampleCode

Let’s pass the project and tab name separately.

function Tab:init(project,name)
    self.project = project
    self.name = name
end

OK, we’re back to not having the right number of tabs, because getClassTable doesn’t do anything.

And I’m interrupted. I have a zoom session at noon.

Back later …

Much Later …

It is 0500 next day (Friday). I am awake because I messed up my back somehow at about 1 AM and am now sitting almost comfortably in my Aeron at the computer. I’ll try to write and code. Maybe it’ll take my mind off the owwie.

Where were we? I’m not sure. What do the tests say? Oh, and I really ought to get this thing into Working Copy.

1: Tabs class -- TabTests:61: attempt to call a nil value (method 'getTabs')

That’s nice. It’s always helpful to stop on a broken test: it gives us something to do when we come back. I’ve put it into WorkingCopy, committed with the test broken. Now let’s see what we’ve got here.

Our test:

        _:test("Tabs class", function()
            local tabs = Tabs("SampleStatsProject")
            local tab = tabs:at(2)
            _:expect(tab:tabName()).is("SampleCode")
            _:expect(tab:numberOfClasses()).is(2)
        end)

The error is here, where we’re about to get the tabs:

function Tab:getClassTable()
    local tabs = self:getTabs()
end

We “just” need to write that method.

function Tab:getTabs()
    return listProjectTabs(self.project)
end

That won’t give us the class table but let’s see what the test says.

1: Tabs class -- TabTests:50: attempt to get length of a nil value

The error is here. I’m beginning to remember what’s going on.

function Tab:numberOfClasses()
    return #self:classes()
end

We’re lazy-initing the class list. We need to get all the classes from all the tabs. We’ll need to do a gmatch to do that. I’m torn between going back to the other test suite and writing up a gmatch there, and just pushing forward. Let’s commit as a save point and push forward: save point test still broken.

Now we need to complete this method:

function Tab:getClassTable()
    local tabs = self:getTabs()
end

I get this far:

function Tab:getClassTable()
    local classes = {}
    local tabs = self:getTabs()
    for _i,tab in ipairs(tabs) do
        local tabClasses = self:getClassesInTab(tab)
        for 
    end
end

I don’t really know what’s going to come back from getClassesInTab, so I can’t really write the rest of that method. Let’s write a new test for that method. I’ll tie this function off so it’ll compile and we’ll work more directly on the InTab one.

        _:test("Classes in a Tab", function()
            local tabs = Tabs("SampleStatsProject")
            local tab = tabs:at(2)
            local tabClasses = Tabs:classesInTab("SampleCode")
            _:expect(#tabClasses).is(2)
        end)

I’m not sure what comes back but there are two class definitions in that tab, so 2 seems a good thing to check for.

I’m ignoring the other test for now and dealing with this one:

1: Classes in a Tab -- TabTests:19: attempt to call a nil value (method 'classesInTab')

Now we get to write that.

The test is wrong, should be:

        _:test("Classes in a Tab", function()
            local tabs = Tabs("SampleStatsProject")
            local tab = tabs:at(2)
            local tabClasses = tab:classesInTab("SampleCode")
            _:expect(#tabClasses).is(2)
        end)

That’s a method on tab, not on Tabs. Moving right along … I come up with this:

function Tab:classesInTab(tabName)
    local tabContents = readProjectTab(self.project..":"..tabName)
    local pattern = PatternDictionary():classDefinition()
    local matchResult = tabContents:gmatch(pattern)
    local classes = {}
    for name in matchResult do
        table.insert(classes,name)
    end
    return classes
end

I “know” there’s only one capture in the classDefinition pattern, so I only expect to get one thing back from each capture, the name. Let’s see what the test says.

The “Classes in a Tab” test runs, it did find two items. The other test didn’t, saying:

2: Tabs class -- TabTests:78: attempt to call a nil value (method 'getClassesInTab')

I named the method wrong. I think get is better, because it’s doing the work. Fix first test and method name.

New error:

2: Tabs class -- TabTests:88: attempt to get length of a nil value

Failing code is:

function Tab:numberOfClasses()
    return #self:classes()
end

As we should have expected, because the getClasses method is still incomplete:

function Tab:getClassTable()
    local classes = {}
    local tabs = self:getTabs()
    for _i,tab in ipairs(tabs) do
        local tabClasses = self:getClassesInTab(tab)
        --for 
    end
end

Now we know we have a collection of names and we just add them to the larger collection.

function Tab:getClassTable()
    local classes = {}
    local tabs = self:getTabs()
    for _i,tab in ipairs(tabs) do
        local tabClasses = self:getClassesInTab(tab)
        for _i, name in ipairs(tabClasses) do
            table.insert(classes,name)
        end
    end
    return classes
end

I could have stored the table into the member variable here, but I decided to do this instead:

function Tab:classes()
    if not self.classtable then
        self.classTable = self:getClassTable()
    end
    return self.classTable
end

I did that because I think a method called get should return a thing, and should not in general have a side effect of storing something.

I think the test may run now. And they do. Commit: can get class table for a project.

I’ve had enough for this difficult morning, and the article is long enough as well. Let’s look back and sum up.

Summary

We have a very limited start on things. We can read all the tabs of a project and extract the class names defined in it. (We haven’t tested more than one tab with classes, but I’m sure it’ll work. I’ll test further soon.

However, our class table right now is just a table of names. I suspect we’d like to have it be a table of some kind of object, maybe ClassInfo, each one containing lots of info about the class, not just its name. Number of methods, or the actual method names, things like that. That should be relatively easy to do, though it is going to break the tests a bit.

All in all, we have a good start. We don’t have anything we could show to a business-side customer, but this is Making App, not Shipping App, and we can certainly demonstrate to our programmer colleagues what we have working.

For now, the pain of the day is sufficient.

See you next time!