We continue working on our little ‘Making App’, because it’s kind of fun. Today, I foresee a ‘major refactoring’.

As I was reflecting on what we have in the stats project so far, and on what we probably want, I began, as one does, to think about the data structure. I find this kind of thinking valuable, because the shape of one’s program relates strongly to the shape of the data. And, of course, we want the shape of the program to relate strongly to the program’s purpose, which is in this case, about the shape of the output.

This means that we’d like the shape of the data to support, or at least not resist, the shape of the output.

Right now, the program is keeping the information gleaned from each tab of the project in a Tab object. Inside that object are a collection of class descriptions, and a collection of method descriptions.

This is almost, but not quite, reflecting the shape of the program as seen laid out among its tabs. However, “not quite” is accurate. A Lua program can have a class in one tab, and some or all of the class’s methods in another tab. So, while we may want tab info in the output, we probably do not want method definitions isolated to any given tab.

This line of thinking leads me to want a different organization, something like this:

  • A Project object for each project, containing,
  • A collection of each kind of thing we tally up, such as:
    • Classes defined
    • Methods defined
    • Functions defined

For each of the kinds of things we tally up, we would like the thing’s info to include everything important about it, including, I posit, the tab in which we found it.

In other words, I’m proposing a flat one-level structure containing collections of everything we like.

As I say that, I think of something simpler to build and equally comprehensive. What if we just had one big collection containing, will’e nill’e, class definition objects, method definition objects, function definition objects, and so on?

Processing such a collection would almost always involve first selecting from it, to get whatever one wanted to report on. So each element would need to know its “kind” as well as other info.

On the face of it, this seems less “efficient”, since we’d be trekking through the big collection all the time. On the other hand, it is more flexible, in that it is indifferent to whatever query one is making. (Arguably, “indifferent” means “equally bad” here, but we’ll stick with “indifferent” for now.)

Note that we’re making a decision which is analogous to the ones we make when we define a database structure or file structure. (Don’t even make me think of converting this program to some kind of XML or JSON …) Often, we’re given a database with a given structure, and have no choice. Even then, we often benefit from thinking in terms of the structure our program would like to process. Why? At least two reasons:

First, the structures we want represent exactly the queries we want against the database, and knowing our queries is important.

Second, processing data in memory is often (nearly always?) faster than processing it in the database, and the amount of memory we have in most programming situations is almost obscene, which means that we can read in a great deal of information and process it quickly in memory. We get better control over what we get, and generally faster performance.

Be that as it may, I do not intend to store my project information on Oracle, and thinking about the data has led me to this conclusion:

We’ll try keeping all our objects together in one big collection, which we’ll query as needed to get the output we want. If nothing else, this will be educational.

What Must We Do?

I think we must begin by assessing our tests, to see which ones can continue to exist as they are, and which ones need modification–and which ones need replacement entirely.

And we’ll surely now need to put tab information into every object we create.

A Temptation

Some readers, if there are any, will be thinking now of creating a simple flat “record” rather than class instances, perhaps a keyed table for each item, containing

  • kind - class, method, etc
  • tab - tab it was found in
  • major name - e.g. class name for a method
  • minor name - specific function / method name
  • args - if any
  • mumble, foo, baz - whatever else we need

We’re not going to do that, and I’ll tell you why. When we create dumb data, and the picture above is about as dumb as data gets, we drive out procedural code. Imagine the case/switch/if-else statements it will take to format one of those records. Nasty, procedural hobbitses. We’re OO programmers here, Jim.

Our various classes will help us. I’m sure of that. And they’ll all need to respond to the same protocol. We’ll deal with that somehow. I even have a vague idea for how, using an abstract superclass. But that’s for later.

For now, let’s see about flattening this baby out.

Flattening the Baby

I assure you, no actual babies will be harmed in the course of this series. Stand down, cancel the march of mothers.

I’ll include all the test code here, since we haven’t reviewed it for a while. The first test tab in the program, “Tests”, appears to me to be all lower-level pattern checking and can be left, I believe, as is:

-- RJ 20211021
-- Project Statistics Tests

local tab
local pd

function testProjectStatistics()
    CodeaUnit.detailed = true

    _:describe("Project Statistics", function()

        _:before(function()
            pd = PatternDictionary()
            tab = readProjectTab("SampleStatsProject:SampleCode")
        end)

        _:after(function()
        end)

        _:test("HOOKUP", function()
            _:expect("Bar", "testing fooness").is("Bar")
        end)
        
        _:test("Read SampleCode tab from this project", function()
            local pattern = "%-%-%s*(SampleCode)"
            local start,stop,capture = tab:find(pattern)
            _:expect(start).is(1)
            _:expect(stop).is(13)
            _:expect(capture).is("SampleCode")
        end)
        
        _:test("Find a class", function()
            local pattern = pd:classDefinition()
            local start,stop,capture = tab:find(pattern)
            _:expect(capture).is("SampleCode")
        end)
        
        _:test("Find a subclass", function()
            tab = " SubclassOfSample = class(Sample)"
            local pattern = pd:classDefinition()
            local start,stop,capture = tab:find(pattern)
            _:expect(capture).is("SubclassOfSample")
        end)
        
        _:test("Pack the return", function()
            tab = "<123,456,789>"
            local pattern = "<(%d*),(%d*),(%d*)>"
            local result = {tab:find(pattern)}
            _:expect(#result).is(5)
            _:expect(result[1]).is(1)
            _:expect(result[2]).is(13)
            _:expect(result[3]).is("123")
            _:expect(result[4]).is("456")
            _:expect(result[5]).is("789")
        end)
        
        _:test("Find all classes", function()
            local pattern = pd:classDefinition()
            local iter = tab:gmatch(pattern)
            local names = {}
            for k,v in iter do
                table.insert(names,k)
            end
            _:expect(names[1]).is("SampleCode")
            _:expect(names[2]).is("SubclassOfSample")
        end)
        
        _:test("Lua 5.3 Ref example", function()
            t = {}
            s = "from=world=ron, to=Lua=jeffries"
            local names = {}
            for m1,m2,m3 in string.gmatch(s, "(%w+)=(%w+)=(%w+)") do
                table.insert(names, {m1,m2,m3})
            end
            _:expect(names[2][3]).is("jeffries")
        end)
        
        _:test("Find a method", function()
            local pattern = pd:methodDefinition()
            local start,stop,className,methodName, args = tab:find(pattern)
            _:expect(className,"className").is("SampleCode")
            _:expect(methodName,"methodName").is("init")
            _:expect(args,"args").is("x")
        end)
        
    end)
end

The other tab with Tests also includes some real class definitions. I often build new classes right in the Tab that tests them. It saves tabbing about and helps me keep things compact. We’ll just review the tests first. Here they are:

-- RJ 20211021
-- Tab Tests
local pd

function testTabs()
    CodeaUnit.detailed = true
    
    _:describe("Tabs", function()
        
        _:before(function()
        end)
        
        _:after(function()
        end)
        
        _:test("Classes in a Tab", function()
            local tabs = Tabs("SampleStatsProject")
            local tab = tabs:at(2)
            local tabClasses = tab:getClassesInTab("SampleCode")
            _:expect(#tabClasses).is(2)
        end)
        
        _:test("Tabs class", function()
            local tabs = Tabs("SampleStatsProject")
            local tab = tabs:at(2)
            _:expect(tab:tabName()).is("SampleCode")
            _:expect(tab:numberOfClasses()).is(2)
        end)
        
        _:test("Methods", function()
            local tabs = Tabs("SampleStatsProject")
            local tab = tabs:at(2)
            local methodCollection = tab:methods()
            _:expect(methodCollection:count()).is(5)
            local methodsInSampleCode = methodCollection:methodsForClass("SampleCode")
            _:expect(methodsInSampleCode:count()).is(4)
        end)
        
    end)
end

The good news about changing our data structure on day three is that there aren’t very many tests, and presumably not much code, dedicated to the old way. This is why we think all the time, not just at the beginning, when it’s too soon to know, and at the end, when it’s too late to do anything about it.

So all these tests need at least some modification. We have classes now–I’m tempted to turn the program loose on itself–including MethodDefinition and MethodCollection, but classes are stored as raw names.

I think the thing to do is probably to delete all the existing code and rewrite the tests, but there’s code in there that’s worth retaining. At least I think there is. If it were easier to suck code out of WorkingCopy, I’d do the delete, but if it’s easy I don’t know how. We’ll change the tests and then try extra hard to remove things we don’t need.

So … the pattern is, we want a ProgramDescription object that contains a big collection of MethodDefinition, ClassDefinition, FunctionDefinition, whatever objects. Each of those objects must also know its tab name as well as whatever else it knows.

So this test, for example:

        _:test("Classes in a Tab", function()
            local tabs = Tabs("SampleStatsProject")
            local tab = tabs:at(2)
            local tabClasses = tab:getClassesInTab("SampleCode")
            _:expect(#tabClasses).is(2)
        end)

The corresponding test in our new scheme might look like this:

        _:test("Classes in a Tab", function()
            local desc = ProjectDescription("SampleStatsProject")
            local tabClasses = desc:getClassesInTab("SampleCode")
            _:expect(#tabClasses).is(2)
        end)

Now this is a simple test with some big implications, namely that method getClassesInTab, but I think we’re up to it. I could rewrite the others now, but I’m not sure I like this protocol. We’ll learn as we go.

Run the test, expecting complaints about ProjectDescription.

1: Classes in a Tab -- TabTests:17: attempt to call a nil value (global 'ProjectDescription')

Write ProjectDescription class. I can see two ways to go. One is to create the class to go through all the tabs creating the master collection I have in mind. The other is to use the existing Tabs class and other innards. The latter will need unwinding but should get our tests running sooner.

ProjectDescription =  class()

function ProjectDescription:init(projectName)
    self.projectName = projectName
end

I’m going to try really hard not to write any code that isn’t implied by the test, but I’ve already gone too far by saving the project name. I have no test that requires that.

Run tests:

1: Classes in a Tab -- TabTests:18: attempt to call a nil value (method 'getClassesInTab')

No surprise there.

function ProjectDescription:init(projectName)
    self.projectName = projectName
    self.tabs = Tabs(projectName)
end

function ProjectDescription:getClassesInTab(tabName)
    local tabs = self.tabs:tabNamed(tabName)
    return tab:classes()
end

There is no tabNamed method. That’s the error I expect.

1: Classes in a Tab -- TabTests:49: attempt to call a nil value (method 'tabNamed')

I love it when a plan comes together.

function Tabs:tabNamed(tabName)
    for _i,tab in self.tabs do
        if tab:isNamed(tabName) then
            return tab
        end
    end
    return nil
end

I elected not to rip the guts out of the Tab but to ask it if it is named whatever. I expect the error to show no such method.

1: Classes in a Tab -- TabTests:66: attempt to call a table value

Forgot ipairs. Again.

function Tabs:tabNamed(tabName)
    for _i,tab in ipairs(self.tabs) do
        if tab:isNamed(tabName) then
            return tab
        end
    end
    return nil
end
1: Classes in a Tab -- TabTests:67: attempt to call a nil value (method 'isNamed')

Back on track. Tests are wonderful.

function Tab:isNamed(tabName)
    return self.name == tabName
end

Let’s see what happens now.

1: Classes in a Tab -- TabTests:50: attempt to index a nil value (global 'tab')

Uh oh.

function ProjectDescription:getClassesInTab(tabName)
    local tabs = self.tabs:tabNamed(tabName)
    return tab:classes()
end

Yes. I meant local tab. With that in place, our new test runs.

Now this is interesting. If it continues to be this easy, we can get all our tests running at the high level of ProjectDescription without having to change the internal structures. Then we’ll be able to modify them any way we wish, so long as we retain the protocol in ProjectDescription.

Let’s revamp this:

        _:test("Tabs class", function()
            local tabs = Tabs("SampleStatsProject")
            local tab = tabs:at(2)
            _:expect(tab:tabName()).is("SampleCode")
            _:expect(tab:numberOfClasses()).is(2)
        end)

This test is really imposing a protocol on the class Tab, which we’re planning not to have. I think we ignore it. Which reminds me, why didn’t I commit that running test? Commit: First ProjectDescription test.

Looking at both the remaining unmodified tests, I don’t like what I see:

        _:test("Tabs class", function()
            local tabs = Tabs("SampleStatsProject")
            local tab = tabs:at(2)
            _:expect(tab:tabName()).is("SampleCode")
            _:expect(tab:numberOfClasses()).is(2)
        end)
        
        _:test("Methods", function()
            local tabs = Tabs("SampleStatsProject")
            local tab = tabs:at(2)
            local methodCollection = tab:methods()
            _:expect(methodCollection:count()).is(5)
            local methodsInSampleCode = methodCollection:methodsForClass("SampleCode")
            _:expect(methodsInSampleCode:count()).is(4)
        end)

These just don’t reflect the new design. They are all too tab-focused. Even the first one is, because it is testing a Tab object that I plan to get rid of.

I am considering two ways forward. One is to blow away or ignore these tests and write new ones, and make ProjectDescription the way I want it to be. The other is to preserve these tests, providing a Tab object that I otherwise would not have, but to create it from the flat data structure.

The latter is something we could do it we had a lot of code relying on Tab, but we only have these three tests.

Let’s go ahead, TDD out ProjectDescription, and then probably thank these initial three for their service and remove them.

TDD ProjectDescription

I think ideally, I’d like to have the first test in its old form. Yes, OK, let’s revert out what we just did and start over.

I undo today’s commit, and then revert. Existing (old) tests run. New test:

I get this far …

        _:test("ProjectDescription methods", function()
            local desc = ProjectDescription("SampleStatsProject")
            local methods = desc:methods()
            _:expect
        end)

What should this thing return? I think it has to be some kind of smart collection, so let’s assume that.

        _:test("ProjectDescription methods", function()
            local desc = ProjectDescription("SampleStatsProject")
            local methods = desc:methods()
            _:expect(methods:count()).is(4)
        end)

In this case we have a MethodsCollection object so let’s suppose we’ll use it if it feels right. Test will fail looking for ProjectDescription.

4: ProjectDescription methods -- TabTests:40: attempt to call a nil value (global 'ProjectDescription')
ProjectDescription = class()

Error, as expected:

4: ProjectDescription methods -- TabTests:41: attempt to call a nil value (method 'methods')

This is a bit more code than I’d like to write, but I had templates to copy.

function ProjectDescription:init(projectName)
    self.projectName = projectName
    self.tabs = listProjectTabs(projectName)
    self.methods = self:getMethods()
end

function ProjectDescription:getMethods()
    local methods = {}
    for _i,tabName in ipairs(self.tabs) do
        local longName = self.projectName..":"..tabName
        self:addMethodsFromTab(longName, methods)
    end
    return MethodCollection(methods)
end

function ProjectDescription:addMethodsFromTab(longTabName, methods)
    local tabContents = readProjectTab(longTabName)
    local pattern = PatternDictionary:methodDefinition()
    local matchResult = tabContents:gmatch(pattern)
    for className,methodName,args in matchResult do
        local method = MethodDefinition(className,methodName,args)
        table.insert(methods, method)
    end
end

I’m not sure what to expect. Run the tests.

4: ProjectDescription methods -- TabTests:41: attempt to call a table value (method 'methods')

Perfect. We have no method yet named methods.

        _:test("ProjectDescription methods", function()
            local desc = ProjectDescription("SampleStatsProject")
            local methods = desc:methods()
            _:expect(methods:count()).is(4)
        end)

So we write it.

function ProjectDescription:methods()
    return self.methods
end

I rather hope that this will pass, but I think it probably fails on ‘count’ because I’m not yet returning the MethodsCollection.

4: ProjectDescription methods -- TabTests:41: attempt to call a table value (method 'methods')

Dammit! Again I name a method the same as the name of a member variable.

function ProjectDescription:init(projectName)
    self.projectName = projectName
    self.tabs = listProjectTabs(projectName)
    self.methodCollection = MethodCollection(self:getMethods())
end

function ProjectDescription:methods()
    return self.methodCollection
end

I think the addition of the MethodCollection conversion was righteous because the test requires it. Let’s see what happens now:

4: ProjectDescription methods  -- Actual: 0, Expected: 4

OK what have I done here? Well, hell. I was already wrapping the answer. Let’s undo that second wrapping. I outsmarted myself.

function ProjectDescription:init(projectName)
    self.projectName = projectName
    self.tabs = listProjectTabs(projectName)
    self.methodCollection = self:getMethods()
end
4: ProjectDescription methods  -- Actual: 5, Expected: 4

That is in fact the correct answer. The test is wrong. With the 5 in place, no surprise, it passes. Commit: ProjectDescription class, creates full MethodCollection.

I find myself wanting more and more to remove the old tests and the code they imply. Do you mind? I didn’t think so.

Big delete there. The remaining tests run. Commit: remove obsolete Tabs and Tab classes.

That was a delete of 120 lines. I hope I’m not getting paid by number of lines added, it’ll be a bad morning.

I have in mind an extension to our test. We would like to be able to get the methods for a given class. Let’s extend the methods function to do that.

        _:test("ProjectDescription methods", function()
            local desc = ProjectDescription("SampleStatsProject")
            local methods = desc:methods()
            _:expect(methods:count()).is(5)
            local sampleCodeMethods = desc:methods("SampleCode")
            _:expect(sampleCodeMethods:count()).is(4)
        end)

I’m positing that we have an optional parameter, the name of the class. I think this is a bit naff, but I want to do it anyway, just to see what it looks like. Naturally we get this failure:

1: ProjectDescription methods  -- Actual: 5, Expected: 4

We code:

function ProjectDescription:methods(className)
    if className then
        return self.methodCollection:methodsForClass(className)
    else
        return self.methodCollection
    end
end

Test runs. We already had that method on MethodCollection:

function MethodCollection:methodsForClass(className)
    local result = {}
    for _i,meth in ipairs(self.methodDefinitions) do
        if meth.className == className then
            table.insert(result, meth)
        end
    end
    return MethodCollection(result)
end

Let’s improve this by pushing the possibly nil class name down:

function ProjectDescription:methods(className)
    return self.methodCollection:methodsForClass(className)
end

function MethodCollection:methodsForClass(className)
    local result = {}
    for _i,meth in ipairs(self.methodDefinitions) do
        if not className or meth.className == className then
            table.insert(result, meth)
        end
    end
    return MethodCollection(result)
end

Test runs. I think that’ll do for now. Commit: ProjectDescription methods function can return all methods or a specific class.

However. We know we’re going to want the tab name in every one of our objects like the MethodDefinition class. Or will we? Yes, we could add a tab field to MethodDefinition. Or we could wrap all the various definitions in a Tab container of some kind. Could do the same with the class.

I’m torn. The flat idea is simple, because flat is simple. A Tab instance containing a MethodDescription, not so much.

Another possibility comes to mind. We are presently treating the names, method names, class names, tab names, all as strings. Is that a case of primitive obsession? Should those values be wrapped in tiny TabName, MethodName, ClassName instances?

I’m not sure. It seems at this moment like too much. But we’ve already passed a string down about three or four levels, which isn’t quite cool.

Let’s try an experiment. (This is weird. I feel as if I’m about to reinvent the Tab class that I just removed. But in fact I think it’ll be quite different, if it is even a good idea at all.

Let’s assume that the ProjectDescription has a collection, not of tab names, like it has now, but a collection of Tab instances, which are little more than a thing containing a name, but that will probably do more quite soon.

This is a refactoring. We’ve really only got the one test but I’m sure it’ll break readily enough.

function ProjectDescription:init(projectName)
    self.projectName = projectName
    self.tabs = listProjectTabs(projectName)
    self.methodCollection = self:getMethods()
end

I want self.tabs here to be an array of Tab instances.

function ProjectDescription:getProjectTabs(projectName)
    local tabNames = listProjectTabs(projectName)
    local result = {}
    for _i, tabName in ipairs(tabNames) do
        table.insert(result,Tab(projectName,tabName))
    end
    return result
end

And we need the Tab class:

Tab = class()

function Tab:init(projectName, tabName)
    self.projectName = projectName
    self.tabName = tabName
    self.longName = projectName..":"..tabName
end

I ran ahead a bit. I could run ahead more but let’s let the test break.

1: ProjectDescription methods -- TabTests:47: attempt to concatenate a table value (local 'tabName')

Not unexpected:

function ProjectDescription:getMethods()
    local methods = {}
    for _i,tabName in ipairs(self.tabs) do
        local longName = self.projectName..":"..tabName
        self:addMethodsFromTab(longName, methods)
    end
    return MethodCollection(methods)
end

function ProjectDescription:addMethodsFromTab(longTabName, methods)
    local tabContents = readProjectTab(longTabName)
    local pattern = PatternDictionary:methodDefinition()
    local matchResult = tabContents:gmatch(pattern)
    for className,methodName,args in matchResult do
        local method = MethodDefinition(className,methodName,args)
        table.insert(methods, method)
    end
end

Tabs are now objects. Now we can push that behavior back down into them if we wish. This does seem to be moving us back where we were, but I think we’ll find that we’re OK.

function ProjectDescription:getMethods()
    local methods = {}
    for _i,tab in ipairs(self.tabs) do
        self:addMethodsFromTab(tab, methods)
    end
    return MethodCollection(methods)
end

function ProjectDescription:addMethodsFromTab(tab, methods)
    local tabContents = tab:contents()
    local pattern = PatternDictionary:methodDefinition()
    local matchResult = tabContents:gmatch(pattern)
    for className,methodName,args in matchResult do
        local method = MethodDefinition(className,methodName,args)
        table.insert(methods, method)
    end
end

function Tab:contents()
    return readProjectTab(self.longName)
end

Test. Test runs. Commit: ProjectDescription tabs table is array of Tab instances that can return tab contents.

It’s Sunday, and I can sense that breakfast and our Standard Sunday Morning Sunday Morning is about to begin. Let’s sum up.

Summary

We’ve changed direction toward a higher-level object, ProjectDescription, intended to contain (or compute) collections of methods, classes, functions, whatever. So far it has a collection of barely alive Tab objects that know their project and tab name, and can return tab contents.

In the ProjectDescription we parse out methods and return a MethodCollection of all the methods in the project, and we can select by class name, but nothing else.

We’re not really as far along as we were when we started, but we are moving in a direction that we like. If this were an ordinary working day, we’d wind up where we were with a better (well more desired) structure.

Since it’s a short day, we’re not quite there yet.

Looking forward, I think we’ll replace the MethodCollection with a more general kind of collection, but I could be wrong about that. We’ll probably proceed by creating a ClassCollection for the classes, and then observing duplication.

And I am getting tempted to do my queries with a little structure like

query = {tab=tabName, class=className, ...}

We’ll see about that. Maybe I’ll invent relational databases. It wouldn’t be the first time I’ve done that.

For today, two steps back, one step forward, to a better place. At least I think it’s a better place.

And, by the way, that’s not an uncommon thing when we do incremental evolutionary design. We’re discovering better ways to write the program, not assuming at the outset that we know what’s right.

See you next time for more discovery!