Alistair found some ancient estimate for a KWIC program and challenged folks to implement it. I’m picking up the challenge.

0800

Alistair Cockburn tweeted:

twitter: programming challenge. He says a good programmer could do this in a week or two (of normal 8-hour working days, so 40-80 hours). How long does it take you? Start the timer the instant you start doodling and sketching! - Don’t cheat! I want to see the times, please :)

The spec is:

kwic

I thought about this briefly last night before bed, and briefly before I got up. No pencil / paper, maybe an hour.

0802 Let’s get to work

I create a new Codea project with CodeaUnit available. Now some thought.

We’re going to have a big array of “lines”, each with some words in it. There’s one of these “lines” for each word in each line of input. If the input is “hello, dolly”, there will be two LineArrays, one for each word.

I’m going to ignore punctuation. In fact, for now, I’m going to ignore text. I’d better write a test.

0811

        _:test("Arrays from starting array of words", function()
            local input = { "some", "random", "words", "in", "a", "row" }
            local array = makeLines(input)
            _:expect(#array).is(6)
            _:expect(array[1]:key()).is("some")
            _:expect(array[2]:key()).is("random")
        end)

I’m supposing that makeLines creates an array of little objects that hold the “rotated” line. I don’t plan to rotate the line. Watch:

0817

Kwic = class()

function Kwic:makeLines(words)
    local result = {}
    for i = 1, #words do
        table.insert(result, Kwic(words,i))
    end
    return result
end

function Kwic:init(anArrayOfWords, index)
    self.words = anArrayOfWords
    self.index = index
end

function Kwic:key()
    return self.words[self.index]
end

Test passes.

OK, now what? I guess I want to create a few of these, jam them all together, and sort them down. I’ll write a new test.

0822

        _:test("Corpus from array of arrays of words", function()
            local textLines = {
            { "the", "local", "boy", "made", "good", "in", "the", "big", "city" },
            { "soon", "he", "came", "back", "to", "the", "village", "for", "his", "girl" }
            }
            local corpus = Kwic:makeCorpus(textLines)
            _:expect(#corpus).is(19)
        end)

I counted the words and got 19. I hope that’s right.

0824

function Kwic:makeCorpus(lineArray)
    local corpus = {}
    for i,words in ipairs(lineArray) do
        local lines = Kwic:makeLines(words)
        table.concat(corpus,lines)
    end
    return lines
end

I expect the test to run. But no, that’s not how you append tables. concat builds a string. We need:

function Kwic:makeCorpus(lineArray)
    local corpus = {}
    for i,words in ipairs(lineArray) do
        local lines = Kwic:makeLines(words)
        table.move(lines,1,#lines, #corpus, corpus)
    end
    return corpus
end

I noticed I wasn’t returning the right thing, fixed that too. Now maybe this works. However, no:

3: Corpus from array of arrays of words  -- Actual: 17, Expected: 19

That’s curious. Two less than the number of lines? No idea.

My call to table.move isn’t working. Screw that:

function Kwic:makeCorpus(lineArray)
    local corpus = {}
    for i,words in ipairs(lineArray) do
        local lines = Kwic:makeLines(words)
        for i,l in ipairs(lines) do
            table.insert(corpus, l)
        end
    end
    return corpus
end

0835 test runs

Now we have the array in corpus, we need to sort it. To sort it, we need to provide comparison functions on the Kwic instances that are in there. I’ve never done that but how hard could it be.

function Kwic:makeCorpus(lineArray)
    local corpus = {}
    for i,words in ipairs(lineArray) do
        local lines = Kwic:makeLines(words)
        for i,l in ipairs(lines) do
            table.insert(corpus, l)
        end
    end
    table.sort(corpus, function(a,b) return a:key() < b:key() end)
    return corpus
end

I think this is probably nearly good. Forgive me but I’m going to print the corpus to decide about the next test.

0842

This looks right, but you can’t see the printout. Now I’ll need to display it on screen. I think this doesn’t count against my time.

0849

output

That output looks pretty decent. Are we done? Not quite. We need to convert text lines to word arrays.

        _:test("string to array of words", function()
            local string = "the local boy made good in the big city"
            local words = Kwic:makeWords(string)
            _:expect(words[1]).is("the")
            _:expect(words[9]).is("city")
        end)

I looked this up:

0855

function Kwic:makeWords(aString)
    local words = {}
    for w in aString:gmatch("%w+") do
        table.insert(words, w)
    end
    return words
end

Now we need a method to start from the story and produce our output. I’m not going to TDD that, I’m going to run it from Main:

function setup()
    if CodeaUnit then 
        codeaTestsVisible(true)
        runCodeaUnitTests() 
    end
    
    local textLines = {
    "the local boy made good in the big city" ,
    "soon he came back to the village for his girl",
    "and they lived happily ever after with their pet aardvark and zebra"
    }
    corpus = Kwic:makeCorpusFromStrings(textLines)
end

And:

function Kwic:makeCorpusFromStrings(stringArray)
    local lineArray = {}
    for i,line in ipairs(stringArray) do
        local words = self:makeWords(line)
        table.insert(lineArray, words)
    end
    return self:makeCorpus(lineArray)
end

And the result:

aardvark to zebra

Time is 0905.

Just over an hour for the code and article to this point.

Let’s reflect:

Reflection

All went smoothly. I never did figure out the table.move trick, I just went ahead and moved it by myself.

If there is cleverness in what I did, it is in not rotating the array of words, just fetching the correct word as the key to sort on.

I think a standard KWIC index usually puts the sorted words in the middle of the page, and the words preceding in the line to the left, following to the right. That wasn’t in the spec AFAIK.

The code isn’t ideal: it doesn’t quite line up as a pipeline, and now that I have it I can see that I’d like it to work that way.

Testing helped a lot, and found a few typos, plus I quickly learned that I didn’t know how table.move works.

I looked up the word to array function, but it’s a trivial regex trick that any reasonable person here in the 21st century would probably know. It wouldn’t be that hard to write it out longhand.

I didn’t deal with upper-lower case, and since upper case and lower case letters don’t sort together, we should probably tolower all the keys before returning them. Trivial improvement.

I just have one question: who should I bill for 40 to 80 hours for this thing?

Here’s all the code and tests. You can see that I didn’t do the niceties of naming the test something sensible, and so on. I was under pressure to get done in less than two weeks. What can I say?


-- CUBase

function setup()
    if CodeaUnit then 
        codeaTestsVisible(true)
        runCodeaUnitTests() 
    end
    
    local textLines = {
    "the local boy made good in the big city" ,
    "soon he came back to the village for his girl",
    "and they lived happily ever after with their pet aardvark and zebra"
    }
    corpus = Kwic:makeCorpusFromStrings(textLines)
end

function draw()
    --if CodeaUnit then showCodeaUnitTests() end
    background(40,40,40)
    fill(255)
    textMode(CORNER)
    local line = 1000
    for i,kwic in ipairs(corpus) do
        local key = kwic:key()
        local txt = kwic:text()
        text(key, 100, line)
        text(txt, 200, line)
        line = line - 25
    end
end

-- RJ 20210411
-- KWIC tests

function testYOURTESTNAMEHERE()
    CodeaUnit.detailed = true

    _:describe("YOUR_TEST_NAME_HERE", function()

        _:before(function()
        end)

        _:after(function()
        end)

        _:test("HOOKUP", function()
            _:expect("Bar", "testing fooness").is("Bar")
        end)
        
        _:test("Arrays from starting array of words", function()
            local input = { "some", "random", "words", "in", "a", "row" }
            local array = Kwic:makeLines(input)
            _:expect(#array).is(6)
            _:expect(array[1]:key()).is("some")
            _:expect(array[2]:key()).is("random")
        end)
        
        _:test("Corpus from array of arrays of words", function()
            local textLines = {
            { "the", "local", "boy", "made", "good", "in", "the", "big", "city" },
            { "soon", "he", "came", "back", "to", "the", "village", "for", "his", "girl" }
            }
            local corpus = Kwic:makeCorpus(textLines)
            _:expect(#corpus).is(19)
            for i,kw in ipairs(corpus) do
                local str = table.concat(kw.words, " ")
                print(kw:key()," -> ", str)
            end
        end)
        
        _:test("string to array of words", function()
            local string = "the local boy made good in the big city"
            local words = Kwic:makeWords(string)
            _:expect(words[1]).is("the")
            _:expect(words[9]).is("city")
        end)

    end)
end


Kwic = class()

function Kwic:makeCorpusFromStrings(stringArray)
    local lineArray = {}
    for i,line in ipairs(stringArray) do
        local words = self:makeWords(line)
        table.insert(lineArray, words)
    end
    return self:makeCorpus(lineArray)
end

function Kwic:makeCorpus(lineArray)
    local corpus = {}
    for i,words in ipairs(lineArray) do
        local lines = Kwic:makeLines(words)
        for i,l in ipairs(lines) do
            table.insert(corpus, l)
        end
    end
    table.sort(corpus, function(a,b) return a:key() < b:key() end)
    return corpus
end

function Kwic:makeLines(words)
    local result = {}
    for i = 1, #words do
        table.insert(result, Kwic(words,i))
    end
    return result
end

function Kwic:makeWords(aString)
    local words = {}
    for w in aString:gmatch("%w+") do
        table.insert(words, w)
    end
    return words
end

function Kwic:init(anArrayOfWords, index)
    self.words = anArrayOfWords
    self.index = index
end

function Kwic:key()
    return self.words[self.index]
end

function Kwic:text()
    return table.concat(self.words, " ")
end