XST 47: An anti-strawberry?

We’re taught to ‘make it work; make it right; make it fast’. We’re taught that if it’s hard to test, it’s not right. Let’s explore. TL;DR: No strawberry here?

I’m having trouble finding strawberries, the bite-sized ideas that are immediately helpful and that are difficult to water down to the point where they are no longer of value. Today we’re going to look for at least one. I don’t know if we’ll succeed.

Spoiler: I set out to find some code that wasn’t “right”, that was making testing difficult. Then I thought I might have a nice little strawberry article about making code right. What happened surprised me: I found a somewhat serious error in the extended set theory program. Here’s that story:

In the past couple of days, my colleagues have said some things that are driving this exploration.

Make it work; make it right; make it fast;
If it’s hard to test, it’s not right;
If you have a tested operation on a thing and you want a collection of things that have that operation applied to them, you can basically just use map, because map can’t possibly break.

That last one is kind of long, but I didn’t want to boil it down.

GeePaw Hill talks about really wanting tests for all his branching logic, and not wanting much else. This is similar to the last notion above. If map works, and the function works, you’re done. So test the function, because map is known to work.

I’m going to start with some code that is hard to test, see whether it is not right, see whether the branching logic / map ideas can help me, and see whether there is a tiny lesson to be had. I’ve had to write some fairly gnarly tests in the Extended Set Theory program, so we’ll start there.

XSet2

I open Working Copy, my git client, on XSet2, and before I even open Codea, Working Copy tells me there is a file to commit. Someone has been careless. I open Codea. Let’s see what’s up.

It turns out that I added some comments in the XData file. Maybe we’ll look at them later. All 63 tests are green. We’re good to go.

I’m looking for tests that are hard to write, to see what I can learn. The ones I remember are tests where I had to set up a lot of data. They are basically “acceptance tests” on set operations. There may be other interesting ones. I’ll look around and pick something to start with. Here’s one:

        _:test("Intersect", function()
            local A = XSet()
            A:addAt("Jeffries", "last")
            A:addAt("Ron", "first")
            _:expect(A:card(), "#A").is(2)
            local B = XSet()
            B: addAt("Ron", "first")
            _:expect(B:card(), "#B").is(1)
            local C = A:intersect(B)
            _:expect(C:hasAt("Ron", "first")).is(true)
            _:expect(C:card()).is(1)
            _:expect(C:isNull()).is(false)
        end)

This is essentially the first test I wrote of a set operation. It is typical of early tests. Later ones could have been a bit easier, after the fromTable method was created:

        _:test("Intersect Shorthand", function()
            local A = XSet:fromTable{last="Jeffries",first="Ron"}
            _:expect(A:card(), "#A").is(2)
            local B = XSet():fromTable{first="Ron"}
            _:expect(B:card(), "#B").is(1)
            local C = A:intersect(B)
            _:expect(C:hasAt("Ron", "first")).is(true)
            _:expect(C:card()).is(1)
            _:expect(C:isNull()).is(false)
        end)

Now this could be simpler if we didn’t check the cardinality (size) of the inputs. Let’s do that.

        _:test("Intersect Shorthand", function()
            local A = XSet:fromTable{last="Jeffries",first="Ron"}
            local B = XSet():fromTable{first="Ron"}
            local C = A:intersect(B)
            _:expect(C:hasAt("Ron", "first")).is(true)
            _:expect(C:card()).is(1)
            _:expect(C:isNull()).is(false)
        end)

The check for isNull shouldn’t really be needed. I must have had some fear about creating a set with contents and still finding it null. Let’s make a card for that and remove the check.

        _:test("Intersect Shorthand", function()
            local A = XSet:fromTable{last="Jeffries",first="Ron"}
            local B = XSet():fromTable{first="Ron"}
            local C = A:intersect(B)
            _:expect(C:hasAt("Ron", "first")).is(true)
            _:expect(C:card()).is(1)
        end)

Those last two expectations check that we only got one element in our new set and that it is Ron^first. If we had a set equality function we could rewrite the test this way:

        _:test("Intersect with Set equality", function()
            local A = XSet:fromTable{last="Jeffries",first="Ron"}
            local B = XSet():fromTable{first="Ron", last="Hendrickson"}
            local C = A:intersect(B)
            local E = XSet():fromTable{first="Ron"}
            _:expect(C==E).is(true)
        end)

Note that I made the test more stringent by adding an element to B that should not occur in the intersection. And we do have set equality now, so I can use it. The tests are still green.

What Have We Learned?

Well. I set out to find a test that was telling me that the objects weren’t helping me. Instead I found a test that wasn’t as clean as it could be, probably at least in part because it was an early one. We should talk about what we should do when that happens.

First of all, we rarely remove a test. Last night, Chet reminded us that on C3 (the first XP project, for those following along at home), we had the rule that anyone could add a test, but that it required a team decision to remove a test. I think that’s a good way to lean. If someone feels the need for a test, we all just go “Yes” and add it. But if a test seems unneeded, we need to be pretty careful, since it could be the last thread holding things together.

Here I have three intersect tests, the original that I showed you, the “shorthand” one, and now this one. I think this one covers the first two and as a team I agree we can remove them.

There is one more intersect test:

        _:test("Null Intersect", function()
            local A = XSet()
            A:addAt("Jeffries", "last")
            A:addAt("Ron", "first")
            _:expect(A:card(), "#A").is(2)
            local B = XSet()
            B: addAt("Ron", "notFirst")
            _:expect(B:card(), "#B").is(1)
            local C = A:intersect(B)
            _:expect(C:card()).is(0)
            _:expect(C:isNull()).is(true)
        end)

That one is a bit belt and suspenders, but I can imagine a way of implementing intersect that would let it find a result when it shouldn’t. I have a very strong imagination for bad code. So I think I’ll clean it rather than remove it. First phase:

        _:test("Null Intersect", function()
            local A = XSet:fromTable{last="Jeffries",first="Ron"}
            local B = XSet:fromTable{notFirst="Ron"}
            local C = A:intersect(B)
            _:expect(C:card()).is(0)
            _:expect(C:isNull()).is(true)
        end)

Now about that double check for card and isNull. Let’s look at the code for those.

function XSet:isNull()
    return self==NULL or not self:exists(function(e,s) return true end)
end

NULL.card = function() return 0 end

function XSet:card()
    return self.data.card and self.data:card() or self:reduce(0, function(r,e,s) return r+1 end)
end

function XTuple:card()
    return #self.contents
end

function CSVSet:card()
    local pat = ".-\n"
    local rep = "X"
    local nls = self.data:gsub(pat,rep)
    return nls:len()
end

Note: We’re not close to finding a strawberry. What we’re doing is discovering infelicities in the test code and improving them. Maybe there’s a strawberry there, but fact is, I’m working set theory now, not tiny techniques that make good strawberries.

This doesn’t satisfy me. There are set types, the Tuple and CSVSet that implement card but not isNull. We’re not using either of those. Let’s look at intersect:

function XSet:intersect(anXSet)
    return self:select(function(e,s) return anXSet:hasAt(e,s) end)
end

That’s clearly right. Select the elements of the current set which are also in the other set. What does select do, just to check?

function XSet:select(predicate)
    return self:reduce(XSet(), function(r,e,s)
        return predicate(e,s) and r:addAt(e,s) or r
    end)
end

function XSet:reduce(first,f)
    local result = first
    for e,s in self:elements() do
        result = f(result, e,s)
    end
    return result
end

We’ve traced this all the way down, and I think two things are clear: 1) it’s correct and 2) it will apparently return an XSet with no contents, but that is not identical to NULL.

This leads me to digress. Is NULL even useful? Certainly the property is useful, but the special set? We have all this code:

XSet.NULL = XSet()
NULL = XSet.NULL
NULL.addAt = function() return NULL end
NULL.card = function() return 0 end
NULL.elements = function() return pairs({}) end
NULL.hasAt = function() return false end

NULL is a set that is empty and stays empty, because you can’t add anything to it. But what use, if any, does it have?

Ah, with the help of the code, I remember. The classical set

{a,b,c}

is represented in XST as

{a^NULL, b^NULL, c^NULL}

So we need it for that purpose, perhaps. I don’t think we’ve been very careful in that area and we don’t really care, in general, whether a set is classical or not. But I can imagine a test that we could write that might be interesting.

Note: Coming at this situation with fresh eyes, I think of a test that might not work. When that happens, the best thing we can do is to write that test absolutely right now.

        _:test("Classical Set with exponent not NULL", function()
            local A = XSet()
            A:addAt("foo",NULL)
            local N = XSet()
            local B = XSet()
            B:addAt("foo",N)
            _:expect(A==B).is(true)
        end)

This test fails. That’s why we wrote it: to become certain that we have a problem. The two sets are not equal. The empty set N that I created is making it fail. I think we actually have a bug here, but I’m not quite sure what it is.

Let’s double check equality:

function XSet:__eq(something)
    return self:isXSet(something) and self:isSubset(something) and something:isSubset(self)
end

So we may have an issue with subset.

function XSet:isSubset(other)
    -- return whether self is a subset of other
    -- for every element e,s of self
    --   is e element-s of other
    return self:every(function(e,s) 
        return other:hasAt(e,s) 
    end)
end

So it comes down to hasAt:

function XSet:hasAt(element, scope)
    if self.data.hasAt then
        return self.data:hasAt(element,scope)
    else
        return self:exists(function(e,s) return element==e and scope==s end)
    end
end

Let’s test a bit and find the issue:

        _:test("Null sets are equal", function()
            local N1 = XSet()
            local N2 = XSet()
            _:expect(N1==N2).is(true)
            _:expect(N1==NULL).is(true)
            _:expect(NULL==N1).is(true)
        end)

I kind of expected that to fail, but it didn’t. So let’s check isSubset.

        _:test("isSubset involving NULL", function()
            local A = XSet()
            A:addAt("foo",NULL)
            local N = XSet()
            local B = XSet()
            B:addAt("foo",N)
            _:expect(B:isSubset(A),"B ss A").is(true)
            _:expect(A:isSubset(B),"A ss B").is(true)
        end)

Both these expectations fail. So the problem is in isSubset. Reviewing:

function XSet:isSubset(other)
    -- return whether self is a subset of other
    -- for every element e,s of self
    --   is e element-s of other
    return self:every(function(e,s) 
        return other:hasAt(e,s) 
    end)
end

So clearly the hasAt must not be comparing correctly on the sets, and that’s really kind of a big deal.

I think I’ve been assuming for a long time that the scopes of elements can only be strings (or maybe numbers but there’s code here that seems to be converting numbers to strings). And I think hasAt assumes that it can just index:

function XSet:hasAt(element, scope)
    if self.data.hasAt then
        return self.data:hasAt(element,scope)
    else
        return self:exists(function(e,s) return element==e and scope==s end)
    end
end

function XGeneric:hasAt(element, scope)
    local tab = self.contents[scope]
    if not tab then return false end
    if not XSet:isXSet(element) then
        return tab[element] == true
    else
        for e,_true in pairs(tab) do
            if e==element then
                return true
            end
        end
        return false
    end
end

Right. We just index into contents using the scope. That’s not OK if scopes can be sets. This is trouble, big trouble.

Let’s first figure out what this is doing. Then we’ll add tests to make it do what we want.

Let’s review the XGeneric data structure.

The set {a^x, b^x,c^y} is represented in XGeneric contents as a table of tables:

contents = { x={a=true,b=true}, y={c=true} }

We index into contents with scope, to find the table at that scope, then index into that table at the element to see if we get true. If we do, the element is there. But if the element we’re checking for is a set, we have to check what we find in contents to see if it is equal:

Fetch the contents table indexed by scope (this is the bug). The contents table is the set of all elements at that scope. There can be more than one such element.
Return false if there is no such table.
If the element we’re checking for is not a set, return whether the table contains it explicitly, i.e. tab[element]==true.
Otherwise, check each member of the table to see whether it equals the set element. We use ==, so if all this worked it would recur and get the right answer. But it’s messy isn’t it? And it doesn’t work.

We have some failing tests. We could probably make them work by a simple patch to make all null sets be set to NULL during addition. That’s not good enough, but let’s try it to be sure we understand what’s going on.

function XGeneric:addAt(element, scope)
    assert(scope~=nil,"nil scope "..tostring(element))
    if self:hasAt(element,scope) then return end
    if not self.contents[scope] then
        self.contents[scope] = {}
    end
    self.contents[scope][element] = true
    return self
end

We patch thusly:

function XGeneric:addAt(element, scope)
    assert(scope~=nil,"nil scope "..tostring(element))
    if XSet:isXSet(scope) and scope:isNull() then scope = NULL end
    if self:hasAt(element,scope) then return end
    if not self.contents[scope] then
        self.contents[scope] = {}
    end
    self.contents[scope][element] = true
    return self
end

If the scope is a set and empty, we replace it with NULL. Then it’ll be found in the fetch. This tells me we’re on the right track, but it’s not really good enough.

If we’re going to allow sets as scopes, we can create the same problem by creating two sets that are equal but not identical and then use them as scopes. We can break subset that way as well.

Is that unclear? Let’s do it.

        _:test("Extended Sets with set scopes", function()
            local R1 = XSet:fromTable{name="Ron"}
            local R2 = XSet:fromTable{name="Ron"}
            local S = XSet()
            S:addAt("foo",R1)
            _:expect(S:hasAt("foo",R2)).is(true)
        end)

This test fails, despite R1 equaling R2. Let’s express that, by the way:

        _:test("Extended Sets with set scopes", function()
            local R1 = XSet:fromTable{name="Ron"}
            local R2 = XSet:fromTable{name="Ron"}
            _:expect(R1,"equality").is(R2)
            local S = XSet()
            S:addAt("foo",R1)
            _:expect(S:hasAt("foo",R2), "has").is(true)
        end)

The is passes and the has does not:

46: Extended Sets with set scopes has -- Actual: false, Expected: true

Now back to hasAt:

function XGeneric:hasAt(element, scope)
    local tab = self.contents[scope]
    if not tab then return false end
    if not XSet:isXSet(element) then
        return tab[element] == true
    else
        for e,_true in pairs(tab) do
            if e==element then
                return true
            end
        end
        return false
    end
end

This is a poorly factored function anyway, isn’t it? Let’s refactor just a bit:

function XGeneric:hasAt(element, scope)
    local tab = self:findContents(scope)
    if not tab then return false end
    if not XSet:isXSet(element) then
        return tab[element] == true
    else
        for e,_true in pairs(tab) do
            if e==element then
                return true
            end
        end
        return false
    end
end

function XGeneric:findContents(aScope)
    return self.contents[aScope]
end

We just extracted a method findContents. All the tests still run except for my broken one. Now let’s improve findContents:

function XGeneric:findContents(aScope)
    if not XSet:isXSet(aScope) then
        return self.contents[aScope]
    end
    for sc,el in pairs(self.contents) do
        if sc == aScope then return el end
    end
    return nil
end

Tests are all green. We’ve fixed the bug. Let’s refactor to make this code more nearly right.

function XGeneric:findContents(aScope)
    if not XSet:isXSet(aScope) then
        return self.contents[aScope]
    else
        return self:findLongWay(aScope)
    end
end

function XGeneric:findLongWay(aScope)
    for sc,el in pairs(self.contents) do
        if sc == aScope then return el end
    end
end

That’s more expressive. Let’s look at the top level method as well:

function XGeneric:hasAt(element, scope)
    local tab = self:findContents(scope)
    if not tab then return false end
    if not XSet:isXSet(element) then
        return tab[element] == true
    else
        for e,_true in pairs(tab) do
            if e==element then
                return true
            end
        end
        return false
    end
end

That last bit needs extracting:

function XGeneric:hasAt(element, scope)
    local tab = self:findContents(scope)
    if not tab then return false end
    if not XSet:isXSet(element) then
        return tab[element] == true
    else
        return self:searchTabForElement(tab,element)
    end
end

function XGeneric:searchTabForElement(tab,element)
    for e,_true in pairs(tab) do
        if e==element then
            return true
        end
    end
    return false
end

Given that tab is a raw Lua table, that’s close to the best we can do. Now as I look at this code, it’s pretty complicated, and I think it could fail. That said, we have a lot of tests that rely on hasAt working, so we can be indirectly confident that it works. Unfortunately, we just discovered cases where it didn’t work, so we should probably consider whether we can do better.

However, we’re green, so let’s get a long-overdue commit in place: Improve tests for intersect. Test and fix use of sets as scopes.

Heads Up!

It’s time to lift my head up from the muck and think about where we are and what we might do.

We set out to make a couple of tests more clear and followed our nose to a deep bug relating to sets as scopes. (I still think I have never really fully committed to that idea, but we had implemented enough of the idea to have a bug in it.

We did a good thing, and wrote small tests to track down and isolate the problem, then fixed the problem and saw the tests run. So I am pretty confident that we’ve improved the system.

But this code isn’t obviously correct, I suggest:

function XGeneric:hasAt(element, scope)
    local tab = self:findContents(scope)
    if not tab then return false end
    if not XSet:isXSet(element) then
        return tab[element] == true
    else
        return self:searchTabForElement(tab,element)
    end
end

function XGeneric:searchTabForElement(tab,element)
    for e,_true in pairs(tab) do
        if e==element then
            return true
        end
    end
    return false
end

function XGeneric:findContents(aScope)
    if not XSet:isXSet(aScope) then
        return self.contents[aScope]
    else
        return self:findLongWay(aScope)
    end
end

function XGeneric:findLongWay(aScope)
    for sc,el in pairs(self.contents) do
        if sc == aScope then return el end
    end
end

Let’s rename that search to tableHas and see if that’s better:

function XGeneric:hasAt(element, scope)
    local tab = self:findContents(scope)
    if not tab then return false end
    if not XSet:isXSet(element) then
        return tab[element] == true
    else
        return self:tableHas(tab,element)
    end
end

function XGeneric:tableHas(tab,element)
    for e,_true in pairs(tab) do
        if e==element then
            return true
        end
    end
    return false
end

Still too much going on at the beginning. In looking to improve that, I notice this:

function XGeneric:findLongWay(aScope)
    for sc,el in pairs(self.contents) do
        if sc == aScope then return el end
    end
end

That should be returning nil, and it does but sort of accidentally. But let’s be more clear in there as well. We are searching the contents, so el is always a table. Let’s make that clear, and return a nil.

function XGeneric:findLongWay(aScope)
    for sc,tab in pairs(self.contents) do
        if sc == aScope then return tab end
    end
    return nil
end

Now look again at the top:

function XGeneric:hasAt(element, scope)
    local tab = self:findContents(scope)
    if not tab then return false end
    if not XSet:isXSet(element) then
        return tab[element] == true
    else
        return self:tableHas(tab,element)
    end
end

What if we were to return an empty table instead of nil, and not check tab?

Wouldn’t we drop right through finding nothing? The check for tab is an optimization, not a requirement. Let’s try.

function XGeneric:hasAt(element, scope)
    local tab = self:findContents(scope)
    if not XSet:isXSet(element) then
        return tab[element] == true
    else
        return self:tableHas(tab,element)
    end
end

function XGeneric:tableHas(tab,element)
    for e,_true in pairs(tab) do
        if e==element then
            return true
        end
    end
    return false
end

function XGeneric:findContents(aScope)
    if not XSet:isXSet(aScope) then
        return self.contents[aScope] or {} -- <--
    else
        return self:findLongWay(aScope)
    end
end

function XGeneric:findLongWay(aScope)
    for sc,tab in pairs(self.contents) do
        if sc == aScope then return tab end
    end
    return {} -- <--
end

We needed to return a table from both the find methods, thus the or in findContents. Tests are green. Still too many ifs to make me happy.

Is that first if in hasAt just an optimization now? Can we remove the if?

function XGeneric:hasAt(element, scope)
    local tab = self:findContents(scope)
    return self:tableHas(tab,element)
end

Tests pass. Can’t we do the other search the long way as well?

function XGeneric:hasAt(element, scope)
    local tab = self:findContents(scope)
    return self:tableHas(tab,element)
end

function XGeneric:tableHas(tab,element)
    for e,_true in pairs(tab) do
        if e==element then
            return true
        end
    end
    return false
end

function XGeneric:findContents(aScope)
        return self:findLongWay(aScope)
end

function XGeneric:findLongWay(aScope)
    for sc,tab in pairs(self.contents) do
        if sc == aScope then return tab end
    end
    return {}
end

Tests pass. Consolidate upward.

function XGeneric:hasAt(element, scope)
    local tab = self:findContents(scope)
    return self:tableHas(tab,element)
end

function XGeneric:tableHas(tab,element)
    for e,_true in pairs(tab) do
        if e==element then
            return true
        end
    end
    return false
end

function XGeneric:findContents(aScope)
    for sc,tab in pairs(self.contents) do
        if sc == aScope then return tab end
    end
    return {}
end

The findContents name isn’t good. It’s really findContentsAtScope or:

function XGeneric:hasAt(element, scope)
    local tab = self:findContentsAt(scope)
    return self:tableHas(tab,element)
end

function XGeneric:findContentsAt(aScope)
    for sc,tab in pairs(self.contents) do
        if sc == aScope then return tab end
    end
    return {}
end

function XGeneric:tableHas(tab,element)
    for e,_true in pairs(tab) do
        if e==element then
            return true
        end
    end
    return false
end

Meh. Better but not great. How about this:

function XGeneric:hasAt(element, scope)
    local elements = self:elementsAtScope(scope)
    return self:elementsInclude(elements,element)
end

function XGeneric:elementsAtScope(aScope)
    for sc,tab in pairs(self.contents) do
        if sc == aScope then return tab end
    end
    return {}
end

function XGeneric:elementsInclude(tab,element)
    for e,_true in pairs(tab) do
        if e==element then
            return true
        end
    end
    return false
end

That’s better. This will be better yet:

function XGeneric:elementsAtScope(aScope)
    for scope,elementsTable in pairs(self.contents) do
        if scope == aScope then return elementsTable end
    end
    return {}
end

What I’d really like to be able to do is defer these issues to the table. But let’s commit this and then think about that. Commit: refining XGeneric:hasAt.

Feature Envy

We’ve clearly got some feature envy going on here. Instead of table helping us, we’re writing code to look at the elements. Now in our own XSets, we have select, every and so on. It would be nice to have those on table but Lua tables aren’t that smart.

We could fix that. But I think that is a bit larger undertaking than what we should do now.

We’ll save that for another day, perhaps tomorrow.

Let’s try to figure out what happened today.

Summary

I was initially thinking that tests are hard to write in the XSet2 program, and that we might learn something about the code by looking at what’s hard to do. Instead, I discovered some tests that could be written more simply now than they initially were.

The message there is a mixed one, I think. The initial tests were tentative so I checked a lot of details as I went along, to be sure that the inputs had the right elements and so on. But the system is stronger now, so I wrote a simpler test, which seemed to cover the older more complicated ones, so I permitted myself to remove the old ones.

The result there is that the tests for that feature, intersect, are simpler and easier to understand. That’s a win for readability, and I didn’t think the older tests showed any useful history.

That got me looking around, and I uncovered a slightly subtle defect, that a set with NULL scopes would not show as equal to the same set with other empty sets not identical to NULL, which can in fact exist.

That led to a fix in the generic hasAt, followed by some refactoring. I pushed the refactoring a few times, resulting in three simpe methods where I had four or five more complex ones. The new methods do not take shortcuts that the old ones took. In particular, they’ll search a table even when it could be directly indexed. That will definitely be slower, though I can’t detect it in my tests. I prefer the more expressive code.

And in that “final” code, we detect feature envy, specifically that tables aren’t helping us out with methods like every, map and reduce. That seems like a bigger bite than I want to take today, or at least bigger than I want to write up. I actually did spike a lot of the table functions, and we’ll explore that tomorrow.

So, odd morning. I set out to find a nice simple case of a test telling me the code needed a small but tasty improvement, and instead … what I just described.

Strawberries are hard to find. Ways to improve the code, not so much, they’re all around us.

Huh. See you next time!