XST 7: Design

How do you design a thing like this, Brain? Same as everything else? Or not?

Pretty much every decent design has layers, I figure. User interface, calculation, file access. It’s not just an onion kind of layers, but there are isolated bits of capability that are essentially subordinate to others. When we mix different kinds of things together, it seems to work for a while and then it begins to get messy and troublesome, until, too often, we can’t make progress through the jungle that we ourselves have built.

So we create abstractions. Files are just a cover on disk blocks. Objects are just a cover on tables and naming conventions, and so on. We do best, I reckon, when we use abstractions that have been worked out by our betters. I could write a file system, indeed I have done. But at the end of the day my time is probably better spent providing something that my customers can appreciate. Unless my customers are people who buy file systems.

It is a fair question to ask what I think I’m doing in this series. Surely I’m not proposing to write an information management language in Lua for the iPad. Even I couldn’t be so naive as to think that would be a best-selling money-maker. So what is the point?

Avoid Reality: Part of the point of these articles, frankly, is to avoid reality. I can only take so much reality, and there’s a lot of it today, counting the death of the planet, the deathly illness of democracy, the deathly illness, and, come to think of it, I’m pretty darn old.
Change Reality: A small part of what I do here is to attempt to change reality as I see it. My goals there are small. Mostly I just want to change the number of developers and projects that I see, that seem to me to be in trouble for socio-technical reasons. Mostly here, I write technical, but sometimes socio. I hope to provide ideas that people will find interesting enough that they’ll think about them, and then do something that makes their lives a bit better.
Exercise Brain: I find programming to be an enjoyable brain exercise. I just like it. I like setting problems that are just a bit larger than a day or a week, and working with them. I feel that it helps to keep me mentally sharp.
Work the Clay: Code is my clay. I shape it, move it around, work it until it takes on a shape that appeals to me. As I do that, I think about what shapes appeal to me, why they appeal, and how to get to a more appealing shape without tossing out the whole blob and starting over.
Bête Noire: I became fascinated with extended set theory back last century, and have tried working with it multiple times. It combines math and programming in a way that appeals to me. And, for purposes of these articles, it brings up lots of seemingly small concerns that all have to fit together smoothly.

OK, so I have reasons, or at least a rationale. Where might we go with this thing? I’m honestly not sure, though I suppose as long as we include “might” in the sentence, we might:

Show how a single small set of operations can support more than one internal data structure.
Move from a set of operations that our program can call to some kind of expression-building app that lets us query or otherwise process information. A little user-focused database language, perhaps.
Have separate set-theoretic layers, information and storage, each providing set-theoretic benefits.

Then there are the things that I never set out to do, but the inevitable learnings that come from not having enough tests, or the right tests, so that strange defects arise, confusing and embarrassing me.

I think Dave Childs might argue that set theory can provide a more reliable implementation, because the operations are formally defined, and with a few primitives that just work, everything else is provably correct. That’s as may be, but I can tell you as a programmer that there’s a big difference between theory and practice. I do think well-understood operations help, but we see in our code so far that there are places for mistakes to be made.

Maybe a second set-theoretic storage layer would help with that. Maybe we’ll find out.

Let’s get to it.

Today …

Today we’ll start by reviewing some code, to see whether it seems robust enough and factored enough. I have some concerns.

Last week, I created the set iterator, elements(), which lets us run a block of code over every element of a set. And we have a few bits of code that do that:

function XSet:exists(booleanFunction)
    for element,scope in self:elements() do
        if booleanFunction(element, scope) then
            return true
        end
    end
    return false
end

This function applies a boolean function to every element of a set until the function returns true (telling us that there exists an element where the function is true), or until there are none left, in which case it returns false.

We also have the function iterate:

function XSet:iterate(f)
    for element, scope in self:elements() do
        f(element, scope)
    end
end

This might be better named apply. It applies a function to every element of the set.

We use iterate in union, I believe:

function XSet:union(anXSet)
    local result = XSet()
    local add = function(element, scope) result:addAt(element,scope) end
    self:iterate(add)
    anXSet:iterate(add)
    return result:lock()
end

You may recall that we modified addAt so that it does not add duplicates to a set, at some expense.

We also have restrict, which troubles me somehow:

function XSet:restrict(B)
    -- return all elements (a) of self such that
    --  there exists a record (b) in B such that
    --      b:intersect(a) equals b 
    local result = XSet()
    local function doesBexist(a,s)
        if B:doesIntersectMatchExist(a,s) then
            result:addAt(a,s)
        end
    end
    self:iterate(doesBexist)
    return result
end

function XSet:doesIntersectMatchExist(a)
    -- return true if some element b of this set 
    --    has an intersect match with a 
    --    i.e. b:intersect(a) == b
    local function checkIntersects(b,s)
        return b:intersect(a):equals(b)
    end
    return self:exists(checkIntersects)
end

Somehow this seems to me to be too many levels. Too many notes. The comment at the top of restrict seems simpler than the code, and the code is not an obvious translation of the comment.

Frequent readers may recall that I’ve tried at least three or four times to write restrict, just trying to make it work. Perhaps more. That could be a sign that I’m not very bright, but I choose to interpret it as a sign that my objects and methods aren’t helping me as much as they could. Restrict is actually a pretty simple operation, and it should be possible to express it more simply.

Hey. We’ve been saying

a:intersect(b):equals(b)

Is that the same as b:isSubset(a)? I think it is. If it is, does it help?

Nota Bene

We see here an issue with set theory: we see what may be a different way of expressing the same question. To be certain that it is equivalent, we need to do some math. We need to prove a theorem, which most of us here at my house last did in the early 1960’s.

I do have some tests, so I can at least try this:

function XSet:doesIntersectMatchExist(a)
    -- return true if some element b of this set 
    --    has an intersect match with a 
    --    i.e. b:intersect(a) == b
    local function checkIntersects(b,s)
        -- return b:intersect(a):equals(b)
        return b:isSubset(a)
    end
    return self:exists(checkIntersects)
end

The tests run. This is a good sign, and I also tried scribbling out a partial proof, which at least didn’t convince me I was wrong.

Let’s do some improving:

function XSet:doesIntersectMatchExist(a)
    -- return true if some element b of this set 
    --    has an intersect match with a 
    --    i.e. b:intersect(a) == b
    --    i.e. b:isSubset(a)
    return self:exists(function(b,s) return b:isSubset(a) end)
end

Let’s rename the method. What is a good word of phrase for “has an element that is a subset of this set”?

function XSet:hasSubsetElement(a)
    -- return true if some element b of this set 
    --    is a subset of a
    --    i.e. b:isSubset(a)
    return self:exists(function(b,s) return b:isSubset(a) end)
end

function XSet:restrict(B)
    -- return all elements (a) of self such that
    --  there exists a record (b) in B such that
    --      b:intersect(a) equals b 
    local result = XSet()
    local function doesBexist(a,s)
        if B:hasSubsetElement(a,s) then
            result:addAt(a,s)
        end
    end
    self:iterate(doesBexist)
    return result
end

Test: All run. Commit: simplify restrict using hasSubsetElement.

Let’s rename that internal function:

function XSet:restrict(B)
    -- return all elements (a) of self such that
    --  there exists a record (b) in B such that
    --      b:intersect(a) equals b 
    local result = XSet()
    local function copyIfBMatches(a,s)
        if B:hasSubsetElement(a,s) then
            result:addAt(a,s)
        end
    end
    self:iterate(copyIfBMatches)
    return result
end

Test, commit: rename internal function

I wonder if I’d like that better with the function inline:

function XSet:restrict(B)
    -- return all elements (a) of self such that
    --  there exists a record (b) in B such that
    --      b:subset(a)
    local result = XSet()
    self:iterate(function(a,s)
        if B:hasSubsetLement(a,s) then
            result:addAt(a,s)
        end
    end)
    return result
end

Spelling matters: hasSubsetElement. Test commit: inline selection function in restrict.

But iterate just loops over the records. We can rewrite:

function XSet:restrict(B)
    -- return all elements (a) of self such that
    --  there exists a record (b) in B such that
    --      b:subset(a)
    local result = XSet()
    for a,s in self:elements() do
        if B:hasSubsetElement(a,s) then
            result:addAt(a,s)
        end
    end
    return result
end

I think that’s better. The iterate just made things more opaque, I think. Test, commit: drop use of iterate in restrict.

Let’s see what other uses of iterate we have. This seems like it would always make things better.

function XSet:intersect(anXSet)
    local result = XSet()
    local cont = anXSet.contents
    local added = false
    anXSet:iterate(function(element,scope)
        if self:hasAt(element,scope) then
            result:addAt(element, scope)
            added = true
        end
    end)
    return added and result:lock() or XSet.NULL
end

We can do just the same here:

function XSet:intersect(anXSet)
    local result = XSet()
    local cont = anXSet.contents
    local added = false
    for element,scope in anXSet:elements() do
        if self:hasAt(element,scope) then
            result:addAt(element, scope)
            added = true
        end
    end
    return added and result:lock() or XSet.NULL
end

This works. Commit: inline iterate in intersect.

Looking at the method above I notice that it does rather an odd thing: instead of iterating itself, it iterates the other set. That is surely correct, but why not do the more obvious thing:

function XSet:intersect(anXSet)
    local result = XSet()
    local cont = anXSet.contents
    local added = false
    for element,scope in self:elements() do
        if anXSet:hasAt(element,scope) then
            result:addAt(element, scope)
            added = true
        end
    end
    return added and result:lock() or XSet.NULL
end

Tests run. Commit: simplify intersect.

We were looking for calls to iterate.

There’s this debug one:

function XSet:display(message)
    if true then return end
    result = message or ""
    local function d(e,s)
        result = result.." "..e.."@"..s..", "
    end
    self:iterate(d)
end

Might as well change it. Curiously, I note that it doesn’t actually do anything. I think it’s suppose to print, if we comment out the if.

function XSet:display(message)
    if true then return end
    result = message or ""
    for e,s in self:elements() do
        result = result.." "..e.."@"..s..", "
    end
    print(result)
end

The idea was that I could sprinkle display around and then turn it off. Anyway, test and commit: display uses elements not iterate.

The remaining use of iterate is here:

function XSet:union(anXSet)
    local result = XSet()
    local add = function(element, scope) result:addAt(element,scope) end
    self:iterate(add)
    anXSet:iterate(add)
    return result:lock()
end

We could write this out longhand, and if we do, we’ll have even more duplicated code. We see that both calls to iterate simply copy all the elements of a set into the result set. Let’s code that:

function XSet:union(anXSet)
    local result = XSet()
    self:copyInto(result)
    anXSet:copyInto(result)
    return result:lock()
end

function XSet:copyInto(anXSet)
    for e,s in self:elements() do
        anXSet:addAt(e,s)
    end
end

Tests run. Commit: recode union to use copyInto.

Let’s step back and think about what we’ve just done.

Lookin Inta Histry Back

We have changed several instances of code like this:

local function doSomething(e,s)
    blah()
    mumble()
end
self:iterate(doSmething)

Into code like this:

for e,s in self:elements() do
    blah()
    mumble()
end

The code is uniformly shorter (by one line) and is less obscure, since Codea’s lambda equivalent is pretty wordy.

However, doing an explicit for loop rather than iterate seems a bit less abstract. Certainly it relies on the fact that we can loop over the elements of a set with a for loop, where iterate allowed us to hide that mechanism and let it be anything we want.

I think we gain a bit here, in that the code is less obscure, and lose a little, in that it is a bit more tired to the particular implementation. That said, it seems to me that many if not most set operations do “loop” over their elements.

Still, I’m not entirely comfortable. We have removed a few lines of code and while the result may be less “abstract”, I think it’s at least equally clear. We can’t quite remove iterate yet, because it’s used in the tests. We can fix that. Let’s do.

Oh, I found an actual use:

function XSet:card()
    local card = 0
    self:iterate(function(e,s) card = card+1 end)
    return card
end

function XSet:card()
    local card = 0
    for e,s in self:elements() do
        card = card+1
    end
    return card
end

Silly me, I shouldn’t have jumped to conclusions about where the uses were:

function XSet:isSubset(other)
    local flag = true
    local function find(e,s)
        if other:hasntAt(e,s) then
            flag = false
        end
    end
    self:iterate(find)
    return flag
end

function XSet:isSubset(other)
    local flag = true
    for e,s in self:elements() do
        if other:hasntAt(e,s) then
            flag = false
        end
    end
    return flag
end

Test. Commit: removed all uses of iterate and the function itself.

That last operation, however, offers an opportunity, I think. First a comment:

function XSet:isSubset(other)
    -- return whether self is a subset of other
    -- for every element e,s of self
    --   is e element-s of other
    local flag = true
    for e,s in self:elements() do
        if other:hasntAt(e,s) then
            flag = false
        end
    end
    return flag
end

Then, I notice we can use the little-used break statement here:

function XSet:isSubset(other)
    -- return whether self is a subset of other
    -- for every element e,s of self
    --   is e element-s of other
    local flag = true
    for e,s in self:elements() do
        if other:hasntAt(e,s) then
            flag = false
            break
        end
    end
    return flag
end

Can we do the same in exists? Ah, in essence we do:

function XSet:exists(booleanFunction)
    for element,scope in self:elements() do
        if booleanFunction(element, scope) then
            return true
        end
    end
    return false
end

Let’s write every:

function XSet:every(booleanFunction)
    for element,scope in self:elements() do
        if not booleanFunction(element,scope) then
            return false
        end
    end
    return true
end

Sorry, I got on a roll there and wrote that without a test. But I do have one in mind, right here:

function XSet:isSubset(other)
    -- return whether self is a subset of other
    -- for every element e,s of self
    --   is e element-s of other
    return self:every(function(e,s) return other:hasAt(e,s) end)
end

Tests run. Commit: implement every, use in isSubset.

Retro again.

Retro

OK, I like every and exists, and I don’t like iterate. They seem so close to the same yet I like one and not the other. Let’s try a comparison. Put iterate back:

function XSet:iterate(f)
    for e,s in self:elements() do
        f(e,s)
    end
end

We have this:

function XSet:intersect(anXSet)
    local result = XSet()
    local cont = anXSet.contents
    local added = false
    for element,scope in self:elements() do
        if anXSet:hasAt(element,scope) then
            result:addAt(element, scope)
            added = true
        end
    end
    return added and result:lock() or XSet.NULL
end

We could write it this way:

function XSet:intersect(anXSet)
    local result = XSet()
    local cont = anXSet.contents
    local added = false
    self:iterate(function(element,scope)
        if anXSet:hasAt(element,scope) then
            result:addAt(element, scope)
            added = true
        end
    end)
    return added and result:lock() or XSet.NULL
end

What if we renamed iterate:

function XSet:elementsDo(f)
    for e,s in self:elements() do
        f(e,s)
    end
end

function XSet:intersect(anXSet)
    local result = XSet()
    local cont = anXSet.contents
    local added = false
    self:elementsDo(function(element,scope)
        if anXSet:hasAt(element,scope) then
            result:addAt(element, scope)
            added = true
        end
    end)
    return added and result:lock() or XSet.NULL
end

Is that better than the for loop, or not? Let’s try the shorter names e and s, which we conventionally use and which are therefore “OK”.

function XSet:intersect(anXSet)
    local result = XSet()
    local added = false
    self:elementsDo(function(e,s)
        if anXSet:hasAt(e,s) then
            result:addAt(e, s)
            added = true
        end
    end)
    return added and result:lock() or XSet.NULL
end

I removed that local cont thing, which was unused. An appendix perhaps.

It all comes down to a closing paren on the end and the difference between these two lines:

self:elementsDo(function(e,s)

and

for e,s, in self:elements() do

Honestly, it’s so close in my mind. I kind of prefer the elementsDo form, but I am sure that if I had any readers, some of them would prefer the for. I hope they tweet me their opinions.

I think we are coming close to another set of abstractions, though.

Look at intersect again:

function XSet:intersect(anXSet)
    local result = XSet()
    local added = false
    self:elementsDo(function(e,s)
        if anXSet:hasAt(e,s) then
            result:addAt(e, s)
            added = true
        end
    end)
    return added and result:lock() or XSet.NULL
end

We recently wrote this function:

function XSet:copyInto(anXSet)
    for e,s in self:elements() do
        anXSet:addAt(e,s)
    end
end

What if we had this:

function XSet:intersect(anXSet)
    local result = XSet()
    local added = self:copyIntoWhere(result, function(e,s)
        return anXSet:hasAt(e,s)
    end)
    return added and result:lock() or XSet.NULL
end

function XSet:copyIntoWhere(result, booleanFunction)
    local added = false
    self:elementsDo(function(e,s)
        if(booleanFunction(e,s)) then
            result:addAt(e,s)
            added = true
        end
    end)
    return added
end

The new copyIntoWhere copies all the elements where the supplied boolean function is true, and returns true if any were copied. It’s only a tiny bit tricky and look how much it simplified intersect.

Well, I say simplified. We can perhaps do better.

Let’s try this:

function XSet:intersect(anXSet)
    return self:elementsSuchThat(function(e,s)
        return anXSet:hasAt(e,s)
    end)
end

function XSet:elementsSuchThat(booleanFunction)
    local result = XSet()
    local added = self:copyIntoSuchThat(result, booleanFunction)
    return added and result:lock() or XSet.NULL
end

function XSet:copyIntoSuchThat(result, booleanFunction)
    local added = false
    self:elementsDo(function(e,s)
        if(booleanFunction(e,s)) then
            result:addAt(e,s)
            added = true
        end
    end)
    return added
end

I like that definition of intersect. The other two functions are rather internal, so they won’t be things that users have to think about. Tests run. Commit: implement elementsSuchThat and copyIntoSuchThat and use in intersect.

Now you may be asking yourself, “Self, it seems like he’s kind of circling here. Why doesn’t he make up his mind?”

And the answer is:

It is not by mind alone that I set my ideas in motion. It is by the code itself that the mind acquires understanding, the code acquires clarity, the tests become a warning. It is by mind and code together that I set my ideas in motion.

Or, well, maybe “Because the code tells me things that my marination of the code does not.”

Let’s sum up.

Summary

Again, the big lesson is that even the plans of three hours ago are just a vague picture of what we actually do when we get to the code face. The code tells us what we have done, and what we might do. We work the code. We don’t just dig out more code. We’re not trying to make a pile of code, we’re trying to make a sensible structure of code.

The smaller lesson, or discovery, is that we are getting closer to methods that seem consistent with the language of set theory, and since that’s our intended design model, that’s a good thing.

We have a method exists that tells us whether any element has a given property. We have every that tells us whether all elements have a property. We have elementsDo that applies a function to all the elements of a set, and we have for e,s in elements do, which does the same with a “lower level” statement.

I think that elementsDo is more general and certainly it is more abstract. It doesn’t require that we have the ability to iterate a set with for. That said, we do have that ability, and it’s kind of useful.

I remain uncertain whether one of these is better than the other and we should converge on it, or whether both should exist.

I think the recent learning of how to write an iterator using coroutines means that we can always make the for syntax work, and of course if it works, then iterate works. It’ll come down to which is more comfortable.

We might well come down to having some standard functions and predicates to use everywhere, rather than making them quite so local. I’m inclined to push that way, to see how mathematical we can make the code look.

Normally, that would be very questionable, but here, we are actually trying to implement mathematics, so it makes more sense than usual to accept the math form.

That said, we might consider this:

function XSet:elementsSuchThat(predicate)
    local result = XSet()
    local added = self:copyIntoSuchThat(result, predicate)
    return added and result:lock() or XSet.NULL
end

function XSet:copyIntoSuchThat(result, predicate)
    local added = false
    self:elementsDo(function(e,s)
        if(predicate(e,s)) then
            result:addAt(e,s)
            added = true
        end
    end)
    return added
end

They are called predicates in math.

I note that I’m moving closer and closer to single-letter variables. This is common in math, of course, and that’s why I’m doing it, mostly unconsciously. This flies in the face of our usual advice about meaningful names, but one wants to argue that in math, we conventionally pick single letters and use them consistently. Here we’ve got e for element and s for scope.

We’ll see if we mind it. I myself rather like it.

Overall

Overall, the code is getting smaller and tighter. We’re seeing capabilities coming out of the fog that do things we find generally useful. We’re not just imagining this product on paper or in our head (though I am thinking about it a lot). We’re pushing the pieces around to see how they best fit together.

Would we have the luxury of doing this on a “real” project? Perhaps not. Perhaps to the detriment of the project, or the detriment of the people working on it. (I did get um encouraged to move on, when I protected the set theory project longer than the executives were willing to wait. That said, I protected it long enough to let the team get it working. Maybe that was a good thing.)

Here, I’d really like to discover the best way I can to implement the set operations, to get a sense of how useful a real set theory project might be. Why? We’ve been over that.

Because I want to. And for you? There are lessons in here about your own work, every day. Some, I’ll draw out. Some, you just have to watch me and pick them up.

Tweets more than welcome. Email too, if you wish. Or even the Agile Mentoring slack.

I hope to see you around!