XST 7: Design
How do you design a thing like this, Brain? Same as everything else? Or not?
Pretty much every decent design has layers, I figure. User interface, calculation, file access. It’s not just an onion kind of layers, but there are isolated bits of capability that are essentially subordinate to others. When we mix different kinds of things together, it seems to work for a while and then it begins to get messy and troublesome, until, too often, we can’t make progress through the jungle that we ourselves have built.
So we create abstractions. Files are just a cover on disk blocks. Objects are just a cover on tables and naming conventions, and so on. We do best, I reckon, when we use abstractions that have been worked out by our betters. I could write a file system, indeed I have done. But at the end of the day my time is probably better spent providing something that my customers can appreciate. Unless my customers are people who buy file systems.
It is a fair question to ask what I think I’m doing in this series. Surely I’m not proposing to write an information management language in Lua for the iPad. Even I couldn’t be so naive as to think that would be a best-selling money-maker. So what is the point?
- Avoid Reality
- Part of the point of these articles, frankly, is to avoid reality. I can only take so much reality, and there’s a lot of it today, counting the death of the planet, the deathly illness of democracy, the deathly illness, and, come to think of it, I’m pretty darn old.
- Change Reality
- A small part of what I do here is to attempt to change reality as I see it. My goals there are small. Mostly I just want to change the number of developers and projects that I see, that seem to me to be in trouble for socio-technical reasons. Mostly here, I write technical, but sometimes socio. I hope to provide ideas that people will find interesting enough that they’ll think about them, and then do something that makes their lives a bit better.
- Exercise Brain
- I find programming to be an enjoyable brain exercise. I just like it. I like setting problems that are just a bit larger than a day or a week, and working with them. I feel that it helps to keep me mentally sharp.
- Work the Clay
- Code is my clay. I shape it, move it around, work it until it takes on a shape that appeals to me. As I do that, I think about what shapes appeal to me, why they appeal, and how to get to a more appealing shape without tossing out the whole blob and starting over.
- Bête Noire
- I became fascinated with extended set theory back last century, and have tried working with it multiple times. It combines math and programming in a way that appeals to me. And, for purposes of these articles, it brings up lots of seemingly small concerns that all have to fit together smoothly.
OK, so I have reasons, or at least a rationale. Where might we go with this thing? I’m honestly not sure, though I suppose as long as we include “might” in the sentence, we might:
- Show how a single small set of operations can support more than one internal data structure.
- Move from a set of operations that our program can call to some kind of expression-building app that lets us query or otherwise process information. A little user-focused database language, perhaps.
- Have separate set-theoretic layers, information and storage, each providing set-theoretic benefits.
Then there are the things that I never set out to do, but the inevitable learnings that come from not having enough tests, or the right tests, so that strange defects arise, confusing and embarrassing me.
I think Dave Childs might argue that set theory can provide a more reliable implementation, because the operations are formally defined, and with a few primitives that just work, everything else is provably correct. That’s as may be, but I can tell you as a programmer that there’s a big difference between theory and practice. I do think well-understood operations help, but we see in our code so far that there are places for mistakes to be made.
Maybe a second set-theoretic storage layer would help with that. Maybe we’ll find out.
Let’s get to it.
Today …
Today we’ll start by reviewing some code, to see whether it seems robust enough and factored enough. I have some concerns.
Last week, I created the set iterator, elements()
, which lets us run a block of code over every element of a set. And we have a few bits of code that do that:
function XSet:exists(booleanFunction)
for element,scope in self:elements() do
if booleanFunction(element, scope) then
return true
end
end
return false
end
This function applies a boolean function to every element of a set until the function returns true (telling us that there exists an element where the function is true), or until there are none left, in which case it returns false.
We also have the function iterate
:
function XSet:iterate(f)
for element, scope in self:elements() do
f(element, scope)
end
end
This might be better named apply
. It applies a function to every element of the set.
We use iterate
in union
, I believe:
function XSet:union(anXSet)
local result = XSet()
local add = function(element, scope) result:addAt(element,scope) end
self:iterate(add)
anXSet:iterate(add)
return result:lock()
end
You may recall that we modified addAt
so that it does not add duplicates to a set, at some expense.
We also have restrict
, which troubles me somehow:
function XSet:restrict(B)
-- return all elements (a) of self such that
-- there exists a record (b) in B such that
-- b:intersect(a) equals b
local result = XSet()
local function doesBexist(a,s)
if B:doesIntersectMatchExist(a,s) then
result:addAt(a,s)
end
end
self:iterate(doesBexist)
return result
end
function XSet:doesIntersectMatchExist(a)
-- return true if some element b of this set
-- has an intersect match with a
-- i.e. b:intersect(a) == b
local function checkIntersects(b,s)
return b:intersect(a):equals(b)
end
return self:exists(checkIntersects)
end
Somehow this seems to me to be too many levels. Too many notes. The comment at the top of restrict
seems simpler than the code, and the code is not an obvious translation of the comment.
Frequent readers may recall that I’ve tried at least three or four times to write restrict
, just trying to make it work. Perhaps more. That could be a sign that I’m not very bright, but I choose to interpret it as a sign that my objects and methods aren’t helping me as much as they could. Restrict is actually a pretty simple operation, and it should be possible to express it more simply.
Hey. We’ve been saying
a:intersect(b):equals(b)
Is that the same as b:isSubset(a)
? I think it is. If it is, does it help?
Nota Bene
We see here an issue with set theory: we see what may be a different way of expressing the same question. To be certain that it is equivalent, we need to do some math. We need to prove a theorem, which most of us here at my house last did in the early 1960’s.
I do have some tests, so I can at least try this:
function XSet:doesIntersectMatchExist(a)
-- return true if some element b of this set
-- has an intersect match with a
-- i.e. b:intersect(a) == b
local function checkIntersects(b,s)
-- return b:intersect(a):equals(b)
return b:isSubset(a)
end
return self:exists(checkIntersects)
end
The tests run. This is a good sign, and I also tried scribbling out a partial proof, which at least didn’t convince me I was wrong.
Let’s do some improving:
function XSet:doesIntersectMatchExist(a)
-- return true if some element b of this set
-- has an intersect match with a
-- i.e. b:intersect(a) == b
-- i.e. b:isSubset(a)
return self:exists(function(b,s) return b:isSubset(a) end)
end
Let’s rename the method. What is a good word of phrase for “has an element that is a subset of this set”?
function XSet:hasSubsetElement(a)
-- return true if some element b of this set
-- is a subset of a
-- i.e. b:isSubset(a)
return self:exists(function(b,s) return b:isSubset(a) end)
end
function XSet:restrict(B)
-- return all elements (a) of self such that
-- there exists a record (b) in B such that
-- b:intersect(a) equals b
local result = XSet()
local function doesBexist(a,s)
if B:hasSubsetElement(a,s) then
result:addAt(a,s)
end
end
self:iterate(doesBexist)
return result
end
Test: All run. Commit: simplify restrict using hasSubsetElement
.
Let’s rename that internal function:
function XSet:restrict(B)
-- return all elements (a) of self such that
-- there exists a record (b) in B such that
-- b:intersect(a) equals b
local result = XSet()
local function copyIfBMatches(a,s)
if B:hasSubsetElement(a,s) then
result:addAt(a,s)
end
end
self:iterate(copyIfBMatches)
return result
end
Test, commit: rename internal function
I wonder if I’d like that better with the function inline:
function XSet:restrict(B)
-- return all elements (a) of self such that
-- there exists a record (b) in B such that
-- b:subset(a)
local result = XSet()
self:iterate(function(a,s)
if B:hasSubsetLement(a,s) then
result:addAt(a,s)
end
end)
return result
end
Spelling matters: hasSubsetElement
. Test commit: inline selection function in restrict
.
But iterate
just loops over the records. We can rewrite:
function XSet:restrict(B)
-- return all elements (a) of self such that
-- there exists a record (b) in B such that
-- b:subset(a)
local result = XSet()
for a,s in self:elements() do
if B:hasSubsetElement(a,s) then
result:addAt(a,s)
end
end
return result
end
I think that’s better. The iterate
just made things more opaque, I think. Test, commit: drop use of iterate
in restrict
.
Let’s see what other uses of iterate
we have. This seems like it would always make things better.
function XSet:intersect(anXSet)
local result = XSet()
local cont = anXSet.contents
local added = false
anXSet:iterate(function(element,scope)
if self:hasAt(element,scope) then
result:addAt(element, scope)
added = true
end
end)
return added and result:lock() or XSet.NULL
end
We can do just the same here:
function XSet:intersect(anXSet)
local result = XSet()
local cont = anXSet.contents
local added = false
for element,scope in anXSet:elements() do
if self:hasAt(element,scope) then
result:addAt(element, scope)
added = true
end
end
return added and result:lock() or XSet.NULL
end
This works. Commit: inline iterate
in intersect
.
Looking at the method above I notice that it does rather an odd thing: instead of iterating itself, it iterates the other set. That is surely correct, but why not do the more obvious thing:
function XSet:intersect(anXSet)
local result = XSet()
local cont = anXSet.contents
local added = false
for element,scope in self:elements() do
if anXSet:hasAt(element,scope) then
result:addAt(element, scope)
added = true
end
end
return added and result:lock() or XSet.NULL
end
Tests run. Commit: simplify intersect
.
We were looking for calls to iterate
.
There’s this debug one:
function XSet:display(message)
if true then return end
result = message or ""
local function d(e,s)
result = result.." "..e.."@"..s..", "
end
self:iterate(d)
end
Might as well change it. Curiously, I note that it doesn’t actually do anything. I think it’s suppose to print, if we comment out the if.
function XSet:display(message)
if true then return end
result = message or ""
for e,s in self:elements() do
result = result.." "..e.."@"..s..", "
end
print(result)
end
The idea was that I could sprinkle display
around and then turn it off. Anyway, test and commit: display
uses elements
not iterate
.
The remaining use of iterate
is here:
function XSet:union(anXSet)
local result = XSet()
local add = function(element, scope) result:addAt(element,scope) end
self:iterate(add)
anXSet:iterate(add)
return result:lock()
end
We could write this out longhand, and if we do, we’ll have even more duplicated code. We see that both calls to iterate simply copy all the elements of a set into the result set. Let’s code that:
function XSet:union(anXSet)
local result = XSet()
self:copyInto(result)
anXSet:copyInto(result)
return result:lock()
end
function XSet:copyInto(anXSet)
for e,s in self:elements() do
anXSet:addAt(e,s)
end
end
Tests run. Commit: recode union
to use copyInto
.
Let’s step back and think about what we’ve just done.
Lookin Inta Histry Back
We have changed several instances of code like this:
local function doSomething(e,s)
blah()
mumble()
end
self:iterate(doSmething)
Into code like this:
for e,s in self:elements() do
blah()
mumble()
end
The code is uniformly shorter (by one line) and is less obscure, since Codea’s lambda equivalent is pretty wordy.
However, doing an explicit for
loop rather than iterate
seems a bit less abstract. Certainly it relies on the fact that we can loop over the elements of a set with a for loop, where iterate
allowed us to hide that mechanism and let it be anything we want.
I think we gain a bit here, in that the code is less obscure, and lose a little, in that it is a bit more tired to the particular implementation. That said, it seems to me that many if not most set operations do “loop” over their elements.
Still, I’m not entirely comfortable. We have removed a few lines of code and while the result may be less “abstract”, I think it’s at least equally clear. We can’t quite remove iterate
yet, because it’s used in the tests. We can fix that. Let’s do.
Oh, I found an actual use:
function XSet:card()
local card = 0
self:iterate(function(e,s) card = card+1 end)
return card
end
function XSet:card()
local card = 0
for e,s in self:elements() do
card = card+1
end
return card
end
Silly me, I shouldn’t have jumped to conclusions about where the uses were:
function XSet:isSubset(other)
local flag = true
local function find(e,s)
if other:hasntAt(e,s) then
flag = false
end
end
self:iterate(find)
return flag
end
function XSet:isSubset(other)
local flag = true
for e,s in self:elements() do
if other:hasntAt(e,s) then
flag = false
end
end
return flag
end
Test. Commit: removed all uses of iterate
and the function itself.
That last operation, however, offers an opportunity, I think. First a comment:
function XSet:isSubset(other)
-- return whether self is a subset of other
-- for every element e,s of self
-- is e element-s of other
local flag = true
for e,s in self:elements() do
if other:hasntAt(e,s) then
flag = false
end
end
return flag
end
Then, I notice we can use the little-used break
statement here:
function XSet:isSubset(other)
-- return whether self is a subset of other
-- for every element e,s of self
-- is e element-s of other
local flag = true
for e,s in self:elements() do
if other:hasntAt(e,s) then
flag = false
break
end
end
return flag
end
Can we do the same in exists
? Ah, in essence we do:
function XSet:exists(booleanFunction)
for element,scope in self:elements() do
if booleanFunction(element, scope) then
return true
end
end
return false
end
Let’s write every
:
function XSet:every(booleanFunction)
for element,scope in self:elements() do
if not booleanFunction(element,scope) then
return false
end
end
return true
end
Sorry, I got on a roll there and wrote that without a test. But I do have one in mind, right here:
function XSet:isSubset(other)
-- return whether self is a subset of other
-- for every element e,s of self
-- is e element-s of other
return self:every(function(e,s) return other:hasAt(e,s) end)
end
Tests run. Commit: implement every
, use in isSubset
.
Retro again.
Retro
OK, I like every
and exists
, and I don’t like iterate
. They seem so close to the same yet I like one and not the other. Let’s try a comparison. Put iterate
back:
function XSet:iterate(f)
for e,s in self:elements() do
f(e,s)
end
end
We have this:
function XSet:intersect(anXSet)
local result = XSet()
local cont = anXSet.contents
local added = false
for element,scope in self:elements() do
if anXSet:hasAt(element,scope) then
result:addAt(element, scope)
added = true
end
end
return added and result:lock() or XSet.NULL
end
We could write it this way:
function XSet:intersect(anXSet)
local result = XSet()
local cont = anXSet.contents
local added = false
self:iterate(function(element,scope)
if anXSet:hasAt(element,scope) then
result:addAt(element, scope)
added = true
end
end)
return added and result:lock() or XSet.NULL
end
What if we renamed iterate
:
function XSet:elementsDo(f)
for e,s in self:elements() do
f(e,s)
end
end
function XSet:intersect(anXSet)
local result = XSet()
local cont = anXSet.contents
local added = false
self:elementsDo(function(element,scope)
if anXSet:hasAt(element,scope) then
result:addAt(element, scope)
added = true
end
end)
return added and result:lock() or XSet.NULL
end
Is that better than the for loop, or not? Let’s try the shorter names e and s, which we conventionally use and which are therefore “OK”.
function XSet:intersect(anXSet)
local result = XSet()
local added = false
self:elementsDo(function(e,s)
if anXSet:hasAt(e,s) then
result:addAt(e, s)
added = true
end
end)
return added and result:lock() or XSet.NULL
end
I removed that local cont thing, which was unused. An appendix perhaps.
It all comes down to a closing paren on the end
and the difference between these two lines:
self:elementsDo(function(e,s)
and
for e,s, in self:elements() do
Honestly, it’s so close in my mind. I kind of prefer the elementsDo
form, but I am sure that if I had any readers, some of them would prefer the for
. I hope they tweet me their opinions.
I think we are coming close to another set of abstractions, though.
Look at intersect
again:
function XSet:intersect(anXSet)
local result = XSet()
local added = false
self:elementsDo(function(e,s)
if anXSet:hasAt(e,s) then
result:addAt(e, s)
added = true
end
end)
return added and result:lock() or XSet.NULL
end
We recently wrote this function:
function XSet:copyInto(anXSet)
for e,s in self:elements() do
anXSet:addAt(e,s)
end
end
What if we had this:
function XSet:intersect(anXSet)
local result = XSet()
local added = self:copyIntoWhere(result, function(e,s)
return anXSet:hasAt(e,s)
end)
return added and result:lock() or XSet.NULL
end
function XSet:copyIntoWhere(result, booleanFunction)
local added = false
self:elementsDo(function(e,s)
if(booleanFunction(e,s)) then
result:addAt(e,s)
added = true
end
end)
return added
end
The new copyIntoWhere
copies all the elements where the supplied boolean function is true, and returns true if any were copied. It’s only a tiny bit tricky and look how much it simplified intersect.
Well, I say simplified. We can perhaps do better.
Let’s try this:
function XSet:intersect(anXSet)
return self:elementsSuchThat(function(e,s)
return anXSet:hasAt(e,s)
end)
end
function XSet:elementsSuchThat(booleanFunction)
local result = XSet()
local added = self:copyIntoSuchThat(result, booleanFunction)
return added and result:lock() or XSet.NULL
end
function XSet:copyIntoSuchThat(result, booleanFunction)
local added = false
self:elementsDo(function(e,s)
if(booleanFunction(e,s)) then
result:addAt(e,s)
added = true
end
end)
return added
end
I like that definition of intersect. The other two functions are rather internal, so they won’t be things that users have to think about. Tests run. Commit: implement elementsSuchThat
and copyIntoSuchThat
and use in intersect
.
Now you may be asking yourself, “Self, it seems like he’s kind of circling here. Why doesn’t he make up his mind?”
And the answer is:
It is not by mind alone that I set my ideas in motion. It is by the code itself that the mind acquires understanding, the code acquires clarity, the tests become a warning. It is by mind and code together that I set my ideas in motion.
Or, well, maybe “Because the code tells me things that my marination of the code does not.”
Let’s sum up.
Summary
Again, the big lesson is that even the plans of three hours ago are just a vague picture of what we actually do when we get to the code face. The code tells us what we have done, and what we might do. We work the code. We don’t just dig out more code. We’re not trying to make a pile of code, we’re trying to make a sensible structure of code.
The smaller lesson, or discovery, is that we are getting closer to methods that seem consistent with the language of set theory, and since that’s our intended design model, that’s a good thing.
We have a method exists
that tells us whether any element has a given property. We have every
that tells us whether all elements have a property. We have elementsDo
that applies a function to all the elements of a set, and we have for e,s in elements do
, which does the same with a “lower level” statement.
I think that elementsDo
is more general and certainly it is more abstract. It doesn’t require that we have the ability to iterate a set with for
. That said, we do have that ability, and it’s kind of useful.
I remain uncertain whether one of these is better than the other and we should converge on it, or whether both should exist.
I think the recent learning of how to write an iterator using coroutines means that we can always make the for
syntax work, and of course if it works, then iterate
works. It’ll come down to which is more comfortable.
We might well come down to having some standard functions and predicates to use everywhere, rather than making them quite so local. I’m inclined to push that way, to see how mathematical we can make the code look.
Normally, that would be very questionable, but here, we are actually trying to implement mathematics, so it makes more sense than usual to accept the math form.
That said, we might consider this:
function XSet:elementsSuchThat(predicate)
local result = XSet()
local added = self:copyIntoSuchThat(result, predicate)
return added and result:lock() or XSet.NULL
end
function XSet:copyIntoSuchThat(result, predicate)
local added = false
self:elementsDo(function(e,s)
if(predicate(e,s)) then
result:addAt(e,s)
added = true
end
end)
return added
end
They are called predicates
in math.
I note that I’m moving closer and closer to single-letter variables. This is common in math, of course, and that’s why I’m doing it, mostly unconsciously. This flies in the face of our usual advice about meaningful names, but one wants to argue that in math, we conventionally pick single letters and use them consistently. Here we’ve got e for element and s for scope.
We’ll see if we mind it. I myself rather like it.
Overall
Overall, the code is getting smaller and tighter. We’re seeing capabilities coming out of the fog that do things we find generally useful. We’re not just imagining this product on paper or in our head (though I am thinking about it a lot). We’re pushing the pieces around to see how they best fit together.
Would we have the luxury of doing this on a “real” project? Perhaps not. Perhaps to the detriment of the project, or the detriment of the people working on it. (I did get um encouraged to move on, when I protected the set theory project longer than the executives were willing to wait. That said, I protected it long enough to let the team get it working. Maybe that was a good thing.)
Here, I’d really like to discover the best way I can to implement the set operations, to get a sense of how useful a real set theory project might be. Why? We’ve been over that.
Because I want to. And for you? There are lessons in here about your own work, every day. Some, I’ll draw out. Some, you just have to watch me and pick them up.
Tweets more than welcome. Email too, if you wish. Or even the Agile Mentoring slack.
I hope to see you around!