XST: Theory to the Rescue
A Rank Idea
Remember that hack that makes us say “element.contents.contents” sometimes, when we really mean “element.contents”? The issue is in this code:
def record(scope) record_contents = @contents[scope*@record_length,@record_length] XSet.new(1,record_contents) end
The basic thing wrong with this is that we unconditionally put the “record” in an XSet wrapper. When the record in question is in fact a record from our outer relational-style sets, that’s correct. But when it’s a letter from someone’s last name, an element of our inner sets, it’s not correct. We’ve talked about creating a different subclass or some other solution to handle this issue.
I was reading a bunch of XST articles (and boy are my arms tired) because I was sure there was a word in there to describe a set all of whose elements are atomic. I didn’t find it, so I wrote Dave Childs a note and asked him. He responded by reminding me of something I should have known, and probably could have found in the articles, namely rank. The rank of a set is, informally, the deepest degree of nesting in the set. The rank of a set of atoms is 1. The rank of a set of sets, the highest being of rank N, is N+1. And, if all the elements of a set have the same rank, then their rank is the rank of the set, minus one.
Our mission, should we decide to accept it, is to use the concept of rank to fix this hack. While we’re at it, I think we should change the name of that method from :record to :element. Along the way, let’s talk a bit about the probable impact of theory on our code and vice versa.
Fix the Name
My plan is to rename the :record method, and other occurrences of the word “record”, to refer to “element”, which is more in line with the XST terminology. (We might revisit this decision as we get closer to the external interface of our final program, but for now we are in the internals, and as we are trying to implement an math theory, it makes sense to me to use terminology from the domain.) So:
def element(scope) element_contents = @contents[scope*@record_length,@record_length] XSet.new(1,element_contents) end
I’d like to rename @record_length too, but it’s a member variable and that would trigger two kinds of errors, so I’ll hold off on that. I’ll continue to rename local variables as we find things neededing to be changed. The tests will tell me what’s next. Most of them whine about this line:
1) Error: test_detect(TC_MyTest): NoMethodError: undefined method `record' for #<XSet:0x2b1c150> ./restricttest.rb:94:in `each' def each for scope in record_range yield ScopedElement.new(record(scope), scope) end end
Simple fix. I’d also like to change record_range, I think, but again, not now. Put it on the list and code:
def each
for scope in record_range
yield ScopedElement.new(element(scope), scope)
end
end
Tests now mostly whine about :subset?, which looks like this:
def subset?(larger_set) record_range.all? { | scope | larger_set.contains?(record(scope), scope) } end
That should be:
def subset?(larger_set)
record_range.all? { | scope |
larger_set.contains?(element(scope), scope)
}
end
Now, quelle surprise, it’s :contains:
def contains?(a_record, scope) record(scope).contents == a_record.contents end
Which becomes:
def contains?(a_set, scope) element(scope).contents == a_set.contents end
This fixes all the tests except the three that refer directly to the :record method and which need to be changed to refer to :element, resulting in:
def test_record_bytes input = XSet.new(1,"abcdef") assert_equal("b", input.element(1).contents) end def test_record assert_equal("132 ", @five_records.element(2).contents) end def test_match select = XSet.new(1,"1") assert(@five_records.match(@five_records.element(2), select.element(0))) end
Tests all run. The method is renamed. That would be easier with a refactoring tool, but it took much longer to write about it than it took to do it. Now, with your permission, I’ll rename @record_length and anything else I notice, and the program in toto (and your little dog too) looks like this:
class TC_MyTest < Test::Unit::TestCase def setup name_data = "Jeffries Ron Hendrickson ChetAnderson Ann Johnson Lee " @name_set = XSet.new(16, name_data) @five_element_set = XSet.new(4, "123 234 132 342 abc ") end def test_cardinality assert_equal(5, @five_element_set.cardinality) end def test_element_bytes input = XSet.new(1,"abcdef") assert_equal("b", input.element(1).contents) end def test_element_range assert_equal(0...5, @five_element_set.element_range) end def test_element_extraction assert_equal("132 ", @five_element_set.element(2).contents) end def test_match select = XSet.new(1,"1") assert(@five_element_set.match(@five_element_set.element(2), select.element(0))) end def test_restrict select = XSet.new(1,"1") expected = "123 132 " result = @five_element_set.restrict(select) assert_equal(expected,result.contents) end def test_name_restrict select_data = "HendricksonJeffries " select = XSet.new(11, select_data) expected = "Jeffries Ron Hendrickson Chet" result = @name_set.restrict(select) assert_equal(expected, result.contents) end def test_single_selection select_data = "Jeffries Jeffries " select = XSet.new(11, select_data) expected = "Jeffries Ron " result = @name_set.restrict(select) assert_equal(expected, result.contents) end def test_each_using_scope ann = "" @name_set.each do | scope_element | if (scope_element.scope==2) ann = scope_element.element.contents end end assert_equal("Anderson Ann ", ann) end def test_detect chet_scope_element = @name_set.detect { | scope_element | scope_element.element.contents.include? "Chet" } assert_equal("Hendrickson Chet", chet_scope_element.element.contents) end # def test_firstname_restrict # name_data = "Jeffries Ron Hendrickson ChetAnderson Ann Johnson Lee " # input = XSet.new(16, name_data) # select_data = "Ron Lee " # select = XSet.new(4, select_data) # expected = "Jeffries Ron Johnson Lee " # result = input.restrict(select) # assert_equal(expected, result.contents) # end end class XSet include Enumerable attr_reader :contents def initialize(element_length, contents) @element_length = element_length @contents = contents end def each for scope in element_range yield ScopedElement.new(element(scope), scope) end end def restrict(selector) result_contents = "" each do | scoped_element | if selector.matches(scoped_element.element) result_contents << scoped_element.element.contents end end XSet.new(@element_length, result_contents) end def matches(an_element) any? { | scoped_element | match(an_element, scoped_element.element) } end def match(my_element, selector_element) selector_element.subset?(my_element) end def element(scope) element_contents = @contents[scope*@element_length,@element_length] XSet.new(1,element_contents) end def subset?(larger_set) element_range.all? { | scope | larger_set.contains?(element(scope), scope) } end def contains?(a_set, scope) element(scope).contents == a_set.contents end def element_range 0...cardinality end def cardinality @contents.length / @element_length end end class ScopedElement attr_reader :element, :scope def initialize(element, scope) @element = element @scope = scope end end
Better, and it only took 35 minutes counting writing the article thus far.
Fix the Implementation
For my next trick, I plan to introduce the concept of “rank” to the existing code, and then use it to untangle the extra set bracketing introduced in reading elements (individual characters) from records. We can do this with new tests for :rank before using it elsewhere. Let’s do that first:
def test_rank assert_equal(2, @name_set.rank) end
That should be enough to force the code:
def rank 2 end
Whee, it runs. Let’s write another test to make it right. The rank of a set of plain flat records like ours is 2, and the rank of the record (still a set) whould be one. This test should suffice:
def test_element_rank element = @name_set.element(2) assert_equal(1, element.rank) end
Perfect. 1 expected but was 2. The code in question is here:
def element(scope) element_contents = @contents[scope*@element_length,@element_length] XSet.new(1,element_contents) end
Hmm, now this is a bit tricky. Right now, we pass through this code twice, once to pull records from our original XSets (of rank two). Those records should be of rank one. We also use that code to process XSets of rank one (our records) and the result there should be that we return a character, not an XSet.
We’ll just take the first step, which I think for now will be to record a rank of two when we create the record set, then in :element, set the rank directly. This much should make the existing tests run, but not the new one:
def initialize(element_length, contents, rank=2) @element_length = element_length @contents = contents @rank = rank end def rank @rank end
Sure enough, as predicted. Just the “expected 1” error comes out. Now to make the inner set have rank 1.
def element(scope) element_contents = @contents[scope*@element_length,@element_length] XSet.new(1,element_contents, self.rank-1) end
This in fact makes the test run. But I am concerned about that 1 in the record-length parameter. That seems wrong, somehow. Let me think …
We are implementing XSets, right now, as arrays of bytes that represent collections of fixed-length records. We are also using the same code to represent each record as an XSet of its own. This is an interim state in the code: is it right or wrong? Well, seeing a record as a set is OK: they are sets. What are the elements of the set that is a record? Characters. So, for now, the code is, I think, OK.
Longer term, as we add more set operations, we’ll need to keep an eye on this.
Theory or Practice: A General Concern
I’ve built a lot of database code in my life, and a fair amount with a set-theoretic focus. There are some tricky forces that act on you. On the one hand, we could imagine building a relational system in a style that created record objects and then table objects, with the record and table objects each having their own kinds of methods. We might find commonalities, and refactor them out.
On the other hand, set theory says “everything is a set”. With that hat on, we envision code that somehow works “for all sets”. An issue with that is that sets can be extremely complex and curly. Imagine throwing a bunch of unions and intersections and such at a whole bunch of different tables and records from a large database. We could get a set so big we couldn’t lift it.
When I’ve gone that way in the past, it seemed to lead to self-describing data, and that seems to lead away from what I’m working on here, which is the ability to process that data in big sweeps over arrays. Speculating about how the design may evolve, I imagine that we’ll have our rectangular sets somehow include information about thigns like their record lengths and ranks, and that each operation will creat new sets that know different things about their record lengths and ranks and other characteristics.
We are sitting on the edge where I’ve often been with XST and database: the theory/practice boundary. We’ll just try not to slide too fast, so it won’t hurt too badly, and we’ll see whether we can keep things clean enough to make progress.
So far, so good.
Back to the Swamp
Or is it back to the alligator? Either way, we have a two-part purpose here. Part one was to add the notion of rank to our sets. Part two was to use that notion to make elements of our rows be characters, not sets containing characters. We want to do that for two reasons. First of all, it just seems right. We put characters in, we want them out. A set containing one character doesn’t help anyone.
The second reason is a bit more set-theoretical. We have some “long” string coming in, like “abcdwxyz”. We mean to interpret that as a set of two records, each 4 characters long. In set theory, spelled all the way out, that could be:
{ { a0, b1, c2, d3 }, { w0, x1, y2, z3 } }
or possibly:
{ { a0, b1, c2, d3 }0, { w0, x1, y2, z3 }1 }
If that’s the theoretical picture, then the elements of the inner sets are in fact characters, not sets containing characters. Our code isn’t correctly modeling that decision. That’s what we’re here to fix. The plan is to use rank to do it. If rank is 2, return a set. If rank is 1, return an element. In this case, a character. We may wish to extend to use the element_length value to decide how much to extract.
Let’s see what we can make happen. This is the core of our problem, the :element method:
def element(scope) element_contents = @contents[scope*@element_length,@element_length] XSet.new(1,element_contents, self.rank-1) end
The element contents value is good, and in fact that confirms that we should be setting the length to 1 in our new set, so the extraction will work when we come around again. So we just need to modulate that XSet behavior, based on rank. Let’s see if there are some tests to change, and then use those to drive the code.
Turns out there isn’t a test for that yet. There is this one, which I think I initially thought addressed this issue:
def test_element_bytes input = XSet.new(1,"abcdef") assert_equal("b", input.element(1).contents) end
This test is good as it stands, though we could of course interrogate the returned XSet more. But since our XSet is a relation of one-byte records, it is appropriate for what comes back to be an XSet, whose contents are “b”. Let’s write a new test.
def test_record_bytes johnson = @name_set.element(3); assert_equal("J", johnson.element(0)) end
We expect this to fail now with an XSet instead of a character. Then, we hope that changing element to check rank will make this work. Let’s see … test fails as expected. Now with luck, it’s a simple change.
I’ve been saying “hope” and “with luck”. That tells me that my confidence is low, and that tells me that maybe I need more tests … or maybe the design isn’t as clean as it should be. First, though, we’ll see if this test can be made to run in the obvious way. I’m just going to code it in line:
def element(scope) element_contents = @contents[scope*@element_length,@element_length] if (@rank > 1) return XSet.new(1,element_contents, self.rank-1) else return element_contents end end
In Ruby, the word “return” is optional in this usage, because the last thing you compute is always returned. But I prefer to make it explicit as I’ve shown here. I haven’t run the test yet … here goes …
There’s good news, and bad news. The good news is that our new test ran. The bad news is that four other tests broke:
def test_match select = XSet.new(1,"1") assert(@five_element_set.match(@five_element_set.element(2), select.element(0))) end def test_name_restrict select_data = "HendricksonJeffries " select = XSet.new(11, select_data) expected = "Jeffries Ron Hendrickson Chet" result = @name_set.restrict(select) assert_equal(expected, result.contents) end def test_restrict select = XSet.new(1,"1") expected = "123 132 " result = @five_element_set.restrict(select) assert_equal(expected,result.contents) end def test_single_selection select_data = "Jeffries Jeffries " select = XSet.new(11, select_data) expected = "Jeffries Ron " result = @name_set.restrict(select) assert_equal(expected, result.contents) end
Apparently my hopes and fears, rather than full confidence, were warranted. The errors are all saying the same thing, that :contents is being sent to strings rather than to XSets. Looking at those tests, they should clearly be seeing sets. The code is returning a string instead. I think the issue is that > 1 should be > 0:
def element(scope)
element_contents = @contents[scope*@element_length,@element_length]
if (@rank > 0)
return XSet.new(1,element_contents, self.rank-1)
else
return element_contents
end
end
Grr. That fixes the newly broken tests but breaks the new test.
def test_record_bytes johnson = @name_set.element(3); assert_equal("J", johnson.element(0)) end
That test sure looks right to me. Let’s beef the test up a bit, see if we can see what I’m missing.
def test_record_bytes
johnson = @name_set.element(3);
assert_equal(2, @name_set.rank)
assert_equal(1, johnson.rank)
assert_equal("J", johnson.element(0))
end
I feel sure those numbers are correct, which makes me think that I wasn’t right about changing that > after all. Run the test.
The ranks are right. The return is not. One more try to think this out, then a break. I should just take the break but my pair isn’t here to drag my fingers off the keyboard. After one more look at the code, I think I’ll take that break. My car needs a place to stay for the winter, and I need a donut. I’ll be right back …
Much better. The good thing about a break is that it can remind you to do what you know is best. I’ve put the rank > 1 back, because it’s right. Then instead of panicing, I’ll actually look at the four tests that break. They’re all breaking in :contains, in :subset?. Let’s do ourselves the courtesy of looking at that before changing something else.
def subset?(larger_set) element_range.all? { | scope | larger_set.contains?(element(scope), scope) } end def contains?(a_set, scope) element(scope).contents == a_set.contents end
I find myself wishing I could just go into the debugger and look around to see what’s going on. However, in a long time working with Ruby I never learned the debugger and I’m not going to start now. Might print something if I need to. Let’s reason a moment.
The subset method is asking whether the record of the smaller set, e.g. <Jeffries> is a subset of the larger set, e.g. <Jeffries Ron>. It does this by asking whether the larger set contains each element of the smaller. But the call is unwinding its input an extra level … those calls to contents. The code should compare the two elements directly … and that first parameter is an element, not a set. (Did I mention changing that? I was probably wrong.)
def contains?(an_element, scope) element(scope) == an_element end
And the tests run. There are some lessons here, for sure!
What Lessons?
Pay attention to our fears. When I feel my confidence waning, that’s evidence that I should proceed with caution. I should take smaller steps, write more tests, and not jump to conclusions. The same might be true for you.
Listen to what the tests are really saying. I wrote that simple if while in a state of fear. I had reasoned correctly, but when a bunch of tests broke, I jumped to the conclusion that my line of code was wrong and tried to convince myself that it was wrong. It wasn’t. That line of reasoning was wasteful. I’d have done better to follow where the tests pointed to see what was wrong. The change was very simple and appropriate: there was a hack in the system because of the extra level of set nesting and since we removed that level, the hack needed to come out.
Maybe we don’t need a debugger after all. If I had gone into the debugger even on this problem, I’d have probed around and sooner or later found the problem. I don’t think I’d have found it sooner, even with something this simple. I really have never learned to use the Ruby debugger, though I admit I do sort of know how to use the C# one. If you ever see me in the debugger, whack me up upside the head and tell me to write a test. Thank you sir, may I have another?
If you are writing an article, mistakes are a good thing. That’s the best part of all this. While I was out putting the 3M3RTX3 car in its winter quarters, I was reflecting about how cool it is that when I screw up, I can just tell you about it and try to draw a lesson from it. It helps me not be defensive, keeps me calm, and helps me learn. I hope that sometimes it helps you, as well. Maybe the lesson is to remember your mistakes and share them with your pals. “You’ll never believe the dumb thing I did yesterday.” We’ll all be better off.
What's Next?
Our injection of the theoretical notion of “rank” has worked out just as we hoped it would. It lets our innermost sets consist of atoms instead of being wrapped in extra set brackets, and it resulted in a minor cleanup in the :contains? method, bringing us just a bit closer to theory, with somewhat better code overall.
We still need to review all the code for a match between theory and practice, and I’d like to do a bit more summing up on that issue, after looking at the program. Then we’ll need to set direction.
All that can be for next time; this is enough for now.