There's this hack where we return the elements of a set as if they were sets. Nothing like theory to fix a hack, is there? Naturally there are some stumbles along the way, mostly involving the tests. Remember who you're dealing with.

A Rank Idea

Remember that hack that makes us say “element.contents.contents” sometimes, when we really mean “element.contents”? The issue is in this code:

    def record(scope)
      record_contents = @contents[scope*@record_length,@record_length]
      XSet.new(1,record_contents)
    end

The basic thing wrong with this is that we unconditionally put the “record” in an XSet wrapper. When the record in question is in fact a record from our outer relational-style sets, that’s correct. But when it’s a letter from someone’s last name, an element of our inner sets, it’s not correct. We’ve talked about creating a different subclass or some other solution to handle this issue.

I was reading a bunch of XST articles (and boy are my arms tired) because I was sure there was a word in there to describe a set all of whose elements are atomic. I didn’t find it, so I wrote Dave Childs a note and asked him. He responded by reminding me of something I should have known, and probably could have found in the articles, namely rank. The rank of a set is, informally, the deepest degree of nesting in the set. The rank of a set of atoms is 1. The rank of a set of sets, the highest being of rank N, is N+1. And, if all the elements of a set have the same rank, then their rank is the rank of the set, minus one.

Our mission, should we decide to accept it, is to use the concept of rank to fix this hack. While we’re at it, I think we should change the name of that method from :record to :element. Along the way, let’s talk a bit about the probable impact of theory on our code and vice versa.

Fix the Name

My plan is to rename the :record method, and other occurrences of the word “record”, to refer to “element”, which is more in line with the XST terminology. (We might revisit this decision as we get closer to the external interface of our final program, but for now we are in the internals, and as we are trying to implement an math theory, it makes sense to me to use terminology from the domain.) So:

    def element(scope)
      element_contents = @contents[scope*@record_length,@record_length]
      XSet.new(1,element_contents)
    end

I’d like to rename @record_length too, but it’s a member variable and that would trigger two kinds of errors, so I’ll hold off on that. I’ll continue to rename local variables as we find things neededing to be changed. The tests will tell me what’s next. Most of them whine about this line:

  1) Error:
test_detect(TC_MyTest):
NoMethodError: undefined method `record' for #<XSet:0x2b1c150>
    ./restricttest.rb:94:in `each'

    def each
      for scope in record_range
        yield ScopedElement.new(record(scope), scope)
      end
    end

Simple fix. I’d also like to change record_range, I think, but again, not now. Put it on the list and code:

    def each
      for scope in record_range
        yield ScopedElement.new(element(scope), scope)
      end
    end

Tests now mostly whine about :subset?, which looks like this:

    def subset?(larger_set)
      record_range.all? { | scope |
        larger_set.contains?(record(scope), scope)
      }
    end

That should be:

    def subset?(larger_set)
      record_range.all? { | scope |
        larger_set.contains?(element(scope), scope)
      }
    end

Now, quelle surprise, it’s :contains:

    def contains?(a_record, scope)
      record(scope).contents == a_record.contents
    end

Which becomes:

    def contains?(a_set, scope)
      element(scope).contents == a_set.contents
    end

This fixes all the tests except the three that refer directly to the :record method and which need to be changed to refer to :element, resulting in:

    def test_record_bytes
      input = XSet.new(1,"abcdef")
      assert_equal("b", input.element(1).contents)
    end

    def test_record
      assert_equal("132 ", @five_records.element(2).contents)
    end

    def test_match
      select = XSet.new(1,"1")
      assert(@five_records.match(@five_records.element(2), select.element(0)))    
    end

Tests all run. The method is renamed. That would be easier with a refactoring tool, but it took much longer to write about it than it took to do it. Now, with your permission, I’ll rename @record_length and anything else I notice, and the program in toto (and your little dog too) looks like this:

  class TC_MyTest < Test::Unit::TestCase

    def setup
      name_data = "Jeffries    Ron Hendrickson ChetAnderson    Ann Johnson     Lee "
      @name_set = XSet.new(16, name_data)
      @five_element_set = XSet.new(4, "123 234 132 342 abc ")
    end

    def test_cardinality
      assert_equal(5, @five_element_set.cardinality)
    end

    def test_element_bytes
      input = XSet.new(1,"abcdef")
      assert_equal("b", input.element(1).contents)
    end

    def test_element_range
      assert_equal(0...5, @five_element_set.element_range)
    end

    def test_element_extraction
      assert_equal("132 ", @five_element_set.element(2).contents)
    end

    def test_match
      select = XSet.new(1,"1")
      assert(@five_element_set.match(@five_element_set.element(2), select.element(0)))    
    end

    def test_restrict
      select = XSet.new(1,"1")
      expected = "123 132 "
      result = @five_element_set.restrict(select)
      assert_equal(expected,result.contents)
    end

    def test_name_restrict
      select_data = "HendricksonJeffries   "
      select = XSet.new(11, select_data)
      expected = "Jeffries    Ron Hendrickson Chet"
      result = @name_set.restrict(select)
      assert_equal(expected, result.contents)
    end

    def test_single_selection
      select_data = "Jeffries   Jeffries   "
      select = XSet.new(11, select_data)
      expected = "Jeffries    Ron "
      result = @name_set.restrict(select)
      assert_equal(expected, result.contents)
    end

    def test_each_using_scope
      ann = ""
      @name_set.each do 
        | scope_element | 
        if (scope_element.scope==2) 
          ann = scope_element.element.contents 
        end
      end
      assert_equal("Anderson    Ann ", ann)
    end

    def test_detect
      chet_scope_element = @name_set.detect { 
        | scope_element | 
        scope_element.element.contents.include? "Chet" }
      assert_equal("Hendrickson Chet", chet_scope_element.element.contents)
    end

#    def test_firstname_restrict
#      name_data = "Jeffries    Ron Hendrickson ChetAnderson    Ann Johnson     Lee "
#      input = XSet.new(16, name_data)
#      select_data = "Ron Lee "
#      select = XSet.new(4, select_data)
#      expected = "Jeffries    Ron Johnson     Lee "
#      result = input.restrict(select)
#      assert_equal(expected, result.contents)
#    end
  end

   class XSet
     include Enumerable
     attr_reader :contents

    def initialize(element_length, contents)
      @element_length = element_length
      @contents = contents
    end

    def each
      for scope in element_range
        yield ScopedElement.new(element(scope), scope)
      end
    end

    def restrict(selector)
      result_contents = ""
      each do | scoped_element |
        if selector.matches(scoped_element.element)
          result_contents << scoped_element.element.contents
        end
      end
      XSet.new(@element_length, result_contents)
    end

    def matches(an_element)
      any? { | scoped_element |
        match(an_element, scoped_element.element)
      }
    end

    def match(my_element, selector_element)
      selector_element.subset?(my_element)
    end

    def element(scope)
      element_contents = @contents[scope*@element_length,@element_length]
      XSet.new(1,element_contents)
    end

    def subset?(larger_set)
      element_range.all? { | scope |
        larger_set.contains?(element(scope), scope)
      }
    end

    def contains?(a_set, scope)
      element(scope).contents == a_set.contents
    end

    def element_range
      0...cardinality
    end

    def cardinality
      @contents.length / @element_length
    end
  end

  class ScopedElement
    attr_reader :element, :scope
    def initialize(element, scope)
      @element = element
      @scope = scope
    end
  end

Better, and it only took 35 minutes counting writing the article thus far.

Fix the Implementation

For my next trick, I plan to introduce the concept of “rank” to the existing code, and then use it to untangle the extra set bracketing introduced in reading elements (individual characters) from records. We can do this with new tests for :rank before using it elsewhere. Let’s do that first:

    def test_rank
      assert_equal(2, @name_set.rank)
    end

That should be enough to force the code:

    def rank
      2
    end

Whee, it runs. Let’s write another test to make it right. The rank of a set of plain flat records like ours is 2, and the rank of the record (still a set) whould be one. This test should suffice:

    def test_element_rank
      element = @name_set.element(2)
      assert_equal(1, element.rank)
    end

Perfect. 1 expected but was 2. The code in question is here:

    def element(scope)
      element_contents = @contents[scope*@element_length,@element_length]
      XSet.new(1,element_contents)
    end

Hmm, now this is a bit tricky. Right now, we pass through this code twice, once to pull records from our original XSets (of rank two). Those records should be of rank one. We also use that code to process XSets of rank one (our records) and the result there should be that we return a character, not an XSet.

We’ll just take the first step, which I think for now will be to record a rank of two when we create the record set, then in :element, set the rank directly. This much should make the existing tests run, but not the new one:

    def initialize(element_length, contents, rank=2)
      @element_length = element_length
      @contents = contents
      @rank = rank
    end

    def rank
      @rank
    end

Sure enough, as predicted. Just the “expected 1” error comes out. Now to make the inner set have rank 1.

    def element(scope)
      element_contents = @contents[scope*@element_length,@element_length]
      XSet.new(1,element_contents, self.rank-1)
    end

This in fact makes the test run. But I am concerned about that 1 in the record-length parameter. That seems wrong, somehow. Let me think …

We are implementing XSets, right now, as arrays of bytes that represent collections of fixed-length records. We are also using the same code to represent each record as an XSet of its own. This is an interim state in the code: is it right or wrong? Well, seeing a record as a set is OK: they are sets. What are the elements of the set that is a record? Characters. So, for now, the code is, I think, OK.

Longer term, as we add more set operations, we’ll need to keep an eye on this.

Theory or Practice: A General Concern

I’ve built a lot of database code in my life, and a fair amount with a set-theoretic focus. There are some tricky forces that act on you. On the one hand, we could imagine building a relational system in a style that created record objects and then table objects, with the record and table objects each having their own kinds of methods. We might find commonalities, and refactor them out.

On the other hand, set theory says “everything is a set”. With that hat on, we envision code that somehow works “for all sets”. An issue with that is that sets can be extremely complex and curly. Imagine throwing a bunch of unions and intersections and such at a whole bunch of different tables and records from a large database. We could get a set so big we couldn’t lift it.

When I’ve gone that way in the past, it seemed to lead to self-describing data, and that seems to lead away from what I’m working on here, which is the ability to process that data in big sweeps over arrays. Speculating about how the design may evolve, I imagine that we’ll have our rectangular sets somehow include information about thigns like their record lengths and ranks, and that each operation will creat new sets that know different things about their record lengths and ranks and other characteristics.

We are sitting on the edge where I’ve often been with XST and database: the theory/practice boundary. We’ll just try not to slide too fast, so it won’t hurt too badly, and we’ll see whether we can keep things clean enough to make progress.

So far, so good.

Back to the Swamp

Or is it back to the alligator? Either way, we have a two-part purpose here. Part one was to add the notion of rank to our sets. Part two was to use that notion to make elements of our rows be characters, not sets containing characters. We want to do that for two reasons. First of all, it just seems right. We put characters in, we want them out. A set containing one character doesn’t help anyone.

The second reason is a bit more set-theoretical. We have some “long” string coming in, like “abcdwxyz”. We mean to interpret that as a set of two records, each 4 characters long. In set theory, spelled all the way out, that could be:

{ { a0, b1, c2, d3 }, { w0, x1, y2, z3 } }

or possibly:

{ { a0, b1, c2, d3 }0, { w0, x1, y2, z3 }1 }

If that’s the theoretical picture, then the elements of the inner sets are in fact characters, not sets containing characters. Our code isn’t correctly modeling that decision. That’s what we’re here to fix. The plan is to use rank to do it. If rank is 2, return a set. If rank is 1, return an element. In this case, a character. We may wish to extend to use the element_length value to decide how much to extract.

Let’s see what we can make happen. This is the core of our problem, the :element method:

    def element(scope)
      element_contents = @contents[scope*@element_length,@element_length]
      XSet.new(1,element_contents, self.rank-1)
    end

The element contents value is good, and in fact that confirms that we should be setting the length to 1 in our new set, so the extraction will work when we come around again. So we just need to modulate that XSet behavior, based on rank. Let’s see if there are some tests to change, and then use those to drive the code.

Turns out there isn’t a test for that yet. There is this one, which I think I initially thought addressed this issue:

    def test_element_bytes
      input = XSet.new(1,"abcdef")
      assert_equal("b", input.element(1).contents)
    end

This test is good as it stands, though we could of course interrogate the returned XSet more. But since our XSet is a relation of one-byte records, it is appropriate for what comes back to be an XSet, whose contents are “b”. Let’s write a new test.

    def test_record_bytes
      johnson = @name_set.element(3);
      assert_equal("J", johnson.element(0))
    end

We expect this to fail now with an XSet instead of a character. Then, we hope that changing element to check rank will make this work. Let’s see … test fails as expected. Now with luck, it’s a simple change.

I’ve been saying “hope” and “with luck”. That tells me that my confidence is low, and that tells me that maybe I need more tests … or maybe the design isn’t as clean as it should be. First, though, we’ll see if this test can be made to run in the obvious way. I’m just going to code it in line:

    def element(scope)
      element_contents = @contents[scope*@element_length,@element_length]
      if (@rank > 1)
        return XSet.new(1,element_contents, self.rank-1)
      else
        return element_contents
      end
    end

In Ruby, the word “return” is optional in this usage, because the last thing you compute is always returned. But I prefer to make it explicit as I’ve shown here. I haven’t run the test yet … here goes …

There’s good news, and bad news. The good news is that our new test ran. The bad news is that four other tests broke:

    def test_match
      select = XSet.new(1,"1")
      assert(@five_element_set.match(@five_element_set.element(2), select.element(0)))    
    end

    def test_name_restrict
      select_data = "HendricksonJeffries   "
      select = XSet.new(11, select_data)
      expected = "Jeffries    Ron Hendrickson Chet"
      result = @name_set.restrict(select)
      assert_equal(expected, result.contents)
    end

    def test_restrict
      select = XSet.new(1,"1")
      expected = "123 132 "
      result = @five_element_set.restrict(select)
      assert_equal(expected,result.contents)
    end

    def test_single_selection
      select_data = "Jeffries   Jeffries   "
      select = XSet.new(11, select_data)
      expected = "Jeffries    Ron "
      result = @name_set.restrict(select)
      assert_equal(expected, result.contents)
    end

Apparently my hopes and fears, rather than full confidence, were warranted. The errors are all saying the same thing, that :contents is being sent to strings rather than to XSets. Looking at those tests, they should clearly be seeing sets. The code is returning a string instead. I think the issue is that > 1 should be > 0:

    def element(scope)
      element_contents = @contents[scope*@element_length,@element_length]
      if (@rank > 0)
        return XSet.new(1,element_contents, self.rank-1)
      else
        return element_contents
      end
    end

Grr. That fixes the newly broken tests but breaks the new test.

    def test_record_bytes
      johnson = @name_set.element(3);
      assert_equal("J", johnson.element(0))
    end

That test sure looks right to me. Let’s beef the test up a bit, see if we can see what I’m missing.

    def test_record_bytes
      johnson = @name_set.element(3);
      assert_equal(2, @name_set.rank)
      assert_equal(1, johnson.rank)
      assert_equal("J", johnson.element(0))
    end

I feel sure those numbers are correct, which makes me think that I wasn’t right about changing that > after all. Run the test.

The ranks are right. The return is not. One more try to think this out, then a break. I should just take the break but my pair isn’t here to drag my fingers off the keyboard. After one more look at the code, I think I’ll take that break. My car needs a place to stay for the winter, and I need a donut. I’ll be right back …

Much better. The good thing about a break is that it can remind you to do what you know is best. I’ve put the rank > 1 back, because it’s right. Then instead of panicing, I’ll actually look at the four tests that break. They’re all breaking in :contains, in :subset?. Let’s do ourselves the courtesy of looking at that before changing something else.

    def subset?(larger_set)
      element_range.all? { | scope |
        larger_set.contains?(element(scope), scope)
      }
    end

    def contains?(a_set, scope)
      element(scope).contents == a_set.contents
    end

I find myself wishing I could just go into the debugger and look around to see what’s going on. However, in a long time working with Ruby I never learned the debugger and I’m not going to start now. Might print something if I need to. Let’s reason a moment.

The subset method is asking whether the record of the smaller set, e.g. <Jeffries> is a subset of the larger set, e.g. <Jeffries Ron>. It does this by asking whether the larger set contains each element of the smaller. But the call is unwinding its input an extra level … those calls to contents. The code should compare the two elements directly … and that first parameter is an element, not a set. (Did I mention changing that? I was probably wrong.)

    def contains?(an_element, scope)
      element(scope) == an_element
    end

And the tests run. There are some lessons here, for sure!

What Lessons?

Pay attention to our fears. When I feel my confidence waning, that’s evidence that I should proceed with caution. I should take smaller steps, write more tests, and not jump to conclusions. The same might be true for you.

Listen to what the tests are really saying. I wrote that simple if while in a state of fear. I had reasoned correctly, but when a bunch of tests broke, I jumped to the conclusion that my line of code was wrong and tried to convince myself that it was wrong. It wasn’t. That line of reasoning was wasteful. I’d have done better to follow where the tests pointed to see what was wrong. The change was very simple and appropriate: there was a hack in the system because of the extra level of set nesting and since we removed that level, the hack needed to come out.

Maybe we don’t need a debugger after all. If I had gone into the debugger even on this problem, I’d have probed around and sooner or later found the problem. I don’t think I’d have found it sooner, even with something this simple. I really have never learned to use the Ruby debugger, though I admit I do sort of know how to use the C# one. If you ever see me in the debugger, whack me up upside the head and tell me to write a test. Thank you sir, may I have another?

If you are writing an article, mistakes are a good thing. That’s the best part of all this. While I was out putting the 3M3RTX3 car in its winter quarters, I was reflecting about how cool it is that when I screw up, I can just tell you about it and try to draw a lesson from it. It helps me not be defensive, keeps me calm, and helps me learn. I hope that sometimes it helps you, as well. Maybe the lesson is to remember your mistakes and share them with your pals. “You’ll never believe the dumb thing I did yesterday.” We’ll all be better off.

What's Next?

Our injection of the theoretical notion of “rank” has worked out just as we hoped it would. It lets our innermost sets consist of atoms instead of being wrapped in extra set brackets, and it resulted in a minor cleanup in the :contains? method, bringing us just a bit closer to theory, with somewhat better code overall.

We still need to review all the code for a match between theory and practice, and I’d like to do a bit more summing up on that issue, after looking at the program. Then we’ll need to set direction.

All that can be for next time; this is enough for now.