I did an offline experiment to figure out what to do about this :each thing that returns two items instead of the usual one. Based on that, we'll create a new little object, wire it in, then clean up the tests a bit. Join us ...

A Spike While You Weren't Looking

While you were away, I did some experiments with the iterator. As you recall, it returns two values, the element in question and its scope. It is necessary to return both, because both are important in XST: they are what keep simple sets and ordered sets from collapsing into each other in awkward ways. My original concern was that Ruby’s :inject method would probably not work as expected, and I was right1. Apparently the implementation of :inject does something clever internally tha loses the second part of my yield, the scope.

I experimented with a few approaches, such as returning an array. I don’t really like passing naked objects like arrays around, as anything worth structuring is worth giving a name to. I also tried building a little object, and liked that better. In spiking around, I just went ahead and modified the code however I wanted. My plan now is to revert to where we left off in the previous article, and begin to build something that matters.

Constant Readers may recall the code manager that Chet and I built for Adventures in C#. It saves every changed version of any file in the project directory. The result in this case is that I just looked at the article date, grabbed the file whose date was just earlier than that, and copied it back to the project directory.

It would be nice to have a tool for that, but I revert files so seldom that I’ve never gotten around to it, since an ideal tool would have a GUI and that way lies more hassle than fun. Maybe someone will give me one for Christmas.

Is This "Starting Over"?

I commented in an earlier article that though folks report that they TDD along and then suddenly the next test makes them “start over”, rewriting masses of code, I don’t often have that experience. Is this starting over? I claim not, and it’s my article. The “spike” or experiment is a time-honored approach to learning something about how to program something, and it is best undertaken with intent to throw the code away and keep the learning. That’s what I was doing while you were away, just trying things. So, no, this isn’t the same as what I understood the N-th Test Makes You Start Over phenomenon to be.

The learning in question was this:

  • I verified that :inject would not work with my current :each method.
  • I tried returning an array but decided that if it's worth a structure, it's worth a class.
  • I tried creating a little class and it worked pretty well.

I think that creating a little class is the best solution, though I’m not entirely comfortable with it for theoretical reasons. That is, while an element has a scope inside a set, it does not have one outside. So in a sense, exporting this little class is exporting information outside the XSet world that might not want to move out. But one way or another, people need information about elements and scopes, so my guess is it is the best choice.

Let’s look at what we might do, and decide what’s next.

Stories Needing Attention

Some of the things that need doing include:

  • Improve :each to allow things like :inject to work. Probably just means putting a small class in place and making the tests work.
  • The handling of contents and intersection is still limping. The iteration over each individual row of our relation is returning an XSet containing a record consisting of just one character. This probably requires two different classes of XSet, one known to be a relation, and one knows to be a vector of characters.
  • We need more set operations, to fill in the abilities of the code and discover any lurking problems.
  • We need to address additional non-relational set structures.
  • We need to address storing and retrieving sets on the disk.

After some thought, I think I’ll proceed with the :each improvement, then choose some more setops, or improve the relational classing thing. The former is more fun, the latter probably more important. For now, anyway, we’ll just fix up :each.

Improving :each

Our current implementation of each is this:

    def each
      for scope in record_range
        yield record(scope), scope
      end
    end

We return two objects, the set element and its scope, and while that mostly works, it doesn’t work everywhere. The plan is to return the element and scope wrapped up in a little object. We could TDD that into existence but since we have tests for each, and code using each, I think we can instead just refactor. I’ll call the new class ScopedElement, and just assume that it exists. That should break at least one test!

    def each
      for scope in record_range
        yield ScopedElement.new(record(scope), scope)
      end
    end

That breaks nicely, since there is no class ScopedElement. I’ll fix that and try again:

class ScopedElement
end

That fails with “Wrong number of arguments in initialize”. Notice that I’m proceeding in the same style as in TDD, just letting the computer tell me what’s wrong. So long as it’s something simple, it’s just another red bar, though so far, the tests haven’t run. Let’s make enough changes to make them run this time, but some will fail.

class ScopedElement
  def initialize(element, scope)
  end
end

“9 tests, 6 assertions, 1 failures, 3 errors”. Not bad, five out of nine tests are already running. The errors are all the same:

  2) Error:
test_name_restrict(TC_MyTest):
NoMethodError: undefined method `subset?' for #<ScopedElement:0x2ae9f60>
    ./restricttest.rb:115:in `match'
    ./restricttest.rb:110:in `matches'
    ./restricttest.rb:109:in `any?'
    ./restricttest.rb:109:in `each'

The problem will be on line 109, in the :any?, because we are now returning an object where the any? probably expects a pair. The code is:

    def matches(a_record)
      any? { |selector_record, selector_scope |
        match(a_record, selector_record)
      }
    end

As expected. We need to expect a ScopedElement here, and then unwrap it. I’ll code:

    def matches(a_record)
      any? { | scoped_element |
        match(a_record, scoped_element.element)
      }
    end

I decided to call the method that returns the element part :element, which makes more sense than :record. We need to implement:

  class ScopedElement
    attr_reader :element, :scope
    def initialize(element, scope)
      @element = element
      @scope = scope
    end
  end

Now the tests blow up with messages like this:

  2) Error:
test_name_restrict(TC_MyTest):
NoMethodError: undefined method `contains?' for #<ScopedElement:0x2ae9ee8>
    ./restricttest.rb:125:in `subset?'
    ./restricttest.rb:124:in `all?'
    ./restricttest.rb:124:in `each'
    ./restricttest.rb:124:in `all?'
    ./restricttest.rb:124:in `subset?'
    ./restricttest.rb:115:in `match'

The relevant code is:

    def restrict(selector)
      result_contents = ""
      each do |my_record, ignored_scope |
        if selector.matches(my_record)
          result_contents << my_record.contents
        end
      end
      XSet.new(@record_length, result_contents)
    end

We need to accomodate a ScopedElement here:

    def restrict(selector)
      result_contents = ""
      each do | scoped_element |
        if selector.matches(scoped_element.element)
          result_contents << scoped_element.element.contents
        end
      end
      XSet.new(@record_length, result_contents)
    end

That makes all the tests work except for test_each, which was failing differently from all the others. Let’s reflect on what has just happened. We changed the return value from :each, and just let the tests point us to all the places where we used :each or its derivatives like all? or any?, and plugged in the relevant changes. Typical refactoring situation: change the code, follow the tests. I feel good about this. Now what about that test_each?

  1) Failure:
test_each(TC_MyTest) [./restricttest.rb:67]:
<"Anderson    Ann "> expected but was
<"">.

Based on the test:

    def test_each
      name_data = "Jeffries    Ron Hendrickson ChetAnderson    Ann Johnson     Lee "
      input = XSet.new(16, name_data)
      ann = ""
      input.each do 
        | record, index | 
        if (index==2) 
          ann = record.contents 
        end
      end
      assert_equal("Anderson    Ann ", ann)
      chet, chet_index = input.detect { | record, index | record.contents.include? "Chet" }
      assert_equal("Hendrickson Chet", chet.contents)
    end

This is straightforward: again we have an each situation, so we need to recode the test to use our new feature. In retrospect, it might have been better to record this test first, to drive out the insertion of ScopedElement, but I didn’t think of that. We’re here now. Let’s just fix it. The key aspect of this test is to test that the scope index 2 is the record containing Ann Anderson’s record. Since that was confusing to at least one reader, I’ll try to make it more explicit in the test that the scope is important.

    def test_each_using_scope
      name_data = "Jeffries    Ron Hendrickson ChetAnderson    Ann Johnson     Lee "
      input = XSet.new(16, name_data)
      ann = ""
      input.each do 
        | scope_element | 
        if (scope_element.scope==2) 
          ann = scope_element.element.contents 
        end
      end
      assert_equal("Anderson    Ann ", ann)
      chet_scope_element = input.detect { 
        | scope_element | 
        scope_element.element.contents.include? "Chet" }
      assert_equal("Hendrickson Chet", chet_scope_element.element.contents)
    end

The tests run! Note that I renamed the method to :test_each_using_scope. The use of “each do” there in the search for ann is weak code, because :detect, as shown below in the chet test is better Ruby. But we wrote that test to test our initial implementation of :each. Better leave it that way. But let’s break out the second test into a test of its own:

    def test_each_using_scope
      name_data = "Jeffries    Ron Hendrickson ChetAnderson    Ann Johnson     Lee "
      input = XSet.new(16, name_data)
      ann = ""
      input.each do 
        | scope_element | 
        if (scope_element.scope==2) 
          ann = scope_element.element.contents 
        end
      end
      assert_equal("Anderson    Ann ", ann)
    end

    def test_detect
      name_data = "Jeffries    Ron Hendrickson ChetAnderson    Ann Johnson     Lee "
      input = XSet.new(16, name_data)
      chet_scope_element = input.detect { 
        | scope_element | 
        scope_element.element.contents.include? "Chet" }
      assert_equal("Hendrickson Chet", chet_scope_element.element.contents)
    end

Tests run and that’s a bit more clear. Constant Reader CW asked whether there isn’t getting to be too much duplication in the tests, and I agree that there is. I allow quite a bit more duplication in tests than I would in production code, but in this case it’s going too far. I think the trick might be to pull out the duplicate inits, at least these repeated tests on the name data. After that, the tests look like this:

  class TC_MyTest < Test::Unit::TestCase

    def setup
      name_data = "Jeffries    Ron Hendrickson ChetAnderson    Ann Johnson     Lee "
      @name_set = XSet.new(16, name_data)
      @five_records = XSet.new(4, "123 234 132 342 abc ")
    end

    def test_cardinality
      assert_equal(5, @five_records.cardinality)
    end

    def test_record_bytes
      input = XSet.new(1,"abcdef")
      assert_equal("b", input.record(1).contents)
    end

    def test_record_range
      assert_equal(0...5, @five_records.record_range)
    end

    def test_record
      assert_equal("132 ", @five_records.record(2).contents)
    end

    def test_match
      select = XSet.new(1,"1")
      assert(@five_records.match(@five_records.record(2), select.record(0)))    
    end

    def test_restrict
      select = XSet.new(1,"1")
      expected = "123 132 "
      result = @five_records.restrict(select)
      assert_equal(expected,result.contents)
    end

    def test_name_restrict
      select_data = "HendricksonJeffries   "
      select = XSet.new(11, select_data)
      expected = "Jeffries    Ron Hendrickson Chet"
      result = @name_set.restrict(select)
      assert_equal(expected, result.contents)
    end

    def test_single_selection
      select_data = "Jeffries   Jeffries   "
      select = XSet.new(11, select_data)
      expected = "Jeffries    Ron "
      result = @name_set.restrict(select)
      assert_equal(expected, result.contents)
    end

    def test_each_using_scope
      ann = ""
      @name_set.each do 
        | scope_element | 
        if (scope_element.scope==2) 
          ann = scope_element.element.contents 
        end
      end
      assert_equal("Anderson    Ann ", ann)
    end

    def test_detect
      chet_scope_element = @name_set.detect { 
        | scope_element | 
        scope_element.element.contents.include? "Chet" }
      assert_equal("Hendrickson Chet", chet_scope_element.element.contents)
    end

#    def test_firstname_restrict
#      name_data = "Jeffries    Ron Hendrickson ChetAnderson    Ann Johnson     Lee "
#      input = XSet.new(16, name_data)
#      select_data = "Ron Lee "
#      select = XSet.new(4, select_data)
#      expected = "Jeffries    Ron Johnson     Lee "
#      result = input.restrict(select)
#      assert_equal(expected, result.contents)
#    end
  end

That’s a bit cleaner. I put the @name_set and @five_records set into setup, and used the @five_record set in a few tests where another set was being used. I decided to leave the commented test alone, because it’s just there to remind me of something that I might want to address, the ability to do a restrict that doesn’t start at byte zero.

Reflection

Good stuff. We capitalized on our learning from the offline spike, adding a little object to encapsulate element and scope. Then we cleaned up the tests, making the world a better place. That’s enough for now … more next time!

Thanks for tuning in!


  1. I’m going to experiment with tagging method names with a colon, signifying that they are symbols, to set them off in the text without a lot of glitz. Let me know if that’s a bad idea.