Just for a little something to do, let's add iteration capability to our XSets. [Added: Clarification of something that threw at least one reader off.]

To "each" His Own

It says here that if you implement “each” and mix in Enumerable, you’ll get all those nifty collection methods. Let’s see how hard it is.

“Begin with a test.”1

    def test_each
      name_data = 
        "Jeffries    Ron Hendrickson ChetAnderson    Ann Johnson     Lee "
      input = XSet.new(16, name_data)
      ann = ""
      input.each do 
        | record, index | 
        if (index==2) 
          ann = record.contents 
        end
      end
      assert_equal("Anderson    Ann ", ann)
    end

This should get us going, though it’s a bit awkward. I’ll iterate over all the records, and when I get to number 2, save its contents, expecting that it will be the Ann Anderson record. (0, 1, 2, 3, get it?) Notice that my each is going to yield not one, but two values. That’s because with XST, I need to know both the value of the element and its scope. See footnote 2 for more on this. We’ll see whether that gets us in trouble.

The other things that got me in trouble were remembering that it’s “each do”, not just “each”, getting enough “end” statements in place, and realizing that I had to initialize the “ann” variable outside the loop if I wanted to see it in the assert. Only took a few moments. And I’m sure in another week or so, I’ll get my Ruby chops back.

The test fails, as expected. Here’s an implementation:

     def each
       for scope in record_range
         yield record(scope), scope
       end
     end

That seems easy enough. Loop over the scopes, return the corresponding record, and its scope. And the test runs!

Now for a sufficiently complex set, we’ll need a more complex each, but that will come in a kind of XSet class that we do not as yet have. But recall that { a0, b0 } is a perfectly good XSet. We’ll have to deal with that someday, when we get a class that works that way.

Now let’s try some other iterators:

    def test_each
      name_data = "Jeffries    Ron Hendrickson ChetAnderson    Ann Johnson     Lee "
      input = XSet.new(16, name_data)
      ann = ""
      input.each do 
        | record, index | 
        if (index==2) 
          ann = record.contents 
        end
      end
      assert_equal("Anderson    Ann ", ann)
      chet, chet_index = input.detect { | record, index | record.contents.include? "Chet" }
      assert_equal("Hendrickson Chet", chet.contents)
    end

We just add to our test to try the “detect” method. I expect this to fail, because I haven’t mixed in Enumerable yet … and it does, undefined method. So we’ll mix it in:

   class XSet
     include Enumerable
     attr_reader :contents
     ...
   end

The tests runs as written. I glossed over one little glitch, though, which is that I forgot that chet_index variable the first time. Having the iterator return two values may turn out to be confusing. I don’t see much of an alternative, though, so for now I’ll just keep an eye on it.

Cool, iterators work. I could play with a few more, but for right now, I don’t see the need …

Let's Improve Our Code

What I do see is that I should be able to improve the code inside XSet to make use of the new iterator. Let’s take a look. Well, the biggie is this one:

     def restrict(selector)
       result_contents = ""
       for scope in record_range
         for selector_scope in selector.record_range
           if match(scope, selector, selector_scope)
             result_contents << record(scope).contents
             break
           end
         end
       end
       XSet.new(@record_length, result_contents)
     end

Pit Capitain was grousing about this one earlier. We should be able to use some iterators now. We want to do each input record, and in the inner loop, we’re just trying to find whether there is a match. That could be a call for any?, might it not? Let me try that:

     def restrict(selector)
       result_contents = ""
       for scope in record_range
         if selector.any? { 
           |selector_record, selector_scope |
           match(scope, selector, selector_scope)
         }
           result_contents << record(scope).contents
         end
       end
       XSet.new(@record_length, result_contents)
     end

That works! But it’s not pretty. Let’s extract a method, if we can, to encapsulate the meaning of that if:

     def restrict(selector)
       result_contents = ""
       for scope in record_range
         if any_selector_matches(scope, selector)
           result_contents << record(scope).contents
         end
       end
       XSet.new(@record_length, result_contents)
     end

     def any_selector_matches(scope, selector)
       selector.any? { 
         |selector_record, selector_scope |
         match(scope, selector, selector_scope)
       }
     end

Now there are two things on my list. One is to get rid of the “for scope” in the restrict. But the other is the way that match works:

     def match(scope, match_set, match_scope)
       record1 = self.record(scope)
       record2 = match_set.record(match_scope)
       record2.subset?(record1)
     end

Match is fetching the records every time. That’s wasteful and not terribly clear. We should be able to do better than that. I think that if I get rid of the “for scope”, that will cause me to pass the record itself into match. Let’s see:

     def restrict(selector)
       result_contents = ""
       each do |my_record, ignored_scope |
         if any_selector_matches(my_record, selector)
           result_contents << my_record.contents
         end
       end
       XSet.new(@record_length, result_contents)
     end

     def any_selector_matches(my_record, selector)
       selector.any? { 
         |selector_record, selector_scope |
         match(my_record, selector, selector_scope)
       }
     end

     def match(my_record, match_set, match_scope)
       record2 = match_set.record(match_scope)
       record2.subset?(my_record)
     end

I tried that. All the tests run, except for test_match. It’s blowing up by sending contains? to a Fixnum, specifically a 2. Huh. test_match says:

    def test_match
       input = XSet.new(4, "123 234 132 342 ")
       select = XSet.new(1,"1")
       assert(input.match(2, select, 0))    
    end

Ha, yes. Refactoring error: I didn’t change the test_match call to match to reflect that match now takes a record. The record it wants is, of course, record 2. Change the test:

    def test_match
       input = XSet.new(4, "123 234 132 342 ")
       select = XSet.new(1,"1")
       assert(input.match(input.record(2), select, 0))    
    end

And it runs! Good deal. But I’m still not happy with match. It’s taking one record in, and one set/scope combination. That’s not symmetric. Why not just pass the record. (I’m going to have to change the test again to do that, and when I make that test work, other tests are going to break. I think I’ll do it the other way around, programming with intent to make just test_match break again:

     def restrict(selector)
       result_contents = ""
       each do |my_record, ignored_scope |
         if any_selector_matches(my_record, selector)
           result_contents << my_record.contents
         end
       end
       XSet.new(@record_length, result_contents)
     end

     def any_selector_matches(my_record, selector)
       selector.any? { 
         |selector_record, selector_scope |
         match(my_record, selector_record)
       }
     end

     def match(my_record, selector_record)
       selector_record.subset?(my_record)
     end

Works as planned. The test_match fails for wrong number of arguments. We change:

    def test_match
       input = XSet.new(4, "123 234 132 342 ")
       select = XSet.new(1,"1")
       assert(input.match(input.record(2), select, 0))    
    end

To this:

    def test_match
       input = XSet.new(4, "123 234 132 342 ")
       select = XSet.new(1,"1")
       assert(input.match(input.record(2), select.record(0)))    
    end

This is looking pretty good. Let’s look at the whole XST class:

   class XSet
     include Enumerable
     attr_reader :contents

     def initialize(record_length, contents)
       @record_length = record_length
       @contents = contents
     end

     def each
       for scope in record_range
         yield record(scope), scope
       end
     end

     def restrict(selector)
       result_contents = ""
       each do |my_record, ignored_scope |
         if any_selector_matches(my_record, selector)
           result_contents << my_record.contents
         end
       end
       XSet.new(@record_length, result_contents)
     end

     def any_selector_matches(my_record, selector)
       selector.any? { |selector_record, selector_scope |
         match(my_record, selector_record)
       }
     end

     def match(my_record, selector_record)
       selector_record.subset?(my_record)
     end

     def record(scope)
       record_contents = @contents[scope*@record_length,@record_length]
       XSet.new(1,record_contents)
     end

     def subset?(larger_set)
       record_range.all? { | scope |
         larger_set.contains?(record(scope), scope)
       }
     end

     def contains?(a_record, scope)
       record(scope).contents == a_record.contents
     end

     def record_range
       0...cardinality
     end

     def cardinality
       @contents.length / @record_length
     end
   end

A review of all the code in a class is always a good way to notice things. I’m troubled by this:

     def restrict(selector)
       result_contents = ""
       each do |my_record, ignored_scope |
         if any_selector_matches(my_record, selector)
           result_contents << my_record.contents
         end
       end
       XSet.new(@record_length, result_contents)
     end

     def any_selector_matches(my_record, selector)
       selector.any? { |selector_record, selector_scope |
         match(my_record, selector_record)
       }
     end

The issue is that any_selector_matches is a “utility method”. It doesn’t access any of the member variables of the class, or send any messages to self. What’s the problem? We should ask the selector, an XSet, whether it matches our record:

     def restrict(selector)
       result_contents = ""
       each do |my_record, ignored_scope |
         if selector.matches(my_record)
           result_contents << my_record.contents
         end
       end
       XSet.new(@record_length, result_contents)
     end

     def matches(a_record)
       any? { |selector_record, selector_scope |
         match(a_record, selector_record)
       }
     end

Much better. Instead of asking ourselves whether the selector set matches our record, we ask the selector set whether it matches our record. That’s an improvement. I’m a little troubled now by the similarity of names in methods “matches” and “match”. I’ll reflect on that later. My tests are green, and I’m tired. Time for a rest. Let’s reflect, then get out of here.

Reflection

We started with a small implementation, with an eye to improving the code. We wrote a test for “each”, implemented it, and then also tested “detect”. Very fine.

Then we looked for ways to use our new ability to iterate cleanly to improve the existing code. That led us to improve the restrict method profoundly, from this:

     def restrict(selector)
       result_contents = ""
       for scope in record_range
         for selector_scope in selector.record_range
           if match(scope, selector, selector_scope)
             result_contents << record(scope).contents
             break
           end
         end
       end
       XSet.new(@record_length, result_contents)
     end

Ultimately all the way to this:

     def restrict(selector)
       result_contents = ""
       each do |my_record, ignored_scope |
         if selector.matches(my_record)
           result_contents << my_record.contents
         end
       end
       XSet.new(@record_length, result_contents)
     end

Along the way, we used the any? iterator, extracted a method to clean up restrict, and simplified restrict further by using the iterator “each”. Then we cleaned up the calling sequence in match, improving it from this:

     def match(scope, match_set, match_scope)
      record1 = self.record(scope)
      record2 = match_set.record(match_scope)
      record2.subset?(record1)
    end

To this:

     def match(my_record, selector_record)
       selector_record.subset?(my_record)
     end

All in all, a noticeable improvement. Perhaps most interesting is that I didn’t get into any deep trouble anywhere along the way. The tests kept me safe, and I remembered to write a new one when I needed it.

Quite enough for one session. Ricia should be home soon, and it’s time for a little rest and then some dinner. See you next time!


  1. A little mantra that I try to remember to say before I start programming. It helps me remember … to begin with a test.

  2. Constant Reader CW observed that the use of the value 2 in test_each seems redundant, as does returning the scope from the each method. He proposed simplifying the code accordingly. That turns out not to be appropriate. The key difference between Classical Set Theory and Extended Set Theory is that elements of XSets have “scopes”, the little superscripts on top of the elements, as in, for example, { a0, b1, c2 }. The set {x1} does not equal the set {x2}. In particular, the relations I’m working with at the moment are intended to be vectors – n-tuples – of records, and as such, each record needs its own unique scope index, starting from zero.