Our previous work left me thinking about whether there would be lots of redundant Scope Transform bytes needed in all the records. I chatted with Chet about it, and thought about it a bit more, and I think we're OK. This article will bring you up to date on the thinking. No new code, but I will include the complete listings as of now just to keep you up to speed.

The Big Picture

Chet and I were chatting at lunch today, about what’s going on here, and the difficulty of building a “framework” without specific stories. Certainly what I usually recommend with respect to a framework is that we write a real application, and factor out everything that looks like a framework.

We’re looking at a different situation here. We represent some company with a technical invention (Extended Set Theory, in this case), and we’re trying to figure out two things simultaneously: whether we can use it effectively and what product to build with it. Right now, though these articles have gone on for days, there’s still probably less than two days’ real programming in the code, so it’s not like we’re over-investing in up front work, even if you don’t take into account that we actually have running code.

This whole project can be thought of as “Research and Development”. An important aspect of that is that while the research is going on … so is the development. R&D more commonly means Research … and then after a long time … Development.

But enough philosophy. I’m here to kill some alligators.

Mapping Considerations

Yesterday (and Saturday night), we did that little ShiftedRecord object, to explore how bytes might be slud1 over to line up with bytes in other parts of other records. It wasn’t hard to make the test work, and we might not even be far away from being able to make this one work:

#    def test_firstname_restrict
#      name_data = "Jeffries    Ron Hendrickson ChetAnderson    Ann Johnson     Lee "
#      input = XSet.new(16, name_data)
#      select_data = "Ron Lee "
#      select = XSet.new(4, select_data)
#      expected = "Jeffries    Ron Johnson     Lee "
#      result = input.restrict(select)
#      assert_equal(expected, result.contents)
#    end

We can’t quite make it work, because the test as written has no way to indicate that we intend the “Ron” and “Lee” to line up with byte 12 rather than 0. But we have a decent technical start on the underlying implementation. No hurry on that, and we’re getting closer.

I was thinking today about the ShiftedRecord object and some issues with it. The ShiftedRecord has at least one serious drawback. It seems to imply that the offset and length are right there as part of the data. (That’s not required by the object, but it is certainly the way I was thinking and the way I described it.) As I discussed in the preceding article, including all those offsets and lengths would be redundant. It would also constitute duplication and it would be repetitive2. In particular, our plain flat string implementation has an implied ScopeTransform of “identity”, i.e. [0, 1, … ] or { 00, 11, … }. We wouldn’t want to have to put an indicator of that in every record: it would be wasteful3.

So I was thinking. There are some sets where every record has the same identity Scope Transform. There are others where every record has the same non-identity transform. And there are some where each record has its own transform … and surely some in between. Therefore …

What we might want is a single kind of set that could support all these notions. It would include two separate parts … a data part, a string or a slice of one; and a map. The string slice might change as we increment forward record by record, as in :each. The map would change, never, seldom, or all the time, depending on the needs of the set.

Speculating just a bit further, the Scope Transform map might change based on a map-changing strategy:

  • Flat Unmapped Set: always identity Scope Transform;
  • Flat Mapped Set: always some constant Scope Transform;
  • Each Record Unique: reset Scope Transform on every record.

Hey! Isn’t this YAGNI?? Well, no. As we’ve discussed, YAGNI was created to keep us from building things before their time, not to keep us from thinking. Thinking is good. We’re just chatting here. In fact, there’s value to a limited amount of speculation about what we might do – it gives us confidence. There is a big difference between knowing no way to do something and knowing one way. There is a lesser difference, but an important one, in knowing a few good ways to do something. When we know how we might do something, we’ve moved from “might be impossible” to “might be ugly”. That’s a big step.

In this case, it’s a bigger step. The thinking has helped me to resolve a concern that was growing in my mind, about whether there need to be overhead bytes packed into all the records. The sketch of an idea described here tells me that we can probably have no overhead at all, in most sets, and have descriptive overhead only where we need it, in sets of complex structure.

Now, the footnotes, then the code for reference. See you next time!


  1. Slud: past tense of slide, according to Dizzy Dean

  2. This sort of thing is what I use in lieu of humor. My apologies.

  3. See 2.

Appendix: Current Code

  class TC_MyTest < Test::Unit::TestCase

    def setup
      name_data = "Jeffries    Ron Hendrickson ChetAnderson    Ann Johnson     Lee "
      @name_set = XSet.new(16, name_data)
      @five_element_set = XSet.new(4, "123 234 132 342 abc ")
    end

    def test_cardinality
      assert_equal(5, @five_element_set.cardinality)
    end

    def test_one_byte_record
      input = XSet.new(1,"abcdef")
      assert_equal("b", input.element(1).element(0))
    end

    def test_record_bytes
      johnson = @name_set.element(3);
      assert_equal(2, @name_set.rank)
      assert_equal(1, johnson.rank)
      assert_equal("J", johnson.element(0))
    end

    def test_element_range
      assert_equal(0...5, @five_element_set.element_range)
    end

    def test_element_extraction
      assert_equal("132 ", @five_element_set.element(2).contents)
    end

    def test_restrict
      select = XSet.new(1,"1")
      expected = "123 132 "
      result = @five_element_set.restrict(select)
      assert_equal(expected,result.contents)
    end

    def test_name_restrict
      select_data = "HendricksonJeffries   "
      select = XSet.new(11, select_data)
      expected = "Jeffries    Ron Hendrickson Chet"
      result = @name_set.restrict(select)
      assert_equal(expected, result.contents)
    end

    def test_single_selection
      select_data = "Jeffries   Jeffries   "
      select = XSet.new(11, select_data)
      expected = "Jeffries    Ron "
      result = @name_set.restrict(select)
      assert_equal(expected, result.contents)
    end

    def test_each_using_scope
      ann = ""
      @name_set.each do 
        | scope_element | 
        if (scope_element.scope==2) 
          ann = scope_element.element.contents 
        end
      end
      assert_equal("Anderson    Ann ", ann)
    end

    def test_detect
      chet_scope_element = @name_set.detect { 
        | scope_element | 
        scope_element.element.contents.include? "Chet" }
      assert_equal("Hendrickson Chet", chet_scope_element.element.contents)
    end

    def test_rank
      assert_equal(2, @name_set.rank)
    end

    def test_element_rank
      element = @name_set.element(2)
      assert_equal(1, element.rank)
    end

    def test_shifted_record
      r = XSet.new(1, "Hendrickson Chet", 1)
      chet = ShiftedRecord.new(12,4,"Chet")
      ron = ShiftedRecord.new(12,4,"Ron ")
      assert(chet.subset?(r), "Chet sought but not found")
      assert(!ron.subset?(r), "Ron incorrectly found")
    end

#    def test_firstname_restrict
#      name_data = "Jeffries    Ron Hendrickson ChetAnderson    Ann Johnson     Lee "
#      input = XSet.new(16, name_data)
#      select_data = "Ron Lee "
#      select = XSet.new(4, select_data)
#      expected = "Jeffries    Ron Johnson     Lee "
#      result = input.restrict(select)
#      assert_equal(expected, result.contents)
#    end
  end

  class ShiftedRecord
    def initialize(offset, length, string)
      @offset = offset
      @length = length
      @string = string
    end

    def subset? set
      each do  | se |
        if ( set.element(se.scope) != se.element ) 
          return false
        end
      end
      return true
    end

    def each
      for index in 0...@length
        yield ScopedElement.new(@string[index,1], index+@offset)
      end
    end
  end

  class XSet
    include Enumerable
    attr_reader :contents

    def initialize(element_length, contents, rank=2)
      @element_length = element_length
      @contents = contents
      @rank = rank
    end

    def each
      for scope in element_range
        yield ScopedElement.new(element(scope), scope)
      end
    end

    def restrict(selector)
      matching_scopes = []
      each do | scoped_element |
        if selector.matches(scoped_element)
          matching_scopes << scoped_element.scope
        end
      end
      ScopeTransform.new(self, matching_scopes)
    end

    def matches(a_scoped_element)
      any? { | scoped_element |
        match(a_scoped_element, scoped_element)
      }
    end

    def match(my_scoped_element, selector_scoped_element)
      selector_scoped_element.element.subset?(my_scoped_element.element)
    end

    def subset?(larger_set)
      element_range.all? { | scope |
        larger_set.contains?(element(scope), scope)
      }
    end

#    def subset? set
#      each do  | se |
#        if ( set.element(se.scope) != se.element ) 
#          return false
#        end
#      end
#      return true
#    end

    def element(scope)
      element_contents = @contents[scope*@element_length,@element_length]
      if (@rank > 1)
        return XSet.new(1,element_contents, self.rank-1)
      else
        return element_contents
      end
    end

    def contains?(an_element, scope)
      element(scope) == an_element
    end

    def element_range
      0...cardinality
    end

    def cardinality
      @contents.length / @element_length
    end

    def rank
      @rank
    end
  end

  class ScopedElement
    attr_reader :element, :scope
    def initialize(element, scope)
      @element = element
      @scope = scope
    end

    def to_s
      "SE#{@scope}=>#{@element}"
    end
  end

 class ScopeTransformTest < Test::Unit::TestCase
    def test_select_two_records
      input = XSet.new(4, "1111222233334444")
      trans = ScopeTransform.new(input, [ 1, 2 ])
      assert_equal(input.element(1).contents, trans.element(0).contents)
      assert_equal(input.element(2).contents, trans.element(1).contents)
    end

    def test_reverse_two_records
      input = XSet.new(4, "1111222233334444")
      trans = ScopeTransform.new(input, [ 3, 1 ])
      assert_equal(input.element(3).contents, trans.element(0).contents)
      assert_equal(input.element(1).contents, trans.element(1).contents)      
    end
  end

  class ScopeTransform
    def initialize(set, array)
     @base_set = set
     @map = array
    end

    def element(scope)
      @base_set.element(@map[scope])
    end

    def contents
      result_string = ""
      @map.each do | scope |
        result_string << @base_set.element(scope).contents
      end
      result_string
    end
  end

  class HashExperiment < Test::Unit::TestCase  
    def test_hash
      h = { :LastName=>"Jeffries", :FirstName=>"Ron" }
      assert_equal("Jeffries", h[:LastName])
      s = [ { :LastName=>"Jeffries", :FirstName=>"Ron" },
            { :LastName=>"Hendrickson", :FirstName=>"Chet" } ]
      assert_equal("Chet", s[1][:FirstName])
    end

    def test_mixed_set
      s = [ { :LastName=>"Jeffries", :FirstName=>"Ron" },
            { :Age=>35 } ]
      assert_equal( 35, s[1][:Age])
    end
  end