Some cleanup efforts lead us to a theoretical idea, and to a successful spike. Out of chaos, order.

Examining the Code

I didn’t include a full listing in the previous article, because I didn’t review the full code base at that time. Here’s one now:

  class TC_MyTest < Test::Unit::TestCase

    def setup
      name_data = 
        "Jeffries    Ron Hendrickson ChetAnderson    Ann Johnson     Lee "
      @name_set = XSet.new(16, name_data)
      @five_element_set = XSet.new(4, "123 234 132 342 abc ")
    end

    def test_cardinality
      assert_equal(5, @five_element_set.cardinality)
    end

    def test_one_byte_record
      input = XSet.new(1,"abcdef")
      assert_equal("b", input.element(1).contents)
    end

    def test_record_bytes
      johnson = @name_set.element(3);
      assert_equal(2, @name_set.rank)
      assert_equal(1, johnson.rank)
      assert_equal("J", johnson.element(0))
    end

    def test_element_range
      assert_equal(0...5, @five_element_set.element_range)
    end

    def test_element_extraction
      assert_equal("132 ", @five_element_set.element(2).contents)
    end

    def test_match
      select = XSet.new(1,"1")
      assert(@five_element_set.match(@five_element_set.element(2), select.element(0)))    
    end

    def test_restrict
      select = XSet.new(1,"1")
      expected = "123 132 "
      result = @five_element_set.restrict(select)
      assert_equal(expected,result.contents)
    end

    def test_name_restrict
      select_data = "HendricksonJeffries   "
      select = XSet.new(11, select_data)
      expected = "Jeffries    Ron Hendrickson Chet"
      result = @name_set.restrict(select)
      assert_equal(expected, result.contents)
    end

    def test_single_selection
      select_data = "Jeffries   Jeffries   "
      select = XSet.new(11, select_data)
      expected = "Jeffries    Ron "
      result = @name_set.restrict(select)
      assert_equal(expected, result.contents)
    end

    def test_each_using_scope
      ann = ""
      @name_set.each do 
        | scope_element | 
        if (scope_element.scope==2) 
          ann = scope_element.element.contents 
        end
      end
      assert_equal("Anderson    Ann ", ann)
    end

    def test_detect
      chet_scope_element = @name_set.detect { 
        | scope_element | 
        scope_element.element.contents.include? "Chet" }
      assert_equal("Hendrickson Chet", chet_scope_element.element.contents)
    end

    def test_rank
      assert_equal(2, @name_set.rank)
    end

    def test_element_rank
      element = @name_set.element(2)
      assert_equal(1, element.rank)
    end

#    def test_firstname_restrict
#      name_data = "Jeffries    Ron Hendrickson ChetAnderson    Ann Johnson     Lee "
#      input = XSet.new(16, name_data)
#      select_data = "Ron Lee "
#      select = XSet.new(4, select_data)
#      expected = "Jeffries    Ron Johnson     Lee "
#      result = input.restrict(select)
#      assert_equal(expected, result.contents)
#    end
  end

   class XSet
     include Enumerable
     attr_reader :contents

    def initialize(element_length, contents, rank=2)
      @element_length = element_length
      @contents = contents
      @rank = rank
    end

    def each
      for scope in element_range
        yield ScopedElement.new(element(scope), scope)
      end
    end

    def restrict(selector)
      result_contents = ""
      each do | scoped_element |
        if selector.matches(scoped_element.element)
          result_contents << scoped_element.element.contents
        end
      end
      XSet.new(@element_length, result_contents)
    end

    def matches(an_element)
      any? { | scoped_element |
        match(an_element, scoped_element.element)
      }
    end

    def match(my_element, selector_element)
      selector_element.subset?(my_element)
    end

    def element(scope)
      element_contents = @contents[scope*@element_length,@element_length]
      if (@rank > 1)
        return XSet.new(1,element_contents, self.rank-1)
      else
        return element_contents
      end
    end

    def subset?(larger_set)
      element_range.all? { | scope |
        larger_set.contains?(element(scope), scope)
      }
    end

    def contains?(an_element, scope)
      element(scope) == an_element
    end

    def element_range
      0...cardinality
    end

    def cardinality
      @contents.length / @element_length
    end

    def rank
      @rank
    end
  end

  class ScopedElement
    attr_reader :element, :scope
    def initialize(element, scope)
      @element = element
      @scope = scope
    end
  end

First I’ll just look around and see what I see. First thing is this:

    def restrict(selector)
      result_contents = ""
      each do | scoped_element |
        if selector.matches(scoped_element.element)
          result_contents << scoped_element.element.contents
        end
      end
      XSet.new(@element_length, result_contents)
    end

    def matches(an_element)
      any? { | scoped_element |
        match(an_element, scoped_element.element)
      }
    end

    def match(my_element, selector_element)
      selector_element.subset?(my_element)
    end

I notice that we pull the element part out of one ScopedElement in the :restrict method, and the other one in the :matches method. That seems asymmetrical. It’s also making me a little nervous that we have the notion of “element” meaning the value part of the ScopedElement, and the notion of the ScopedElement itself. Those words are too close together for comfort. But, I think they’re called “element” and “scope” in Childs’s work, so maybe I’ll have to live with it. In any case, that asymmetry bothers me. How about this:

    def restrict(selector)
      result_contents = ""
      each do | scoped_element |
        if selector.matches(scoped_element)
          result_contents << scoped_element.element.contents
        end
      end
      XSet.new(@element_length, result_contents)
    end

    def matches(a_scoped_element)
      any? { | scoped_element |
        match(a_scoped_element, scoped_element)
      }
    end

    def match(my_scoped_element, selector_scoped_element)
      selector_scoped_element.element.subset?(my_scoped_element.element)
    end

That at least moves all the .element stuff to the same place, at the bottom of the nest. All is good except that test_match doesn’t run, because now :match expects ScopedElements:

    def test_match
      select = XSet.new(1,"1")
      assert(@five_element_set.match(@five_element_set.element(2), select.element(0)))    
    end

It turns out, of course, that the method :element returns a raw element, not an element wrapped up with a scope. This may be a problem but it’s not what we’re here to do. We’ll adjust the test.

    def test_match
      select = XSet.new(1,"1")
      ignored = 999
      five_elt = ScopedElement.new(@five_element_set.element(2), ignored)
      select_elt = ScopedElement.new(select.element(0), ignored)
      assert(@five_element_set.match(five_elt, select_elt))    
    end

That runs. The test is pretty weird, but it’s very invasive. It’s also redundant: we just wrote it to drive the development of the match code. Quite possibly we should delete the test altogether, because match is tested by the restrict tests, and it’s not logically public anyway. Might be coming up on time to make some methods private, though that isn’t usually my style. Anyway, not right now, we’re here to look at the code.

First, though, I’ve talked myself into deleting that test. Poof, gone!

Strings, Sets, and Slices

One thing that doesn’t look quite right to me is this code:

    def restrict(selector)
      result_contents = ""
      each do | scoped_element |
        if selector.matches(scoped_element)
          result_contents << scoped_element.element.contents
        end
      end
      XSet.new(@element_length, result_contents)
    end

    def matches(a_scoped_element)
      any? { | scoped_element |
        match(a_scoped_element, scoped_element)
      }
    end

    def match(my_scoped_element, selector_scoped_element)
      selector_scoped_element.element.subset?(my_scoped_element.element)
    end

    def subset?(larger_set)
      element_range.all? { | scope |
        larger_set.contains?(element(scope), scope)
      }
    end

    def element(scope)
      element_contents = @contents[scope*@element_length,@element_length]
      if (@rank > 1)
        return XSet.new(1,element_contents, self.rank-1)
      else
        return element_contents
      end
    end

    def contains?(an_element, scope)
      element(scope) == an_element
    end

What bugs me isn’t entirely clear. Let’s think about it. The restrict operator itself is computing the contents of the result set, which it is doing by appending strings together. That’s potentially very inefficient, as the string will be copied and reallocated time and again. I’m also a bit troubled because we’re computing the contents way down at the string level and that isn’t as abstract as one might like.

Neither of these is a big concern, because these are very early days, and because we are, after all, experimenting with byte manipulation. But still it bothers me a bit.

Down toward the middle of all this, we have our call to subset?, which is certainly formally correct but is more abstract than the match operations that go before. The subset? method iterates over the would-be subset, and checks each element one at a time to see if the two sets match. The element operation itself creates a string (containing one character) and returns it. Ultimately the strings are compared. The code reads clearly enough, and we probably can’t reduce the number of objects created if we want to compare the characters one at a time. While it’s true that a generic subset? method would compare elements one at a time, we know we have strings here, and we could do better.

At the same time, I’m rather proud of the way that design dropped out of the eaches, anys, and alls. Certainly one way to go would be to move a bit back toward the byte-manipulation approach, since this set’s physical representation is definitely OneBigString. Now that the tests are in place, we could do that. But as I have thought about this, another idea comes to mind.

Scope Transformation

One of the operations of Extended Set Theory is called “Scope Transform”. There are two variations, Scope Transform, and Constituent Scope Transform. The math for these is too difficult to type into HTML, but the concept is pretty easy. Suppose we had two sets, like these:

S = { Jeffriesname, Pinckneytown } T = { PersonNamename, Locationtown }

Then ScopeTransform(S, T) = S(ST)T = { JeffriesPersonName, PinckneyLocation }

As you can see, the Scope Transform is a renaming operation on scopes. As with any XST operation, the values in the sets in question are arbitrary. It follows, for example, that

< a, b, c, d > (ST) < 3, 2, 1, 0 > = < d, c, b, a > (Hint: < 3, 2, 1, 0 > = { 30, 21, 12, 03 })

Viewed at a symbolic level – if we ever get there – Scope Transform is good for things like renaming fields. Viewed at the numeric level, in dealing with ordered ntuples like strings, Scope Transform reorders them. Looking at memory, Scope Transform moves it around. Scope Transformation can even do a kind of selection as well as reordering:

<a, b, c, d > (ST) { 30, 01 } = < d, a >

You may be wondering why, when moments ago we were talking about string comparisons, I’ve dragged us off into this Scope Transform area. The reason is that I’m seeing a possibility. Couldn’t we use this operation to produce a “map” of some data set (string), containing only the records of interest?

Suppose we have a collection of records, like the name set in our tests. We do a restriction on it, resulting in a subset consisting of a few of those records. One way to represent that result would be to copy the records in questions into a new set. Indeed, that’s what we’re doing now. We make a new string containing the records we selected.

Suppose instead that we created a “virtual” string, described by a Scope Transform on the original set, calling for mapping, say, the third record to the initial position, the tenth record to the next position, and so on? This set could be represented by a reference to the original set, and a vector of what amount to record numbers! Instead of creating a potentially large string and creating a set from it, we could create instead, a small set of integers, one per record selected!

It seems to me that this could be a very effective and efficient way to do things, and I’d like to explore it soon.

There is one interesting issue: a set built of record numbers pointing into another set is essentially a view. What happens if the records in the underlying set change? There are some interesting possibilities there, involving another set operation, Symmetric Difference. We’ll come to that by and by, if this series goes on long enough and successfully enough.

For now, I think we’ll let this code stand, even though it’s a bit troubling, and take a look at Scope Transform. The reason is an interesting application of the YAGNI principle.

YAGNI - You Aren't Gonna Need It

When I hear myself saying “We’re gonna need” something, thinking that we might as well do it now, I imagine a little voice saying “You aren’t gonna need it.” I could have imagined that the voice said “But not yet,” or “Maybe we aren’t gonna need it,” but I imagine this shoulder-imp standing in the Stop! position, its hand out, saying “You aren’t gonna need it”. When I start to do something because it will be needed in the future, I want to be stopped, not slowed down or diverted. The stop will give me time to think.

In this case, the :restrict method and its subordinates could use some cleaning up, and perhaps wants to be changed to be a bit more focused on the bytes of the string rather than quite so abstract. We always need the code to be clean, don’t we? Well, right now, we aren’t gonna need what we would do now, if this Scope Transform thing pans out. If Scope Transform pays off, we will be changing how :restrict works, to use ST, and we will also be creating a second physical type of set, a sort of Scope Transform view. All sets are required to be amenable to all set operations. Therefore a Scope Transform view must respond to :restrict. I’m not sure just what this will entail, but my guess is that we’ll wind up creating a superclass or a mixin to contain the basic set operations such as :restrict.

Therefore, I think it’s appropriate to hold off cleaning up :restrict until the Scope Transform spike has been done, and then decide what to do.

Am I right? Am I wrong? We’ll see. Either way, things don’t take long, so in a very short period of time I’ll learn something. If I were committing days, that would be one thing. A couple of hours before I have a better clue: that’s worth doing. So here goes:

Scope Transform Test

Begin with a test, of course. In this case, a whole new test class. I haven’t been troubling you with the file structure of this code yet, and I won’t. But the tests for this, it seems to me, will have a new setup and such, so I’m going to keep them in a separate class, and as it happens a separate file as well. As is my habit, I’ll start with a failing test named test_hookup:

class ScopeTransformTest < Test::Unit::TestCase
  def test_hookup
    assert(false)
  end
end

Just as I’d hoped, test_hookup fails, and the other dozen tests pass. We’re on the way. Let’s see, what shall we do? Let’s start with an easy one. I’ll build a simple XSet, then explicitly build a ScopeTransformSet to contain just a couple of its records. That doesn’t sound too daunting, and shouldn’t take long:

  def test_select_two_records
    input = XSet.new(4, "1111222233334444")
    trans = ScopeTransform.new(input, "1021")
    assert_equal(input.element(1).contents, result.element(0).contents)
    assert_equal(input.element(2).contents, result.element(1).contents)
  end

Let’s look at that so you can see some tentative design decisions. I’ve decided to represent the ScopeTransform as a set of ordered pairs of bytes, representing the source record and target record numbers. So the “1021” there means to put record 1 in position zero, record 2 in position 1. That means the contents of the transformed result should be “22223333”.

In the asserts, I chose not to put in the literals “2222” and so on, but to show the relationships of the result records to the input records using the element numbers.

Now even as I just talk about this (thanks for pairing with me), I don’t like that string input very well: we know that we have to support more than 256 records. But it was convenient to type, so for now I guess I’ll live with it. We need two things to make this test run. We need a ScopeTransform class, and an element method.

Enough chatter, let’s get to it.

  class ScopeTransform
    def initialize(set, string)
    end

    def element(scope)
      XSet.new(5, "wrong")
    end
  end

Now, in the interest of full disclosure, it took me a couple of quick passes to get here. First, I became confused in writing the test. The idea of the test is that the ScopeTransform is the result of applying a :scope_transform operation to an XSet. It’s doing to be a kind of XSet when we’re done. But after creating the ScopeTransform, I started to send it through a not-yet-existing :scope_transform operation on XSet. That wasn’t necessary: I’m creating this transform directory.

Then it took me a time or two to figure out just what the element operation was supposed to return. I tried a string (not entirely silly) and a ScopedElement, which was silly, but probably tells us that the :element operation on sets is a bit iffy because of the overloaded idea of just the element part, and the scoped element itself. We’ll table that for now. The test fails in the way I wanted:

  1) Failure:
test_select_two_records(ScopeTransformTest) [./scopetransformtest.rb:5]:
<"2222"> expected but was
<"wrong">.

Now it should be a Small Matter Of Programming to make it work. Unless that string trick was too weird. We need to save the input XSet (no problem), and then figure out how to decode the string to return the elements we want. And now I see why the string was wrong.

The simplest Scope Transform I can imagine would just take an ordered ntuple of indexes, and return a set with those records in it. So instead of “1021”, the input should be just “12”. We can get to more complex transforms with holes in them and such, later on. While I’m at it, I’m going to an array of integers. Rewrite the test: it still won’t run of course:

    def test_select_two_records
      input = XSet.new(4, "1111222233334444")
      trans = ScopeTransform.new(input, [ 1, 2 ])
      assert_equal(input.element(1).contents, trans.element(0).contents)
      assert_equal(input.element(2).contents, trans.element(1).contents)
    end

This continues to fail. But now (at last) the implementation looks simple enough for a man of my limited powers to do:

  class ScopeTransform
    def initialize(set, array)
     @base_set = set
     @map = array
    end

    def element(scope)
      @base_set.element(@map[scope])
    end
  end

And the test runs! Done! Ship it!

Well, maybe not yet. But we do have a new object, a ScopeTransform, that represents an ordered selection of records from another set. It only responds to element so far, not even :each, much less :restrict, but we could, if we choose, use it as the return set from our XSet:restrict operation, and maybe that’s what we’ll do next. For now, let’s wrap up.

Reflection

We began with some needed cleanup. We noticed that there was an asymmetry in the code, where the same thing (getting an element to compare) was done at two levels in the method nest. We decided to move the element calls together, which made the code seem better.

Then we looked more deeply at the whole :restrict nest, and thought about it. Now at that point, I pulled a new set-theoretic idea, Scope Transform, out of my pocket. The rest of the team (you) had no way to know that was coming. In fact, I had no way to know it was coming. It was the result of a combination of ingredients:

  • We are supposed to be trying to implement an Extended Set Theory system;
  • I happen to be the guy on the team who knows the most about XST;
  • We were reflecting on the practice of selecting records and building whole new sets of them, and how that wasn't all that efficient;
  • I realized that the XST notion of Scope Transform could represent what we were doing;
  • We decided to try to build a simple virtual set, ScopeTransform, to see whether it would be possible.

When you’re doing a project, that’s how teamwork is supposed to work. (And in fact I discussed a bunch of this with my team, namely Chet, though he hasn’t been pairing on the code with me. (My wife won’t let him come over at 4 AM, and his wife won’t let him leave home at that hour.))

Good teams don’t work singly, they work together. They reflect on the code they have, they think of the technical ideas they know, and they choose some to try. If they’re wise, they choose experiments like this one, which can be done in an hour or two. Now we know something that may be worth trying.

I think this was a good exercise. We scanned the code and improved it; we reflected on some troubling issues; we considered a fairly technical solution that might help; we experimented with that solution and learned quite a bit about it, all in just a few lines of code. I call that a good session.

Next time, we’ll do a bit of planning and decide on a new “iteration”. Thanks for dropping by!