XST: Looking Around
Examining the Code
I didn’t include a full listing in the previous article, because I didn’t review the full code base at that time. Here’s one now:
class TC_MyTest < Test::Unit::TestCase
def setup
name_data =
"Jeffries Ron Hendrickson ChetAnderson Ann Johnson Lee "
@name_set = XSet.new(16, name_data)
@five_element_set = XSet.new(4, "123 234 132 342 abc ")
end
def test_cardinality
assert_equal(5, @five_element_set.cardinality)
end
def test_one_byte_record
input = XSet.new(1,"abcdef")
assert_equal("b", input.element(1).contents)
end
def test_record_bytes
johnson = @name_set.element(3);
assert_equal(2, @name_set.rank)
assert_equal(1, johnson.rank)
assert_equal("J", johnson.element(0))
end
def test_element_range
assert_equal(0...5, @five_element_set.element_range)
end
def test_element_extraction
assert_equal("132 ", @five_element_set.element(2).contents)
end
def test_match
select = XSet.new(1,"1")
assert(@five_element_set.match(@five_element_set.element(2), select.element(0)))
end
def test_restrict
select = XSet.new(1,"1")
expected = "123 132 "
result = @five_element_set.restrict(select)
assert_equal(expected,result.contents)
end
def test_name_restrict
select_data = "HendricksonJeffries "
select = XSet.new(11, select_data)
expected = "Jeffries Ron Hendrickson Chet"
result = @name_set.restrict(select)
assert_equal(expected, result.contents)
end
def test_single_selection
select_data = "Jeffries Jeffries "
select = XSet.new(11, select_data)
expected = "Jeffries Ron "
result = @name_set.restrict(select)
assert_equal(expected, result.contents)
end
def test_each_using_scope
ann = ""
@name_set.each do
| scope_element |
if (scope_element.scope==2)
ann = scope_element.element.contents
end
end
assert_equal("Anderson Ann ", ann)
end
def test_detect
chet_scope_element = @name_set.detect {
| scope_element |
scope_element.element.contents.include? "Chet" }
assert_equal("Hendrickson Chet", chet_scope_element.element.contents)
end
def test_rank
assert_equal(2, @name_set.rank)
end
def test_element_rank
element = @name_set.element(2)
assert_equal(1, element.rank)
end
# def test_firstname_restrict
# name_data = "Jeffries Ron Hendrickson ChetAnderson Ann Johnson Lee "
# input = XSet.new(16, name_data)
# select_data = "Ron Lee "
# select = XSet.new(4, select_data)
# expected = "Jeffries Ron Johnson Lee "
# result = input.restrict(select)
# assert_equal(expected, result.contents)
# end
end
class XSet
include Enumerable
attr_reader :contents
def initialize(element_length, contents, rank=2)
@element_length = element_length
@contents = contents
@rank = rank
end
def each
for scope in element_range
yield ScopedElement.new(element(scope), scope)
end
end
def restrict(selector)
result_contents = ""
each do | scoped_element |
if selector.matches(scoped_element.element)
result_contents << scoped_element.element.contents
end
end
XSet.new(@element_length, result_contents)
end
def matches(an_element)
any? { | scoped_element |
match(an_element, scoped_element.element)
}
end
def match(my_element, selector_element)
selector_element.subset?(my_element)
end
def element(scope)
element_contents = @contents[scope*@element_length,@element_length]
if (@rank > 1)
return XSet.new(1,element_contents, self.rank-1)
else
return element_contents
end
end
def subset?(larger_set)
element_range.all? { | scope |
larger_set.contains?(element(scope), scope)
}
end
def contains?(an_element, scope)
element(scope) == an_element
end
def element_range
0...cardinality
end
def cardinality
@contents.length / @element_length
end
def rank
@rank
end
end
class ScopedElement
attr_reader :element, :scope
def initialize(element, scope)
@element = element
@scope = scope
end
end
First I’ll just look around and see what I see. First thing is this:
def restrict(selector)
result_contents = ""
each do | scoped_element |
if selector.matches(scoped_element.element)
result_contents << scoped_element.element.contents
end
end
XSet.new(@element_length, result_contents)
end
def matches(an_element)
any? { | scoped_element |
match(an_element, scoped_element.element)
}
end
def match(my_element, selector_element)
selector_element.subset?(my_element)
end
I notice that we pull the element part out of one ScopedElement in the :restrict method, and the other one in the :matches method. That seems asymmetrical. It’s also making me a little nervous that we have the notion of “element” meaning the value part of the ScopedElement, and the notion of the ScopedElement itself. Those words are too close together for comfort. But, I think they’re called “element” and “scope” in Childs’s work, so maybe I’ll have to live with it. In any case, that asymmetry bothers me. How about this:
def restrict(selector)
result_contents = ""
each do | scoped_element |
if selector.matches(scoped_element)
result_contents << scoped_element.element.contents
end
end
XSet.new(@element_length, result_contents)
end
def matches(a_scoped_element)
any? { | scoped_element |
match(a_scoped_element, scoped_element)
}
end
def match(my_scoped_element, selector_scoped_element)
selector_scoped_element.element.subset?(my_scoped_element.element)
end
That at least moves all the .element stuff to the same place, at the bottom of the nest. All is good except that test_match doesn’t run, because now :match expects ScopedElements:
def test_match
select = XSet.new(1,"1")
assert(@five_element_set.match(@five_element_set.element(2), select.element(0)))
end
It turns out, of course, that the method :element returns a raw element, not an element wrapped up with a scope. This may be a problem but it’s not what we’re here to do. We’ll adjust the test.
def test_match
select = XSet.new(1,"1")
ignored = 999
five_elt = ScopedElement.new(@five_element_set.element(2), ignored)
select_elt = ScopedElement.new(select.element(0), ignored)
assert(@five_element_set.match(five_elt, select_elt))
end
That runs. The test is pretty weird, but it’s very invasive. It’s also redundant: we just wrote it to drive the development of the match code. Quite possibly we should delete the test altogether, because match is tested by the restrict tests, and it’s not logically public anyway. Might be coming up on time to make some methods private, though that isn’t usually my style. Anyway, not right now, we’re here to look at the code.
First, though, I’ve talked myself into deleting that test. Poof, gone!
Strings, Sets, and Slices
One thing that doesn’t look quite right to me is this code:
def restrict(selector)
result_contents = ""
each do | scoped_element |
if selector.matches(scoped_element)
result_contents << scoped_element.element.contents
end
end
XSet.new(@element_length, result_contents)
end
def matches(a_scoped_element)
any? { | scoped_element |
match(a_scoped_element, scoped_element)
}
end
def match(my_scoped_element, selector_scoped_element)
selector_scoped_element.element.subset?(my_scoped_element.element)
end
def subset?(larger_set)
element_range.all? { | scope |
larger_set.contains?(element(scope), scope)
}
end
def element(scope)
element_contents = @contents[scope*@element_length,@element_length]
if (@rank > 1)
return XSet.new(1,element_contents, self.rank-1)
else
return element_contents
end
end
def contains?(an_element, scope)
element(scope) == an_element
end
What bugs me isn’t entirely clear. Let’s think about it. The restrict operator itself is computing the contents of the result set, which it is doing by appending strings together. That’s potentially very inefficient, as the string will be copied and reallocated time and again. I’m also a bit troubled because we’re computing the contents way down at the string level and that isn’t as abstract as one might like.
Neither of these is a big concern, because these are very early days, and because we are, after all, experimenting with byte manipulation. But still it bothers me a bit.
Down toward the middle of all this, we have our call to subset?, which is certainly formally correct but is more abstract than the match operations that go before. The subset? method iterates over the would-be subset, and checks each element one at a time to see if the two sets match. The element operation itself creates a string (containing one character) and returns it. Ultimately the strings are compared. The code reads clearly enough, and we probably can’t reduce the number of objects created if we want to compare the characters one at a time. While it’s true that a generic subset? method would compare elements one at a time, we know we have strings here, and we could do better.
At the same time, I’m rather proud of the way that design dropped out of the eaches, anys, and alls. Certainly one way to go would be to move a bit back toward the byte-manipulation approach, since this set’s physical representation is definitely OneBigString. Now that the tests are in place, we could do that. But as I have thought about this, another idea comes to mind.
Scope Transformation
One of the operations of Extended Set Theory is called “Scope Transform”. There are two variations, Scope Transform, and Constituent Scope Transform. The math for these is too difficult to type into HTML, but the concept is pretty easy. Suppose we had two sets, like these:
S = { Jeffriesname, Pinckneytown } T = { PersonNamename, Locationtown }
Then ScopeTransform(S, T) = S(ST)T = { JeffriesPersonName, PinckneyLocation }
As you can see, the Scope Transform is a renaming operation on scopes. As with any XST operation, the values in the sets in question are arbitrary. It follows, for example, that
< a, b, c, d > (ST) < 3, 2, 1, 0 > = < d, c, b, a > (Hint: < 3, 2, 1, 0 > = { 30, 21, 12, 03 })
Viewed at a symbolic level – if we ever get there – Scope Transform is good for things like renaming fields. Viewed at the numeric level, in dealing with ordered ntuples like strings, Scope Transform reorders them. Looking at memory, Scope Transform moves it around. Scope Transformation can even do a kind of selection as well as reordering:
<a, b, c, d > (ST) { 30, 01 } = < d, a >
You may be wondering why, when moments ago we were talking about string comparisons, I’ve dragged us off into this Scope Transform area. The reason is that I’m seeing a possibility. Couldn’t we use this operation to produce a “map” of some data set (string), containing only the records of interest?
Suppose we have a collection of records, like the name set in our tests. We do a restriction on it, resulting in a subset consisting of a few of those records. One way to represent that result would be to copy the records in questions into a new set. Indeed, that’s what we’re doing now. We make a new string containing the records we selected.
Suppose instead that we created a “virtual” string, described by a Scope Transform on the original set, calling for mapping, say, the third record to the initial position, the tenth record to the next position, and so on? This set could be represented by a reference to the original set, and a vector of what amount to record numbers! Instead of creating a potentially large string and creating a set from it, we could create instead, a small set of integers, one per record selected!
It seems to me that this could be a very effective and efficient way to do things, and I’d like to explore it soon.
There is one interesting issue: a set built of record numbers pointing into another set is essentially a view. What happens if the records in the underlying set change? There are some interesting possibilities there, involving another set operation, Symmetric Difference. We’ll come to that by and by, if this series goes on long enough and successfully enough.
For now, I think we’ll let this code stand, even though it’s a bit troubling, and take a look at Scope Transform. The reason is an interesting application of the YAGNI principle.
YAGNI - You Aren't Gonna Need It
When I hear myself saying “We’re gonna need” something, thinking that we might as well do it now, I imagine a little voice saying “You aren’t gonna need it.” I could have imagined that the voice said “But not yet,” or “Maybe we aren’t gonna need it,” but I imagine this shoulder-imp standing in the Stop! position, its hand out, saying “You aren’t gonna need it”. When I start to do something because it will be needed in the future, I want to be stopped, not slowed down or diverted. The stop will give me time to think.
In this case, the :restrict method and its subordinates could use some cleaning up, and perhaps wants to be changed to be a bit more focused on the bytes of the string rather than quite so abstract. We always need the code to be clean, don’t we? Well, right now, we aren’t gonna need what we would do now, if this Scope Transform thing pans out. If Scope Transform pays off, we will be changing how :restrict works, to use ST, and we will also be creating a second physical type of set, a sort of Scope Transform view. All sets are required to be amenable to all set operations. Therefore a Scope Transform view must respond to :restrict. I’m not sure just what this will entail, but my guess is that we’ll wind up creating a superclass or a mixin to contain the basic set operations such as :restrict.
Therefore, I think it’s appropriate to hold off cleaning up :restrict until the Scope Transform spike has been done, and then decide what to do.
Am I right? Am I wrong? We’ll see. Either way, things don’t take long, so in a very short period of time I’ll learn something. If I were committing days, that would be one thing. A couple of hours before I have a better clue: that’s worth doing. So here goes:
Scope Transform Test
Begin with a test, of course. In this case, a whole new test class. I haven’t been troubling you with the file structure of this code yet, and I won’t. But the tests for this, it seems to me, will have a new setup and such, so I’m going to keep them in a separate class, and as it happens a separate file as well. As is my habit, I’ll start with a failing test named test_hookup:
class ScopeTransformTest < Test::Unit::TestCase
def test_hookup
assert(false)
end
end
Just as I’d hoped, test_hookup fails, and the other dozen tests pass. We’re on the way. Let’s see, what shall we do? Let’s start with an easy one. I’ll build a simple XSet, then explicitly build a ScopeTransformSet to contain just a couple of its records. That doesn’t sound too daunting, and shouldn’t take long:
def test_select_two_records
input = XSet.new(4, "1111222233334444")
trans = ScopeTransform.new(input, "1021")
assert_equal(input.element(1).contents, result.element(0).contents)
assert_equal(input.element(2).contents, result.element(1).contents)
end
Let’s look at that so you can see some tentative design decisions. I’ve decided to represent the ScopeTransform as a set of ordered pairs of bytes, representing the source record and target record numbers. So the “1021” there means to put record 1 in position zero, record 2 in position 1. That means the contents of the transformed result should be “22223333”.
In the asserts, I chose not to put in the literals “2222” and so on, but to show the relationships of the result records to the input records using the element numbers.
Now even as I just talk about this (thanks for pairing with me), I don’t like that string input very well: we know that we have to support more than 256 records. But it was convenient to type, so for now I guess I’ll live with it. We need two things to make this test run. We need a ScopeTransform class, and an element method.
Enough chatter, let’s get to it.
class ScopeTransform
def initialize(set, string)
end
def element(scope)
XSet.new(5, "wrong")
end
end
Now, in the interest of full disclosure, it took me a couple of quick passes to get here. First, I became confused in writing the test. The idea of the test is that the ScopeTransform is the result of applying a :scope_transform operation to an XSet. It’s doing to be a kind of XSet when we’re done. But after creating the ScopeTransform, I started to send it through a not-yet-existing :scope_transform operation on XSet. That wasn’t necessary: I’m creating this transform directory.
Then it took me a time or two to figure out just what the element operation was supposed to return. I tried a string (not entirely silly) and a ScopedElement, which was silly, but probably tells us that the :element operation on sets is a bit iffy because of the overloaded idea of just the element part, and the scoped element itself. We’ll table that for now. The test fails in the way I wanted:
1) Failure: test_select_two_records(ScopeTransformTest) [./scopetransformtest.rb:5]: <"2222"> expected but was <"wrong">.
Now it should be a Small Matter Of Programming to make it work. Unless that string trick was too weird. We need to save the input XSet (no problem), and then figure out how to decode the string to return the elements we want. And now I see why the string was wrong.
The simplest Scope Transform I can imagine would just take an ordered ntuple of indexes, and return a set with those records in it. So instead of “1021”, the input should be just “12”. We can get to more complex transforms with holes in them and such, later on. While I’m at it, I’m going to an array of integers. Rewrite the test: it still won’t run of course:
def test_select_two_records
input = XSet.new(4, "1111222233334444")
trans = ScopeTransform.new(input, [ 1, 2 ])
assert_equal(input.element(1).contents, trans.element(0).contents)
assert_equal(input.element(2).contents, trans.element(1).contents)
end
This continues to fail. But now (at last) the implementation looks simple enough for a man of my limited powers to do:
class ScopeTransform
def initialize(set, array)
@base_set = set
@map = array
end
def element(scope)
@base_set.element(@map[scope])
end
end
And the test runs! Done! Ship it!
Well, maybe not yet. But we do have a new object, a ScopeTransform, that represents an ordered selection of records from another set. It only responds to element so far, not even :each, much less :restrict, but we could, if we choose, use it as the return set from our XSet:restrict operation, and maybe that’s what we’ll do next. For now, let’s wrap up.
Reflection
We began with some needed cleanup. We noticed that there was an asymmetry in the code, where the same thing (getting an element to compare) was done at two levels in the method nest. We decided to move the element calls together, which made the code seem better.
Then we looked more deeply at the whole :restrict nest, and thought about it. Now at that point, I pulled a new set-theoretic idea, Scope Transform, out of my pocket. The rest of the team (you) had no way to know that was coming. In fact, I had no way to know it was coming. It was the result of a combination of ingredients:
- We are supposed to be trying to implement an Extended Set Theory system;
- I happen to be the guy on the team who knows the most about XST;
- We were reflecting on the practice of selecting records and building whole new sets of them, and how that wasn't all that efficient;
- I realized that the XST notion of Scope Transform could represent what we were doing;
- We decided to try to build a simple virtual set, ScopeTransform, to see whether it would be possible.
When you’re doing a project, that’s how teamwork is supposed to work. (And in fact I discussed a bunch of this with my team, namely Chet, though he hasn’t been pairing on the code with me. (My wife won’t let him come over at 4 AM, and his wife won’t let him leave home at that hour.))
Good teams don’t work singly, they work together. They reflect on the code they have, they think of the technical ideas they know, and they choose some to try. If they’re wise, they choose experiments like this one, which can be done in an hour or two. Now we know something that may be worth trying.
I think this was a good exercise. We scanned the code and improved it; we reflected on some troubling issues; we considered a fairly technical solution that might help; we experimented with that solution and learned quite a bit about it, all in just a few lines of code. I call that a good session.
Next time, we’ll do a bit of planning and decide on a new “iteration”. Thanks for dropping by!