A small experiment, implementing "restrict" in Ruby. As usual, warts and all.

Getting Started

It’s probably time to see whether any of the Ruby IDEs out there are worth using, as the state of the art has advanced a lot since last I worked with the language. The language itself is up to version 1.8.2, and the marvelous Pickaxe book is twice the size of the old one. And my copy is autographed by Dave Thomas himself!

For now, I’ll continue to work in TextPad. I have a macro defined that allows me to require in project files, all defined in a conventional target file, “project.rb”, which the macro compiles and runs. The starting file looks like this for this experiment:

require 'test/unit'
require'restricttest.rb'

My initial test is always the same, one that fails:

  class TC_MyTest < Test::Unit::TestCase

     def test_fail
       assert(false, 'Assertion was false.')
     end
   end

Results are:

Loaded suite project
Started
F
Finished in 0.09 seconds.

  1) Failure:
test_fail(TC_MyTest) [./restricttest.rb:4]:
Assertion was false.
<false> is not true.

1 tests, 1 assertions, 1 failures, 0 errors

Tool completed with exit code 1

That’s good news … for now. Let’s move on.

My mission here will be to test a restrict operator on some kind of packed byte structure, which I expect to be a string. I expect to [re]learn a lot about Ruby before this is over. Ruby programmers: if you see me doing something really dumb, please drop me a note. Don’t forget to put [ron] in as part of the subject line, to get through my spam filters.

For my first test, I think I’ll just create an extended set containing a packed up string with a couple of records in it. If that turns out to be too hard, I’ll back off and try something simpler.

     def test_restrict
       input = XSet.new(4, "123 234 132 342 ")
       select = XSet.new(1,"1")
       expected = "123 132 "
       result = input.restrict(select)
       assert_equal(expected,result.contents)
     end

OK. I am, of course, making this up. The new extended set class is XSet, and I’m supposing that the new method takes a record length and packed record. That will surely change, but it should be enough for now. Let’s fake it, with intention to fail rather than succeed. After a couple of tries relearning a little Ruby syntax, I get this class and associated expected failure:

   class XSet

     def initialize(recordLength, contents)
     end

     def restrict(anXset)
       XSet.new(0,nil)
     end

     def contents
       "wrong"
     end
   end

Loaded suite project
Started
F
Finished in 0.11 seconds.

  1) Failure:
test_restrict(TC_MyTest) [./restricttest.rb:8]:
<"123 132 "> expected but was
<"wrong">.

1 tests, 1 assertions, 1 failures, 0 errors

Tool completed with exit code 1

This permits me to code. I’ll add a few lines of code but keep the test broken.

   class XSet

     def initialize(recordLength, contents)
       @recordLength = recordLength
       @contents = contents
     end

     def restrict(anXset)
       XSet.new(11, "still wrong")
     end

     def contents
       @contents
     end
   end

This fails as expected: still wrong. Now, with your kind permission, I’m going to try to program restrict by intention. This could be too big a bite, but I think I know what I’m doing. (Remember the redneck’s last words: “Watch this!”)

Yes, well. That didn’t work. I had to code a lot. Thought I was doing OK, but the test fails. Here’s what I wrote:

     def restrict(anXset)
       resultContents = ""
       for myBase in recordRange
         for hisBase in anXset.recordRange
           if match(record(myBase), anXset.record(hisBase))
             result << record(myBase)
           end
         end
       end
       XSet.new(@recordLength, resultContents)
     end

     def recordRange
       0..cardinality-1
     end

     def cardinality
       @contents.length / @recordLength
     end

     def record(base)
       @contents[base..base+@recordLength-1]
     end

     def match(record1, record2)
       record1==record2
     end

Fortunately, years of fouling up in public make me fairly willing to share my learning. Let’s write some tests against the little methods, to begin with.

    def test_cardinality
       input = XSet.new(4, "123 234 132 342 abc ")
       assert_equal(5, input.cardinality)
    end

That one passes. That makes me think that recordRange should work, but I’m feeling less than confident, so I’ll check it.

Wait! I see the bug. In coding this up so nicely, I haven’t taken the record length into account. The recordRange is just an index from 0 to cardinality-1. I could fix it, but I need to learn from my mistakes, so I’m going to fill in some needed tests first.

    def test_recordRange
       input = XSet.new(4, "123 234 132 342 abc ")
       assert_equal(0..4, input.recordRange)
    end

This passes also. I was sure it would. Now let me look at the code and see if it’s OK to fix the problem. Look at these methods:

     def restrict(anXset)
       resultContents = ""
       for myBase in recordRange
         for hisBase in anXset.recordRange
           if match(record(myBase), anXset.record(hisBase))
             result << record(myBase)
           end
         end
       end
       XSet.new(@recordLength, resultContents)
     end

     def record(base)
       @contents[base..base+@recordLength-1]
     end

I rewrote the restrict part way through. I had taken a break from a lovely Sunday breakfast and viewing of Sunday Morning, and was glancing through the Pickaxe book. That set me on a different way of expressing the algorithm for restrict. But I didn’t erase it and start over, I edited it. I left the name myBase in, intending it to mean the base byte index of a record in the set. But it’s not … it’s the raw index, 0-n. It needs to be multiplied by record length to be correct.

My programming by intention threw me off, through my choice of a bad name. Let’s fix it. Here’s what I wrote, but it still doesn’t work:

     def restrict(anXset)
       resultContents = ""
       for myRecordIndex in recordRange
         for hisRecordIndex in anXset.recordRange
           if match(record(myRecordIndex), anXset.record(hisRecordIndex))
             result << record(myRecordIndex)
           end
         end
       end
       XSet.new(@recordLength, resultContents)
     end

     def record(recordIndex)
       @contents[recordIndex*@recordLength..recordIndex*@recordLength+@recordLength-1]
     end

Well, I have to wonder whether record is working as I expect, so I’ll write a test for that:

    def test_record
       input = XSet.new(4, "123 234 132 342 abc ")
       assert_equal("132 ", input.record(2))
    end

That works! So I can see two more possibilities. Either == doesn’t compare two strings for equality the way I think it does, or the << operator doesn’t work the way I think it does. (Or I’m even more confused.) I think I’m getting decent tests in place, so I’m not going to back this all out. But I probably should. Maybe next, a test for “match”, which looks like this:

     def match(record1, record2)
       record1==record2
     end

Well, duh, that’s obviously not right. The two records, from the input set and the selection set, aren’t even the same SIZE!! The match method is flat wrong. What we want is that they match up to the length of the second record. Let me write a test: obviously I need more discipline here, and should have gone in smaller bites. Still, each test is taking me further, so I’m going to press on.

    def test_match
       input = XSet.new(4, "123 234 132 342 ")
       select = XSet.new(1,"1")
       assert(input.match(input.record(2), select.record(0)))    
    end

This fails, of course, and now I’ll make it pass:

     def match(record1, record2)
       record1[0..record2.length-1]==record2
     end

That finds the error in the restrict method, where I started the string with the name resultContents, and then started calling it result. Grr.

     def restrict(anXset)
       resultContents = ""
       for myRecordIndex in recordRange
         for hisRecordIndex in anXset.recordRange
           if match(record(myRecordIndex), anXset.record(hisRecordIndex))
             resultContents << record(myRecordIndex)
           end
         end
       end
       XSet.new(@recordLength, resultContents)
     end

Whew! All the tests run, just as I expected. The initial implementation of the restrict method works. Time for some cleanup – there are things I don’t like. First, let’s look at the XSet code:

   class XSet

     def initialize(recordLength, contents)
       @recordLength = recordLength
       @contents = contents
     end

     def restrict(anXset)
       resultContents = ""
       for myRecordIndex in recordRange
         for hisRecordIndex in anXset.recordRange
           if match(record(myRecordIndex), anXset.record(hisRecordIndex))
             resultContents << record(myRecordIndex)
           end
         end
       end
       XSet.new(@recordLength, resultContents)
     end

     def record(recordIndex)
       @contents[recordIndex*@recordLength..recordIndex*@recordLength+@recordLength-1]
     end

     def recordRange
       0..cardinality-1
     end

     def cardinality
       @contents.length / @recordLength
     end

     def match(record1, record2)
       record1[0..record2.length-1]==record2
     end

     def contents
       @contents
     end
   end

What’s not to like? There are a lot of “-1” adjustments in there, which are of course inevitable when you’re looping from zero to something. But I’d like to get rid of them. And I’m not entirely happy with using the range option to pick up the bytes from the records, in match, because that will create a lot of temporary strings. Match should be changed to work inside the contents buffers. And we need to keep it fast, since that’s the main point of this kind of operation. Finally, there is some other arithmetic that we can reduce by having the set cache things like its cardinality and such1

Red, Green, Refactor

Fist thing I want to fix is the match operator, and the fact that it runs on slices. I’ll begin by passing in the basic parameters, the two record numbers, and the second set. That requires me to change one of the tests. The complete changes are:

    def test_match
       input = XSet.new(4, "123 234 132 342 ")
       select = XSet.new(1,"1")
       assert(input.match(2, select, 0))    
    end

     def restrict(anXset)
       resultContents = ""
       for myRecordIndex in recordRange
         for hisRecordIndex in anXset.recordRange
           if match(myRecordIndex, anXset, hisRecordIndex)
             resultContents << record(myRecordIndex)
           end
         end
       end
       XSet.new(@recordLength, resultContents)
     end    

     def match(myRecordIndex, anXset, hisRecordIndex)
       record1 = record(myRecordIndex)
       record2 = anXset.record(hisRecordIndex)
       record1[0..record2.length-1]==record2
     end

The tests still run. I haven’t resolved my issue, which is the copying of the records, but now it’s all centralized in this one method. So let’s digress.

I’m doing something here that I usually recommend against: I’m optimizing code for which I have no proof of needing an optimization. So maybe I’m doing something bad. On the other hand, the point of these operations is to be as fast as possible, and I also need to learn more about the best way to build and access the underlying bytes of the sets in question. On those grounds, I’m granting myself permission to push this a bit further. On the other hand, if I were pairing, my pair might hold me back. Premature optimization is definitely bad. Still, I’m going ahead, and I expect it to take less time to do it than it took to write this paragraph.

Wrong again! It took longer than it should have, perhaps ten minutes. I didn’t even set out to get done, just to grab the records from the set contents directly. I’ll show the code so far, but then I need to take a break: I’m making too many mistakes.

     def match(myRecordIndex, anXset, hisRecordIndex)
       hisContents = anXset.contents
       hisBase = anXset.recordBase(hisRecordIndex)
       myBase = recordBase(myRecordIndex)
       length = anXset.recordLength - 1
       record1 = @contents[myBase..myBase+length]
       record2 = hisContents[hisBase..hisBase+length]
       record1==record2
     end

     def recordBase(recordIndex)
       recordIndex*@recordLength
     end

All that’s happening here is that I’m grabbing the contents buffer for the input set, calculating the offsets and lengths, then snarfing the records and comparing them. The next step will be to write the compare loop out longhand, so it doesn’t create any objects …

Or will it??? My plan is that I’m going to write, somewhere, something that compares two bytes from the strings, like

buf1[i]==buf2[j]

That’s going to create two strings of length 1, isn’t it? Odds are, that’s not more efficient, and it might be less, depending on what string == really does. If it’s a decent primitive, it should go faster. I could go ahead and test both these approaches but I’m sure that in Ruby, this isn’t an improvement. (In C#, it might be, because characters are value types and don’t have to be created as objects. We’ll leave that concern for another day.)So, what have I learned? Primarily, I’ve learned to listen to my own advice. “Optimizing” this method, at least this way, isn’t a good idea. I probably can’t beat the record-slicing approach I took initially. The case isn’t proven, so I could be wrong, but it seems clear that I’m way into the roundoff in any case.

See? You should listen to me. Even I should listen to me! I’m going to take a break, then revert this code, then look around to see what else needs “improvement”.

Back to a Decent State

Here’s where I reverted to, plus a tiny bit of renaming cleanup I still want to clean up a bit of that -1 code, if nothing else. I’m reminded that Ruby will help me with that! I’ve been using the “..” operator, which includes the end point. I can use “…”, which does not. Hold on, I’m going to put that in before I show you the code.

  class TC_MyTest < Test::Unit::TestCase

    def test_cardinality
       input = XSet.new(4, "123 234 132 342 abc ")
       assert_equal(5, input.cardinality)
    end

    def test_recordRange
       input = XSet.new(4, "123 234 132 342 abc ")
       assert_equal(0...5, input.recordRange)
    end

    def test_record
       input = XSet.new(4, "123 234 132 342 abc ")
       assert_equal("132 ", input.record(2))
    end

    def test_match
       input = XSet.new(4, "123 234 132 342 ")
       select = XSet.new(1,"1")
       assert(input.match(2, select, 0))    
    end

     def test_restrict
       input = XSet.new(4, "123 234 132 342 ")
       select = XSet.new(1,"1")
       expected = "123 132 "
       result = input.restrict(select)
       assert_equal(expected,result.contents)
     end
   end

   class XSet
     attr        :recordLength
     attr_reader :contents

     def initialize(recordLength, contents)
       @recordLength = recordLength
       @contents = contents
     end

     def restrict(aSet)
       resultContents = ""
       for myRecordIndex in recordRange
         for hisRecordIndex in aSet.recordRange
           if match(myRecordIndex, aSet, hisRecordIndex)
             resultContents << record(myRecordIndex)
           end
         end
       end
       XSet.new(@recordLength, resultContents)
     end

     def match(myRecordIndex, aSet, hisRecordIndex)
       record1 = self.record(myRecordIndex)
       record2 = aSet.record(hisRecordIndex)
       record1[0...record2.length]==record2
     end

     def record(recordIndex)
       @contents[recordIndex*@recordLength...recordIndex*@recordLength+@recordLength]
     end

     def recordRange
       0...cardinality
     end

     def cardinality
       @contents.length / @recordLength
     end
   end

Wrapup for Now

The tests are green, and the code, while far from great, is fairly decent. It’s been a long off-and-on day, time to wrap this up and hang with Ricia. Next time, I think a bit more cleanup, probably focused on the asymmetry of the way match gets its own record implicitly and the other set’s explicitly. I think I’d rather pass both records in, though that will make match reference no member variables of the class. Anyway, that’s for another day. Let’s put this one to bed.


  1. The word “cardinality”, by the way, is set-guy talk for “number of items in the set”. I was trained to use these words, and will probably use them in the code. I’ll try to remember to define them.