XST 2: Trying a new data structure.

I have a random idea about the data structure for sets, so thought I’d give it a try.

Presently I store a table of scopes, and for each scope in the set, an entry in the scope table for that scope. I was thinking to provide the reverse structure as well, mapping elements to the scopes under which they are found. I’m not sure why I think that’s good, it’s more that I just want to experience what it would be like.

Relatedly, instead of storing the inner items in a linear table, I plan to store them in a keyed table, pointing to true, so that there can be no duplicates.

I’ll do the second change first. Then, if I’m wise, I might do the first change never.

I’ll need to change addAt, hasAt, and card.

function XSet:addAt(element, scope)
    if not self.contents[scope] then
        self.contents[scope] = {}
    end
    table.insert(self.contents[scope], element)
end

function XSet:hasAt(element, scope)
    local tab = self.contents[scope]
    if not tab then return false end
    for _i,elt in ipairs(tab) do
        if elt == element then return true end
    end
    return false
end

function XSet:card()
    local card = 0
    self:iterate(function(e,s) card = card+1 end)
    return card
end

Ah. card should be OK as it stands. Life is good.

function XSet:addAt(element, scope)
    if not self.contents[scope] then
        self.contents[scope] = {}
    end
    self.contents[scope][element] = true
end

function XSet:hasAt(element, scope)
    local tab = self.contents[scope]
    if not tab then return false end
    return tab[element] == true
end

That fails lots of tests. Fixing iteration should help a bit:

function XSet:iterate(f)
    for scope,elements in pairs(self.contents) do
        for element, _true in pairs(elements) do
            f(element, scope)
        end
    end
end

I changed to pairs from ipairs, and changed which variable in the inner loop is the element. Tests are green.

Commit: convert inner table in contents to indexed by element.

First Change Never?

At this moment, I have no need for duplicating the contents to be indexed both by scope and by element. The implementation is asymmetric but no harm seems to be done. So maybe I shouldn’t do that change, as I have no call for it.

YAGNI, we call that.

I wonder, though. As it stands now, there’s a little extra messing about to create the scope table entry and to check to see if it is there. It would be “nice” if we could create a pair consisting of element and scope combined and index it into a single table. If we could index into the hashed part, that would be quite nice.

However, Lua compares tables for identity, not contents, when doing such things. So we can’t quite do that. If we could get a unique key somehow for a pair, we could use that. But elements and scopes can be anything, certainly numbers and strings, so I’m not quite sure what we could use. Could we stringify them and concatenate? Would that be better?

I suppose we could but I don’t see this as needed either. Looking further out, suppose we could associate a hash key with element-scope pairs, and even with table contents. We might be able to speed up set equality. But here again, we’re borrowing trouble from the future. Let’s not.

Instead, let’s implement union.

        _:test("Union", function()
            local A = XSet()
            A:addAt("Jeffries", "last")
            A:addAt("Hendrickson", "last")
            local B = XSet()
            B:addAt("Jeffries", "last")
            B:addAt("Johnson", "other")
            local C = A:union(B)
            _:expect(C:card()).is(3)
        end)

This should drive out the code, but I’ll do further checking as well.

Expect fail on missing union, when that’s in, on wrong cardinality,

5: Union -- Tests:58: attempt to call a nil value (method 'union')

function XSet:union(anXSet)
    local result = XSet()
    local f = function(element, scope) result:addAt(element,scope) end
    self:iterate(f)
    anXSet:iterate(f)
    return result
end

So that works. Let me beef up the test a bit:

        _:test("Union", function()
            local A = XSet()
            A:addAt("Jeffries", "last")
            A:addAt("Hendrickson", "last")
            local B = XSet()
            B:addAt("Jeffries", "last")
            B:addAt("Johnson", "other")
            local C = A:union(B)
            _:expect(C:card()).is(3)
            _:expect(C:hasAt("Jeffries","last")).is(true)
            _:expect(C:hasAt("Hendrickson","last")).is(true)
            _:expect(C:hasAt("Johnson","other")).is(true)
        end)

Still runs.

Note that these are all interesting sets, as they contain things like

{ Jeffries@last, Hendrickson@last, Johnson@other }

I think our next step should be to create a set of records. That brings us to an issue I’ve not addressed:

What is a Plain Old Set?

In XST ordered n-tuples are defined as

<a,b,c> = { a@1, b@2, c@3 }

Given the 1-tuple , which is {a@1}, we want to distinguish it from {a}. (If we don’t, weird things could happen if we thought we were processing tuples and a vanilla set showed up.

In XST, the scope for a plain vanilla set element is the null set, if I’m not mistaken. We can certainly try that. We can readily just define a global null set, but we would probably like for every set that turns out null to be identical to the null set.

Right now, all our sets start out null, and then get things added to them. If we could somehow init our sets to be the null set and then when we add to them, magically change them to non null … but that’s too tricky, and possibly impossible.

Instead, let’s say that all our operations that can return an empty set will return the unique NullSet in that case. That should be easy enough to do.

Looking down the road, I want all sets to be immutable once created. The addAt method will have to be moved “behind the curtain somehow”, with some other way of creating a set with contents

And looking about that far down the road, we’ll also encounter the “symmetric difference” operator, which is really fun. But that’s for another day.

Let’s try for a test for NullSet.

        _:test("Null Intersect", function()
            local A = XSet()
            A:addAt("Jeffries", "last")
            A:addAt("Ron", "first")
            _:expect(A:card(), "#A").is(2)
            local B = XSet()
            B: addAt("Ron", "notFirst")
            _:expect(B:card(), "#B").is(1)
            local C = A:intersect(B)
            _:expect(C:card()).is(0)
            _:expect(C).is(NullSet)
        end)

The card check should pass and the NullSet fail on an XSet vs a nil.

4: Null Intersect  -- Actual: table: 0x28300d2c0, Expected: nil

Good. Also tells me that I’d like a default string for sets.

function XSet:__tostring()
    return "XSet("..self:card().." elements)"
end

And to return the NullSet first define it.

function XSet:init()
    self.contents = {}
end

XSet.NULL = XSet()

I decided not to make it a full global. Change the test:

        _:test("Null Intersect", function()
            local A = XSet()
            A:addAt("Jeffries", "last")
            A:addAt("Ron", "first")
            _:expect(A:card(), "#A").is(2)
            local B = XSet()
            B: addAt("Ron", "notFirst")
            _:expect(B:card(), "#B").is(1)
            local C = A:intersect(B)
            _:expect(C:card()).is(0)
            _:expect(C).is(XSet.NULL)
        end)

Test runs. Commit: Intersect returns XSet.NULL on null result.

We should probably make it so that no one can add items to the null set, and while we’re at it, other output sets.

        _:test("Null Intersect", function()
            local A = XSet()
            A:addAt("Jeffries", "last")
            A:addAt("Ron", "first")
            _:expect(A:card(), "#A").is(2)
            local B = XSet()
            B: addAt("Ron", "notFirst")
            _:expect(B:card(), "#B").is(1)
            local C = A:intersect(B)
            _:expect(C:card()).is(0)
            _:expect(C).is(XSet.NULL)
            _:expect(C:addAt("x", 1)).throws("set is locked")
        end)

This fails so far, because it doesn’t throw.

function XSet:lock()
    self.locked = true
    return self
end

And I locked the NULL set. Now to check it:

Oops, got the test wrong, have to enclose the call in a function so that throws can work:

        _:test("Null Intersect", function()
            local A = XSet()
            A:addAt("Jeffries", "last")
            A:addAt("Ron", "first")
            _:expect(A:card(), "#A").is(2)
            local B = XSet()
            B: addAt("Ron", "notFirst")
            _:expect(B:card(), "#B").is(1)
            local C = A:intersect(B)
            _:expect(C:card()).is(0)
            _:expect(C).is(XSet.NULL)
            _:expect(function() C:addAt("x", 1) end).throws("set is locked")
        end)

And …

XSet.NULL = XSet()
XSet.NULL:lock()

And of course:

function XSet:lock()
    self.locked = true
    return self
end

I returned the set after locking, so that I can make this test pass also:

        _:test("Union", function()
            local A = XSet()
            A:addAt("Jeffries", "last")
            A:addAt("Hendrickson", "last")
            local B = XSet()
            B:addAt("Jeffries", "last")
            B:addAt("Johnson", "other")
            local C = A:union(B)
            _:expect(C:card()).is(3)
            _:expect(C:hasAt("Jeffries","last")).is(true)
            _:expect(C:hasAt("Hendrickson","last")).is(true)
            _:expect(C:hasAt("Johnson","other")).is(true)
            _:expect(function() C:addAt("x", 1) end).throws("set is locked")
        end)

This will fail not thrown:

6: Union  -- Actual: nothing thrown, Expected: set is locked

And:

function XSet:union(anXSet)
    local result = XSet()
    local f = function(element, scope) result:addAt(element,scope) end
    self:iterate(f)
    anXSet:iterate(f)
    return result:lock()
end

Test should pass. Does. Commit: NULL and union output are locked. intersect returns NULL when result is empty.

However, I’m not locking a non-null intersection. Better test that too.

        _:test("Intersect", function()
            local A = XSet()
            A:addAt("Jeffries", "last")
            A:addAt("Ron", "first")
            _:expect(A:card(), "#A").is(2)
            local B = XSet()
            B: addAt("Ron", "first")
            _:expect(B:card(), "#B").is(1)
            local C = A:intersect(B)
            _:expect(C:hasAt("Ron", "first")).is(true)
            _:expect(C:card()).is(1)
            _:expect(function() C:addAt("x", 1) end).throws("set is locked")
        end)

Should fail not thrown. Does. Fix:

function XSet:intersect(anXSet)
    local result = XSet()
    local cont = anXSet.contents
    local added = false
    anXSet:iterate(function(element,scope)
        if self:hasAt(element,scope) then
            result:addAt(element, scope)
            added = true
        end
    end)
    return added and result:lock() or XSet.NULL
end

Tests green. Commit: intersect returns locked result.

Summary

I almost let my ideas run ahead of my need, but managed to hold myself back from implementing things not needed. Then I implemented until, NULL, and locking, all of which are needed, at least for now.

We may be able to hide the addAt method later on, but for now we need it to build our tests.

So, I think that was good stuff. I’ll see you soon with something even more weird.