OK, I like the Finite State Machine, but I don’t like it enough. What can we do to make it better?

The new FSM class development has gone smoothly, and I’m using it in one place already, the first Non-Player Character, Horn Girl. Her FSM looks like this:

{
        initial="m0",
        events = {
            {event="query", from="m0", to="m1"},
            {event="query", from="m1", to="m2"},
            {event="query", from="m2", to="m3"},
            {event="query", from="m3", to="m1"},
            {event="received", from="m0", to="ty"},
            {event="received", from="m1", to="ty"},
            {event="received", from="m2", to="ty"},
            {event="received", from="m3", to="ty"},
        },
        callbacks = {
            on_enter_state = function(fsm,event,from,to)
                self:setMessage(to)
            end,
        },
        data = {
            messages = {
                "I am your friendly neighborhood Horn Girl!\nAnd do I have a super quest for you!",
                "I am looking for my lost Amulet of Cat Persuasion,\nso that I can persuade the Cat Girl to let me pet her kitty.",
                "If you can find it and bring it to me,\nI shall repay you generously.",
                "Thank you!"
            },
            stateToMessage = {m1=1, m2=2, m3=3, ty=4},
        }
    }

What’s not to like? Two things. First, there are all those duplicate lines. The first four are different in both from and to, but the second four vary only in from. I’m told, by the people who tell you things, that this is a common case: there are many different starting states and a given event wants to send all of them to the same target state.

The second issue is similar but not quite the same. In my mind–keep your smart remarks to yourself back there–in my mind, there are really only two states I care about, call them “waiting” and “happy”. If she’s waiting for the Amulet, there is a story she’ll tell, and then what the story really calls for is for her to just enter a sort of final “waiting” mode where she says the same thing over and over until you pony up with the Amulet.

But that has turned into four FSM states, a starting state and one for each of the messages that she can say. The only difference between these states is that we use the state name as a key to the message we want. (Now that I look at that, I think we could do that better as well.)

Now there are known solutions to these concerns, or perhaps one should say “known responses”. They include implementing sub- and superstates, push-down automata, concurrent finite state machines, and the big gun, Goal-Oriented Action Planning. GOAP is the latest thing, unless there’s an even more latest thing that I don’t know about, for programming “AI” in games. GOAP takes a collection of goals, which have pre- and postconditions, and the GOAP program figures out a minimum-cost “plan” of attaining preconditions and postconditions until finally attaining the desired goal. Find axe, chop tree, make raft sort of thing.

If we were building a real live 2021 game engine, we’d probably implement GOAP, and I can imagine that we might do that someday.

Even the lesser solutions are more than we need right now. The style of development that I am trying to show is one where we build just the infrastructure we really need, keeping the code well organized, expecting that when new needs arise, our well-factored code will support those needs well enough.

That may not always be true. I try always to work as if it is, because I think it’s the best strategy in general, and because if we do run across some necessary change and our code can’t support it, we’ll learn something really interesting.

But I Digress

For now, we’ll eschew these medium or large solutions, and see what we can do at a smaller scale to make FSMs like the one we have here easier to write.

First, something that I think should be “easy”. Let’s change our FSM class so that instead of writing this:

            {event="received", from="m0", to="ty"},
            {event="received", from="m1", to="ty"},
            {event="received", from="m2", to="ty"},
            {event="received", from="m3", to="ty"},

We can write this:

            {event="received", from={"m0","m1","m2","m3"}, to="ty"},

That seems like a reasonable shorthand and, well, how hard could it possibly be?

Hmm …

I wonder if we’d like a similar shorthand like this:

            {event="query", from={"m0","m1","m2","m3"}, to={"m1","m2","m3","m1"}},

Note that those two tables are the same size, but different, expressing the step-wise behavior of these four:

            {event="query", from="m0", to="m1"},
            {event="query", from="m1", to="m2"},
            {event="query", from="m2", to="m3"},
            {event="query", from="m3", to="m1"},

We should keep on course, but that’s a tempting idea.

Let’s write a test for our original planned case.

        _:test("multiple from states", function()
            local tab = {
                initial = "m1",
                events = {
                    {event="step",from="m1",to="m2"},
                    {event="step",from="m2",to="m3"},
                    {event="jump",from={"m1","m2","m3"},to="done"},
                    {event="back",from="done",to="m1"},
                }
            }
            fsm = FSM(tab)
            _:expect(fsm:state()).is("m1")
            fsm:event("jump")
            _:expect(fsm:state()).is("done")
            fsm:event("back")
            _:expect(fsm:state()).is("m1")
            fsm:event("step")
            _:expect(fsm:state()).is("m2")
            fsm:event("jump")
            _:expect(fsm:state()).is("done")
        end)

I think this expresses the idea. (I’m not using “m3” yet. If I don’t use it, I’ll remove it. If this much works, I figure we’re good.

Of course at present it doesn’t work:

6: multiple from states  -- Actual: m1, Expected: done
6: multiple from states  -- Actual: m1, Expected: done

Right. Now let’s look at how these babies get set up.

In some implementations of FSM, we’d be building up a table of transitions and we’d expand our table like a macro. But we don’t preprocess our tables, so we’ll need to recognize the table while in flight, here:

function FSM:event(event)
    local cb = self.tab.callbacks
    return self:performAccepted(event, function(trans)
        self:doCallback("on_leave_state", event, trans)
        self:doCallback("on_enter_"..trans.to, event, trans)
        self:doCallback("on_enter_state", event, trans)
        self.fsm_state = trans.to
    end)
end

function FSM:performAccepted(event, f)
    for i,trans in ipairs(self.tab.events) do
        if trans.from == self.fsm_state and trans.event == event then
            if f then f(trans) end
            return true
        end
    end
    return false
end

The work happens in that second method, performAccepted. Let’s extract a method:

function FSM:fromMatches(trans)
    return trans.from == self.fsm_state
end

function FSM:performAccepted(event, f)
    for i,trans in ipairs(self.tab.events) do
        if self:fromMatches(trans) and trans.event == event then
            if f then f(trans) end
            return true
        end
    end
    return false
end

This is a pure refactoring, and the tests all work except the two checks in our new test. Now we extend the method:

function FSM:fromMatches(trans)
    local from = trans.from
    if type(from) ~= "table" then
        return trans.from == self.fsm_state
    end
    for i,fr in ipairs(from) do
        if fr == self.fsm_state then return true end
    end
    return false
end

The test passes. Let’s improve that a bit:

function FSM:fromMatches(trans)
    local desired = self.fsm_state
    local from = self:asTable(trans.from)
    for i,fr in ipairs(from) do
        if fr == desired then return true end
    end
    return false
end

function FSM:asTable(something)
    if type(something) == "table" then
        return something
    else
        return { something }
    end
end

Our new function asTable returns an input table, and wraps anything else as a one-element table.

I like that better. YMMV. I’d like it better still if all the objects in Lua could respond to asTable. But they can’t. Let’s do see about moving that function over to our TableOperations:

function asTable(something)
    if type(something) == "table" then
        return something
    else
        return {something}
    end
end

And use it here and remove the method:

function FSM:fromMatches(trans)
    local desired = self.fsm_state
    local from = asTable(trans.from)
    for i,fr in ipairs(from) do
        if fr == desired then return true end
    end
    return false
end

Inline function:

function FSM:fromMatches(trans)
    local desired = self.fsm_state
    for i,fr in ipairs(asTable(trans.from)) do
        if fr == desired then return true end
    end
    return false
end

Inline local variable:

function FSM:fromMatches(trans)
    for i,fr in ipairs(asTable(trans.from)) do
        if fr == self.fsm_state then return true end
    end
    return false
end

Tests all good. Commit: FSM accepts table of from states. asTable added to table operations.

Use our new feature in NPC:

    local tbl = {
        initial="m0",
        events = {
            {event="query", from="m0", to="m1"},
            {event="query", from="m1", to="m2"},
            {event="query", from="m2", to="m3"},
            {event="query", from="m3", to="m1"},
            {event="received", from={"m0","m1","m2","m3"}, to="ty"},
        },

Test.

works, but ...

Well, we got the Thank You without going through all the messages, but then when I queried in the ty state, I got an odd message “illegal query???”. That won’t really do. Where does that come from?

function NPC:query()
    local worked = self.fsm:event("query")
    if worked then
        return self.msg or "??? no message"
    else
        return "illegal query???"
    end
end

I put that in as a belt-and-suspenders kind of thing. I think the new rule should be to produce a more benign message. We could return an empty string safely enough. Or we could implement a query on the ty state. I rather like that idea better here but let’s also return empty strings if all else fails.

If we had an error log, I’d log an error.

function NPC:query()
    local worked = self.fsm:event("query")
    if worked then
        return self.msg or ""
    else
        return ""
    end
end

function NPC:hornGirlTable()
    local tbl = {
        initial="m0",
        events = {
            {event="query", from="m0", to="m1"},
            {event="query", from="m1", to="m2"},
            {event="query", from="m2", to="m3"},
            {event="query", from="m3", to="m1"},
            {event="received", from={"m0","m1","m2","m3"}, to="ty"},
            {event="query", from={"ty","ty1"}, to="ty1"},
        },
        callbacks = {
            on_enter_state = function(fsm,event,from,to)
                self:setMessage(to)
            end,
        },
        data = {
            messages = {
                "I am your friendly neighborhood Horn Girl!\nAnd do I have a super quest for you!",
                "I am looking for my lost Amulet of Cat Persuasion,\nso that I can persuade the Cat Girl to let me pet her kitty.",
                "If you can find it and bring it to me,\nI shall repay you generously.",
                "Thank you!",
                "Hello, friend!",
            },
            stateToMessage = {m1=1, m2=2, m3=3, ty=4, ty1=5},
        }
    }
    return tbl
end

We need a new state, one for “Thank you” and one for “Hello, friend”, plus the new message. The table above works as intended: Horn Girl just says “Hello, friend” to further queries.

hello

Commit: NPC Horn Girl uses table transitions, now says “Hello, friend” when happy.

Let’s have a quick retrospective and look around.

Retro

So that went well. We added a new ability to FSM, for a shorthand state transition definition, when one event takes a number of starting states into the same end state. Then we used that capability in our one NPC, the Horn Girl.

We’re not done with Horn Girl yet: she needs to give a valuable item to the Player.

A few other things have occurred to me as we went along here. I’ll make cards for them:

  • Make it easier to do a list of messages in NPC/FSM
  • A way to ensure only one InventoryItem of a given type in a level, etc. *Touched button should highlight, for better videos.
  • Touch and hold? on InventoryItem should announce what it is without using it.
  • Integrity check FSM?
  • Allow wildcard from in FSM

Let’s look at the “easier way to do a list of messages.

List of Messages

In one of the FSM articles I looked at, I forget which one, the rule was that the callbacks that occurred before the transition could return true if the transition could occur, and false if not. That would allow the callback to decide, based on some internal knowledge, that the conditions weren’t right yet.

We could use that facility to increment a message count until we have consumed all the messages for a given state, allowing us to collapse our “m0”-“m3” into one state. Since these tables aren’t incredibly easy to set up, that would be useful.

Right now, we’re not well set up to do that. We only run the various callbacks if we have already decided to do the transition:

function FSM:event(event)
    local cb = self.tab.callbacks
    return self:performAccepted(event, function(trans)
        self:doCallback("on_leave_state", event, trans)
        self:doCallback("on_enter_"..trans.to, event, trans)
        self:doCallback("on_enter_state", event, trans)
        self.fsm_state = trans.to
    end)
end

function FSM:fromMatches(trans)
    for i,fr in ipairs(asTable(trans.from)) do
        if fr == self.fsm_state then return true end
    end
    return false
end

function FSM:performAccepted(event, f)
    for i,trans in ipairs(self.tab.events) do
        if self:fromMatches(trans) and trans.event == event then
            if f then f(trans) end
            return true
        end
    end
    return false
end

Let’s think what we’d want the rules to be.

We do want to be sure that we have a transition to perform. At that point we have a defined from s1 and to s2. Then we should call the specific on_leave_s1 and save whether it has returned false or true. Then call on_leave_state, if we implement it, and save that result. If either of those is false, we return without changing state.

Sounds like we can just exit immediately on false for either of those.

Anyway, then, if we’re going to change state, we can call on_enter_s2 and on_enter_state if we have those.

!!!

As I reflected on this, I was thinking about storing a message index or counter in the FSM table. That could be a problem, because Lua passes tables by reference, which means that if we create two FSMs using the same table, they have the same table. That’s probably not what one usually wants.

That’s for another day, at least for now. Many FSM implementations get around this by generating a special instance containing a “compiled” version of the input table, set up to be faster and easier to use, if a lot more obscure, than our table. So there are a couple of ways around this issue if and when.

OK, let’s see if we can make this multiple call thing happen.

I’ll write a test for it.

        _:test("callback can refuse transition", function()
            local tab = {
                
            }
            fsm = FSM(tab)
            _:expect(fsm:state()).is("s1")
            fsm:event("step", false)
            _:expect(fsm:state()).is("s1")
            fsm:event("step", true)
            _:expect(fsm:state()).is("s2")
        end)

I’ve already decided on another new feature: if we pass additional arguments to event, they should be passed to any of the callbacks that are triggered. I think I’ll convert this test to a test for that, first. So:

        _:test("callback can receive args", function()
            local result = nil
            local f = function(event, from, to, bool, int)
                if bool then 
                    result = int
                else
                    result = 2*int
                end
            end
            local tab = {
                initial = "s1",
                events = {
                    {event="step", from="s1", to="s2"},
                    {event="step", from="s2", to="s3"},
                    {event="step", from="s3", to="s3"},
                },
                callbacks = {
                    on_enter_s1 = f,
                    on_enter_s2 = f,
                    on_enter_s3 = f,
                }
            }
            fsm = FSM(tab)
            _:expect(fsm:state()).is("s1")
            fsm:event("step", false, 31)
            _:expect(fsm:state()).is("s2")
            _:expect(result).is(62)
            fsm:event("step", true, 32)
            _:expect(fsm:state()).is("s3")
            _:expect(result).is(32)
        end)

This should fail on the result but not on the states.

7: callback can receive args  -- Actual: nil, Expected: 62
7: callback can receive args  -- Actual: nil, Expected: 32

I love it when a plan comes together.

Now after relearning how Lua variable arguments work, we have this:

function FSM:event(event, ...)
    local args = {...}
    return self:performAccepted(event, function(trans)
        self:doCallback("on_leave_state", event, trans, args)
        self:doCallback("on_enter_"..trans.to, event, trans, args)
        self:doCallback("on_enter_state", event, trans, args)
        self.fsm_state = trans.to
    end)
end

function FSM:doCallback(callbackName, event, trans, args)
    local cb = self.tab.callbacks
    if cb[callbackName] then
        cb[callbackName](self, event, trans.from, trans.to, table.unpack(args))
    end
end

Test runs. Commit: additional arguments to FSM:event are passed to callbacks.

It’s 1200 hours. I’m going to take a break. See you soon.

Later That Day …

It’s 1420 now and I’m ready to proceed. Should I explain that variable args stuff? OK, yes. What we’d like to do is accept a variable number of arguments in our event method, and then pass on those arguments to all the callbacks we do. You’d hope that you could say this:

function FSM:event(event, ...)
    return self:performAccepted(event, function(trans)
        self:doCallback("on_leave_state", event, trans, ...)
        self:doCallback("on_enter_"..trans.to, event, trans, ...)
        self:doCallback("on_enter_state", event, trans, ...)
        self.fsm_state = trans.to
    end)
end

But you cannot, because those uses of ... are not inside a function that was passed variable arguments, they are inside a different function.

The ..., when it is usable, is a list of values. You could, after receiving them, up at the top of event, say:

a,b,c,d = ...

And up to the first four values passed in as ... would be assigned to those values. It’s like what comes back from a function that returns multiple values. It’s some kind of bizarre upacked table.

So what we do is turn it into a table by wrapping it {...} and saving the table, then pass the table to our doCallback and then unpack it into the function we call:

function FSM:event(event, ...)
    local args = {...}
    return self:performAccepted(event, function(trans)
        self:doCallback("on_leave_state", event, trans, args)
        self:doCallback("on_enter_"..trans.to, event, trans, args)
        self:doCallback("on_enter_state", event, trans, args)
        self.fsm_state = trans.to
    end)
end

function FSM:doCallback(callbackName, event, trans, args)
    local cb = self.tab.callbacks
    if cb[callbackName] then
        cb[callbackName](self, event, trans.from, trans.to, table.unpack(args))
    end
end

Weird. As far as I could find, that’s how you do it. In any case, it works and passes the test.

Now I can write a test for the feature I want, which is to skip the transition if the on_leave' function doesn't allow it. And I think we'll have to implement a new on_leave_STATE` as well as the generic. YAGNI, but we’d best do it.

So, a test:

        _:test("on_leave callback can prevent transition", function()
            local count = 0
            function go_on_2(fsm,event,from,to)
                count = count + 1
                return count > 1
            end
            local tab = {
                initial = "s1",
                events = {
                    {event="go", from="s1", to="s2"},
                },
                callbacks = {
                    on_leave_state = go_on_2
                }
            }
            fsm = FSM(tab)
            _:expect(fsm:state()).is("s1")
            fsm:event("go")
            _:expect(fsm:state()).is("s1")
            fsm:event("go")
            _:expect(fsm:state()).is("s2")
        end)

I had planned to use a callback where I passed in a true/false to say whether to go, but when I got down to it, I did it this new way with a counter. I expect this test to fail expecting s1 and getting s2.

8: on_leave callback can prevent transition  -- Actual: s2, Expected: s1

Yes. Now this:

function FSM:doCallback(callbackName, event, trans, args)
    local cb = self.tab.callbacks
    if cb[callbackName] then
        cb[callbackName](self, event, trans.from, trans.to, table.unpack(args))
    end
end

Needs to return the result of the call …

function FSM:doCallback(callbackName, event, trans, args)
    local cb = self.tab.callbacks
    if cb[callbackName] then
        return cb[callbackName](self, event, trans.from, trans.to, table.unpack(args))
    end
end

And this:

function FSM:event(event, ...)
    local args = {...}
    return self:performAccepted(event, function(trans)
        self:doCallback("on_leave_state", event, trans, args)
        self:doCallback("on_enter_"..trans.to, event, trans, args)
        self:doCallback("on_enter_state", event, trans, args)
        self.fsm_state = trans.to
    end)
end
function FSM:event(event, ...)
    local args = {...}
    local res
    return self:performAccepted(event, function(trans)
        res = self:doCallback("on_leave_state", event, trans, args)
        if res == false then return end
        self:doCallback("on_enter_"..trans.to, event, trans, args)
        self:doCallback("on_enter_state", event, trans, args)
        self.fsm_state = trans.to
    end)
end

And the test runs! I do love it when my plans work.

Let’s go ahead and do the state-specific one as well. I’ll near-duplicate the test:

        _:test("on_leave_s1 callback can prevent transition", function()
            local count = 0
            function go_on_2(fsm,event,from,to)
                count = count + 1
                return count > 1
            end
            local tab = {
                initial = "s1",
                events = {
                    {event="go", from="s1", to="s2"},
                },
                callbacks = {
                    on_leave_s1 = go_on_2
                }
            }
            fsm = FSM(tab)
            _:expect(fsm:state()).is("s1")
            fsm:event("go")
            _:expect(fsm:state()).is("s1")
            fsm:event("go")
            _:expect(fsm:state()).is("s2")
        end)

This should fail.

9: on_leave_s1 callback can prevent transition  -- Actual: s2, Expected: s1

This is another case where we have to create the function name on the fly.

I realize I have to default the doCallback:

function FSM:doCallback(callbackName, event, trans, args)
    local cb = self.tab.callbacks
    if cb[callbackName] then
        return cb[callbackName](self, event, trans.from, trans.to, table.unpack(args))
    else
        return true
    end
end

… because I wanted to do this:

function FSM:event(event, ...)
    local args = {...}
    local res
    return self:performAccepted(event, function(trans)
        res = self:doCallback("on_leave_state", event, trans, args)
        res = res and self:doCallback("on_leave_"..trans.from, event, trans, args)
        if res == false then return end
        self:doCallback("on_enter_"..trans.to, event, trans, args)
        self:doCallback("on_enter_state", event, trans, args)
        self.fsm_state = trans.to
    end)
end

To and things together. We can’t be returning nil, that would turn a true to false. (We are requiring an explicit false, not a falsey.)

However, while the current test runs, others fail:

1: Recognizes cat amulet -- FSM:234: attempt to concatenate a table value (field 'from')
6: multiple from states -- FSM:234: attempt to concatenate a table value (field 'from')

The first is a cascaded fail from the second, which is the test that uses a table instead of a single value in from. We can’t just go concatenating the table: we need to know which specific from state was accepted.

After a short break it comes to me: the state we want is the state we’re in. :)

function FSM:event(event, ...)
    local args = {...}
    local res
    return self:performAccepted(event, function(trans)
        res = self:doCallback("on_leave_state", event, trans, args)
        res = res and self:doCallback("on_leave_"..self.fsm_state, event, trans, args)
        if res == false then return end
        self:doCallback("on_enter_"..trans.to, event, trans, args)
        self:doCallback("on_enter_state", event, trans, args)
        self.fsm_state = trans.to
    end)
end

Tests run. I don’t like that trick with the and. Replace:

function FSM:event(event, ...)
    local args = {...}
    local res
    return self:performAccepted(event, function(trans)
        res = self:doCallback("on_leave_state", event, trans, args)
        if res == false then return end
        res = self:doCallback("on_leave_"..self.fsm_state, event, trans, args)
        if res == false then return end
        self:doCallback("on_enter_"..trans.to, event, trans, args)
        self:doCallback("on_enter_state", event, trans, args)
        self.fsm_state = trans.to
    end)
end

Better, but let’s try this:

function FSM:event(event, ...)
    local args = {...}
    return self:performAccepted(event, function(trans)
        if false == self:doCallback("on_leave_state", event, trans, args) then
            return
        end
        if false == self:doCallback("on_leave_"..self.fsm_state, event, trans, args) then
            return
        end
        self:doCallback("on_enter_"..trans.to, event, trans, args)
        self:doCallback("on_enter_state", event, trans, args)
        self.fsm_state = trans.to
    end)
end

Yes, I’m satisfied with that. Tests are good. Commit: FSM now allows leave callbacks to return false to refuse the transition.

It’s quite late, I got caught up watching the Tour with my honey, so let’s wrap this up.

Summary

Some nice stuff today, spread over much of the day.

We allow a table of from entries in the events definition, which can save a lot of typing and probably quite a few errors as well.

We cadged a new table function asTable that wraps a non-table as a single-item table and otherwise returns its input. Used at least once, and I think there are another couple of places where it could be applied.

We decided to implement the ability for a leave callback to return false to refuse the transition, intending to use it to emit several messages and only then go to the next state.

Along the way, I decided to allow extra arguments to be passed to the event, and to have them passed to all callbacks after the four that they always get. I had planned to use that capability in my code for the refusal on false implementation, and then after making the feature work, I didn’t use it in that test after all.

And now, we have the refuse-if-false capability implemented.

Commit: Fix problem in callback with multi-from events. refactoring.

We are not, as yet, using that in NPC. Tomorrow, I reckon.

Let’s print this and publish.


D2.zip