Robot 34

My Inner Eddie says we may be in big trouble. Let’s find out. ClickBait: Absurd claim at the end.

When Eddie used to come in in the morning and say “We’re in big trouble, guys”, he had almost invariably identified a real concern. Most of them we were able to resolve quickly. Some had already been resolved and he didn’t happen to know. Some took a bit more work.

My “inner Eddie” is concerned about the asynchronous nature of sockets. The program as written has the Robot receiving some input from the user, making a call to World, and receiving the World’s response as the return from the call. There are two problems with this, at least:

There is a long delay between the call and response if the World is a socket away. If we were to lock up the Robot’s program for that delay, screen updating and other such things might be stopped.
If we don’t lock up, then we can’t return the packet via the call-return structure.

The conventional thing with Codea Lua is that a keyboard or touch action should take a very short period of time and then immediately return. I believe that if it doesn’t return, then other critical things like drawing cannot happen, because Lua isn’t threaded. I say “believe”: I’d like to verify whether that’s the case or not before doing what I think we have to do.

If we can safely hang on the socket, waiting for the reply from the World server, with the rest of the game running, then we can continue what we’re doing with the response coming back as the return from the request call. If not, we’ll need to come up with something else, and that might render our current design null, void, and, well, wrong.

Let’s first verify whether we have a problem. I’ll do that with a small spike.

I write this draw function:

function draw()
    background(0)
    stroke(250)
    strokeWidth(5)
    fill(0)
    pushMatrix()
    translate(WIDTH/2,HEIGHT/2)
    rotate(angle)
    line(-100,0,100,0)
    angle = angle+1
    popMatrix()
end

That just draws a rotating line in the middle of the screen.

The essence of the spike is just this:

function keyboard(key)
    print("Key: ", key)
    for i = 1,100 do
        for j = 1,1000000 do
            k = j
        end
    end
end

When I type a key, the console displays the key, the rotation stops, and then after a bit, starts again:

rotating bar stops when I type

So now we know that if the keystroke that issues a command to the remote world doesn’t return essentially immediately, the local Robot program will be locked up. With some designs, we might not care: there might be nothing going on on the screen. But as a general rule, that would be bad. We might want some kind of dynamic information to appear on the screen.

So, Eddie, we do have a problem.

The Problem

A fundamental design aspect of our game has been found to be unworkable. We cannot afford to call the World and wait for a return. Here’s an example of one bit of code that just can’t be the way that it is:

function Robot:scan()
    local packets = self._world:scan(self._name)
    for i,packet in ipairs(packets) do
        self:addLook(packet)
    end
end

This code expects _world to return the scan packets immediately, which is currently the case:

function WorldProxy:scan(...)
    local jsonString = self.world:scan(...)
    local outcome = json.decode(jsonString)
    local packets = outcome.data.objects
    --print(#packets, " packets")
    local result = {}
    for i,p in ipairs(packets) do
        local lp = LookPacket:fromObject(p)
        --print(lp)
        table.insert(result, lp)
    end
    return result
end

The WorldProxy sends the world the scan message and expects the response string right back. Then it handles some of the message munging to produce the packets that the Robot’s scan wants. We could quibble over whether the Robot should deal with the response directly, but the current scheme is that our WorldProxy makes the real world look like the world we want, at least in this case.

The other cases are not so bad as this one:

function Robot:back(steps)
    self:setResponse(self._world:back(self._name, steps))
    self.knowledge = self.knowledge:newLensAt(self:x(),self:y())
end

function Robot:forward(steps)
    self:setResponse(self._world:forward(self._name, steps))
    self.knowledge = self.knowledge:newLensAt(self:x(),self:y())
end

function Robot:turn(lOrR)
    self:setResponse(self._world:turn(self._name,lOrR))
    self.knowledge = self.knowledge:newLensAt(self:x(),self:y())
end

I see some duplication there which we might wish to deal with. Be that as it may, These other Robot capabilities have the same real problem: they are expecting a return from the call to the world, and we don’t see a way to provide a return.

Are we in fact in big trouble? I’ve thought and thought, and I can see no way in Lua to keep things from locking up. We simply must return immediately from calls to the world, right after they do their send. The simplest case looks like this:

function WorldProxy:forward(name,steps)
    return self:moveFB(name, "forward", steps)
end

function WorldProxy:moveFB(name,command, steps)
    local rq = {
        robot=name,
        command=command,
        arguments={steps}
    }
    return Response(self.world:request(rq))
end

We must return there where it says “return Response…”, but we cannot return a Response because we don’t have one.

(We do have a Response in the current implementation … but when we connect via sockets, we will not. That connection remains to be done, but we know what it will look like and what it will have to do. It’ll have to run on the tween timer, much as the server spike we did a few days ago, and when it finally sees a result from doing a receive on the socket, only then can it produce a result.)

What are we to do? Are we doomed?

Fortunately, we are not doomed. It would be really embarrassing if I were to have to say right here in front of everyone that my whole design was a mistake and we are doomed to losing our venture capital and will surely be all laid off and have to find honest work, probably as stevedores or hod carriers or something.

Fortunately, we can use callbacks.

Callbacks

Every use of a call to the world looks essentially like this:

local response = self._world:doSummat(args)
-- do something with the result, generally
self:setResponse(response)
self.knowledge = self.knowledge:newLensAt(self:x(),self:y())

Rather than explain this, let me show you. We’ll refactor our forward function:

function Robot:forward(steps)
    self:setResponse(self._world:forward(self._name, steps))
    self.knowledge = self.knowledge:newLensAt(self:x(),self:y())
end

First we’ll save the response and then use it:

function Robot:forward(steps)
    local response = self._world:forward(self._name, steps)
    self:setResponse(response)
    self.knowledge = self.knowledge:newLensAt(self:x(),self:y())
end

No real change. Test. Green, game works fine. Commit: break out response to temp in Robot:forward.

Now make the last two lines of forward into a function:

function Robot:forward(steps)
    local response = self._world:forward(self._name, steps)
    self:forwardResponse(response)
end

function Robot:forwardResponse(response)
    self:setResponse(response)
    self.knowledge = self.knowledge:newLensAt(self:x(),self:y())
end

Should be green. Yes. Commit: break out Robot:forwardResponse method.

Now we want for _world to call us back on that method.

function Robot:forward(steps)
    local callback = function(...) self:forwardResponse(...) end
    local response = self._world:forward(self._name, steps, callback)
end

The local function callback just calls forwardResponse with whatever parameters it’s given. This won’t work, because _world isn’t cooperating … yet. Lots of tests break, because forward is broken.

In WorldProxy:

function WorldProxy:forward(name,steps)
    return self:moveFB(name, "forward", steps)
end

This needs to deal with the callback, and will just pass it on:

function WorldProxy:forward(name,steps, callback)
    return self:moveFB(name, "forward", steps)
end

And here:

function WorldProxy:moveFB(name,command, steps)
    local rq = {
        robot=name,
        command=command,
        arguments={steps}
    }
    return Response(self.world:request(rq))
end

We need to actually use the callback:

function WorldProxy:moveFB(name,command, steps, callback)
    local rq = {
        robot=name,
        command=command,
        arguments={steps}
    }
    local reply = self.world:request(rq)
    callback(Response(reply))
end

This is going to work for forward but not back. I’ll run it but we’re going to have to do forward and back together (or create some duplication … in fact … let’s do that. I’ll leave moveFB as is and do this:

function WorldProxy:forward(name,steps, callback)
    return self:moveFBNew(name, "forward", steps, callback)
end

function WorldProxy:moveFBNew(name,command, steps, callback)
    local rq = {
        robot=name,
        command=command,
        arguments={steps}
    }
    local response = self.world:request(rq)
    callback(Response(response))
end

Now I’ve separated forward and back and I expect this to work. And it does: Commit: Robot:forward works via callback.

We’re not in big trouble any more!

Why Aren’t We In Big Trouble?

We’ve arranged things so that the reaction to the “forward” command happens on a callback from the WorldProxy. It happens that we call back immediately. But we would not need to do that. When we go to our sockets version, which we may never do, we can just save away the callback function, and when, sooner or later, our socket loop sees the response, it can call the callback at that time. If the delay were an hour,, it would look odd on the screen, because we wouldn’t see our robot move for an hour, but after an hour, it would move.

Of course, if it took an hour, we’d need to deal with whether the player can send another request before the prior one has called back (no), and so on, but all that is just programming that we know how to do.

For example, we can set a flag for “command in process” and ignore keystrokes until the flag is cleared.

So we’re not in big trouble. Are we in small trouble? Just a tiny bit. We now know that we can’t rely on immediate returns from our WorldProxy, so we need to convert each of our handful of calls to use callbacks. However, as we’ve just seen, it is a very straightforward rote process:

Package everything after the call to _world into a callback function.
Pass the callback function to _world. It promises to call back.

We can make these changes over a longer period of time if we wish, up until the day when we actually want sockets to work. In practice we shouldn’t put it off for long, and we should certainly create any new commands using the callback style.

Let’s do a couple more just because we’re here.

Moar Callbacks

We can do “back” trivially, since “forward” is a perfect example:

function WorldProxy:back(name,steps, callback)
    return self:moveFBNew(name, "back", steps, callback)
end

function Robot:back(steps)
    local callback = function(...) self:backResponse(...) end
    self._world:back(self._name, steps, callback)
end

function Robot:backResponse(response)
    self:setResponse(response)
    self.knowledge = self.knowledge:newLensAt(self:x(), self:y())
end

Test. Tests are green … with an issue. First, I think I can commit this: back function uses callback.

I saw something odd when I was testing “back” in the game. In backing up and looking, I found objects “south” of me. I thought the obstacles and pits started with y = 0 and only went up from there. Let’s see what really happens:

function World:setUpGame()
    local world = World(20,20)
    -- left top right bottom
    world:addObstacle(-5,3,-5,-3)
    world:addPit(3,2,3,-2)
    return world
end

Ah, excellent. I do set the obstacles in both negative and positive y. False alarm. Tests and game are right. PEBKAC. Excellent.

Shall we do one more? I think that scan/look will be difficult, so I want to save it for another day, it’ll be worth its own little article. We can do “turn”:

function Robot:turn(lOrR)
    self:setResponse(self._world:turn(self._name,lOrR))
    self.knowledge = self.knowledge:newLensAt(self:x(),self:y())
end

Before I do this, I notice some duplication:

function Robot:back(steps)
    local callback = function(...) self:backResponse(...) end
    self._world:back(self._name, steps, callback)
end

function Robot:backResponse(response)
    self:setResponse(response)
    self.knowledge = self.knowledge:newLensAt(self:x(), self:y())
end

function Robot:forward(steps)
    local callback = function(...) self:forwardResponse(...) end
    self._world:forward(self._name, steps, callback)
end

function Robot:forwardResponse(response)
    self:setResponse(response)
    self.knowledge = self.knowledge:newLensAt(self:x(),self:y())
end

The two response functions are identical. Let’s replace them with one, called “standardResponse”:

function Robot:back(steps)
    local callback = function(...) self:standardResponse(...) end
    self._world:back(self._name, steps, callback)
end

function Robot:standardResponse(response)
    self:setResponse(response)
    self.knowledge = self.knowledge:newLensAt(self:x(), self:y())
end

And same for “forward”. Test. Green. Commit: forward and back use same callback to standardResponse.

And when we look at turn, we see that it can use standard response:

function Robot:turn(lOrR)
    self:setResponse(self._world:turn(self._name,lOrR))
    self.knowledge = self.knowledge:newLensAt(self:x(),self:y())
end

So we change it:

function Robot:turn(lOrR)
    local callback = function(...) self:standardResponse(...) end
    self._world:turn(self._name,lOrR, callback)
end

And in WorldProxy:

function WorldProxy:turn(name, lOrR, callback)
    local rq = {
        robot=name,
        command="turn",
        arguments = { lOrR }
    }
    local response = self.world:request(rq)
    callback(Response(response))
end

I expect this to work. It does. Commit: turn uses callback mechanism.

We could stop now. But first let’s observe some duplication. Here in WorldProxy we see these:

function WorldProxy:moveFBNew(name,command, steps, callback)
    local rq = {
        robot=name,
        command=command,
        arguments={steps}
    }
    local response = self.world:request(rq)
    callback(Response(response))
end

function WorldProxy:turn(name, lOrR, callback)
    local rq = {
        robot=name,
        command="turn",
        arguments = { lOrR }
    }
    local response = self.world:request(rq)
    callback(Response(response))
end

These two methods are rather similar. We should also undo that “New” name, we just did that to avoid changing two functions at the same time. Do that, test commit: rename moveFBNew to moveFB.

Now that duplication:

Extract a method:

function WorldProxy:moveFB(name,command, steps, callback)
    local rq = {
        robot=name,
        command=command,
        arguments={steps}
    }
    self:requestWithCallback(rq,callback)
end

function WorldProxy:requestWithCallback(rq, callback)
    local response = self.world:request(rq)
    callback(Response(response))
end

Test. Green. Commit: Use requestWithCallback in move.

Now use it in the other cases, of which I think we have only one:

function WorldProxy:turn(name, lOrR, callback)
    local rq = {
        robot=name,
        command="turn",
        arguments = { lOrR }
    }
    local response = self.world:request(rq)
    callback(Response(response))
end

That becomes:

function WorldProxy:turn(name, lOrR, callback)
    local rq = {
        robot=name,
        command="turn",
        arguments = { lOrR }
    }
    self:requestWithCallback(rq,callback)
end

Test. Green. Commit: All callback requests use requestWithCallback.

Why Was That Worth Doing?

That duplication removal added a function (4 lies) and replaced 4 lines of code with 2, for a net increase of 2 lines in the program, not counting whitespace. Why was that a good idea?

It was a good idea because the next time we do a callback, we’re going to copy one of the exiting ones, er I mean “use one of the existing ones as a reference”, and if we did that, we’d create more duplication. This way, we’ll wind up with all our commands going through that one method, which means that when we inevitably change over to sockets, there will only be that one method that needs to have its immediate callback replaced with whatever the mechanism is for saving a callback and using it later.

We’ve reduced each future call by one line … but more importantly, we’ve kept an important function in one place rather than two going on three going on N.

People, lately, are calling this notion SPOT, Single Point Of Truth, formerly knows as DRY (Don’t Repeat Yourself). Whatever we call it, it’s a good idea.

Let’s sum up, I’ve got reading to do.

Summary

Inner Eddie made me worry that our current design, focused on call-return, wouldn’t work. Inner Eddie was right: it won’t work. But when Eddit said “we’re in big trouble”, he wasn’t quite so right, because we have quickly replaced our call-return with call and subsequent callback, which we are convinced can be modified to deal with the socket stuff, should we ever bother to do it.

So we have dodged the dodgeball this time.

Does that mean that my initial choice of calling the world directly and expecting returns was a good choice? I think not.

I think we’d have done better if I had realized right away that we’d need to deal with the asynchronous nature of sockets. How much better? Well, so far, maybe 50 lines of code written or changed, over a period of less than two hours, including writing this article.

So I’m not going to beat myself up over this mistake, but it sure does seem like a mistake, even if it turns out to have been a minor one. Could it have been a major mistake? Not in this case, but certainly I think I’d have done better to explore sockets a bit more before committing to the current design.

But I’ve wasted more than two hours on this project being confused because I didn’t see how to write a test or took too big a step and instead of reverting and taking a smaller step, went into debugging mode.

Every day we do something that we could have done better. We only look back to see what we can learn for looking forward. Looking forward I think I’d advise myself:

When you start something that isn’t a one-machine program, remember to consider latency.

Will I forget again? Perhaps, if I ever even do something like this again. Will forgetting ruin things? Not if I keep the program well structured.

Wait, what? You’re claiming this was easy because the program is well structured?: Yes, I am. As we saw, our commands all break down the same way, make the world call, deal with the result. Therefore all commands can be changed to first make the world call, then deal with the result on a subsequent callback.; Therefore, the program’s decent structure provides insurance against future changes. All conceivable changes? No. All likely ones? That seems to be the way to bet.; But still, I want to be a bit more careful.

Come see me make more mistakes next time!