Attract Mode: AI?
Python Asteroids+Invaders on GitHub
I plan to implement a Game Over screen soon. The first one will just display GAME OVER. But could we build a somewhat smart attract mode? One that learned to play the game? Let’s think.
I plan mostly to muse about this for a while, though my morning’s advice does suggest that I’d be wise to code up some examples, at least in little tests, to feel out the ideas. I’ll make a new test file for that.
class TestInvadersAI:
def test_hookup(self):
assert 2 + 2 == 4
So far so good. Let’s speculate.
If we get a good enough idea, we’d like to build a simple “AI” for the player object that would play a somewhat intelligent game of Space Invaders. Rather than program some notion of an optimal strategy, I’d like to provide just some raw perceptual info to the “AI”, and have it learn better ways of playing.
I’m imagining we would give it bits of information like this:
- You are under a shield
- There is an invader above you
- There is a missile coming down above you
- There is a saucer (what would we like to know about this? Direction and distance?)
- You are on the right edge
- You are on the left edge
The AI would have just a few actions:
- Move left
- Move right
- Fire
- Do nothing?
There would be just a few results:
- You have been hit by a missile
- You have killed an invader
It seems like results would have to be kind of back-dated. My killing an invader now probably relates to a decision to fire some time in the past. (Since there can only be one missile in play at a time, should the Player_AI be aware of that and not try to fire when it can’t? Or should we just include “there is a missile flying” as part of its sensorium?)
So … I’m imagining a sort of information vector, mostly just True/False elements. Maybe we’ll want to know a bit more: for example perhaps instead of “missile above you”, we have “missile above you and to your left” or “… to your right”, which might give us a chance of escaping in a better direction. Maybe we even get a sense of the distance above.
So for each value of the vector, we might associate a score for each possible action. You’d get +1 if something good happened, -1 if something bad happened. Or maybe a large negative score, because the bad things tend to be fatal.
So the AI would sense, look up the sensory vector, and then somehow choose among the options, based on their scores. Might just pick the one with the highest score, since offhand I don’t see a mixed strategy as useful in this situation.
Maybe the scoring events have some kind of effect on all the decisions made since the last scoring event? We know just by thinking that what really matters in positive scoring is firing while you are under an invader and not under a shield.
So what kind of object would we use here? It’s kind of a data record. I think Python has some magic support for that.
def test_dataclass(self):
@dataclass
class Perception:
under_shield: bool
under_invader: bool
under_missile_left: bool
under_missile_right: bool
under_missile_center: bool
at_right_edge: bool
at_left_edge: bool
p1 = Perception(True, False, False, False, False, False, False)
p2 = Perception(True, False, False, False, False, False, False)
p3 = Perception(True, True, False, False, False, False, False)
assert p1 == p2
assert p1 != p3
We have learned that the dataclass
handles object equality in a fashion we’d like.
So that’s OK but I suspect we will want keyword arguments and defaults to false, or a lot of constructors. But maybe not. I imagine that on each cycle, or perhaps every N cycles, we’ll assess the situation and set up all the booleans, and at this moment I’m not sure what that might look like.
We’ll probably have something like a dictionary from Perception to some kind of result list, sequence, table, whatever. I don’t know, will have to consider further. But now I have a bit of solid ground to stand on, a little bit better basis to imagine what might be done.
I might even try this.
See you next time!