Python 030 - Efficiency?
Rightly called on my B.S., I strive to improve by running some timing tests on this tiny patch of code.
Ted M. Young asked me:
Curious how you tested that the inlined version is faster than multiple methods? i.e., “the fact is that it takes longer to call a method and do a thing than it takes to do the thing”
In Java, this isn’t (necessarily) true (due to hotspot inlining). I know nothing about Python.
I could quibble and say that I was right because if they do hotspot inlining they’re not calling the function and it’s faster, just like I said, nanner nanner, but Ted is quite right to point out that I have no actual data to show what’s faster. All I have is the accumulated expertise of decades, and some probably nearly good reasoning.
Ted knows, and I know, that reasoning about performance issues is fraught. Our reasoning is often horribly wrong, and our fixes often useless. So let’s see if we can get a little data, not because I doubt what I said but to see what we can find out with just a little effort.
Let’s try it with a test. That may not work well, and if not we’ll try something else. I’m new here myself.
I start with this:
def test_time_colliding():
rail = Rail.foot(600)
ball = Ball(0, -599, 0, -1)
rail.bounce(ball)
assert ball.velocity.y == 1
I wanted to make sure that bounce was running through all the code, checking distance and velocity and I used the foot so that the rotation would get to do something. Now I’ll do the actual timing:
def test_time_colliding():
rail = Rail.foot(600)
ball = Ball(0, -599, 0, -1)
rail.bounce(ball)
assert ball.velocity.y == 1
times = 0
how_many = 1_000_000
start = perf_counter()
while times < how_many:
ball.velocity.y = -1
rail.bounce(ball)
times += 1
end = perf_counter()
print()
print(f"time required: {end-start}")
This iterates a million times and results in this output:
time required: 0.5349892920348793
Half a second for a million runs. I think we can accept that as fast enough, but lets make it harder. First, I’ll change the rotate to always do the math. After commenting out all the optimized cases in our rotate functions, the time goes up:
time required: 0.8549879579804838
It’s interesting (to me at least) that removing all the trig only saved us about .22 seconds out of a million calls. Makes me wonder if the trig functions are checking for the special cases. Maybe we’ll test that too, after this. Probably not.
Let’s try inlining as much as we can.
def bounce(self, ball):
vel_y = ball.velocity.rotate(self.angle).y
ball_y = ball.position.rotate(self.angle).y
y_distance = self.position.y - ball_y
if y_distance <= ball.radius and vel_y > 0:
new_v = ball.velocity.rotate(self.angle)
new_v.y = - new_v.y
ball.velocity = new_v.rotate(-self.angle)
My tests are still green and the ball still bounces on the screen. Let’s check this timing:
time required: 0.5729372919886373
Well, on the face of it, method calls cost us about a quarter of a second per million calls of bounce. We might want a lot more runs and all that, but the numbers don’t seem to vary much from run to run.
When we look at that code, there’s not much going on other than the four calls to rotate. So four million rotates cost us about half a second. Eight million rotates per second isn’t bad, given that we are only running at 60 frames per second.
Let’s do this one more way. Let’s keep the fast rotate and inline everything else. I’ll revert Rail and go again.
def bounce(self, ball):
vel_y = self.rotated_y(ball.velocity)
ball_y = self.rotated_y(ball.position)
y_distance = self.position.y - ball_y
if y_distance <= ball.radius and vel_y > 0:
ball.velocity = self.negate_rotated_y(ball.velocity)
OK, let’s see what the timing says.
time required: 0.4043125419993885
Let me try to sort these times out and see if they make sense.
Case | Time | Saved | % Saved |
---|---|---|---|
(sec) | vs | vs | |
Base | Base | ||
Slow trig, with calls | 0.855 | 0 | – |
Slow trig, inline | 0.573 | 0.220 | 25% |
Fast trig, with calls | 0.535 | 0.320 | 35% |
Fast trig, inline | 0.404 | 0.451 | 52% |
The biggest single improvement was creating and using the fast trig functions. That saved 35% over the original well-factored version with standard trig. In both cases, inlining saved about 25% over method calls, and with the methods being as small as they are, we can probably use a rule of thumb that we pay at most a 25% penalty for well-factored code from method calls. With any method that does I/O or a larger calculation, or includes a wait of some kind, the method call will be invisible. I’ll try timing just a method call down below or tomorrow, depending how I feel about it.
So we saved about a third of a second by going to fast trig, and we saved an additional eighth of a second by inlining that code, for a million calls to our top-level method, in the worst case for that method.
In the game, the balls are almost never bouncing off the rail. If we wanted more speed, what might be better, perhaps, would be to invent a faster way to determine that the ball can’t be colliding with the rail, though I can’t think of one offhand. The way we have is pretty darn simple.
Anyway, I think we’ve done our due diligence on timing, with a rule of thumb that I’d say is
Intricate inlined code is typically no more than 25% faster than well-factored code. If your program is really too slow, reducing method calls may not be the place to look for speed.
Of course, if you’ve done everything else, you might find some hot spots that are worth tightening up, and maybe, just maybe, it’ll be worth it. Day in and day out, I think the speed of changing code is more important than the speed of running code.
YMMV of course.
Method Cost
Let’s measure method calls directly.
I’ll add this method to Rail:
def trivial(self):
pass
And two new timing tests:
def test_time_method():
rail = Rail.foot(600)
times = 0
how_many = 1_000_000
start = perf_counter()
while times < how_many:
rail.trivial()
times += 1
end = perf_counter()
print()
print(f"method time required: {end-start}")
def test_time_nothing():
rail = Rail.foot(600)
times = 0
how_many = 1_000_000
start = perf_counter()
while times < how_many:
times += 1
end = perf_counter()
print()
print(f"nothing time required: {end-start}")
method time required: 0.06250554096186534
nothing time required: 0.023497666988987476
So calling the method a million times cost us less than 0.04 seconds. We can do 25 million Python method calls per second on my laptop. I admit it’s a pretty decent laptop.
Summary
I’m going to stop worrying about how many methods I call per frame. I hope you can see why: 25 million will get us by. And our rule of thumb:
Intricate inlined code is typically no more than 25% faster than well-factored code. If your program is really too slow, reducing method calls may not be the place to look for speed.
And one more thing … getting some timing when we need it is often quite easy and usually quite interesting!
See you next time!