Update: As detailed as this was, it barely scratched the surface. A more detailed explanation is now available. The game loop advice is in Appendix A. If you really want to understand what's going on, start with that.
Original post follows...
I'm going to start with a capsule summary of how the graphics pipeline in Android works. You can find more thorough treatments (e.g. some nicely detailed Google I/O talks), so I'm just hitting the high points. This turned out rather longer than I expected, but I've been wanting to write some of this up for a while.
SurfaceFlinger
Your application does not draw on The Framebuffer. Some devices don't even have The Framebuffer. Your application holds the "producer" side of a BufferQueue
object. When it has completed rendering a frame, it calls unlockCanvasAndPost()
or eglSwapBuffers()
, which queues up the completed buffer for display. (Technically, rendering may not even begin until you tell it to swap and can continue while the buffer is moving through the pipeline, but that's a story for another time.)
The buffer gets sent to the "consumer" side of the queue, which in this case is SurfaceFlinger, the system surface compositor. Buffers are passed by handle; the contents are not copied. Every time the display refresh (let's call it "VSYNC") starts, SurfaceFlinger looks at all the various queues to see what buffers are available. If it finds new content, it latches the next buffer from that queue. If it doesn't, it uses whatever it got previously.
The collection of windows (or "layers") that have visible content are then composited together. This may be done by SurfaceFlinger (using OpenGL ES to render the layers into a new buffer) or through the Hardware Composer HAL. The hardware composer (available on most recent devices) is provided by the hardware OEM, and can provide a number of "overlay" planes. If SurfaceFlinger has three windows to display, and the HWC has three overlay planes available, it puts each window into one overlay, and does the composition as the frame is being displayed. There is never a buffer that holds all the data. This is generally more efficient than doing the same thing in GLES. (Incidentally, this is why you can't grab a screen shot on most recent devices by simply opening the framebuffer dev entry and reading pixels.)
So that's what the consumer side looks like. You can admire it for yourself with adb shell dumpsys SurfaceFlinger
. Let's go back to the producer (i.e. your app).
the producer
You're using a SurfaceView
, which has two parts: a transparent View that lives with the system UI, and a separate Surface layer all its own. The SurfaceView
's surface goes directly to SurfaceFlinger, which is why it has much less overhead than other approaches (like TextureView
).
The BufferQueue for the SurfaceView
's surface is triple-buffered. That means you can have one buffer being scanned out for the display, one buffer that is sitting at SurfaceFlinger waiting for the next VSYNC, and one buffer for your app to draw on. Having more buffers improves throughput and smooths out bumps, but increases the latency between when you touch the screen and when you see an update. Adding additional buffering of whole frames on top of this won't generally do you much good.
If you draw faster than the display can render frames, you will eventually fill up the queue, and your buffer-swap call (unlockCanvasAndPost()
) will pause. This is an easy way to make your game's update rate the same as the display rate -- draw as fast as you can, and let the system slow you down. Each frame, you advance state according to how much time has elapsed. (I used this approach in Android Breakout.) It's not quite right, but at 60fps you won't really notice the imperfections. You'll get the same effect with sleep()
calls if you don't sleep for long enough -- you'll wake up only to wait on the queue. In this case there's no advantage to sleeping, because sleeping on the queue is equally efficient.
If you draw slower than the display can render frames, the queue will eventually run dry, and SurfaceFlinger will display the same frame on two consecutive display refreshes. This will happen periodically if you're trying to pace your game with sleep()
calls and you're sleeping for too long. It is impossible to precisely match the display refresh rate, for theoretical reasons (it's hard to implement a PLL without a feedback mechanism) and practical reasons (the refresh rate can change over time, e.g. I've seen it vary from 58fps to 62fps on a given device).
Using sleep()
calls in a game loop to pace your animation is a bad idea.
going without sleep
You have a couple of choices. You can use the "draw as fast as you can until the buffer-swap call backs up" approach, which is what a lot of apps based on GLSurfaceView#onDraw()
do (whether they know it or not). Or you can use Choreographer.
Choreographer allows you to set a callback that fires on the next VSYNC. Importantly, the argument to the callback is the actual VSYNC time. So even if your app doesn't wake up right away, you still have an accurate picture of when the display refresh began. This turns out to be very useful when updating your game state.
The code that updates game state should never be designed to advance "one frame". Given the variety of devices, and the variety of refresh rates that a single device can use, you can't know what a "frame" is. Your game will play slightly slow or slightly fast -- or if you get lucky and somebody tries to play it on a TV locked to 48Hz over HDMI, you'll be seriously sluggish. You need to determine the time difference between the previous frame and the current frame, and advance the game state appropriately.
This may require a bit of a mental reshuffle, but it's worth it.
You can see this in action in Breakout, which advances the ball position based on elapsed time. It cuts big jumps in time into smaller pieces to keep the collision detection simple. The trouble with Breakout is that it's using the stuff-the-queue-full approach, the timestamps are subject to variations in the time required for SurfaceFlinger to do work. Also, when the buffer queue is initially empty you can submit frames very quickly. (This means you compute two frames with nearly zero time delta, but they're still sent to the display at 60fps. In practice you don't see this, because the time stamp difference is so small that it just looks like the same frame drawn twice, and it only happens when transitioning from non-animating to animating so you don't see anything stutter.)
With Choreographer, you get the actual VSYNC time, so you get a nice regular clock to base your time intervals off of. Because you're using the display refresh time as your clock source, you never get out of sync with the display.
Of course, you still have to be prepared to drop frames.
no frame left behind
A while back I added a screen recording demo to Grafika ("Record GL app") that does very simple animation -- just a flat-shaded bouncing rectangle and a spinning triangle. It advances state and draws when Choreographer signals. I coded it up, ran it... and started to notice Choreographer callbacks backing up.
After digging at it with systrace, I discovered that the framework UI was occasionally doing some layout work (probably to do with the buttons and text in the UI layer, which sits on top of the SurfaceView
surface). Normally this took 6ms, but if I wasn't actively moving my finger around the screen, my Nexus 5 slowed the various clocks to reduce power consumption and improve battery life. The re-layout took 28ms instead. Bear in mind that a 60fps frame is 16.7ms.
The GL rendering was nearly instantaneous, but the Choreographer update was being delivered to the UI thread, which was grinding away at the layout, so my renderer thread didn't get the signal until much later. (You could have Choreographer deliver the signal directly to the renderer thread, but there's a bug in Choreographer that will cause a memory leak if you do.) The fix was to drop frames when the current time is more than 15ms after the VSYNC time. The app still does the state update -- the collision detection is so rudimentary that weird stuff happens if you let the time gap grow too large -- but it doesn't submit a buffer to SurfaceFlinger.
While running the app you can tell when frames are being dropped, because Grafika flashes the border red and updates a counter on screen. You can't tell by watching the animation. Because the state updates are based on time intervals, not frame counts, everything moves just as fast as it would whether the frame was dropped or not, and at 60fps you won't notice a single dropped frame. (Depends to some extent on your eyes, the game, and the characteristics of the display hardware.)
Key lessons:
- Frame drops can be caused by external factors -- dependency on another thread, CPU clock speeds, background gmail sync, etc.
- You can't avoid all frame drops.
- If you set your draw loop up right, nobody will notice.
Drawing
Rendering to a Canvas can be very efficient if it's hardware-accelerated. If it's not, and you're doing the drawing in software, it can take a while -- especially if you're touching lots of pixels.
Two important bits of reading: learn about hardware-accelerated rendering, and using the hardware scaler to reduce the number of pixels your app needs to touch. The "Hardware scaler exerciser" in Grafika will give you a sense for what happens when you reduce the size of your drawing surface -- you can get pretty small before the effects are noticeable. (I find it oddly amusing to watch GL render a spinning triangle on a 100x64 surface.)
You can also take some of the mystery out of the rendering by using OpenGL ES directly. There's a bit of a bump learning how things work, but Breakout (and, for a more elaborate example, Replica Island) show everything you need for a simple game.