Precision and Perfomance

| Comments

I have been struggling the past week to improve performance in the sprite rendering for Gemini. I am using coscos2d version 1.x as my gold standard. To that end, I created a cocos2d project with a simple sprite sheet and filled the screen with animated sprites. The sprite sheet and the resutling screen shot are given below:

On cocos2d I was able to render 600 sprites at 60 fps on my iPhone 4 in full 640x960 with multisampling enabled (using sprite batching). I might have been able to go higher, but this is all that would fit and seems like plenty for now. Using the same sprite sheet and settings with Gemini I was only getting 47 fps. The screen shot is given below. Visually they are the same, but Gemini is slower.

Keep reading to find out why.

I took a look at the source code for cocos2d and found that it is actually very similar to the sprite rendering code I have for Gemini. Both code bases use sprite batching to improve performance. Both code bases use Vertex Buffer Objects (VBO) and triangle strips with a single draw call for sprites that share the same texture atlas (sprite sheet). The biggest difference was that cocos2d 1.x uses OpenGL ES 1.0 whereas Gemini uses OpenGL ES 2.0.

Instruments was not providing me any clear answers as to where my bottle neck was, but I noticed that if I reduced the scale of my sprites (they had been scaled up by 1.5X) then my frame rate went up. This indicated that I was fill rate limited, which didn’t really make sense since cocos2d had no problem running at 60 fps with the same imagery. So I took a look at my fragment shader. This is what I was using:

varying lowp vec4 colorVarying;
varying lowp vec2 texCoordVarying;

uniform sampler2D texture;

void main()
{
 mediump vec4 textureColor = texture2D(texture, texCoordVarying);

    gl_FragColor = colorVarying * textureColor;

}

Pretty simple stuff. At first the only thing I could think to do to simplify it was to change it to this

varying lowp vec4 colorVarying;
varying lowp vec2 texCoordVarying;

uniform sampler2D texture;

void main()
{
 mediump vec4 textureColor = texture2D(texture, texCoordVarying);

    gl_FragColor = textureColor;

}

Which eliminates the multiplication (and the ability to change the color/alpha of the sprite). This change jumped me up to 60 fps, so I knew I was on to something, but I didn’t want to sacrifice the ability to change the alpha value of the sprite via the colorVarying parameter. So I took a second look.

The only other meaningful change I could make was to change the precision of the textureColor variable. I have to admit, precision in shaders is not something I have a lot of experience with. I have no feel for what precision changes do to render quality. But I could not really change anthything else. So I changed my shader to the following:

varying lowp vec4 colorVarying;
varying lowp vec2 texCoordVarying;

uniform sampler2D texture;

void main()
{
 lowp vec4 textureColor = texture2D(texture, texCoordVarying);

    gl_FragColor = colorVarying * textureColor;

}

and voila, back up to 60fps, with no perceivable change to visual quality (see the screen shot below). It’s hard to believe such a simple change could make such a big difference. I have definitely learned my lesson; from now on I will always start with lowp in my shaders and only bump up if I need to do so.

Comments