Optimizing OpenGL ES on the iPhone and interpreting Instruments - iphone

I'm trying to push my FPS up on iPhone 3Gs from 30 as high as possible... and I'm running into a couple of issues and thought it would be better to ask for advice.
1) What exactly do the Renderer Utilization and Tiler Utilization columns on the OpenGL ES Instrument signify? My Tiler Utiliation percentage is extremely low, and my Renderer Utilization tends to drop during user interaction and when the app is flipped to landscape mode. I noticed that my FPS tends to drop whenever the Renderer Utilization value drops as well. My FPS dropping during landscape mode is particularly odd for me, because portrait mode and landscape mode use the exact same game logic, and textures... and landscape mode actually renders fewer vertices/triangles to boot (some parts of the UI aren't drawn at all in landscape mode).
2) I've already done most of the recommended optimizations in the ngmoco/Stanford videos, and the only things I think I can do left are changing GLfloats to GLshorts and interleaving my vertices with my texture coordinates into one array. Are any of these likely to have large effects on my FPS? It's a 2D sprite game with lots of large, detailed textures...
3) Which is a faster way to hide a polygon: setting all of its vertices to the same coordinates (essentially, reducing it to a point), or setting its alpha value to 0? I'm guessing its the former, since blending is slower in general and particularly expensive on the iPhone.
4) Currently, I'm using a 2 512x512 textures, a 1024x512 texture, and a 256x256 texture. I've sought advice on how to best manage this, and I was told not to combine them into 1 1024x1024 texture because of memory problems on the iPhone 3G. I'd like to confirm that here, because if I put everything into 1 texture, I can eradicate having to call glBindTexture repeatedly...

To #4: (a) yes, the iPhone is documented not to deal with images larger than 1024 on a side. 1024x1024 is the maximum theoretical limit, although you may run into problems if you try to push right up against the limit.
(b) all your textures don't fit into a 1024x1024; after the 1024x512 and 2 512x512s fill that space, you'll still have the 256x256 left over.

Related

cocos2d zooming sprite without distortion?

I want to implement zooming of sprites with a pinch gesture in Cocos2d.
How do I achieve it without the image getting pixelated?
I tried with vectors but with no success, I'm doomed using raster bitmap images.
Do I need the largest possible image with the highest resolution to make it look
nice?
What is the size limit for pngs in cocos2d?
What other pitfalls do I need to consider?
Yes. For example if the sprite should cover an area of 1024x1024 pixels when zoomed in to maximum, you need to create the image as 1024x1024 and set the scale property to below 1 in order to create a smaller version. If you use scale greater than 1.0 the image will always lose detail and become ever more blurred as scale increases.
There is no size limit in cocos2d, it's the devices that impose the limit. Most devices can handle 2048x2048 except 1st and 2nd generation which support only 1024x1024. You wouldn't normally support these older devices though, so 2048x2048 should be the default. Several newer devices (iPad 2+, iPhone 4S+) can use up to 4096x4096 textures.
Memory consumption. Not sure what you're trying to do, but often developers have little understanding about how much memory textures consume and what amount of memory is available. For instance, 2048x2048 as PNG with 32-Bit color consumes 16 MB of memory. Don't plan on using more than 4-5 of these, unless you're able to reduce color bit depth and use TexturePacker to be able to use the compressed .pvr.ccz format. Read my article about optimizing memory usage for more info.

What is the performance impact of using multisampled anti-aliasing on iOS?

I've been experimenting with using multisampling to do full-scene anti-aliasing on the iPhone and iPad on iOS 4. The general mechanism uses Apple's APPLE_framebuffer_multisample extension (http://www.khronos.org/registry/gles/extensions/APPLE/APPLE_framebuffer_multisample.txt) and is described in the answer to this question: How do you activate multisampling in OpenGL ES on the iPhone? and documented by Apple in their OpenGL ES Programming Guide.
It works as described, but the drawing performance of my test application suffers by about 50% when I set the number of samples to be 2. I'm primarily testing on an iPhone 4, using a non-retina-enabled application. I am using the other performance suggestions offered by Apple in their documentation (using glDiscardFramebufferEXT to discard the renderbuffers attached to the multisample framebuffer, using glClear to clear the entire framebuffer at the start of the frame, etc.).
The performance overhead of enabling multisampling in this manner seems surprisingly large to me. Are you guys seeing similar results or does this suggest that I'm doing something incorrectly?
You mentioned that you're running this on an iPhone 4. Is your OpenGL ES layer rendering at the full 2X Retina display scale factor? That is, have you set the contentScaleFactor on the OpenGL ES hosting layer to [[UIScreen mainScreen] scale]? If so, you're pushing a large number of pixels to start with.
Are you fill rate limited before you apply the multisampled antialiasing? To check, use the OpenGL ES instrument in Instruments against your running application and enable the Tiler Utilization and Renderer Utilization statistics. If your application shows a high Renderer Utilization without MSAA enabled, you are fill rate limited to begin with. Adding MSAA on top of that could significantly reduce your framerates because of this bottleneck.
In an application that I had which was geometry limited, not fill rate limited, I didn't see that great of a slowdown when using 4X MSAA in it on an iPhone 4. I'm guessing that the bottleneck in your application is in pushing pixels to the screen.
It is not surprising that your performance suffers by about 50% when you set the # of samples to 2: you're drawing twice the samples! Multisampling means you essentially draw your scene at a higher resolution than the screen to an off-screen buffer, and then you use filtering algorithms to reduce the higher resolution multi-sampled buffer to the display screen resolution, hopefully with fewer aliasing artifacts because the final picture actually includes more detail (filtered higher resolution output) than the single-sampled version.
It is a very common (if not the most common) performance problem in graphics: the more samples you draw, the slower you go.

optimizing iPhone OpenGL ES fill rate

I have an Open GL ES game on the iPhone. My framerate is pretty sucky, ~20fps. Using the Xcode OpenGL ES performance tool on an iPhone 3G, it shows:
Renderer Utilization: 95% to 99%
Tiler Utilization: ~27%
I am drawing a lot of pretty large images with a lot of blending. If I reduce the number of images drawn, framerates go from ~20 to ~40, though the performance tool results stay about the same (renderer still maxed). I think I'm being limited by the fill rate of the iPhone 3G, but I'm not sure.
My questions are: How can I determine with more granularity where the bottleneck is? That is my biggest problem, I just don't know what is taking all the time. If it is fillrate, is there anything I do to improve it besides just drawing less?
I am using texture atlases. I have tried to minimize image binds, though it isn't always possible (drawing order, not everything fits on one 1024x1024 texture, etc). Every frame I do 10 image binds. This seem pretty reasonable, but I could be mistaken.
I'm using vertex arrays and glDrawArrays. I don't really have a lot of geometry. I can try to be more precise if needed. Each image is 2 triangles and I try to batch things were possible, though often (maybe half the time) images are drawn with individual glDrawArrays calls. Besides the images, I have ~60 triangles worth of geometry being rendered in ~6 glDrawArrays calls. I often glTranslate before calling glDrawArrays.
Would it improve the framerate to switch to VBOs? I don't think it is a huge amount of geometry, but maybe it is faster for other reasons?
Are there certain things to watch out for that could reduce performance? Eg, should I avoid glTranslate, glColor4g, etc?
I'm using glScissor in a 3 places per frame. Each use consists of 2 glScissor calls, one to set it up, and one to reset it to what it was. I don't know if there is much of a performance impact here.
If I used PVRTC would it be able to render faster? Currently all my images are GL_RGBA. I don't have memory issues.
One of my fullscreen textures is 256x256. Would it be better to use 480x320 so the phone doesn't have to do any scaling? Are there any other general performance advice for texture sizes?
Here is a rough idea of what I'm drawing, in this order:
1) Switch to perspective matrix.
2) Draw a full screen background image
3) Draw a full screen image with translucency (this one has a scrolling texture).
4) Draw a few sprites.
5) Switch to ortho matrix.
6) Draw a few sprites.
7) Switch to perspective matrix.
8) Draw sprites and some other textured geometry.
9) Switch to ortho matrix.
10) Draw a few sprites (eg, game HUD).
Steps 1-6 draw a bunch of background stuff. 8 draws most of the game content. 10 draws the HUD.
As you can see, there are many layers, some of them full screen and some of the sprites are pretty large (1/4 of the screen). The layers use translucency, so I have to draw them in back-to-front order. This is further complicated by needing to draw various layers in ortho and others in perspective.
I will gladly provide additional information if reqested. Thanks in advance for any performance tips or general advice on my problem!
Edit:
I added some logging to see how many glDrawArrays calls I am doing, and with how much data. I do about 20 glDrawArray calls per frame. Usually about 1 to up to 6 of these has about 40 vertices each. The rest of the calls are usually just 2 vertices (one image). I'm just using glVertexPointer and glTexCoordPointer.
Given that the Renderer Utilization is basically at 100%, that indicates that the bottleneck is filling, texturing, and blending pixels. Techniques intended to optimize vertex processing (VBOs and vertex formats) or CPU usage (draw call batching) will likely not help, as they will not speed up pixel processing.
Your best bet is to reduce the number of pixels that you are filling, and also look at different texture formats that make better use of the very limited memory bandwidth available on the first generation devices. Prefer the use of PVRTC textures wherever possible, and 16bit uncompressed textures otherwise.
Look to Apple's "Best Practices for Working with Texture Data" and "Best Practices for Working with Vertex Data" sections of the OpenGL ES Programming Guide for iPhone OS. They highly recommend (as do others) that you use PVRTC for compressing your textures, because they can offer an 8:1 or 16:1 compression ratio over your standard uncompressed textures. Aside from mipmapping, you seem to be doing the other recommended optimization of using a texture atlas.
You do not appear to be geometry-limited, because (as I discovered in this question) the Tiler Utilization statistic seems to indicate how much of a bottleneck is being caused by geometry size. However, the iPhone 3G S (and third-generation iPod touch and iPad) support hardware-accelerated VBOs, so you might give those a shot and see how they affect performance. They might not have as much of an effect as compressing textures would, but they're not hard to implement.
A big win for the (mostly for)3G will also be the texture filtering you are using. Check if you are using "TRILINEAR" filtering, and change it to "BILINEAR".
Make sure you setup the textures like this:
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
and not like this:
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
Harry
I wanted to chime in with an additional answer, change the Framebuffer backing to be a 16bit format as opposed to a 32 bit format.
if ((self = [super initWithCoder:coder]))
{
eaglLayer = (CAEAGLLayer *)self.layer;
eaglLayer.opaque = YES;
eaglLayer.drawableProperties = [NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithBool:NO],
kEAGLDrawablePropertyRetainedBacking,
kEAGLColorFormatRGB565, //kEAGLColorFormatRGBA8 = large frame buff 32bit // kEAGLColorFormatRGB565 = 16bit frame buffer.
kEAGLDrawablePropertyColorFormat,
nil];
}
What woke me up to this was the XCode profiler. It kept complaining about using to large a frame buffer, eventually I found it in my init section.
http://developer.apple.com/library/ios/#documentation/iPhone/Reference/EAGLDrawable_Ref/EAGLDrawable/EAGLDrawable.html
.
That single change allowed my games on iPad, Retina, and iPods to go to 60 FPS.
I have yet to re-release them with this change as I just found it out 3 days ago :) but I do not think I plan to release at 60fps, 30fps is just fine for casual games, and I found that my sound effects cut the frame rate down, so either resample, play the sfx in another thread, or other solutions to keep that frame rate up if I decide to go with 60fps. Don't forget to discard the buffers that are not used to display..
if (fiOSver >= 4.0f) {
const GLenum discards[] = {GL_DEPTH_ATTACHMENT_OES};
glDiscardFramebufferEXT(GL_FRAMEBUFFER_OES,1,discards);
}
[m_oglContext presentRenderbuffer:GL_RENDERBUFFER_OES];
In a similar situation (porting a 2D Adventure game to the iPad). My 3GS version was running more or less locked at 60FPS, put it on iPad dropped it (and my jaw) to 20fps.
Turns out ONE of the little gotchas involved is that the PVR cards hate GL_ALPHA_TEST; on the PC that actually has a slight positive effect (especially on older intel chips), but they're death on fillrate on the iPhone. Changing that to
glDisable(GL_ALPHA_TEST);
gave me an immediate 100% boost in FPS (up to 40 FPS). Not bad for one line of code.. :)
Allan
The biggest performance killer on the iPhone platform is the number of draw calls and state changes. If you're doing more than 20ish draw calls or state changes, you're going to run into a performance wall.
Batching and texture atlases are your friend.
By my past experience with openGL ES on old windows mobile devices with processing speed around 600mhz
usually reducing the rendering window resolution increase the speed of rendering.
as my tests lately i figured out that i need performance monitoring while rendering frame by frame to collect how many fps can display with current performance and what resolution currently applied
i hope it is a good practice to keep a monitoring algorithm in place of rendering view to balance resolution and frame rate while running a game rendering engine.
depending on the amount of frame rate you wanted, you should sacrifice the rendering view resolution, to perform best and degrade gracefully on most devices with varying hardware and software performence.
you may need to control the resolution menually like explained in this article.
http://www.david-amador.com/2010/09/setting-opengl-view-for-iphone-4-retina-hi-resolution/

Large scrolling background in OpenGL ES

I am working on a 2D scrolling game for iPhone. I have a large image background, say 480×6000 pixels, of only a part is visible (exactly one screen’s worth, 480×320 pixels). What is the best way to get such a background on the screen?
Currently I have the background split into several textures (to get around the maximum texture size limit) and draw the whole background in each frame as a textured triangle strip. The scrolling is done by translating the modelview matrix. The scissor box is set to the window size, 480×320 pixels. This is not meant to be fast, I just wanted a working code before I get to optimizing.
I thought that maybe the OpenGL implementation would be smart enough to discard the invisible portion of the background, but according to some measuring code I wrote it looks like background takes 7 ms to draw on average and 84 ms at maximum. (This is measured in the simulator.) This is about a half of the whole render loop, ie. quite slow for me.
Drawing the background should be as easy as copying some 480×320 pixels from one part of the VRAM to another, or, in other words, blazing fast. What is the best way to get closer to such performance?
That's the fast way of doing it. Things you can do to improve performance:
Try different texture-formats. Presumably the SDK docs have details on the preferred format, and presumably smaller is better.
Cull out entirely offscreen tiles yourself
Split the image into smaller textures
I'm assuming you're drawing at a 1:1 zoom-level; is that the case?
Edit: Oops. Having read your question more carefully, I have to offer another piece of advice: Timings made on the simulator are worthless.
The quick solution:
Create a geometry matrix of tiles (quads preferably) so that there is at least one row/column of off-screen tiles on all sides of the viewable area.
Map textures to all those tiles.
As soon as one tile is outside the viewable area you can release this texture and bind a new one.
Move the tiles using a modulo of the tile width and tile height as position (so that the tile will reposition itself at its starting pos when it have moved exactly one tile in length). Also remember to remap the textures during that operation. This allows you to have a very small grid/very little texture memory loaded at any given time. Which I guess is especially important in GL ES.
If you have memory to spare and are still plagued with slow load speed (although you shouldn't for that amount of textures). You could build a texture streaming engine that preloads textures into faster memory (whatever that may be on your target device) when you reach a new area. Mapping as textures will in that case go from that faster memory when needed. Just be sure that you are able to preload it without using up all memory and remember to release it dynamically when not needed.
Here is a link to a GL (not ES) tile engine. I haven't used it myself so I cannot vouch for its functionality but it might be able to help you: http://www.mesa3d.org/brianp/TR.html

Why do images for textures on the iPhone need to have power-of-two dimensions?

I'm trying to solve this flickering problem on the iphone (open gl es game). I have a few images that don't have pow-of-2 dimensions. I'm going to replace them with images with appropriate dimensions... but why do the dimensions need to be powers of two?
The reason that most systems (even many modern graphics cards) demand power-of-2 textures is mipmapping.
What is mipmapping?
Smaller versions of the image will be created in order to make the thing look correctly at a very small size. The image is divided by 2 over and over to make new images.
So, imagine a 256x128 image. This would have smaller versions created of dimensions 128x64, 64x32, 32x16, 16x8, 8x4, 4x2, 2x1, and 1x1.
If this image was 256x192, it would work fine until you got down to a size of 4x3. The next smaller image would be 2x1.5 which is obviously not a valid size. Some graphics hardware can deal with this, but many types cannot.
Some hardware also requires a square image but this isn't very common anymore.
Why do you need mipmapping?
Imagine that you have a picture that is VERY far away, so far away as to be only the size of 4 pixels. Now, when each pixel is drawn, a position on the image will be selected as the color for that pixel. So you end up with 4 pixels that may not be at all representative of the image as a whole.
Now, imagine that the picture is moving. Every time a new frame is drawn, a new pixel is selected. Because the image is SO far away, you are very likely to see very different colors for small changes in movement. This leads to very ugly flashing.
Lack of mipmapping causes problems for any size that is smaller than the texture size, but it is most pronounced when the image is drawn down to a very small number of pixels.
With mipmaps, the hardware will have access to 2x2 version of the texture, so each pixel on it will be the average color of that quadrant of the image. This eliminates the odd color flashing.
http://en.wikipedia.org/wiki/Mipmap
Edit to people who say this isn't true anymore:
It's true that many modern GPUs can support non-power-of-two textures but it's also true that many cannot.
In fact, just last week I had a 1024x768 texture in an XNA app I was working on, and it caused a crash upon game load on a laptop that was only about a year old. It worked fine on most machines though. It's a safe bet that the iPhone's gpu is considerably more simple than a full PC gpu.
Typically, graphics hardware works natively with textures in power-of-2 dimensions. I'm not sure of the implementation/construction details that cause this to be the case, but it's generally how it is everywhere.
EDIT: With a little research, it turns out my knowledge is a little out of date -- a lot of modern graphics cards can handle arbitrary texture sizes now. I would imagine that with the space limitations of a phone's graphics processor though, they'd probably need to omit anything that would require extra silicon like that.
You can find OpenGL ES support info about Apple Ipod/Iphone devices here:
Apple OpenES support
OpenGL ES 2.0 is defined as equal to OpenGL 2.0
The constraint about texture size's has been disappear only from version 2.0
So if you use OpenGL ES with version less then 2.0 - it is normal situation.
I imagine it's a pretty decent optimization in the graphics hardware to assume power-of-2 textures. I bought a new laptop, with latest laptop graphics hardware, and if textures aren't power-of-2 in Maya, the rendering is all messed up.
Are you using PVRTC compression? That requires powers of 2 and square images.
Try implementing wrapping texture-mapping in software and you will quickly discover why power-of-2 sized are desirable.
In short, you will find that if you can assume power-of-2 dimensions then a lot of integer multiplications and divisions turn into bit-shifts.
I would hazard a guess that the recent trend in relaxing this restriction is due to GPUs moving to floating-point maths.
Edit: The "because of mipmapping" answer is incorrect. Mipmapped, non-power-of-two textures are a common feature of modern GPUs.