I'm developing a rendering of an isometric 3d environment made up of blocks (kind of minecraft).
I'm drawing it in a canvas using its 2d context (and doing some math).
On page load a loop is created adding some blocks each frame (window.requestAnimationFrame(fn)), but I'm struggling with low fps when rendering.
This is first time for me to go so deep in performance analysis, and I'm struggling understanding the performance view of Chrome Devtools.
Looking at the results:
What I understand is that the frame took 115.9ms to complete, but looking at the processes seems it took just ~30ms to do the calculation using the canvas API, but in the task bar (upon the Animation Frame Fired) I see much longer time for the frame to complete.
Is this a common behavior? Have I did some dumb mistake wasting performance some way?
(if it is a common behavior, what is happening during that time? Is it the actual drawing?)
I blocked as I'm wondering if I should try to improve my algorithm of drawing, or I should look somewhere else to address a bottleneck
I don't know if you ever got an answer to this, but one thing that jumps out at me is that in your screenshot the green "GPU" bar is nearly solid. As I understand it, this bar indicates that the browser is sending instructions and/or data to the GPU for hardware-accelerated rendering. In my experience this can be a problem if you're using a slow graphics card, depending on what you're trying to do.
The good news is that I would expect testing on a more powerful system to result in an immediate framerate improvement. The bad news is, I'm not sure how to tell exactly which canvas operations put that much load on your (bad) GPU or how to optimize to reduce GPU traffic.
I've been experimenting with using multisampling to do full-scene anti-aliasing on the iPhone and iPad on iOS 4. The general mechanism uses Apple's APPLE_framebuffer_multisample extension (http://www.khronos.org/registry/gles/extensions/APPLE/APPLE_framebuffer_multisample.txt) and is described in the answer to this question: How do you activate multisampling in OpenGL ES on the iPhone? and documented by Apple in their OpenGL ES Programming Guide.
It works as described, but the drawing performance of my test application suffers by about 50% when I set the number of samples to be 2. I'm primarily testing on an iPhone 4, using a non-retina-enabled application. I am using the other performance suggestions offered by Apple in their documentation (using glDiscardFramebufferEXT to discard the renderbuffers attached to the multisample framebuffer, using glClear to clear the entire framebuffer at the start of the frame, etc.).
The performance overhead of enabling multisampling in this manner seems surprisingly large to me. Are you guys seeing similar results or does this suggest that I'm doing something incorrectly?
You mentioned that you're running this on an iPhone 4. Is your OpenGL ES layer rendering at the full 2X Retina display scale factor? That is, have you set the contentScaleFactor on the OpenGL ES hosting layer to [[UIScreen mainScreen] scale]? If so, you're pushing a large number of pixels to start with.
Are you fill rate limited before you apply the multisampled antialiasing? To check, use the OpenGL ES instrument in Instruments against your running application and enable the Tiler Utilization and Renderer Utilization statistics. If your application shows a high Renderer Utilization without MSAA enabled, you are fill rate limited to begin with. Adding MSAA on top of that could significantly reduce your framerates because of this bottleneck.
In an application that I had which was geometry limited, not fill rate limited, I didn't see that great of a slowdown when using 4X MSAA in it on an iPhone 4. I'm guessing that the bottleneck in your application is in pushing pixels to the screen.
It is not surprising that your performance suffers by about 50% when you set the # of samples to 2: you're drawing twice the samples! Multisampling means you essentially draw your scene at a higher resolution than the screen to an off-screen buffer, and then you use filtering algorithms to reduce the higher resolution multi-sampled buffer to the display screen resolution, hopefully with fewer aliasing artifacts because the final picture actually includes more detail (filtered higher resolution output) than the single-sampled version.
It is a very common (if not the most common) performance problem in graphics: the more samples you draw, the slower you go.
At a high level (or low level if you'd like), what's a good way to implement a smudge affect for a drawing program on the iPad using Quartz2D (Core Graphics)? Has anyone tried this?
(source: pixlr.com)
Thanks so much in advance for your wisdom!
UPDATE I found this great article for those interested, check it!
Link now at: http://losingfight.com/blog/2007/09/05/how-to-implement-smudge-and-stamp-tools/
I would suggest implementing a similar algorithm to what is detailed in that article using OpenGL ES 2.0 to get the best performance.
Get the starting image as a texture
Set up a render-to-texture framebuffer
Render initial image in a quad
Render another quad the size of your brush with a slightly shifted view of the image, multiplied by an alpha mask stored in a texture or defined by, for example, a gaussian function. Use alpha-blending with the background quad.
Render this texture into a framebuffer associated with your CAEAGLLayer-backed view
Go to 1 on the next -touchesMoved event, with the result from your previous rendering as the input. Keep in mind you'll want to have 2 texture objects to "ping-pong" between as you can't read from and write to the same texture at once.
I think it's unlikely you're going to get great performance on the CPU, but it's definitely easier to set up that way. In this setup, though, you can have essentially unlimited brush size, etc and you're not looping over image drawing code.
Curious about what sort of performance you do get on the CPU, though. Take care :)
I have an Open GL ES game on the iPhone. My framerate is pretty sucky, ~20fps. Using the Xcode OpenGL ES performance tool on an iPhone 3G, it shows:
Renderer Utilization: 95% to 99%
Tiler Utilization: ~27%
I am drawing a lot of pretty large images with a lot of blending. If I reduce the number of images drawn, framerates go from ~20 to ~40, though the performance tool results stay about the same (renderer still maxed). I think I'm being limited by the fill rate of the iPhone 3G, but I'm not sure.
My questions are: How can I determine with more granularity where the bottleneck is? That is my biggest problem, I just don't know what is taking all the time. If it is fillrate, is there anything I do to improve it besides just drawing less?
I am using texture atlases. I have tried to minimize image binds, though it isn't always possible (drawing order, not everything fits on one 1024x1024 texture, etc). Every frame I do 10 image binds. This seem pretty reasonable, but I could be mistaken.
I'm using vertex arrays and glDrawArrays. I don't really have a lot of geometry. I can try to be more precise if needed. Each image is 2 triangles and I try to batch things were possible, though often (maybe half the time) images are drawn with individual glDrawArrays calls. Besides the images, I have ~60 triangles worth of geometry being rendered in ~6 glDrawArrays calls. I often glTranslate before calling glDrawArrays.
Would it improve the framerate to switch to VBOs? I don't think it is a huge amount of geometry, but maybe it is faster for other reasons?
Are there certain things to watch out for that could reduce performance? Eg, should I avoid glTranslate, glColor4g, etc?
I'm using glScissor in a 3 places per frame. Each use consists of 2 glScissor calls, one to set it up, and one to reset it to what it was. I don't know if there is much of a performance impact here.
If I used PVRTC would it be able to render faster? Currently all my images are GL_RGBA. I don't have memory issues.
One of my fullscreen textures is 256x256. Would it be better to use 480x320 so the phone doesn't have to do any scaling? Are there any other general performance advice for texture sizes?
Here is a rough idea of what I'm drawing, in this order:
1) Switch to perspective matrix.
2) Draw a full screen background image
3) Draw a full screen image with translucency (this one has a scrolling texture).
4) Draw a few sprites.
5) Switch to ortho matrix.
6) Draw a few sprites.
7) Switch to perspective matrix.
8) Draw sprites and some other textured geometry.
9) Switch to ortho matrix.
10) Draw a few sprites (eg, game HUD).
Steps 1-6 draw a bunch of background stuff. 8 draws most of the game content. 10 draws the HUD.
As you can see, there are many layers, some of them full screen and some of the sprites are pretty large (1/4 of the screen). The layers use translucency, so I have to draw them in back-to-front order. This is further complicated by needing to draw various layers in ortho and others in perspective.
I will gladly provide additional information if reqested. Thanks in advance for any performance tips or general advice on my problem!
Edit:
I added some logging to see how many glDrawArrays calls I am doing, and with how much data. I do about 20 glDrawArray calls per frame. Usually about 1 to up to 6 of these has about 40 vertices each. The rest of the calls are usually just 2 vertices (one image). I'm just using glVertexPointer and glTexCoordPointer.
Given that the Renderer Utilization is basically at 100%, that indicates that the bottleneck is filling, texturing, and blending pixels. Techniques intended to optimize vertex processing (VBOs and vertex formats) or CPU usage (draw call batching) will likely not help, as they will not speed up pixel processing.
Your best bet is to reduce the number of pixels that you are filling, and also look at different texture formats that make better use of the very limited memory bandwidth available on the first generation devices. Prefer the use of PVRTC textures wherever possible, and 16bit uncompressed textures otherwise.
Look to Apple's "Best Practices for Working with Texture Data" and "Best Practices for Working with Vertex Data" sections of the OpenGL ES Programming Guide for iPhone OS. They highly recommend (as do others) that you use PVRTC for compressing your textures, because they can offer an 8:1 or 16:1 compression ratio over your standard uncompressed textures. Aside from mipmapping, you seem to be doing the other recommended optimization of using a texture atlas.
You do not appear to be geometry-limited, because (as I discovered in this question) the Tiler Utilization statistic seems to indicate how much of a bottleneck is being caused by geometry size. However, the iPhone 3G S (and third-generation iPod touch and iPad) support hardware-accelerated VBOs, so you might give those a shot and see how they affect performance. They might not have as much of an effect as compressing textures would, but they're not hard to implement.
A big win for the (mostly for)3G will also be the texture filtering you are using. Check if you are using "TRILINEAR" filtering, and change it to "BILINEAR".
Make sure you setup the textures like this:
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
and not like this:
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR_MIPMAP_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
Harry
I wanted to chime in with an additional answer, change the Framebuffer backing to be a 16bit format as opposed to a 32 bit format.
if ((self = [super initWithCoder:coder]))
{
eaglLayer = (CAEAGLLayer *)self.layer;
eaglLayer.opaque = YES;
eaglLayer.drawableProperties = [NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithBool:NO],
kEAGLDrawablePropertyRetainedBacking,
kEAGLColorFormatRGB565, //kEAGLColorFormatRGBA8 = large frame buff 32bit // kEAGLColorFormatRGB565 = 16bit frame buffer.
kEAGLDrawablePropertyColorFormat,
nil];
}
What woke me up to this was the XCode profiler. It kept complaining about using to large a frame buffer, eventually I found it in my init section.
http://developer.apple.com/library/ios/#documentation/iPhone/Reference/EAGLDrawable_Ref/EAGLDrawable/EAGLDrawable.html
.
That single change allowed my games on iPad, Retina, and iPods to go to 60 FPS.
I have yet to re-release them with this change as I just found it out 3 days ago :) but I do not think I plan to release at 60fps, 30fps is just fine for casual games, and I found that my sound effects cut the frame rate down, so either resample, play the sfx in another thread, or other solutions to keep that frame rate up if I decide to go with 60fps. Don't forget to discard the buffers that are not used to display..
if (fiOSver >= 4.0f) {
const GLenum discards[] = {GL_DEPTH_ATTACHMENT_OES};
glDiscardFramebufferEXT(GL_FRAMEBUFFER_OES,1,discards);
}
[m_oglContext presentRenderbuffer:GL_RENDERBUFFER_OES];
In a similar situation (porting a 2D Adventure game to the iPad). My 3GS version was running more or less locked at 60FPS, put it on iPad dropped it (and my jaw) to 20fps.
Turns out ONE of the little gotchas involved is that the PVR cards hate GL_ALPHA_TEST; on the PC that actually has a slight positive effect (especially on older intel chips), but they're death on fillrate on the iPhone. Changing that to
glDisable(GL_ALPHA_TEST);
gave me an immediate 100% boost in FPS (up to 40 FPS). Not bad for one line of code.. :)
Allan
The biggest performance killer on the iPhone platform is the number of draw calls and state changes. If you're doing more than 20ish draw calls or state changes, you're going to run into a performance wall.
Batching and texture atlases are your friend.
By my past experience with openGL ES on old windows mobile devices with processing speed around 600mhz
usually reducing the rendering window resolution increase the speed of rendering.
as my tests lately i figured out that i need performance monitoring while rendering frame by frame to collect how many fps can display with current performance and what resolution currently applied
i hope it is a good practice to keep a monitoring algorithm in place of rendering view to balance resolution and frame rate while running a game rendering engine.
depending on the amount of frame rate you wanted, you should sacrifice the rendering view resolution, to perform best and degrade gracefully on most devices with varying hardware and software performence.
you may need to control the resolution menually like explained in this article.
http://www.david-amador.com/2010/09/setting-opengl-view-for-iphone-4-retina-hi-resolution/
I am creating an iPhone application and I am using OpenGL, which I'm new to. I have to change the pixel positions, can anyone show me the functions to work on pixels?
Some example source code to illustrate the idea would be appreciated.
I'm not an expert on iPhone issues in particular, and I'm an intermediate OpenGL programmer, so take this for what it's worth --
OpenGL discourages direct pixel manipulation, largely because it doesn't make as much sense when you are dealing with any kind of hardware acceleration. The frame buffers are usually stored directly in graphics RAM anymore, and while pushing bits to graphics memory is speedy, pulling information back out is a rare and unoptimized case. The 3-d cards are optimized for fast texturing of triangles, not pixels.
In the good old days, when the frame buffer was in main memory, it wasn't a big deal, but things have changed. So while it's often still possible to peek and poke individual pixels, you will usually get dramatic speed increases by re-expressing your operation as a native OpenGL method. You can write pixels using GL_POINTS, incidentally, but again, it's quite slow.
Still, there are some effects for which manipulating pixels in-place is useful -- plasmas and flame effects really can't be done any other way. For this, I suggest you emulate a frame buffer -- allocate your own, and write to it directly. Then, when you need to display it, blit the whole block to the screen at once.