Performance hit from blending large quad - iphone

I have a game which runs pretty well (55-60fps) on a retina display.
I want to add a fullscreen overlay that blends with the existing scene. However, even when using a small texture, the performance hit is huge. Is there an optimization I can perform to make this useable?
If I use a 80x120 texture (the texture is rendered on the fly, which is why it's not square), I get 25-30FPS. If I make the texture smaller, performance increases, but quality is not acceptable. In general, though, the quality of the overlay is not very important (it's just lighting).
Renderer utilization is at 99%.
Even if I use a square texture from a file (.png), performance is bad.
This is how I create the texture:
[EAGLContext setCurrentContext:context];
// Create default framebuffer object.
glGenFramebuffers(1, &lightFramebuffer);
glBindFramebuffer(GL_FRAMEBUFFER, lightFramebuffer);
// Create color render buffer and allocate backing store.
glGenRenderbuffers(1, &lightRenderbuffer);
glBindRenderbuffer(GL_RENDERBUFFER, lightRenderbuffer);
glRenderbufferStorage(GL_RENDERBUFFER, GL_RGBA8_OES, LIGHT_WIDTH, LIGHT_HEIGHT);
glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER, lightRenderbuffer);
glGenTextures(1, &lightImage);
glBindTexture(GL_TEXTURE_2D, lightImage);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, LIGHT_WIDTH, LIGHT_HEIGHT, 0, GL_RGBA, GL_UNSIGNED_BYTE, NULL);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, lightImage, 0);
And here is the rendering...
/* Draw scene... */
glBlendFunc(GL_ONE, GL_ONE);
//Switch to offscreen texture buffer
glBindFramebuffer(GL_FRAMEBUFFER, lightFramebuffer);
glBindRenderbuffer(GL_RENDERBUFFER, lightRenderbuffer);
glViewport(0, 0, LIGHT_WIDTH, LIGHT_HEIGHT);
glClearColor(ambientLight, ambientLight, ambientLight, ambientLight);
glClear(GL_COLOR_BUFFER_BIT);
/* Draw lights to texture... */
//Switch back to main frame buffer
glBindFramebuffer(GL_FRAMEBUFFER, defaultFramebuffer);
glBindRenderbuffer(GL_RENDERBUFFER, colorRenderbuffer);
glViewport(0, 0, framebufferWidth, framebufferHeight);
glBlendFunc(GL_DST_COLOR, GL_ZERO);
glBindTexture(GL_TEXTURE_2D, glview.lightImage);
/* Set up drawing... */
glDrawElements(GL_TRIANGLE_FAN, 4, GL_UNSIGNED_SHORT, 0);
Here are some benchmarks I took when trying to narrow down the problem. 'No blend' means I glDisable(GL_BLEND) before I draw the quad. 'No buffer switching' means I don't switch back and forth from the offscreen buffer before drawing.
(Tests using a static 256x256 .png)
No blend, No buffer switching: 52FPS
Yes blend, No buffer switching: 29FPS //disabled the glClear, which would artificially speed up the rendering
No blend, Yes buffer switching: 29FPS
Yes blend, Yes buffer switching: 27FPS
Yes buffer switching, No drawing: 46FPS
Any help is appreciated. Thanks!
UPDATE
Instead of blending the whole lightmap afterward, I ended up writing a shader to do the work on the fly. Each fragment samples and blends from the lightmap (kind of like multitexturing). At first, the performance gain was minimal, but then I used a lowp sampler2d for the light map, and then I got around 45FPS.
Here's the fragment shader:
lowp vec4 texColor = texture2D(tex, texCoordsVarying);
lowp vec4 lightColor = texture2D(lightMap, worldPosVarying);
lightColor.rgb *= lightColor.a;
lightColor.a = 1.0;
gl_FragColor = texColor * color * lightColor;

Ok I think you've run up against the limitations of the hardware. Blending a screen-sized quad over the whole scene is probably a particularly bad case for the tile-based hardware.
The PowerVR SGX (on the iPhone) is optimized for hidden surface removal, to avoid drawing things when not needed. It has low memory bandwidth because it's optimized for low power device.
So screen-sized blended quad is reading then writing every fragment on the screen. Ouch!
The glClear speed up is related - because you're telling GL you don't care about the contents of the backbuffer before rendering, which saves loading the previous contents into memory.
There's a very good overview of the iOS hardware here: http://www.imgtec.com/factsheets/SDK/POWERVR%20SGX.OpenGL%20ES%202.0%20Application%20Development%20Recommendations.1.1f.External.pdf
As for an actual solution - I would try directly rendering your overlay on the game scene.
For example, your render loop should look like:
[EAGLContext setCurrentContext:context];
// Set up game view port and render the game
InitGameViewPort();
GameRender();
// Change camera to 2d/orthographic, turn off depth write and compare
InitOverlayViewPort()
// Render overlay into same buffer
OverlayRender()

If you render to a render target on a PowerVR chip, switch to another render target and render, then switch back to any previous render target you will suffer a major performance hit. This kind of access pattern is labelled a "Logical Buffer Load" by the OpenGL ES Analyzer built into the latest Instruments.
If you switch your rendering order so that you draw your lightmap render target first, then render your scene to the main framebuffer, then do your fullscreen blend of the lightmap render target texture your performance should be much higher.

I can confirm, on iPad 1 using iOS 4.2, enable/disable GL_BLEND for one full screen quad toggled between 18 and 31 fps. In both runs, renderer utilization was 90-100%.

Even before fiddling with the texture, make sure your shader is optimized. When filling a 960x640 screen (614400 pixels) any operation in the fragment shader has a huge impact.
One good thing to create a specific version of your fragment shader for this situation. It should be something like this:
varying mediump vec2 vertexTexCoord;
uniform sampler2D texture;
void main() {
gl_FragColor = texture2D(texture, vertexTexCoord);
}
Create another program with this fragment shader and use it before drawing your big quad, then restore the normal program. The iPhone 4 is able to render about 7 full-screen, 1:1 textured quads per frame with blending, but it quickly drops to about 1 with a more sophisticated shader.
(Additionally in your case, try to render your overlay texture first, then the normal elements, then the texture over the rest. It should improve performance by a significant margin.)

Related

How to save and redraw screen content in OpenGL ES

I’m working on a kind of iPhone game where player travels through programmically generated wormhole. To draw the wormhole I chose to draw to arrays of textured vertical lines a pixel width to implement top and bottom walls of the wormhole. Every frame all the lines must be shifted left to implement the player movement and new lines must be drown in free space at right. But drawing 1000 textured rectangles every frame is killing my FPS.
And I’m looking for a solution to save all the lines that was drown at previous frame and redraw them altogether to the new shifted position.
It would be terrific if there is a way to draw textured rectangles in some kind of buffer that is bigger than screen, and then render this buffer to the screen.
I guest these are newbie questions cause I’m totally new in OpenGL.
I spent hours trying to figure this out, but haven’t succeeded. So Any help appreciated.
To expand on #Jerry's answer, I'll walk you through the steps, since you're new. First, we'll create the frame buffer object:
GLuint framebuffer;
glGenFramebuffersOES(1, &framebuffer);
glBindFramebufferOES(GL_FRAMEBUFFER_OES, framebuffer);
Next, we'll create the empty texture to hold our snapshot. This is just the usual OpenGL texture creation stuff, and you can modify it to fit your needs, of course. The only line to notice is the glTexImage2D line - note that instead of pixel data as the last coordinate, you can pass NULL, which creates an empty texture.
GLuint texture;
glGenTextures(1, &texture);
glEnable(GL_TEXTURE_2D);
glBindTexture(GL_TEXTURE_2D, texture);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, NULL);
glBindTexture(GL_TEXTURE_2D, 0);
glDisable(GL_TEXTURE_2D);
Now we bind the texture to the frame buffer:
glFramebufferTexture2DOES(GL_FRAMEBUFFER_OES, GL_COLOR_ATTACHMENT0_OES, GL_TEXTURE_2D, texture, 0);
and check to see if everything went OK:
if(glCheckFramebufferStatusOES(GL_FRAMEBUFFER_OES) != GL_FRAMEBUFFER_COMPLETE_OES)
return false; // uh oh, something went wrong
Now we're all set up to draw!
glBindFramebufferOES(GL_FRAMEBUFFER_OES, framebuffer);
// do drawing here
glBindFramebufferOES(GL_FRAMEBUFFER_OES, 0);
And finally, clean up the frame buffer, if you don't need it any more:
glDeleteFramebuffersOES(1, &framebuffer);
Some caveats:
The size of the frame buffer must be a power-of-two.
You can go up to 1024x1024 on the latest iPhone, but there may be no need to have that level of detail.
Offhand I don't know the exact size that'll be available on a particular model of iPhone, but the general idea would be to use a Frame Buffer Object (FBO) to render to a texture, then you can blit pieces from that texture to the screen buffer.

Why is the framerate of my tweaked OpenGL ES 2.0 template so slow on the iPad?

I've modified the OpenGL es 2.0 template in Xcode to render that little box to an offscreen texture (50*50), then reset the view port and render the texture to the screen using a fullscreen quad. But the FPS dropped down so much that obvious lags were seen (about 10).
I know iPad has problems concerning fillrate, but this just doesn't seem right. I used only one FBO and changed its color attachment between texture and renderBuffer in the loop. Does this have any influence?
Besides, I was writing an audio visualizer (like the one in Windows Media Player) editing pixel values in OpenGL. Any suggestions?
here goes the code:
//implement the texture in -(id)init
glGenTextures(1, &ScreenTex);
glBindTexture(GL_TEXTURE_2D, ScreenTex);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, texSize, texSize, 0, GL_RGB, GL_UNSIGNED_BYTE, nil);
//And in the render loop
//draw to the texture
glViewport(0, 0, texSize, texSize);
glBindTexture(GL_TEXTURE_2D, ScreenTex);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, ScreenTex, 0);
glClear(GL_COLOR_BUFFER_BIT);
glVertexAttribPointer(ATTRIB_VERTEX, 2, GL_FLOAT, 0, 0, squareVertices);
glUniform1i(Htunnel, 0);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
//switch to render to render buffer here
glViewport(0, 0, backingWidth, backingHeight);
glBindRenderbuffer(GL_RENDERBUFFER, colorRenderbuffer);
glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_RENDERBUFFER,colorRenderbuffer);
glClear(GL_COLOR_BUFFER_BIT);
glVertexAttribPointer(ATTRIB_VERTEX, 2, GL_FLOAT, 0, 0, texVertices);
glUniform1i(Htunnel, 1);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
//vertex shader
void main()
{
if (tunnel==0) {
gl_Position = position;
gl_Position.y += sin(translate) / 2.0;
colorVarying = color;
}else {
f_texCoord = v_texCoord;
gl_Position = position;
}
}
//frag shader
void main()
{
if (tunnel==0) {
gl_FragColor = colorVarying;
} else {
gl_FragColor = texture2D(s_texture, f_texCoord);
}
}
Without actual code, it will be difficult to pick out where the bottleneck is. However, you can get an idea of where the problem is by using Instruments to localize the causes.
Create a new Instruments document using both the OpenGL ES instrument and the new Time Profiler one. In the OpenGL ES instrument, hit the little inspector button on its right side, then click on the Configure button. Make sure pretty much every logging option is checked on the resulting page, particularly the Tiler Utilization % and Renderer Utilization %. Click Done and make sure that both of those statistics are checked in the Select statistics to list page.
Run this set of instruments against your application on the iPad for a little while during rendering. Stop it and look at the numbers. As explained in Pivot's answer to my question, if you are seeing the Tiler Utilization % in the OpenGL ES instrument hitting 100%, you are being limited by your geometry (unlikely here). Likewise, if the Renderer Utilization % is near 100%, you are fill-rate limited. You can also look to the other statistics you've logged to pull out what might be happening.
You can then turn to the Time Profiler results to see if you can narrow down the hotspots in your code where things might be getting slowed down. Find the items near the top of the list there. If they are in your code, double-click on them to see what's going on. If they are in system libraries, filter the results until you see something more relevant by right-clicking on the symbol name and choosing either Charge Library to Callers or Charge Symbol to Caller.
At some point, you'll start seeing OpenGL-related symbols up there, which should clue you in to what the GPU is doing. Also, you may be surprised to find some of your own code slowing things down.
There's another OpenGL ES instrument that you might try, but it's part of the Xcode 4 beta and is currently under NDA. Check out the WWDC 2010 session videos for more about that one.

Loading screen for an openGL game on iphone?

I'm working in an openGL for iPhone , and although everything works great I have to wait about a second in certain sections of the game which use a ton of sprite sheets. It's there any way to create a loading screen for such sections ?, or any way to know if a certain texture has finished loading with openGL?
EDIT:
I load my textures with this function:
-(void)loadTexture:(NSString*)nombre {
CGImageRef textureImage = [UIImage imageNamed:nombre].CGImage;
if (textureImage == nil) {
NSLog(#"Failed to load texture image");
return;
}
textureWidth = NextPowerOfTwo(CGImageGetWidth(textureImage));
textureHeight = NextPowerOfTwo(CGImageGetHeight(textureImage));
imageSizeX= CGImageGetWidth(textureImage);
imageSizeY= CGImageGetHeight(textureImage);
GLubyte *textureData = (GLubyte *)calloc(1,textureWidth * textureHeight * 4); // Por 4 pues cada pixel necesita 4 bytes, RGBA
CGContextRef textureContext = CGBitmapContextCreate(textureData, textureWidth,textureHeight,8, textureWidth * 4,CGImageGetColorSpace(textureImage),kCGImageAlphaPremultipliedLast );
CGContextDrawImage(textureContext, CGRectMake(0.0, 0.0, (float)textureWidth, (float)textureHeight), textureImage);
CGContextRelease(textureContext);
glGenTextures(1, &textures[0]);
glBindTexture(GL_TEXTURE_2D, textures[0]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, textureWidth, textureHeight, 0, GL_RGBA, GL_UNSIGNED_BYTE, textureData);
free(textureData);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
}
I usually load the textures i need deppending on the section of the game, for example, one or two 1024x1024 textures which serve me as sprite sheets are enough for most levels, but in certain levels, like boss battles for example, I load about 5 or 6 textures ( the boss is big and has a ton of different attacks) and the game takes about 2 or 3 secs to load all those textures at once.
As Till explained, since you are loading the texture synchronously, it will be loaded once "loadTexture" is done.
Rendering to the screen in another thread via OpenGL while loading via OpenGL can become really messy real quick.
A - quite often used - hack is to update the screen between texture loads:
render 0% screen
loadTexture
render 10% screen
loadTexture
repeat as needed
With a little bit of extra work it's easy to generate a small table that has timing information about the loading, so the "x%" is close to the truth ;)
We usually automate this step and before shipping a new version do a "load through" run, in which we log when what resource is loaded and how long it takes.
Since you are loading the texture synchronously, it is rather obvious that the loading of that texture is done as soon as your loadTexture-method is done.
EDIT / ADDITIONS:
There is a couple of things you should avoid / change in your code. Not that this would really solve your actual problem, just a general advice....
1st: Do not use imageNamed for loading a texture image - that is generally a bad idea. Use imageWithContentsOfFile as that does not hog the image cache with data that you wont be reusing anyways.
2nd: Once a texture is loaded, there is a term that Apple calls "warming the texture" - which basically is using that texture briefly for a quick and dirty (even offscreen) rendering. That way you can be sure that the texture will be fully available and no extra penalties are imposed when doing the first "real" rendering. We are talking milliseconds of penalty - so no biggy but noticeable.
3rd: Try to shift the texture loading to a point where the application is idle anyways - eg. waiting for user input in the startup screen - but refrain from doing that in the applicationDidFinishLaunching method.
Here comes a question to you - why do you load the textures while rendering / inside the level. Why not preloading everything you possibly need? You can use up to 24mb of texture memory without imposing any penalties (well, minus the fbo memory).
Alternatively, you could simply display a UIImageView animation while the textures are loading. This has the special benefit of drawing in a separate threat... this is a special case, just like the UIActivityIndicatorView (spinner). I'm not sure how Apple is achieving this technically, but it works great.
The only catch is that you need to keep your loading animation frames in separate files, rather than in a sprite sheet.
http://developer.apple.com/library/ios/#documentation/uikit/reference/UIImageView_Class/Reference/Reference.html

512x512 Texture causing huge GPU stress on iPhone, despite tiling

I'm testing my simple OpenGL ES implementation (a 2D game) on the iPhone and I notice a high render utilization while using the profiler. These are the facts:
I'm displaying only one preloaded large texture (512x512 pixels) at 60fps and the render utilization is around 40%.
My texture is blended using GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA, the only GL function I'm using.
I've tried to make the texture smaller and tiling it, which made no difference.
I'm using a PNG texture atlas of 1024x1024 pixels
I find it very strange that this one texture is causing such an intense GPU usage.
Is this to be expected? What am I doing wrong?
EDIT: My code:
// OpenGL setup is identical to OpenGL ES template
// initState is called to setup
// timer is initialized, drawView is called by the timer
- (void) initState
{
//usual init declarations have been omitted here
glEnable(GL_BLEND);
glBlendFunc(GL_ONE,GL_ONE_MINUS_SRC_ALPHA);
glEnableClientState (GL_VERTEX_ARRAY);
glVertexPointer (2,GL_FLOAT,sizeof(Vertex),&allVertices[0].x);
glEnableClientState (GL_TEXTURE_COORD_ARRAY);
glTexCoordPointer (2,GL_FLOAT,sizeof(Vertex),&allVertices[0].tx);
glEnableClientState (GL_COLOR_ARRAY);
glColorPointer (4,GL_UNSIGNED_BYTE,sizeof(Vertex),&allVertices[0].r);
}
- (void) drawView
{
[EAGLContext setCurrentContext:context];
glBindFramebufferOES(GL_FRAMEBUFFER_OES, viewFramebuffer);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
GLfloat width = backingWidth /2.f;
GLfloat height = backingHeight/2.f;
glOrthof(-width, width, -height, height, -1.f, 1.f);
glMatrixMode(GL_MODELVIEW);
glClearColor(0.f, 0.f, 0.f, 1.f);
glClear(GL_COLOR_BUFFER_BIT);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
glBindRenderbufferOES(GL_RENDERBUFFER_OES, viewRenderbuffer);
[context presentRenderbuffer:GL_RENDERBUFFER_OES];
[self checkGLError];
}
EDIT: I've made a couple of improvements, but none managed to lower the render utilization. I've divided the texture in parts of 32x32, changed the type of the coordinates and texture coordinates from GLfloat to GLshort and added extra vertices for degenerative triangles.
The updates are:
initState:
(vertex and texture pointer are now GL_SHORT)
glMatrixMode(GL_TEXTURE);
glScalef(1.f / 1024.f, 1.f / 1024.f, 1.f / 1024.f);
glMatrixMode(GL_MODELVIEW);
glScalef(1.f / 16.f, 1.f/ 16.f, 1.f/ 16.f);
drawView:
glDrawArrays(GL_TRIANGLE_STRIP, 0, 1536); //(16*16 parts * 6 vertices)
I'm writing an app which displays five 512x512 textures on top of each other in a 2D environment using GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA, and I can get about 14fps. Do you really need 60fps? For a game, I'd think 24-30 would be fine. Also, use PVR texture compression if at all possible. There's an example that does it included with the SDK.
I hope you didn't forget to disable GL_BLEND when you don't need it already.
You can make an attempt at memory bandwidth optimization - use 16 bpp formats or PVRTC. IMHO with your texture size texture cache doesn't help at all.
Don't forget that your framebuffer is being used as texture by iPhone UI. If it is created as 32 bit RGBA it will be alpha-blended one more time. For optimal performance 16 bit 565 framebuffers are the best (but graphics quality suffers).
I don't know all the details, such as cache size, but, I suppose, textures pixels are already swizzled when uploaded into video memory and triangles are split by PVR tile engine. Therefore your own splitting appears to be redundant.
And finally. This is only a mobile low-power GPU, not designed for huge screens and high fillrates. Alpha-blending is costly, maybe 3-4 times difference on PowerVR chips.
Read this post.
512x512 is probably a little over optimistic for the iPhone to deal with.
EDIT:
I assume you have already read this, but if not check Apples guide to optimal OpenGl ES performance on iPhone.
What is exactly is the problem?
You're getting your 60fps, which is silky smooth.
Who cares if render utilization is 40%?
The issue could be because of the iPhone's texture cache size. It may simply come down to how much of the texture is on each individual triangle, quad, or tristrip, depending on how you're setting state.
Try this: subdivide your quad and repeat your tests. So if you're 1 quad, make it 4. Then 16. and so on, and see if that helps. The key is to reduce the actual number of pixels that each primitive references.
When the texture cache gets blown, then the hardware will thrash texture lookups from main memory into whatever vram is set aside for the texture buffer for each pixel. This can kill performance mighty quick.
OR - I am completely wrong because I really don't know the iPhone hardware, and I also know that the PVR chip is a strange beast in comparison to what I'm used to (PS2, PSP). Still it's an easy test to try and I'm curious if it helps.

Why is this OpenGL ES code slow on iPhone?

I've slightly modified the iPhone SDK's GLSprite example while learning OpenGL ES and it turns out to be quite slow. Even in the simulator (on the hw worst) so I must be doing something wrong since it's only 400 textured triangles.
const GLfloat spriteVertices[] = {
0.0f, 0.0f,
100.0f, 0.0f,
0.0f, 100.0f,
100.0f, 100.0f
};
const GLshort spriteTexcoords[] = {
0,0,
1,0,
0,1,
1,1
};
- (void)setupView {
glViewport(0, 0, backingWidth, backingHeight);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrthof(0.0f, backingWidth, backingHeight,0.0f, -10.0f, 10.0f);
glMatrixMode(GL_MODELVIEW);
glClearColor(0.3f, 0.0f, 0.0f, 1.0f);
glVertexPointer(2, GL_FLOAT, 0, spriteVertices);
glEnableClientState(GL_VERTEX_ARRAY);
glTexCoordPointer(2, GL_SHORT, 0, spriteTexcoords);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
// sprite data is preloaded. 512x512 rgba8888
glGenTextures(1, &spriteTexture);
glBindTexture(GL_TEXTURE_2D, spriteTexture);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, spriteData);
free(spriteData);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glEnable(GL_TEXTURE_2D);
glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC_ALPHA);
glEnable(GL_BLEND);
}
- (void)drawView {
..
glClear(GL_COLOR_BUFFER_BIT);
glLoadIdentity();
glTranslatef(tx-100, ty-100,10);
for (int i=0; i<200; i++) {
glTranslatef(1, 1, 0);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
}
..
}
drawView is called every time the screen is touched or the finger on the screen is moved and tx,ty are set to the x,y coordinates where that touch happened.
I've also tried using GLBuffer, when translation was pre-generated and there was only one DrawArray but gave the same performance (~4 FPS).
===EDIT===
Meanwhile I've modified this so that much smaller quads are used (sized: 34x20) and much less overlapping is done. There are ~400 quads->800 triangles spread on the whole screen. Texture size is 512x512 atlas and RGBA_8888 while the texture coordinates are in float.
The code is very ugly in terms of API efficiency: there are two MatrixMode change along with two loads and two translation then a drawarrays for a triangle strip (quad).
Now this produces ~45 FPS.
(I know this is very late, but I couldn't resist. I'll post anyway, in case other people come here looking for advice.)
This has nothing to do with the texture size. I don't know why people rated up Nils. He seems to have a fundamental misunderstanding of the OpenGL pipeline. He seems to think that for a given triangle, the entire texture is loaded and mapped onto that triangle. The opposite is true.
Once the triangle has been mapped into the viewport, it is rasterized. For every on-screen pixel the your triangle covers, the fragment shader is called. The default fragment shader (OpenGL ES 1.1, which you are using) will lookup the texel that most closely maps (GL_NEAREST) to the pixel you are drawing. It might look up 4 texels since you are using the higher quality GL_LINEAR method to average the best texel. Still, if the pixel count in your triangle is, say 100, then the most texture bytes you will have to read is 4(lookups) * 100(pixels) * 4(bytes per color. Far far less than what Nils was saying. It's amazing that he can make it sound like he actually knows what he's talking about.
WRT the tiled architecture, this is common in embedded OpenGL devices to preserve locality of reference. I believe that each tile gets exposed to each drawing operation, quickly culling most of them. Then the tile decides what to draw on itself. This is going to be much slower when you have blending turned on, as you do. Because you are using large triangles that might overlap and blend with other tiles, the GPU has to do a lot of extra work. If, instead of rendering the example square with alpha edges, you were to render an actual shape (instead of a square picture of the shape), then you could turn off blending for this part of the scene and I bet that would speed things up tremendously.
If you want to try it, just turn off blending and see how much things speed up, even if the don't look right. glDisable(GL_BLEND);
Your texture is 512*512*4 bytes per pixel. That's a megabyte of data. If you render it 200 times per frame you generate a bandwidth load of 200 megabytes per frame.
With roughly 4 fps you consume 800mb/second just for texture reads alone. Frame- and Zbuffer writes need bandwidth as well. Then there is the CPU, and don't underestimate the bandwidth requirements of the display as well.
RAM on embedded systems (e.g. your iphone) is not as fast as on a Desktop-PC. What you see here is a bandwidth starvation effect. The RAM simply can't handle the data faster.
How to cure this problem:
pick a sane texture-size. On average you should have 1 texel per pixel. This gives crisp looking textures. I know - it's not always possible. Use common sense.
use mipmaps. This takes up 33% of extra space but allows the graphic chip to pick use a lower resolution mipmap if possible.
Try smaller texture formats. Maybe you can use the ARGB4444 format. This would double the rendering speed. Also take a look at the compressed texture formats. Decompression does not cause a performance drop as it's done in hardware. Infact the opposite is true: Due to the smaller size in memory the graphic chip can read the texture-data faster.
I guess my first try was just a bad (or very good) test.
iPhone has a PowerVR MBX Lite which has a tile based graphics processor. It subdivides the screen into smaller tiles and renders them parallel. Now in the first case above the subdivision might got a bit exhausted because of the very high overlapping. More over, they couldn't be clipped because of the same distance and so all texture coordinates had to calculated (This could be easily tested by changing the translation in the loop).
Also because of the overlapping the parallelism couldn't be exploited and some tiles were sitting doing nothing and the rest (1/3) were working a lot.
So I think, while memory bandwidth could be a bottleneck, this wasn't the case in this example. The problem is more because of how the graphics HW works and the setup of the test.
I'm not familiar with the iPhone, but if it doesn't have dedicated hardware for handling floating point numbers (I suspect it doesn't) then it'd be faster to use integers whenever possible.
I'm currently developing for Android (which uses OpenGL ES as well) and for instance my vertex array is int instead of float. I can't say how much of a difference it makes, but I guess it's worth a try.
Apple is very tight-lipped about the specific hardware specs of the iPhone, which seems very strange to those of us coming from a console background. But people have been able to determine that the CPU is a 32-bit RISC ARM1176JZF. The good news is that it have a full floating-point unit, so we can continue writing math and physics code the way we do in most platforms.
http://gamesfromwithin.com/?p=239