I have a 3D iPhone game done with OpenGL ES.
It's a big world but with some tiny, first-person-view bits I need to paint up close, so I can't reduce the depth range (zNear vs zFar) that glFrustumf() takes any further.
When surfaces meet for a Z-fight, I paint them slightly apart to stop them flickering. I'm also making the camera's distance determine how far apart I adjust them, in cases where this is useful and needed.
It's mostly OK, but there are some things whose perspective suffers by the separation, and making the separation smaller causes flicker. I'd love to paint surfaces closer together.
Is there any way to increase the depth buffer precision, so surfaces can be closer together without a narrower depth range?
If not, is there any other way around this?
I'm still using OpenGL ES 1.1 in the app, but am willing to upgrade if it's worth it.
Thanks for your help.
Here's how I create the depth buffer...
In init method:
// Create default framebuffer object. The backing will be allocated for the current layer in -resizeFromLayer
glGenFramebuffersOES(1, &defaultFramebuffer);
glGenRenderbuffersOES(1, &colorRenderbuffer);
glBindFramebufferOES(GL_FRAMEBUFFER_OES, defaultFramebuffer);
glBindRenderbufferOES(GL_RENDERBUFFER_OES, colorRenderbuffer);
glFramebufferRenderbufferOES(GL_FRAMEBUFFER_OES, GL_COLOR_ATTACHMENT0_OES, GL_RENDERBUFFER_OES, colorRenderbuffer);
//Added depth buffer
glGenRenderbuffersOES(1, &depthRenderbuffer);
glBindRenderbufferOES(GL_RENDERBUFFER_OES, depthRenderbuffer);
glFramebufferRenderbufferOES(GL_FRAMEBUFFER_OES, GL_DEPTH_ATTACHMENT_OES, GL_RENDERBUFFER_OES, depthRenderbuffer);
In resizeFromLayer method:
// Allocate color buffer backing based on the current layer size
glBindRenderbufferOES(GL_RENDERBUFFER_OES, colorRenderbuffer);
[context renderbufferStorage:GL_RENDERBUFFER_OES fromDrawable:layer];
glGetRenderbufferParameterivOES(GL_RENDERBUFFER_OES, GL_RENDERBUFFER_WIDTH_OES, &backingWidth);
glGetRenderbufferParameterivOES(GL_RENDERBUFFER_OES, GL_RENDERBUFFER_HEIGHT_OES, &backingHeight);
//Added depth buffer
glBindRenderbufferOES(GL_RENDERBUFFER_OES, depthRenderbuffer);
glRenderbufferStorageOES(GL_RENDERBUFFER_OES, GL_DEPTH_COMPONENT16_OES, backingWidth, backingHeight);
Here's how I create the frustum...
const GLfloat zNear = 2.2;
const GLfloat zFar = 30000;
const GLfloat fieldOfView = 60.0;
GLfloat size = zNear * tanf(degreesToRadian(fieldOfView) / 2.0);
if (LANDSCAPE) { //for landscape clip & aspect ratio.
//parameters are: left, right, bottom, top, near, far
glFrustumf(-size/(backingWidth/backingHeight),
size/(backingWidth/backingHeight),
-size, size,
zNear, zFar);
}
What worked for ME was to adjust near-far values. The difference between far and near value defines how precise your depth buffer is.
By example. Let's say you have a far of 10000 and a near of 500. That will have a total depth of: 9500
With a 16 bits DepthBuffer you have 65536 possible combinations of depth. (This value is calculated with the geometry differently depending on GPU and OpenGl implementation )
Then you'll have approximately 65536/9500 ~= 7 possible depths for each unit of space. Then you'll have 1/7 ~= .14 of depth precision. If your objects have a distance between them of .14 or less you'll probably get z-fighting.
In real life this is more complex, but the idea is the same.
Maybe your far value is to long and you don't need it. Also increasing the near value helps with z-fighting in objects that are more closer to the camera (the ones that are more visible).
Apparently 32-bit depth buffers aren't supported in OpenGL ES 1.x.
Also, it seems that 16-bit depth buffers aren't supported on iOS, so using GL_DEPTH_COMPONENT16_OES was just behaving as 24-bit, which is why I didn't see any improvement when I used GL_DEPTH_COMPONENT24_OES instead!
I confirmed this by checking GL_DEPTH_BITS after trying to set the depth buffer to 16 bit:
glBindRenderbufferOES(GL_RENDERBUFFER_OES, depthRenderbuffer);
glRenderbufferStorageOES(GL_RENDERBUFFER_OES, GL_DEPTH_COMPONENT16_OES, backingWidth, backingHeight);
GLint depthBufferBits;
glGetIntegerv(GL_DEPTH_BITS, &depthBufferBits );
NSLog(#"Depth buffer bits: %d", depthBufferBits );
Outputs:
Depth buffer bits: 24
Oh well, at least now I know. Hope this helps someone else.
Standard answers revolve around use of glPolygonOffset. What that does is add an offset to the polygon depth values before comparing to those already in the buffer. The offset is calculated allowing for screen depth and angle, so it's independent of the size of your world and it doesn't affect the identities of the pixels to be painted.
The issue is deciding when to use it. If your scene is, say, lots of discrete objects with no unifying broad data structure (like a quadtree or a BSP tree) then you're probably going to have to use something like a bucket system to spot when objects are very close (relative to their distance) and give a bump to the closer. If the problem is internal to individual meshes and you've no higher level structures then obviously the problem is more complicated.
At the other end, if your scene is entirely or overwhelmingly static then a structure like a BSP tree that can do most of the drawing without even needing a depth buffer might be an advantage. At a desperate end you could render back to front with depth writing but no comparisons then do the moving objects as an extra layer; in practice that'll give you massive overdraw (though a PVS solution would help) versus front-to-back with modern early depth culling — especially on a deferred tile based renderer like the PowerVR — so again it's not an easy win.
As a separate idea, is there any way you can simplify distant geometry?
Related
I'm making a 2D videogame. Right now I don't have that many sprites and one texture with no depth buffer works fine. But when I expand to multiple textures I want to use a depth buffer so that I don't have to make multiple passes over the same texture and so that I don't have to organize my textures with respect to any depth constraints.
When I try to get the depth buffer working I can only get a blank screen with the correct clear color. I'm going to explain my working setup without the depth buffer and list questions I have for upgrading to the depth buffer:
Right now my vertices only have position(x,y) and texture(x,y) coords. There is nothing else. No lighting, no normals, no color, etc. Is it correct that the only upgrade I have to make here is to add a z coord to my position?
Right now I am using:
glOrthof(-2, 2, -3, 3, -1, 1);
this works with no depth buffer. But when I add the depth buffer I think I need to change the near and far values. What should I change them to?
Right now for my glTexImage2D() I am using:
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, size.x, size.y, 0, GL_RGBA, GL_UNSIGNED_BYTE, pixels);
when I add the depth buffer do I have to change any of those arguments?
With my call to glClearDepthf();, should I be using one of the near or far values that I use in my call to glOrthof()? which one?
Since your working with 2D and ortho I find that it helps to have a viewport with coordinates that match your resolution, so this will keep things more readable:
CGRect rect = self.view.bounds;
if (ORTHO) {
if (highRes && (retina == 1)) {
glOrthof(0.0, rect.size.width/2, 0.0 , rect.size.height/2, -1, 1000.0);
} else {
glOrthof(0.0, rect.size.width, 0.0 , rect.size.height, -1, 1000.0);
}
glViewport(0, 0, rect.size.width*retina, rect.size.height*retina);
}
Notice that I always use 320x480 coordinates even on retina, this way I can use the same coordinates for both res, and a .5 will give me pixel perfect on retina, but you can go the other way.
Regarding depth I use a -1 to 1000 depth, so I can draw up to -1000 Z.
Make sure you're binding the depth buffer correctly, something like this:
// Need a depth buffer
glGenRenderbuffersOES(1, &depthRenderbuffer);
glBindRenderbufferOES(GL_RENDERBUFFER_OES, depthRenderbuffer);
glRenderbufferStorageOES(GL_RENDERBUFFER_OES, GL_DEPTH_COMPONENT16_OES, framebufferWidth, framebufferHeight);
glFramebufferRenderbufferOES(GL_FRAMEBUFFER_OES, GL_DEPTH_ATTACHMENT_OES, GL_RENDERBUFFER_OES, depthRenderbuffer);
Or your problem can be as simple as using a depth that's behind your camera and lights or bigger than your buffer, try to use a depth between 0 and -1 (-0.5 for ex.), with my glOrthof you can go up to -1000;
EDIT
Values in glOrthof for near and far specify a quantity (distance), not coordinates, this can be confusing when specifying depth values.
When you specify 1000 for the far parameter, what we are actually saying is the far clipping plane is a 1000 units distant from the viewer, the same with the near field, unfortunately specifying a clipping plane behind the viewer will take negative values, which contributes to the confusion.
So when it comes drawing time we have a clipping plane that's 1000 units from the viewer in front (far or into the screen), in terms of coordinates Z is negative when bellow the viewing plane (into the screen), our actually drawing world is between Z = 1 and Z = -1000, being -1000 the farthest we can go with these parameters.
If you arn't going to use an exisiting library lie Cocos2D for example then you will have to write a manager to manage the Depth buffer yourself based on either
Order that they were added to the screen
User Customised Z value so you can swap them around as needed
I read iOS OpenGL ES Logical Buffer Loads that a performance gain can be reached by "discarding" your depth buffer after each draw cycle. I try this, but it's as my game engine is not rendering any longer. I am getting an glError 1286, or GL_INVALID_FRAMEBUFFER_OPERATION_EXT, when I try to render the next cycle.
I get the feeling I need to initialize or setup the depth buffer each cycle if I'm going to discard it, but I can't seem to find any information on this. Here is how I init the depth buffer (all buffers, actually):
// ---- GENERAL INIT ---- //
// Extract width and height.
int bufferWidth, bufferHeight;
glGetRenderbufferParameteriv(GL_RENDERBUFFER,
GL_RENDERBUFFER_WIDTH, &bufferWidth);
glGetRenderbufferParameteriv(GL_RENDERBUFFER,
GL_RENDERBUFFER_HEIGHT, &bufferHeight);
// Create a depth buffer that has the same size as the color buffer.
glGenRenderbuffers(1, &m_depthRenderbuffer);
glBindRenderbuffer(GL_RENDERBUFFER, m_depthRenderbuffer);
glRenderbufferStorage(GL_RENDERBUFFER, GL_DEPTH_COMPONENT24_OES, GAMESTATE->GetViewportSize().x, GAMESTATE->GetViewportSize().y);
// Create the framebuffer object.
GLuint framebuffer;
glGenFramebuffers(1, &framebuffer);
glBindFramebuffer(GL_FRAMEBUFFER, framebuffer);
glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
GL_RENDERBUFFER, m_colorRenderbuffer);
glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT,
GL_RENDERBUFFER, m_depthRenderbuffer);
glBindRenderbuffer(GL_RENDERBUFFER, m_colorRenderbuffer);
And here is what I'm trying to do to discard the depth buffer at the end of each draw cycle:
// Discard the depth buffer
const GLenum discards[] = {GL_DEPTH_ATTACHMENT, GL_COLOR_ATTACHMENT0};
glBindFramebuffer(GL_FRAMEBUFFER, m_depthRenderbuffer);
glDiscardFramebufferEXT(GL_FRAMEBUFFER,1,discards);
I call that immediately following all of my draw calls and...
[m_context presentRenderbuffer:GL_RENDERBUFFER];
Any ideas? Any info someone could point me to? I tried reading through Apple's guide on the subject (where I got the original idea), http://developer.apple.com/library/ios/#documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/WorkingwithEAGLContexts/WorkingwithEAGLContexts.html, but it doesn't seem to work quite right for me.
Your call to glDiscardFramebufferEXT(GL_FRAMEBUFFER,1,discards) is saying that you are discarding just 1 framebuffer attachment, however your discards array includes two: GL_DEPTH_ATTACHMENT and GL_COLOR_ATTACHMENT0.
Try changing it to:
glDiscardFramebufferEXT(GL_FRAMEBUFFER, 2, discards);
In fact, you say that you are discarding these framebuffer attachments at the end of the draw cycle, but directly before [m_context presentRenderbuffer:GL_RENDERBUFFER];. You are discarding the colour renderbuffer attachment that you need in order to present the renderbuffer - perhaps try just discarding the depth attachment, as this is no longer needed at this point.
You only need to initialise your buffers once, not every draw cycle. The glDiscardFramebufferEXT() doesn't actually delete your framebuffer attachment - it is simply a hint to the API to say that the contents of the renderbuffer are not needed in that draw cycle after the discard command completes. From Apple's OpenGL ES Programming Guide for iOS:
A discard operation is defined by the EXT_discard_framebuffer
extension and is available on iOS 4.0 and later. Discard operations
should be omitted when your application is running on earlier versions
of ioS, but included whenever they are available. A discard is a
performance hint to OpenGL ES; it tells OpenGL ES that the contents of
one or more renderbuffers are not used by your application after the
discard command completes. By hinting to OpenGL ES that your
application does not need the contents of a renderbuffer, the data in
the buffers can be discarded or expensive tasks to keep the contents
of those buffers updated can be avoided.
I've got basically a 2d game on the iPhone and I'm trying to set up multiple backgrounds that scroll at different speeds (known as parallax backgrounds).
So my thought was to just stick the backgrounds BEHIND the foreground using different z-coordinate planes, and just make them bigger than the foreground (in size) to accommodate, so that the whole thing can be scrolled (just at a different speed).
And (as far as I know) I basically implemented that. The only problem is that it seems to entirely ignore whatever z-value I give it, or rather it just zeroes all of them. I see the background (I've only tested ONE background so far, to keep it simple...so for now I just have a foreground and I want one background scrolling at a different speed), but it scrolls 1:1 with my foreground, so it obviously doesn't look right, and most of it is cut off (cause it's bigger). And I've tried various z-values for the background and various near/far clipping planes...it's always the same. I'm probably just doing one simple thing wrong, but I can't figure it out. I'm wondering if it has to do with me using only 2 coordinates in glVertexPointer for the foreground? (Of course for the background I AM passing in 3)
I'll post some code:
This is some initial setup:
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrthof(-1.0f, 1.0f, -1.5f, 1.5f, -10.0f, 10.0f);
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
glEnableClientState(GL_VERTEX_ARRAY);
//glEnableClientState(GL_COLOR_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
//transparency
glEnable (GL_BLEND);
glBlendFunc (GL_ONE, GL_ONE_MINUS_SRC_ALPHA);
A little bit about my foreground's float array....it's interleaved. For my foreground it goes vertex x, vertex y, texture x, texture y, repeat. This all works just fine.
This is my FOREGROUND rendering:
glVertexPointer(2, GL_FLOAT, 4*sizeof(GLfloat), texes); <br>
glTexCoordPointer(2, GL_FLOAT, 4*sizeof(GLfloat), (GLvoid*)texes + 2*sizeof(GLfloat)); <br>
glDrawArrays(GL_TRIANGLES, 0, indexCount / 4);
BACKGROUND rendering:
Same drill here except this time it goes vertex x, vertex y, vertex z, texture x, texture y, repeat. Note the z value this time. I did make sure the data in this array was correct while debugging (getting the right z values). And again, it shows up...it's just not going far back in the distance like it should.
glVertexPointer(3, GL_FLOAT, 5*sizeof(GLfloat), b1Texes);
glTexCoordPointer(2, GL_FLOAT, 5*sizeof(GLfloat), (GLvoid*)b1Texes + 3*sizeof(GLfloat));
glDrawArrays(GL_TRIANGLES, 0, b1IndexCount / 5);
And to move my camera, I just do a simple glTranslatef(x, y, 0.0f);
I'm not understanding what I'm doing wrong cause this seems like the most basic 3D function imaginable...things further away are smaller and don't move as fast when the camera moves. Not the case for me. Seems like it should be pretty basic and not even really be affected by my projection and all that (though I've even tried doing glFrustum just for fun, no success). Please help, I feel like it's just one dumb thing. I will post more code if necessary.
Shot in the dark...
You may have to forgotten to setup the Depth-Buffering within the framebuffer initializer.
Copy&Paste from Apple's older EAGLView templates:
glGenRenderbuffersOES(1, &depthRenderbuffer);
glBindRenderbufferOES(GL_RENDERBUFFER_OES, depthRenderbuffer);
glRenderbufferStorageOES(GL_RENDERBUFFER_OES, GL_DEPTH_COMPONENT16_OES, backingWidth, backingHeight);
glFramebufferRenderbufferOES(GL_FRAMEBUFFER_OES, GL_DEPTH_ATTACHMENT_OES, GL_RENDERBUFFER_OES, depthRenderbuffer);
If you are depending of blending you must draw in depth order, meaning draw the furthest (deepest) layer first. Otherwise they will be covered by the layer on top as the z-buffer value is written even though the area is 100% transparent.
See here
I've figured out that I am using orthographic projections which are incapable of displaying things being further away (please correct me if I'm wrong on this). When I tried glFrustum earlier (as I stated in my question), I was doing something wrong with the setup of it. I was using a negative value for the near-clipping value, and I basically got the 1:1 scrolling problem, same as orthographic. But I have changed this to 0.01, and it finally started displaying correctly (backgrounds displayed further away).
My issue is resolved but just as a side idea, I'm now wondering if I can mix orthographic and perspective within the same frame, and what that would require. Because I'd rather keep the foreground very simple and orthographic (2d), but I want my backgrounds to display with the perspective depth.
My idea was something like:
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrthof(-1.0f, 1.0f, -1.5f, 1.5f, -10.0f, 10.0f);
//render foreground
glLoadIdentity();
glFrustum(-1.0f, 1.0f, -1.5f, 1.5f, 0.01f, 1000.0f);
//render backgrounds
I will play around with this and comment with my results, in case anyone is curious. Feedback on this would be appreciated, though technically I have no pressing need on this issue anymore (from here on out it would just be idea discussion).
I'm testing my simple OpenGL ES implementation (a 2D game) on the iPhone and I notice a high render utilization while using the profiler. These are the facts:
I'm displaying only one preloaded large texture (512x512 pixels) at 60fps and the render utilization is around 40%.
My texture is blended using GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA, the only GL function I'm using.
I've tried to make the texture smaller and tiling it, which made no difference.
I'm using a PNG texture atlas of 1024x1024 pixels
I find it very strange that this one texture is causing such an intense GPU usage.
Is this to be expected? What am I doing wrong?
EDIT: My code:
// OpenGL setup is identical to OpenGL ES template
// initState is called to setup
// timer is initialized, drawView is called by the timer
- (void) initState
{
//usual init declarations have been omitted here
glEnable(GL_BLEND);
glBlendFunc(GL_ONE,GL_ONE_MINUS_SRC_ALPHA);
glEnableClientState (GL_VERTEX_ARRAY);
glVertexPointer (2,GL_FLOAT,sizeof(Vertex),&allVertices[0].x);
glEnableClientState (GL_TEXTURE_COORD_ARRAY);
glTexCoordPointer (2,GL_FLOAT,sizeof(Vertex),&allVertices[0].tx);
glEnableClientState (GL_COLOR_ARRAY);
glColorPointer (4,GL_UNSIGNED_BYTE,sizeof(Vertex),&allVertices[0].r);
}
- (void) drawView
{
[EAGLContext setCurrentContext:context];
glBindFramebufferOES(GL_FRAMEBUFFER_OES, viewFramebuffer);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
GLfloat width = backingWidth /2.f;
GLfloat height = backingHeight/2.f;
glOrthof(-width, width, -height, height, -1.f, 1.f);
glMatrixMode(GL_MODELVIEW);
glClearColor(0.f, 0.f, 0.f, 1.f);
glClear(GL_COLOR_BUFFER_BIT);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
glBindRenderbufferOES(GL_RENDERBUFFER_OES, viewRenderbuffer);
[context presentRenderbuffer:GL_RENDERBUFFER_OES];
[self checkGLError];
}
EDIT: I've made a couple of improvements, but none managed to lower the render utilization. I've divided the texture in parts of 32x32, changed the type of the coordinates and texture coordinates from GLfloat to GLshort and added extra vertices for degenerative triangles.
The updates are:
initState:
(vertex and texture pointer are now GL_SHORT)
glMatrixMode(GL_TEXTURE);
glScalef(1.f / 1024.f, 1.f / 1024.f, 1.f / 1024.f);
glMatrixMode(GL_MODELVIEW);
glScalef(1.f / 16.f, 1.f/ 16.f, 1.f/ 16.f);
drawView:
glDrawArrays(GL_TRIANGLE_STRIP, 0, 1536); //(16*16 parts * 6 vertices)
I'm writing an app which displays five 512x512 textures on top of each other in a 2D environment using GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA, and I can get about 14fps. Do you really need 60fps? For a game, I'd think 24-30 would be fine. Also, use PVR texture compression if at all possible. There's an example that does it included with the SDK.
I hope you didn't forget to disable GL_BLEND when you don't need it already.
You can make an attempt at memory bandwidth optimization - use 16 bpp formats or PVRTC. IMHO with your texture size texture cache doesn't help at all.
Don't forget that your framebuffer is being used as texture by iPhone UI. If it is created as 32 bit RGBA it will be alpha-blended one more time. For optimal performance 16 bit 565 framebuffers are the best (but graphics quality suffers).
I don't know all the details, such as cache size, but, I suppose, textures pixels are already swizzled when uploaded into video memory and triangles are split by PVR tile engine. Therefore your own splitting appears to be redundant.
And finally. This is only a mobile low-power GPU, not designed for huge screens and high fillrates. Alpha-blending is costly, maybe 3-4 times difference on PowerVR chips.
Read this post.
512x512 is probably a little over optimistic for the iPhone to deal with.
EDIT:
I assume you have already read this, but if not check Apples guide to optimal OpenGl ES performance on iPhone.
What is exactly is the problem?
You're getting your 60fps, which is silky smooth.
Who cares if render utilization is 40%?
The issue could be because of the iPhone's texture cache size. It may simply come down to how much of the texture is on each individual triangle, quad, or tristrip, depending on how you're setting state.
Try this: subdivide your quad and repeat your tests. So if you're 1 quad, make it 4. Then 16. and so on, and see if that helps. The key is to reduce the actual number of pixels that each primitive references.
When the texture cache gets blown, then the hardware will thrash texture lookups from main memory into whatever vram is set aside for the texture buffer for each pixel. This can kill performance mighty quick.
OR - I am completely wrong because I really don't know the iPhone hardware, and I also know that the PVR chip is a strange beast in comparison to what I'm used to (PS2, PSP). Still it's an easy test to try and I'm curious if it helps.
I've slightly modified the iPhone SDK's GLSprite example while learning OpenGL ES and it turns out to be quite slow. Even in the simulator (on the hw worst) so I must be doing something wrong since it's only 400 textured triangles.
const GLfloat spriteVertices[] = {
0.0f, 0.0f,
100.0f, 0.0f,
0.0f, 100.0f,
100.0f, 100.0f
};
const GLshort spriteTexcoords[] = {
0,0,
1,0,
0,1,
1,1
};
- (void)setupView {
glViewport(0, 0, backingWidth, backingHeight);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrthof(0.0f, backingWidth, backingHeight,0.0f, -10.0f, 10.0f);
glMatrixMode(GL_MODELVIEW);
glClearColor(0.3f, 0.0f, 0.0f, 1.0f);
glVertexPointer(2, GL_FLOAT, 0, spriteVertices);
glEnableClientState(GL_VERTEX_ARRAY);
glTexCoordPointer(2, GL_SHORT, 0, spriteTexcoords);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
// sprite data is preloaded. 512x512 rgba8888
glGenTextures(1, &spriteTexture);
glBindTexture(GL_TEXTURE_2D, spriteTexture);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, spriteData);
free(spriteData);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glEnable(GL_TEXTURE_2D);
glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC_ALPHA);
glEnable(GL_BLEND);
}
- (void)drawView {
..
glClear(GL_COLOR_BUFFER_BIT);
glLoadIdentity();
glTranslatef(tx-100, ty-100,10);
for (int i=0; i<200; i++) {
glTranslatef(1, 1, 0);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
}
..
}
drawView is called every time the screen is touched or the finger on the screen is moved and tx,ty are set to the x,y coordinates where that touch happened.
I've also tried using GLBuffer, when translation was pre-generated and there was only one DrawArray but gave the same performance (~4 FPS).
===EDIT===
Meanwhile I've modified this so that much smaller quads are used (sized: 34x20) and much less overlapping is done. There are ~400 quads->800 triangles spread on the whole screen. Texture size is 512x512 atlas and RGBA_8888 while the texture coordinates are in float.
The code is very ugly in terms of API efficiency: there are two MatrixMode change along with two loads and two translation then a drawarrays for a triangle strip (quad).
Now this produces ~45 FPS.
(I know this is very late, but I couldn't resist. I'll post anyway, in case other people come here looking for advice.)
This has nothing to do with the texture size. I don't know why people rated up Nils. He seems to have a fundamental misunderstanding of the OpenGL pipeline. He seems to think that for a given triangle, the entire texture is loaded and mapped onto that triangle. The opposite is true.
Once the triangle has been mapped into the viewport, it is rasterized. For every on-screen pixel the your triangle covers, the fragment shader is called. The default fragment shader (OpenGL ES 1.1, which you are using) will lookup the texel that most closely maps (GL_NEAREST) to the pixel you are drawing. It might look up 4 texels since you are using the higher quality GL_LINEAR method to average the best texel. Still, if the pixel count in your triangle is, say 100, then the most texture bytes you will have to read is 4(lookups) * 100(pixels) * 4(bytes per color. Far far less than what Nils was saying. It's amazing that he can make it sound like he actually knows what he's talking about.
WRT the tiled architecture, this is common in embedded OpenGL devices to preserve locality of reference. I believe that each tile gets exposed to each drawing operation, quickly culling most of them. Then the tile decides what to draw on itself. This is going to be much slower when you have blending turned on, as you do. Because you are using large triangles that might overlap and blend with other tiles, the GPU has to do a lot of extra work. If, instead of rendering the example square with alpha edges, you were to render an actual shape (instead of a square picture of the shape), then you could turn off blending for this part of the scene and I bet that would speed things up tremendously.
If you want to try it, just turn off blending and see how much things speed up, even if the don't look right. glDisable(GL_BLEND);
Your texture is 512*512*4 bytes per pixel. That's a megabyte of data. If you render it 200 times per frame you generate a bandwidth load of 200 megabytes per frame.
With roughly 4 fps you consume 800mb/second just for texture reads alone. Frame- and Zbuffer writes need bandwidth as well. Then there is the CPU, and don't underestimate the bandwidth requirements of the display as well.
RAM on embedded systems (e.g. your iphone) is not as fast as on a Desktop-PC. What you see here is a bandwidth starvation effect. The RAM simply can't handle the data faster.
How to cure this problem:
pick a sane texture-size. On average you should have 1 texel per pixel. This gives crisp looking textures. I know - it's not always possible. Use common sense.
use mipmaps. This takes up 33% of extra space but allows the graphic chip to pick use a lower resolution mipmap if possible.
Try smaller texture formats. Maybe you can use the ARGB4444 format. This would double the rendering speed. Also take a look at the compressed texture formats. Decompression does not cause a performance drop as it's done in hardware. Infact the opposite is true: Due to the smaller size in memory the graphic chip can read the texture-data faster.
I guess my first try was just a bad (or very good) test.
iPhone has a PowerVR MBX Lite which has a tile based graphics processor. It subdivides the screen into smaller tiles and renders them parallel. Now in the first case above the subdivision might got a bit exhausted because of the very high overlapping. More over, they couldn't be clipped because of the same distance and so all texture coordinates had to calculated (This could be easily tested by changing the translation in the loop).
Also because of the overlapping the parallelism couldn't be exploited and some tiles were sitting doing nothing and the rest (1/3) were working a lot.
So I think, while memory bandwidth could be a bottleneck, this wasn't the case in this example. The problem is more because of how the graphics HW works and the setup of the test.
I'm not familiar with the iPhone, but if it doesn't have dedicated hardware for handling floating point numbers (I suspect it doesn't) then it'd be faster to use integers whenever possible.
I'm currently developing for Android (which uses OpenGL ES as well) and for instance my vertex array is int instead of float. I can't say how much of a difference it makes, but I guess it's worth a try.
Apple is very tight-lipped about the specific hardware specs of the iPhone, which seems very strange to those of us coming from a console background. But people have been able to determine that the CPU is a 32-bit RISC ARM1176JZF. The good news is that it have a full floating-point unit, so we can continue writing math and physics code the way we do in most platforms.
http://gamesfromwithin.com/?p=239