I'm currently working on a project to convert a physics simulation to a video on the iPhone itself.
To do this, I'm presently using two different loops. The first loop runs in the block where the AVAssetWriterInput object polls the EAGLView for more images. The EAGLView provides the images from an array where they are stored.
The other loop is the actual simulation. I've turned off the simulation timer, and am calling the tick myself with a pre-specified time difference every time. Everytime a tick gets called, I create a new image in EAGLView's swap buffers method after the buffers have been swapped. This image is then placed in the array that AVAssetWriter polls.
There is also some miscellaneous code to make sure the array doesn't get too big
All of this works fine, but is very very slow.
Is there something I'm doing that is, conceptually, causing the entire process to be slower than it could be? Also, does anyone know of a faster way to get an image out of Open GL than glReadPixels?
Video memory is designed so, that it's fast for writing and slow for reading. That's why I perform rendering to texture. Here is the entire method that I've created for rendering the scene to texture (there are some custom containers, but I think it's pretty straightforward to replace them with your own):
-(TextureInf*) makeSceneSnapshot {
// create texture frame buffer
GLuint textureFrameBuffer, sceneRenderTexture;
glGenFramebuffersOES(1, &textureFrameBuffer);
glBindFramebufferOES(GL_FRAMEBUFFER_OES, textureFrameBuffer);
// create texture to render scene to
glGenTextures(1, &sceneRenderTexture);
glBindTexture(GL_TEXTURE_2D, sceneRenderTexture);
// create TextureInf object
TextureInf* new_texture = new TextureInf();
new_texture->setTextureID(sceneRenderTexture);
new_texture->real_width = [self viewportWidth];
new_texture->real_height = [self viewportHeight];
//make sure the texture dimensions are power of 2
new_texture->width = cast_to_power(new_texture->real_width, 2);
new_texture->height = cast_to_power(new_texture->real_height, 2);
//AABB2 = axis aligned bounding box (2D)
AABB2 tex_box;
tex_box.p1.x = 1 - (GLfloat)new_texture->real_width / (GLfloat)new_texture->width;
tex_box.p1.y = 0;
tex_box.p2.x = 1;
tex_box.p2.y = (GLfloat)new_texture->real_height / (GLfloat)new_texture->height;
new_texture->setTextureBox(tex_box);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, new_texture->width, new_texture->height, 0, GL_RGBA, GL_UNSIGNED_BYTE, NULL);
glFramebufferTexture2DOES(GL_FRAMEBUFFER_OES, GL_COLOR_ATTACHMENT0_OES, GL_TEXTURE_2D, sceneRenderTexture, 0);
// check for completness
if(glCheckFramebufferStatusOES(GL_FRAMEBUFFER_OES) != GL_FRAMEBUFFER_COMPLETE_OES) {
new_texture->release();
#throw [NSException exceptionWithName: EXCEPTION_NAME
reason: [NSString stringWithFormat: #"failed to make complete framebuffer object %x", glCheckFramebufferStatusOES(GL_FRAMEBUFFER_OES)]
userInfo: nil];
new_texture = nil;
} else {
// render to texture
[self renderOneFrame];
}
glDeleteFramebuffersOES(1, &textureFrameBuffer);
//restore default frame and render buffers
glBindFramebufferOES(GL_FRAMEBUFFER_OES, _defaultFramebuffer);
glBindRenderbufferOES(GL_RENDERBUFFER_OES, _colorRenderbuffer);
glEnable(GL_BLEND);
[self updateViewport];
glMatrixMode(GL_MODELVIEW);
return new_texture;
}
Of course, if you're doing snapshots all the time, then you'd better create texture frame and render buffers only once (and allocate memory for them).
One thing to remember is that the GPU is running asynchronously from the CPU, so if you try to do glReadPixels immediately after you finish rendering, you'll have to wait for commands to be flushed to the GPU and rendered before you can read them back.
Instead of waiting synchronously, render snapshots into a queue of textures (using FBOs like Max mentioned). Wait until you've rendered a couple more frames before you deque one of the previous frames. I don't know if the iPhone supports fences or sync objects, but if so you could check those to see if rendering has finished before reading the pixels.
You could try using a CADisplayLink object to ensure that your drawing rate and your capture rate correspond to the device's screen refresh rate. You might be slowing down the execution time of the run loop by refreshing and capturing too many times per device screen refresh.
Depending on your app's goals, it might not be necessary for you to capture every frame that you present, so in your selector, you could choose whether or not to capture the current frame.
While the question isn't new, it's not answered yet so I thought I'd pitch in.
glReadPixels is indeed very slow, and therefore cannot be used to record video from an OpenGL-application without adversly affecting performance.
We did find a workaround, and have created a free SDK called Everyplay that can record OpenGL-based graphics to a video file, without performance loss. You can check it out at https://developers.everyplay.com/
Related
Is there any faster way to access the frame buffer than using glReadPixels? I would need read-only access to a small rectangular rendering area in the frame buffer to process the data further in CPU. Performance is important because I have to perform this operation repeatedly. I have searched the web and found some approach like using Pixel Buffer Object and glMapBuffer but it seems that OpenGL ES 2.0 does not support them.
As of iOS 5.0, there is now a faster way to grab data from OpenGL ES. It isn't readily apparent, but it turns out that the texture cache support added in iOS 5.0 doesn't just work for fast upload of camera frames to OpenGL ES, but it can be used in reverse to get quick access to the raw pixels within an OpenGL ES texture.
You can take advantage of this to grab the pixels for an OpenGL ES rendering by using a framebuffer object (FBO) with an attached texture, with that texture having been supplied from the texture cache. Once you render your scene into that FBO, the BGRA pixels for that scene will be contained within your CVPixelBufferRef, so there will be no need to pull them down using glReadPixels().
This is much, much faster than using glReadPixels() in my benchmarks. I found that on my iPhone 4, glReadPixels() was the bottleneck in reading 720p video frames for encoding to disk. It limited the encoding from taking place at anything more than 8-9 FPS. Replacing this with the fast texture cache reads allows me to encode 720p video at 20 FPS now, and the bottleneck has moved from the pixel reading to the OpenGL ES processing and actual movie encoding parts of the pipeline. On an iPhone 4S, this allows you to write 1080p video at a full 30 FPS.
My implementation can be found within the GPUImageMovieWriter class within my open source GPUImage framework, but it was inspired by Dennis Muhlestein's article on the subject and Apple's ChromaKey sample application (which was only made available at WWDC 2011).
I start by configuring my AVAssetWriter, adding an input, and configuring a pixel buffer input. The following code is used to set up the pixel buffer input:
NSDictionary *sourcePixelBufferAttributesDictionary = [NSDictionary dictionaryWithObjectsAndKeys: [NSNumber numberWithInt:kCVPixelFormatType_32BGRA], kCVPixelBufferPixelFormatTypeKey,
[NSNumber numberWithInt:videoSize.width], kCVPixelBufferWidthKey,
[NSNumber numberWithInt:videoSize.height], kCVPixelBufferHeightKey,
nil];
assetWriterPixelBufferInput = [AVAssetWriterInputPixelBufferAdaptor assetWriterInputPixelBufferAdaptorWithAssetWriterInput:assetWriterVideoInput sourcePixelBufferAttributes:sourcePixelBufferAttributesDictionary];
Once I have that, I configure the FBO that I'll be rendering my video frames to, using the following code:
if ([GPUImageOpenGLESContext supportsFastTextureUpload])
{
CVReturn err = CVOpenGLESTextureCacheCreate(kCFAllocatorDefault, NULL, (__bridge void *)[[GPUImageOpenGLESContext sharedImageProcessingOpenGLESContext] context], NULL, &coreVideoTextureCache);
if (err)
{
NSAssert(NO, #"Error at CVOpenGLESTextureCacheCreate %d");
}
CVPixelBufferPoolCreatePixelBuffer (NULL, [assetWriterPixelBufferInput pixelBufferPool], &renderTarget);
CVOpenGLESTextureRef renderTexture;
CVOpenGLESTextureCacheCreateTextureFromImage (kCFAllocatorDefault, coreVideoTextureCache, renderTarget,
NULL, // texture attributes
GL_TEXTURE_2D,
GL_RGBA, // opengl format
(int)videoSize.width,
(int)videoSize.height,
GL_BGRA, // native iOS format
GL_UNSIGNED_BYTE,
0,
&renderTexture);
glBindTexture(CVOpenGLESTextureGetTarget(renderTexture), CVOpenGLESTextureGetName(renderTexture));
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, CVOpenGLESTextureGetName(renderTexture), 0);
}
This pulls a pixel buffer from the pool associated with my asset writer input, creates and associates a texture with it, and uses that texture as a target for my FBO.
Once I've rendered a frame, I lock the base address of the pixel buffer:
CVPixelBufferLockBaseAddress(pixel_buffer, 0);
and then simply feed it into my asset writer to be encoded:
CMTime currentTime = CMTimeMakeWithSeconds([[NSDate date] timeIntervalSinceDate:startTime],120);
if(![assetWriterPixelBufferInput appendPixelBuffer:pixel_buffer withPresentationTime:currentTime])
{
NSLog(#"Problem appending pixel buffer at time: %lld", currentTime.value);
}
else
{
// NSLog(#"Recorded pixel buffer at time: %lld", currentTime.value);
}
CVPixelBufferUnlockBaseAddress(pixel_buffer, 0);
if (![GPUImageOpenGLESContext supportsFastTextureUpload])
{
CVPixelBufferRelease(pixel_buffer);
}
Note that at no point here am I reading anything manually. Also, the textures are natively in BGRA format, which is what AVAssetWriters are optimized to use when encoding video, so there's no need to do any color swizzling here. The raw BGRA pixels are just fed into the encoder to make the movie.
Aside from the use of this in an AVAssetWriter, I have some code in this answer that I've used for raw pixel extraction. It also experiences a significant speedup in practice when compared to using glReadPixels(), although less than I see with the pixel buffer pool I use with AVAssetWriter.
It's a shame that none of this is documented anywhere, because it provides a huge boost to video capture performance.
Regarding what atisman mentioned about the black screen, I had that issue as well. Do really make sure everything is fine with your texture and other settings. I was trying to capture AIR's OpenGL layer, which I did in the end, the problem was that when I didn't set "depthAndStencil" to true by accident in the apps manifest, my FBO texture was half in height(the screen was divided in half and mirrored, I guess because of the wrap texture param stuff). And my video was black.
That was pretty frustrating, as based on what Brad is posting it should have just worked once I had some data in texture. Unfortunately, that's not the case, everything has to be "right" for it to work - data in texture is not a guarantee for seeing equal data in the video. Once I added depthAndStencil my texture fixed itself to full height and I started to get video recording straight from AIR's OpenGL layer, no glReadPixels or anything :)
So yeah, what Brad describes really DOES work without the need to recreate the buffers on every frame, you just need to make sure your setup is right. If you're getting blackness, try playing with the video/texture sizes perhaps or some other settings (setup of your FBO?).
I read iOS OpenGL ES Logical Buffer Loads that a performance gain can be reached by "discarding" your depth buffer after each draw cycle. I try this, but it's as my game engine is not rendering any longer. I am getting an glError 1286, or GL_INVALID_FRAMEBUFFER_OPERATION_EXT, when I try to render the next cycle.
I get the feeling I need to initialize or setup the depth buffer each cycle if I'm going to discard it, but I can't seem to find any information on this. Here is how I init the depth buffer (all buffers, actually):
// ---- GENERAL INIT ---- //
// Extract width and height.
int bufferWidth, bufferHeight;
glGetRenderbufferParameteriv(GL_RENDERBUFFER,
GL_RENDERBUFFER_WIDTH, &bufferWidth);
glGetRenderbufferParameteriv(GL_RENDERBUFFER,
GL_RENDERBUFFER_HEIGHT, &bufferHeight);
// Create a depth buffer that has the same size as the color buffer.
glGenRenderbuffers(1, &m_depthRenderbuffer);
glBindRenderbuffer(GL_RENDERBUFFER, m_depthRenderbuffer);
glRenderbufferStorage(GL_RENDERBUFFER, GL_DEPTH_COMPONENT24_OES, GAMESTATE->GetViewportSize().x, GAMESTATE->GetViewportSize().y);
// Create the framebuffer object.
GLuint framebuffer;
glGenFramebuffers(1, &framebuffer);
glBindFramebuffer(GL_FRAMEBUFFER, framebuffer);
glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
GL_RENDERBUFFER, m_colorRenderbuffer);
glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT,
GL_RENDERBUFFER, m_depthRenderbuffer);
glBindRenderbuffer(GL_RENDERBUFFER, m_colorRenderbuffer);
And here is what I'm trying to do to discard the depth buffer at the end of each draw cycle:
// Discard the depth buffer
const GLenum discards[] = {GL_DEPTH_ATTACHMENT, GL_COLOR_ATTACHMENT0};
glBindFramebuffer(GL_FRAMEBUFFER, m_depthRenderbuffer);
glDiscardFramebufferEXT(GL_FRAMEBUFFER,1,discards);
I call that immediately following all of my draw calls and...
[m_context presentRenderbuffer:GL_RENDERBUFFER];
Any ideas? Any info someone could point me to? I tried reading through Apple's guide on the subject (where I got the original idea), http://developer.apple.com/library/ios/#documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/WorkingwithEAGLContexts/WorkingwithEAGLContexts.html, but it doesn't seem to work quite right for me.
Your call to glDiscardFramebufferEXT(GL_FRAMEBUFFER,1,discards) is saying that you are discarding just 1 framebuffer attachment, however your discards array includes two: GL_DEPTH_ATTACHMENT and GL_COLOR_ATTACHMENT0.
Try changing it to:
glDiscardFramebufferEXT(GL_FRAMEBUFFER, 2, discards);
In fact, you say that you are discarding these framebuffer attachments at the end of the draw cycle, but directly before [m_context presentRenderbuffer:GL_RENDERBUFFER];. You are discarding the colour renderbuffer attachment that you need in order to present the renderbuffer - perhaps try just discarding the depth attachment, as this is no longer needed at this point.
You only need to initialise your buffers once, not every draw cycle. The glDiscardFramebufferEXT() doesn't actually delete your framebuffer attachment - it is simply a hint to the API to say that the contents of the renderbuffer are not needed in that draw cycle after the discard command completes. From Apple's OpenGL ES Programming Guide for iOS:
A discard operation is defined by the EXT_discard_framebuffer
extension and is available on iOS 4.0 and later. Discard operations
should be omitted when your application is running on earlier versions
of ioS, but included whenever they are available. A discard is a
performance hint to OpenGL ES; it tells OpenGL ES that the contents of
one or more renderbuffers are not used by your application after the
discard command completes. By hinting to OpenGL ES that your
application does not need the contents of a renderbuffer, the data in
the buffers can be discarded or expensive tasks to keep the contents
of those buffers updated can be avoided.
I have an "ImageManipulator" class that performs some cropping, resizing and rotating of camera images on the iPhone.
At the moment, everything works as expected but I keep getting a few huge spikes in memory consumption which occasionally cause the app to crash.
I have managed to isolate the problem to a part of the code where I check for the current image orientation property and rotate it accordingly to UIImageOrientationUp. I then get the image from the bitmap context and save it to disk.
This is currently what I am doing:
CGAffineTransform transform = CGAffineTransformIdentity;
// Check for orientation and set transform accordingly...
transform = CGAffineTransformTranslate(transform, self.size.width, 0);
transform = CGAffineTransformScale(transform, -1, 1);
// Create a bitmap context with the image that was passed so we can perform the rotation
CGContextRef ctx = CGBitmapContextCreate(NULL, self.size.width, self.size.height,
CGImageGetBitsPerComponent(self.CGImage), 0,
CGImageGetColorSpace(self.CGImage),
CGImageGetBitmapInfo(self.CGImage));
// Rotate the context
CGContextConcatCTM(ctx, transform);
// Draw the image into the context
CGContextDrawImage(ctx, CGRectMake(0,0,self.size.height,self.size.width), self.CGImage);
// Grab the bitmap context and save it to the disk...
Even after trying to scale the image down to half or even 1/4 of the size, I am still seeing the spikes to I am wondering if there is a different / more efficient way to get the rotation done as above?
Thanks in advance for the replies.
Rog
If you are saving to JPEG, I guess an alternative approach is to save the image as-is and then set the rotation to whatever you'd like by manipulating the EXIF metadata? See for example this post. Simple but probably effective, even if you have to hold the image payload bytes in memory ;)
Things you can do:
Scale down the image even more (which you probably don't want)
Remember to release everything as soon as you finish with it
Live with it
I would choose option 2 and 3.
Image editing is very resource intensive, as it loads the entire raw uncompressed image data into the memory for processing. This is inevitable as there is absolutely no other way to modify an image other than to load the complete raw data into the memory. Having memory consumption spikes doesn't really matter unless the app receives a memory warning, in that case quickly get rid of everything before it crashes. It is very rare that you would get a memory warning, though, because my app regularly loads a single > 10 mb file into the memory and I don't get a warning, even on older devices. So you'll be fine with the spikes.
Have you tried checking for memory leaks and analyzing allocations?
If the image is still too big, try rotating the image in pieces instead of as a whole.
As Anomie mentioned, CGBitmapContextCreate creates a context. We should release that by using
CGContextRelease(ctx);
If you have any other objects created using create or copy, that should also be released. If it is CFData, then
CFRelease(cfdata);
I'm working in an openGL for iPhone , and although everything works great I have to wait about a second in certain sections of the game which use a ton of sprite sheets. It's there any way to create a loading screen for such sections ?, or any way to know if a certain texture has finished loading with openGL?
EDIT:
I load my textures with this function:
-(void)loadTexture:(NSString*)nombre {
CGImageRef textureImage = [UIImage imageNamed:nombre].CGImage;
if (textureImage == nil) {
NSLog(#"Failed to load texture image");
return;
}
textureWidth = NextPowerOfTwo(CGImageGetWidth(textureImage));
textureHeight = NextPowerOfTwo(CGImageGetHeight(textureImage));
imageSizeX= CGImageGetWidth(textureImage);
imageSizeY= CGImageGetHeight(textureImage);
GLubyte *textureData = (GLubyte *)calloc(1,textureWidth * textureHeight * 4); // Por 4 pues cada pixel necesita 4 bytes, RGBA
CGContextRef textureContext = CGBitmapContextCreate(textureData, textureWidth,textureHeight,8, textureWidth * 4,CGImageGetColorSpace(textureImage),kCGImageAlphaPremultipliedLast );
CGContextDrawImage(textureContext, CGRectMake(0.0, 0.0, (float)textureWidth, (float)textureHeight), textureImage);
CGContextRelease(textureContext);
glGenTextures(1, &textures[0]);
glBindTexture(GL_TEXTURE_2D, textures[0]);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, textureWidth, textureHeight, 0, GL_RGBA, GL_UNSIGNED_BYTE, textureData);
free(textureData);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameterf(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
}
I usually load the textures i need deppending on the section of the game, for example, one or two 1024x1024 textures which serve me as sprite sheets are enough for most levels, but in certain levels, like boss battles for example, I load about 5 or 6 textures ( the boss is big and has a ton of different attacks) and the game takes about 2 or 3 secs to load all those textures at once.
As Till explained, since you are loading the texture synchronously, it will be loaded once "loadTexture" is done.
Rendering to the screen in another thread via OpenGL while loading via OpenGL can become really messy real quick.
A - quite often used - hack is to update the screen between texture loads:
render 0% screen
loadTexture
render 10% screen
loadTexture
repeat as needed
With a little bit of extra work it's easy to generate a small table that has timing information about the loading, so the "x%" is close to the truth ;)
We usually automate this step and before shipping a new version do a "load through" run, in which we log when what resource is loaded and how long it takes.
Since you are loading the texture synchronously, it is rather obvious that the loading of that texture is done as soon as your loadTexture-method is done.
EDIT / ADDITIONS:
There is a couple of things you should avoid / change in your code. Not that this would really solve your actual problem, just a general advice....
1st: Do not use imageNamed for loading a texture image - that is generally a bad idea. Use imageWithContentsOfFile as that does not hog the image cache with data that you wont be reusing anyways.
2nd: Once a texture is loaded, there is a term that Apple calls "warming the texture" - which basically is using that texture briefly for a quick and dirty (even offscreen) rendering. That way you can be sure that the texture will be fully available and no extra penalties are imposed when doing the first "real" rendering. We are talking milliseconds of penalty - so no biggy but noticeable.
3rd: Try to shift the texture loading to a point where the application is idle anyways - eg. waiting for user input in the startup screen - but refrain from doing that in the applicationDidFinishLaunching method.
Here comes a question to you - why do you load the textures while rendering / inside the level. Why not preloading everything you possibly need? You can use up to 24mb of texture memory without imposing any penalties (well, minus the fbo memory).
Alternatively, you could simply display a UIImageView animation while the textures are loading. This has the special benefit of drawing in a separate threat... this is a special case, just like the UIActivityIndicatorView (spinner). I'm not sure how Apple is achieving this technically, but it works great.
The only catch is that you need to keep your loading animation frames in separate files, rather than in a sprite sheet.
http://developer.apple.com/library/ios/#documentation/uikit/reference/UIImageView_Class/Reference/Reference.html
I'm testing my simple OpenGL ES implementation (a 2D game) on the iPhone and I notice a high render utilization while using the profiler. These are the facts:
I'm displaying only one preloaded large texture (512x512 pixels) at 60fps and the render utilization is around 40%.
My texture is blended using GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA, the only GL function I'm using.
I've tried to make the texture smaller and tiling it, which made no difference.
I'm using a PNG texture atlas of 1024x1024 pixels
I find it very strange that this one texture is causing such an intense GPU usage.
Is this to be expected? What am I doing wrong?
EDIT: My code:
// OpenGL setup is identical to OpenGL ES template
// initState is called to setup
// timer is initialized, drawView is called by the timer
- (void) initState
{
//usual init declarations have been omitted here
glEnable(GL_BLEND);
glBlendFunc(GL_ONE,GL_ONE_MINUS_SRC_ALPHA);
glEnableClientState (GL_VERTEX_ARRAY);
glVertexPointer (2,GL_FLOAT,sizeof(Vertex),&allVertices[0].x);
glEnableClientState (GL_TEXTURE_COORD_ARRAY);
glTexCoordPointer (2,GL_FLOAT,sizeof(Vertex),&allVertices[0].tx);
glEnableClientState (GL_COLOR_ARRAY);
glColorPointer (4,GL_UNSIGNED_BYTE,sizeof(Vertex),&allVertices[0].r);
}
- (void) drawView
{
[EAGLContext setCurrentContext:context];
glBindFramebufferOES(GL_FRAMEBUFFER_OES, viewFramebuffer);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
GLfloat width = backingWidth /2.f;
GLfloat height = backingHeight/2.f;
glOrthof(-width, width, -height, height, -1.f, 1.f);
glMatrixMode(GL_MODELVIEW);
glClearColor(0.f, 0.f, 0.f, 1.f);
glClear(GL_COLOR_BUFFER_BIT);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
glBindRenderbufferOES(GL_RENDERBUFFER_OES, viewRenderbuffer);
[context presentRenderbuffer:GL_RENDERBUFFER_OES];
[self checkGLError];
}
EDIT: I've made a couple of improvements, but none managed to lower the render utilization. I've divided the texture in parts of 32x32, changed the type of the coordinates and texture coordinates from GLfloat to GLshort and added extra vertices for degenerative triangles.
The updates are:
initState:
(vertex and texture pointer are now GL_SHORT)
glMatrixMode(GL_TEXTURE);
glScalef(1.f / 1024.f, 1.f / 1024.f, 1.f / 1024.f);
glMatrixMode(GL_MODELVIEW);
glScalef(1.f / 16.f, 1.f/ 16.f, 1.f/ 16.f);
drawView:
glDrawArrays(GL_TRIANGLE_STRIP, 0, 1536); //(16*16 parts * 6 vertices)
I'm writing an app which displays five 512x512 textures on top of each other in a 2D environment using GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA, and I can get about 14fps. Do you really need 60fps? For a game, I'd think 24-30 would be fine. Also, use PVR texture compression if at all possible. There's an example that does it included with the SDK.
I hope you didn't forget to disable GL_BLEND when you don't need it already.
You can make an attempt at memory bandwidth optimization - use 16 bpp formats or PVRTC. IMHO with your texture size texture cache doesn't help at all.
Don't forget that your framebuffer is being used as texture by iPhone UI. If it is created as 32 bit RGBA it will be alpha-blended one more time. For optimal performance 16 bit 565 framebuffers are the best (but graphics quality suffers).
I don't know all the details, such as cache size, but, I suppose, textures pixels are already swizzled when uploaded into video memory and triangles are split by PVR tile engine. Therefore your own splitting appears to be redundant.
And finally. This is only a mobile low-power GPU, not designed for huge screens and high fillrates. Alpha-blending is costly, maybe 3-4 times difference on PowerVR chips.
Read this post.
512x512 is probably a little over optimistic for the iPhone to deal with.
EDIT:
I assume you have already read this, but if not check Apples guide to optimal OpenGl ES performance on iPhone.
What is exactly is the problem?
You're getting your 60fps, which is silky smooth.
Who cares if render utilization is 40%?
The issue could be because of the iPhone's texture cache size. It may simply come down to how much of the texture is on each individual triangle, quad, or tristrip, depending on how you're setting state.
Try this: subdivide your quad and repeat your tests. So if you're 1 quad, make it 4. Then 16. and so on, and see if that helps. The key is to reduce the actual number of pixels that each primitive references.
When the texture cache gets blown, then the hardware will thrash texture lookups from main memory into whatever vram is set aside for the texture buffer for each pixel. This can kill performance mighty quick.
OR - I am completely wrong because I really don't know the iPhone hardware, and I also know that the PVR chip is a strange beast in comparison to what I'm used to (PS2, PSP). Still it's an easy test to try and I'm curious if it helps.