I want to make a 2D tiled background system on the iPhone. Something that takes a tilemap and tileset image(s) and converts it into the full map on the screen.
Just doing some messing around, my first approach was to create a polygon for each tile. This worked fine until I started testing it for 400 polygons or so, then it started running very slowly. I'm just wondering - is this method of several polygons just not the way to go? Or am I doing something wrong with it? I'll post code later if needed but my main question is "Would 400 small polygons run slowly on the iPhone or am I just doing something wrong?"
I also considered another way which was to, during initialization, create the map texture by code out of the tilemap/tilesets, and then stick that on ONE large polygon. So yeah...any feedback on how I should go about something like this?
I know someone will mention this - I gave consideration to trying cocos2d, but I've got my reasons for not going that route.
Your problem is almost certainly that you're binding textures 400 times, and not anything else. You should have all your tiles in one big texture atlas / sprite sheet and instead of rebinding your textures you should just bind your atlas once and then draw small parts of it. If you do this, you should be able to draw thousands of tiles with no real slowdown.
You can draw your sprite like this:
//Push the matrix so we can keep it as it was previously.
glPushMatrix();
//Store the coordinates/dimensions from a rectangle.
float x = CGRectGetMinX(rect);
float y = CGRectGetMinY(rect);
float w = CGRectGetWidth(rect);
float h = CGRectGetHeight(rect);
float xOffset = x;
float yOffset = y;
if (rotation != 0.0f)
{
//Translate the OpenGL context to the center of the sprite for rotation.
glTranslatef(x+w/2, y+h/2, 0.0f);
//Apply the rotation over the Z axis.
glRotatef(rotation, 0.0f, 0.0f, 1.0f);
//Have an offset for the top left corner.
xOffset = -w/2;
yOffset = -h/2;
}
// Set up an array of values to use as the sprite vertices.
GLfloat vertices[] =
{
xOffset, yOffset,
xOffset, yOffset+h,
xOffset+w, yOffset+h,
xOffset+w, yOffset,
};
// Set up an array of values for the texture coordinates.
GLfloat texcoords[] =
{
CGRectGetMinX(clippingRect), CGRectGetMinY(clippingRect),
CGRectGetMinX(clippingRect), CGRectGetHeight(clippingRect),
CGRectGetWidth(clippingRect), CGRectGetHeight(clippingRect),
CGRectGetWidth(clippingRect), CGRectGetMinY(clippingRect),
};
//If the image is flipped, flip the texture coordinates.
if (flipped)
{
texcoords[0] = CGRectGetWidth(clippingRect);
texcoords[2] = CGRectGetWidth(clippingRect);
texcoords[4] = CGRectGetMinX(clippingRect);
texcoords[6] = CGRectGetMinX(clippingRect);
}
//Render the vertices by pointing to the arrays.
glVertexPointer(2, GL_FLOAT, 0, vertices);
glTexCoordPointer(2, GL_FLOAT, 0, texcoords);
// Set the texture parameters to use a linear filter when minifying.
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
//Allow transparency and blending.
glEnable(GL_BLEND);
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
//Enable 2D textures.
glEnable(GL_TEXTURE_2D);
//Bind this texture.
if ([Globals getLastTextureBound] != texture)
{
glBindTexture(GL_TEXTURE_2D, texture);
}
//Finally draw the arrays.
glDrawArrays(GL_TRIANGLE_FAN, 0, 4);
//Restore the model view matrix to prevent contamination.
glPopMatrix();
The two CGRect's I used are just for ease's sake. You can specify the X, Y, width, and height to draw the image, and you can specify where in the image you want to draw using the clippingRect. With the clipping rect, (0, 0, 1, 1) is the entire image, whereas (0, 0, 0.25, 0.25) would only draw the top left corner. By changing the clipping rect, you can put all sorts of different tiles in the same texture, then you only need to bind once. Way cheaper.
Scott, the TexParameter setup only needs to be done once per texture. However, that is not the source of your slowdown.
You'll be much better off building up a list of indexes, and calling glDrawArrays once for the entire set of tiles. The goal of vertex arrays are to allow you to draw as much as possible in one step.
glDrawTex should be avoided, because it forces you into the very inefficient one-at-a-time mindset.
Using the glDrawTex extension may also be a possibility.
Stanford iTune University has a podcast on Optimizing OpenGL for iPhone.
But the basic idea are these:
Batch Geometry, combining various vertex array into a single big vertice array. This should reduce x number of glPointer calls into a single glPointer call.
Texture Atlases, using a single texture for all the different tiles, differences being the regions to use for each tile. Just bind once to the texture for all tile drawing.
Interleaved Arrays, combining various parts of a point (eg. vertex, texture coordinates, color) into a single array. This should reduce gl*Pointer calls to a single call.
Indexed triangles, allowing you to reuse geometry information
Using Short instead of Float if possible for geometery information, as it is smaller.
That's just a general opengl optimization guidelines. As for tile engine, well..
Do your own culling before sending the data to opengl. What you don't draw, you save.
I think that's what I can think of so far.
What i did to speed my app up. Was after i load my level. I created an atlas out of the tiles on the map. Then every frame i check to see if the camera did move. If it did then i just pass an glTranslatef and move the entire map at once. If only dynamic objects move on the map then i just update that object in the vertex array atlas. This system is very effiecient as i am able to draw tons of tiles with no framerate drop.
Client states should be enabled only at initialization, also the glTexParameteri functions should be called when creating the texture object.
All glEnable functions are not cached, meaning it will set the state even if it is already set to that value.
All these small things can add up and slow you down.
BR
Related
I'm making a 2D videogame. Right now I don't have that many sprites and one texture with no depth buffer works fine. But when I expand to multiple textures I want to use a depth buffer so that I don't have to make multiple passes over the same texture and so that I don't have to organize my textures with respect to any depth constraints.
When I try to get the depth buffer working I can only get a blank screen with the correct clear color. I'm going to explain my working setup without the depth buffer and list questions I have for upgrading to the depth buffer:
Right now my vertices only have position(x,y) and texture(x,y) coords. There is nothing else. No lighting, no normals, no color, etc. Is it correct that the only upgrade I have to make here is to add a z coord to my position?
Right now I am using:
glOrthof(-2, 2, -3, 3, -1, 1);
this works with no depth buffer. But when I add the depth buffer I think I need to change the near and far values. What should I change them to?
Right now for my glTexImage2D() I am using:
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, size.x, size.y, 0, GL_RGBA, GL_UNSIGNED_BYTE, pixels);
when I add the depth buffer do I have to change any of those arguments?
With my call to glClearDepthf();, should I be using one of the near or far values that I use in my call to glOrthof()? which one?
Since your working with 2D and ortho I find that it helps to have a viewport with coordinates that match your resolution, so this will keep things more readable:
CGRect rect = self.view.bounds;
if (ORTHO) {
if (highRes && (retina == 1)) {
glOrthof(0.0, rect.size.width/2, 0.0 , rect.size.height/2, -1, 1000.0);
} else {
glOrthof(0.0, rect.size.width, 0.0 , rect.size.height, -1, 1000.0);
}
glViewport(0, 0, rect.size.width*retina, rect.size.height*retina);
}
Notice that I always use 320x480 coordinates even on retina, this way I can use the same coordinates for both res, and a .5 will give me pixel perfect on retina, but you can go the other way.
Regarding depth I use a -1 to 1000 depth, so I can draw up to -1000 Z.
Make sure you're binding the depth buffer correctly, something like this:
// Need a depth buffer
glGenRenderbuffersOES(1, &depthRenderbuffer);
glBindRenderbufferOES(GL_RENDERBUFFER_OES, depthRenderbuffer);
glRenderbufferStorageOES(GL_RENDERBUFFER_OES, GL_DEPTH_COMPONENT16_OES, framebufferWidth, framebufferHeight);
glFramebufferRenderbufferOES(GL_FRAMEBUFFER_OES, GL_DEPTH_ATTACHMENT_OES, GL_RENDERBUFFER_OES, depthRenderbuffer);
Or your problem can be as simple as using a depth that's behind your camera and lights or bigger than your buffer, try to use a depth between 0 and -1 (-0.5 for ex.), with my glOrthof you can go up to -1000;
EDIT
Values in glOrthof for near and far specify a quantity (distance), not coordinates, this can be confusing when specifying depth values.
When you specify 1000 for the far parameter, what we are actually saying is the far clipping plane is a 1000 units distant from the viewer, the same with the near field, unfortunately specifying a clipping plane behind the viewer will take negative values, which contributes to the confusion.
So when it comes drawing time we have a clipping plane that's 1000 units from the viewer in front (far or into the screen), in terms of coordinates Z is negative when bellow the viewing plane (into the screen), our actually drawing world is between Z = 1 and Z = -1000, being -1000 the farthest we can go with these parameters.
If you arn't going to use an exisiting library lie Cocos2D for example then you will have to write a manager to manage the Depth buffer yourself based on either
Order that they were added to the screen
User Customised Z value so you can swap them around as needed
I'm trying to draw a basic 2d ground mesh made up of smaller tiles from a texture atlas (note the 1 pixel transparent border):
I render the tiles as texture quads using the following code:
glEnable(GL_TEXTURE_2D);
glBindTexture(GL_TEXTURE_2D, m_texture);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glVertexPointer(2, GL_SHORT, 0, &m_coords[0]);
glTexCoordPointer(2, GL_FLOAT, 0, &m_uvs[0]);
glDrawArrays(GL_TRIANGLES, 0, m_coords.size() / 2);
glBindTexture(GL_TEXTURE_2D, 0);
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
glDisable(GL_TEXTURE_2D);
The positions are obviously integer coordinates. The UV coordinates for the corners are calculated like this:
u0 = float(x) / float(texWidth);
v0 = float(y) / float(texHeight);
u1 = float(x+w) / float(texWidth);
v1 = float(y+h) / float(texHeight);
Where w and h is the size of the tile without the padding.
It looks great when the modelview transform is snapped to an integer position (right), but when it starts to move I get black thingies between the tiles (left):
From what I understand I should offset the UV coordinates with a half texel to make it work, but if I change the UV calculations to:
u0 = float(x+0.5f) / float(texWidth);
v0 = float(y+0.5f) / float(texHeight);
u1 = float(x+w-0.5f) / float(texWidth);
v1 = float(y+h-0.5f) / float(texHeight);
It still doesn't work. Is this the correct way to do it? Do I need blending for this to work? If I offset the tiles to make sure they're snapped to the pixel grid it works, but that makes it snap when moving slowly. How do people usually solve this?
EDIT
I should ofc have said that it's on the iphone.
Your border shouldn't be transparent, but rather the pixels from the opposing side of each subtexture. For example the border on the right hand side of each sub-texture should be a copy of the left-most line of pixels, i.e. the pixels that it would wrap around to.
That is how you "cheat" wrapping for the texture sampler on the borders.
I had a similar issue with a texture atlas. I fixed it by insetting the image by 1.0/TEXTURE_ATLAS_PIXELS_PER_SIDE * 1/128.0. The 128 number you need to figure out by experimentation. The upside for me is no one is going to perceive 128th of a pixel being missing. I made this modification to the texture coordinates being sent to the graphics card and not in a shader. I have not tried doing this with texels in the shader, like you have. I've read different information on how to handle texture bleeding but for a texture atlas this was the easiest solution for me. Adding borders to my textures which are tightly packed and follow the power of two rule would cause me to have a lot of whitespace.
This is what worked for me on the iphone.
You could always switch your GL_TEXTURE_MIN/MAG_FILTER to GL_NEAREST :)
Try using GL_NEAREST for GL_TEXURE_MIN/MAX_FILTER and translate by 0.375f before drawing:
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
glTranslatef(0.375f, 0.375f, 0.0f);
I've got basically a 2d game on the iPhone and I'm trying to set up multiple backgrounds that scroll at different speeds (known as parallax backgrounds).
So my thought was to just stick the backgrounds BEHIND the foreground using different z-coordinate planes, and just make them bigger than the foreground (in size) to accommodate, so that the whole thing can be scrolled (just at a different speed).
And (as far as I know) I basically implemented that. The only problem is that it seems to entirely ignore whatever z-value I give it, or rather it just zeroes all of them. I see the background (I've only tested ONE background so far, to keep it simple...so for now I just have a foreground and I want one background scrolling at a different speed), but it scrolls 1:1 with my foreground, so it obviously doesn't look right, and most of it is cut off (cause it's bigger). And I've tried various z-values for the background and various near/far clipping planes...it's always the same. I'm probably just doing one simple thing wrong, but I can't figure it out. I'm wondering if it has to do with me using only 2 coordinates in glVertexPointer for the foreground? (Of course for the background I AM passing in 3)
I'll post some code:
This is some initial setup:
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrthof(-1.0f, 1.0f, -1.5f, 1.5f, -10.0f, 10.0f);
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
glEnableClientState(GL_VERTEX_ARRAY);
//glEnableClientState(GL_COLOR_ARRAY);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
//transparency
glEnable (GL_BLEND);
glBlendFunc (GL_ONE, GL_ONE_MINUS_SRC_ALPHA);
A little bit about my foreground's float array....it's interleaved. For my foreground it goes vertex x, vertex y, texture x, texture y, repeat. This all works just fine.
This is my FOREGROUND rendering:
glVertexPointer(2, GL_FLOAT, 4*sizeof(GLfloat), texes); <br>
glTexCoordPointer(2, GL_FLOAT, 4*sizeof(GLfloat), (GLvoid*)texes + 2*sizeof(GLfloat)); <br>
glDrawArrays(GL_TRIANGLES, 0, indexCount / 4);
BACKGROUND rendering:
Same drill here except this time it goes vertex x, vertex y, vertex z, texture x, texture y, repeat. Note the z value this time. I did make sure the data in this array was correct while debugging (getting the right z values). And again, it shows up...it's just not going far back in the distance like it should.
glVertexPointer(3, GL_FLOAT, 5*sizeof(GLfloat), b1Texes);
glTexCoordPointer(2, GL_FLOAT, 5*sizeof(GLfloat), (GLvoid*)b1Texes + 3*sizeof(GLfloat));
glDrawArrays(GL_TRIANGLES, 0, b1IndexCount / 5);
And to move my camera, I just do a simple glTranslatef(x, y, 0.0f);
I'm not understanding what I'm doing wrong cause this seems like the most basic 3D function imaginable...things further away are smaller and don't move as fast when the camera moves. Not the case for me. Seems like it should be pretty basic and not even really be affected by my projection and all that (though I've even tried doing glFrustum just for fun, no success). Please help, I feel like it's just one dumb thing. I will post more code if necessary.
Shot in the dark...
You may have to forgotten to setup the Depth-Buffering within the framebuffer initializer.
Copy&Paste from Apple's older EAGLView templates:
glGenRenderbuffersOES(1, &depthRenderbuffer);
glBindRenderbufferOES(GL_RENDERBUFFER_OES, depthRenderbuffer);
glRenderbufferStorageOES(GL_RENDERBUFFER_OES, GL_DEPTH_COMPONENT16_OES, backingWidth, backingHeight);
glFramebufferRenderbufferOES(GL_FRAMEBUFFER_OES, GL_DEPTH_ATTACHMENT_OES, GL_RENDERBUFFER_OES, depthRenderbuffer);
If you are depending of blending you must draw in depth order, meaning draw the furthest (deepest) layer first. Otherwise they will be covered by the layer on top as the z-buffer value is written even though the area is 100% transparent.
See here
I've figured out that I am using orthographic projections which are incapable of displaying things being further away (please correct me if I'm wrong on this). When I tried glFrustum earlier (as I stated in my question), I was doing something wrong with the setup of it. I was using a negative value for the near-clipping value, and I basically got the 1:1 scrolling problem, same as orthographic. But I have changed this to 0.01, and it finally started displaying correctly (backgrounds displayed further away).
My issue is resolved but just as a side idea, I'm now wondering if I can mix orthographic and perspective within the same frame, and what that would require. Because I'd rather keep the foreground very simple and orthographic (2d), but I want my backgrounds to display with the perspective depth.
My idea was something like:
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrthof(-1.0f, 1.0f, -1.5f, 1.5f, -10.0f, 10.0f);
//render foreground
glLoadIdentity();
glFrustum(-1.0f, 1.0f, -1.5f, 1.5f, 0.01f, 1000.0f);
//render backgrounds
I will play around with this and comment with my results, in case anyone is curious. Feedback on this would be appreciated, though technically I have no pressing need on this issue anymore (from here on out it would just be idea discussion).
I took the example of GLPaint... I'm trying to put a background into the "PaintingView", so you could draw over the background and finally save the image as a file..... I'm lost.
I'm loading the PNG (512x512) and try to "paint with it" at the very beginning of the program, but it's painted as 64x64 instead of 512x512...
I tried before to load is as a subview of the painting view... but then, glReadPixels doesn't work as expected (it only take in consideration the PaintingView, not the subview). Also the PaintingView doesn't have a method as initWithImage... I NEED glReadPixels work on the image (and in the modification) but i really don't know why when i load it, the texture has a 64x64 size..
The GLPaint example project uses GL_POINT_SPRITE to draw copies of the brush texture as you move the brush. On the iPhone, the glPointSize is limited to 64x64 pixels. This is a hardware limitation, and in the simulator I think you can make it larger.
It sounds like you're trying to use a GL_POINT_SPRITE method to draw your background image, and that's really not what you want. Instead, try drawing a flat, textured box that fills the screen.
Here's a bit of OpenGL code that sets up vertices and texcoords for a 2D box and then draws it:
const GLfloat verticies[] = {
0.0f, 0.0f,
1.0f, 0.0f,
0.0f, 1.0f,
1.0f, 1.0f,
};
const GLfloat texcoords[] = {
0, 0,
1, 0,
0, 1,
1, 1,
};
glVertexPointer(2, GL_FLOAT, 0, verticies);
glEnableClientState(GL_VERTEX_ARRAY);
glTexCoordPointer(2, GL_FLOAT, 0, texcoords);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glEnable(GL_TEXTURE_2D);
glBindTexture(GL_TEXTURE_2D, texture);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
Hope that helps! Note that you need to specify the vertices differently depending on how your camera projection is set up. In my case, I set up my GL_MODELVIEW using the code below - I'm not sure how the GLPaint example does it.
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
glOrtho(0, 1.0, 0, 1.0, -1, 1);
First, glReadPixels() is only going to see whatever framebuffer is associated with your current OpenGL context. That might explain why you're not getting the pixels you expect.
Second, what do you mean by the texture being rendered at a specific pixel size? I assume the texture is rendered as a quad, and then the size of that quad ought to be under your control, code-wise.
Also, check that the loading of the texture doesn't generate an OpenGL error, I'm not sure what the iPhone's limitations on texture sizes are. It's quite conceivable that 512x512 is out of range. You could of course investigate this yourself, by calling glGetIntegerv() and using the GL_MAX_TEXTURE_SIZE constant.
I've slightly modified the iPhone SDK's GLSprite example while learning OpenGL ES and it turns out to be quite slow. Even in the simulator (on the hw worst) so I must be doing something wrong since it's only 400 textured triangles.
const GLfloat spriteVertices[] = {
0.0f, 0.0f,
100.0f, 0.0f,
0.0f, 100.0f,
100.0f, 100.0f
};
const GLshort spriteTexcoords[] = {
0,0,
1,0,
0,1,
1,1
};
- (void)setupView {
glViewport(0, 0, backingWidth, backingHeight);
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
glOrthof(0.0f, backingWidth, backingHeight,0.0f, -10.0f, 10.0f);
glMatrixMode(GL_MODELVIEW);
glClearColor(0.3f, 0.0f, 0.0f, 1.0f);
glVertexPointer(2, GL_FLOAT, 0, spriteVertices);
glEnableClientState(GL_VERTEX_ARRAY);
glTexCoordPointer(2, GL_SHORT, 0, spriteTexcoords);
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
// sprite data is preloaded. 512x512 rgba8888
glGenTextures(1, &spriteTexture);
glBindTexture(GL_TEXTURE_2D, spriteTexture);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, spriteData);
free(spriteData);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glEnable(GL_TEXTURE_2D);
glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC_ALPHA);
glEnable(GL_BLEND);
}
- (void)drawView {
..
glClear(GL_COLOR_BUFFER_BIT);
glLoadIdentity();
glTranslatef(tx-100, ty-100,10);
for (int i=0; i<200; i++) {
glTranslatef(1, 1, 0);
glDrawArrays(GL_TRIANGLE_STRIP, 0, 4);
}
..
}
drawView is called every time the screen is touched or the finger on the screen is moved and tx,ty are set to the x,y coordinates where that touch happened.
I've also tried using GLBuffer, when translation was pre-generated and there was only one DrawArray but gave the same performance (~4 FPS).
===EDIT===
Meanwhile I've modified this so that much smaller quads are used (sized: 34x20) and much less overlapping is done. There are ~400 quads->800 triangles spread on the whole screen. Texture size is 512x512 atlas and RGBA_8888 while the texture coordinates are in float.
The code is very ugly in terms of API efficiency: there are two MatrixMode change along with two loads and two translation then a drawarrays for a triangle strip (quad).
Now this produces ~45 FPS.
(I know this is very late, but I couldn't resist. I'll post anyway, in case other people come here looking for advice.)
This has nothing to do with the texture size. I don't know why people rated up Nils. He seems to have a fundamental misunderstanding of the OpenGL pipeline. He seems to think that for a given triangle, the entire texture is loaded and mapped onto that triangle. The opposite is true.
Once the triangle has been mapped into the viewport, it is rasterized. For every on-screen pixel the your triangle covers, the fragment shader is called. The default fragment shader (OpenGL ES 1.1, which you are using) will lookup the texel that most closely maps (GL_NEAREST) to the pixel you are drawing. It might look up 4 texels since you are using the higher quality GL_LINEAR method to average the best texel. Still, if the pixel count in your triangle is, say 100, then the most texture bytes you will have to read is 4(lookups) * 100(pixels) * 4(bytes per color. Far far less than what Nils was saying. It's amazing that he can make it sound like he actually knows what he's talking about.
WRT the tiled architecture, this is common in embedded OpenGL devices to preserve locality of reference. I believe that each tile gets exposed to each drawing operation, quickly culling most of them. Then the tile decides what to draw on itself. This is going to be much slower when you have blending turned on, as you do. Because you are using large triangles that might overlap and blend with other tiles, the GPU has to do a lot of extra work. If, instead of rendering the example square with alpha edges, you were to render an actual shape (instead of a square picture of the shape), then you could turn off blending for this part of the scene and I bet that would speed things up tremendously.
If you want to try it, just turn off blending and see how much things speed up, even if the don't look right. glDisable(GL_BLEND);
Your texture is 512*512*4 bytes per pixel. That's a megabyte of data. If you render it 200 times per frame you generate a bandwidth load of 200 megabytes per frame.
With roughly 4 fps you consume 800mb/second just for texture reads alone. Frame- and Zbuffer writes need bandwidth as well. Then there is the CPU, and don't underestimate the bandwidth requirements of the display as well.
RAM on embedded systems (e.g. your iphone) is not as fast as on a Desktop-PC. What you see here is a bandwidth starvation effect. The RAM simply can't handle the data faster.
How to cure this problem:
pick a sane texture-size. On average you should have 1 texel per pixel. This gives crisp looking textures. I know - it's not always possible. Use common sense.
use mipmaps. This takes up 33% of extra space but allows the graphic chip to pick use a lower resolution mipmap if possible.
Try smaller texture formats. Maybe you can use the ARGB4444 format. This would double the rendering speed. Also take a look at the compressed texture formats. Decompression does not cause a performance drop as it's done in hardware. Infact the opposite is true: Due to the smaller size in memory the graphic chip can read the texture-data faster.
I guess my first try was just a bad (or very good) test.
iPhone has a PowerVR MBX Lite which has a tile based graphics processor. It subdivides the screen into smaller tiles and renders them parallel. Now in the first case above the subdivision might got a bit exhausted because of the very high overlapping. More over, they couldn't be clipped because of the same distance and so all texture coordinates had to calculated (This could be easily tested by changing the translation in the loop).
Also because of the overlapping the parallelism couldn't be exploited and some tiles were sitting doing nothing and the rest (1/3) were working a lot.
So I think, while memory bandwidth could be a bottleneck, this wasn't the case in this example. The problem is more because of how the graphics HW works and the setup of the test.
I'm not familiar with the iPhone, but if it doesn't have dedicated hardware for handling floating point numbers (I suspect it doesn't) then it'd be faster to use integers whenever possible.
I'm currently developing for Android (which uses OpenGL ES as well) and for instance my vertex array is int instead of float. I can't say how much of a difference it makes, but I guess it's worth a try.
Apple is very tight-lipped about the specific hardware specs of the iPhone, which seems very strange to those of us coming from a console background. But people have been able to determine that the CPU is a 32-bit RISC ARM1176JZF. The good news is that it have a full floating-point unit, so we can continue writing math and physics code the way we do in most platforms.
http://gamesfromwithin.com/?p=239