Separating OpenGL Calls from Updating on the iPhone - iphone

I'm a bit of a newb when it comes to threading, so any pointers in the right direction would be a great help. I've got a game with both a fairly heavy update function, and a fairly heavy draw function. I'd assume that the majority of the weight in the draw function is going to happen on the GPU. Because of this, I'd like to start calculating the update on the next frame while the drawing is happening. Right now, my game loop is quite simple:
Game->Update1();
Game->Update2();
Game->Draw();
Update1() updates variables that do not change game state, so it can run independently from Draw. That is to say, there should be no fights over data between the two. It is also the bulk of the CPU processing.
Update2() updates variables that Draw needs, and it is quite fast, so it seems right to have it running serially with Draw(). Additionally, I believe that the Draw() function is light on CPU and heavy on GPU.
What I would like to happen is that while the GPU is busy processing all the Draw functionality, the next frame's Update1() can use the CPU to get the next frame's update ready. It doesn't seem like I'm automatically getting this functionality -- the Draw cycle seems to take a little while and block everything until it's done, which is less than ideal.
What's the proper way to do this? Is this already happening, and I'm just not observing it properly?

That depends on what Draw() contains, you should get the CPU-GPU parallelism automatically, unless some call inside Draw() synchronizes between CPU and GPU. One simple example is using glReadPixels.

Related

State preserving particle system for OpenGL ES 2.0

I'm trying to implement a state preserving particle system on the iPhone using OpenGL ES 2.0. By state-preserving, I mean that each particle is integrated forward in time, having a unique velocity and position vector that changes with time and can not be calculated from the initial conditions at every rendering call.
Here's one possible way I can think of.
Setup particle initial conditions in VBO.
Integrate particles in vertex shader, write result to texture in fragment shader. (1st rendering call)
Copy data from texture to VBO.
Render particles from data in VBO. (2nd rendering call)
Repeat 2.-4.
The only thing I don't know how to do efficiently is step 3. Do I have to go through the CPU? I wonder if is possible to do this entirely on the GPU with OpenGL ES 2.0. Any hints are greatly appreciated!
I don't think this is possible without simply using glReadPixels -- ES2 doesn't have the same flexible buffer management that OpenGL has to allow you to copy buffer contents using the GPU (where, for example, you could copy data between the texture and vbo, or use simply use transform feedback which is basically designed to do exactly what you want).
I think your only option if you need to use the GPU is to use glReadPixels to copy the framebuffer contents back out after rendering. You probably also want to check and use EXT_color_buffer_float or related if available to make sure you have high precision values (RGBA8 is probably not going to be sufficient for your particles). If you're intermixing this with normal rendering, you probably want to build in a bunch of buffering (wait a frame or two) so you don't stall the CPU waiting for the GPU (this would be especially bad on PowerVR since it buffers a whole frame before rendering).
ES3.0 will have support for transform feedback, which doesn't help but hopefully gives you some hope for the future.
Also, if you are running on an ARM cpu, it seems like it'd be faster to use NEON to quickly update all your particles. It can be quite fast and will skip all the overhead you'll incur from the CPU+GPU method.

Cocos2d update method efficiency

In a fairly small game, I have everything updating (sprites, velocities,backgrounds ect.) in on large scheduled update method. I was wondering if there was a performance difference between just having one large scheduled update, or several ones only updating a couple sprites each?
I was also wondering if there a performance difference between:
sprite.position = ccpAdd(sprite.postion, (delta*10, delta*5));
and
sprite.position = ccp(sprite.position.x + delta*10, sprite.position.y + delta*5);
Is there a performance difference between assigning positions via ccp vs CGPointMake?
None that matters.
If you really, really want to know, measure it.
Those are minutiae. It's like asking if your car goes faster after waxing it. It might, it might not. In 99.99999% it simply doesn't matter because the difference is negligible and other contributing factors have much more weight (car: traffic and road conditions / game: drawing stuff on the screen).
ccpAdd is resolved to ccp which is then resolved to CGPointMake so they are identical in your compiled code. They are all #define so it is done in the preprocessor.
Indeed, ccpAdd & ccp are identical in your compiled code.
As for your performance problem, if you have a lot of sprites to update you may want to spawn a background thread to do part of your updating there.
performSelectorInBackground:withObject: and don't forget to add the code in an autorelease pool

Multiple Effects in a Shader

My question does have a slight basis in GLSL, since that happens to be the shading language I know.
Its my opinion that shaders & the programmable graphics pipeline are a huge step up from the fixed function pipeline. Shaders are excellent at applying effects and making 3D graphics look far more realistic. However, not every effect is meant to be applied to every scenario. For instance, I wouldn't want my flag waving effect used across an entire scene. If that scene contains one flag, I want that flag to wave back and forth and thats about it. I'd want a water effect applied only to water. You get the idea.
My question is what is the best way to implement this toggling of effects. The only way I can think of is to have a series of uniform variables and toggle/untoggle them before and after drawing something.
For instance,
(pseudocode)
toggle flag effect uniform
draw flag
untoggle flag effect uniform
Inside the shader code, it would check the value of these uniforms and act accordingly.
EDIT: I understand one can have multiple shader programs, and switch on their use as needed, but would this actually be faster than the above method or come with a serious performance overhead from moving all that data around in the GPU? It would seem to be that possibly doing this multiple times per frame would be extremely costly

Optimizing OpenGL ES application. Should I avoid calling glVertexPointer when possible?

I'm developing a game for iPhone in OpenGL ES 1.1; I have a lot of textured quads in a data structure where each node has a list of children nodes. So I traverse the structure from the root, and do the render of each quad, then its childs and so on.
The thing is, for each quad I'm calling glVertexPointer to set the vertices.
Should I avoid calling it for each quad? Will improve performance calling just once for example?
glVertexPointer copies the vertices to GPU memory or just saves the pointer?
Trying to minimize the number of calls will not be easy since each node may have a different quad. I have a lot of equal sprites with the same vertex data, but I'm not necessarily rendering one after another since I may be drawing a different sprite between them.
Thanks.
glVertexPointer keeps just the pointer, but incurs a state change in the OpenGL driver and an explicit synchronisation, so costs quite a lot. Normally when you say 'here's my data, please draw', the GPU starts drawing and continues to do so in parallel to whatever is going on on the CPU for as long as it can. When you change rendering state, it needs to finish whatever it was doing in the old state. So by changing once per quad, you're effectively forcing what could be concurrent processing to be consecutive. Hence, avoiding glVertexPointer (and, presumably, a glDrawArrays or glDrawElements?) per quad should give you a significant benefit.
An immediate optimisation is simply to keep a count of the number of quads in total in the data structure, allocate a single target buffer for vertices that is at least that size and have all quads copy their geometry into the target buffer rather than calling glVertexPointer each time. Then call glVertexPointer and your drawing calls (condensed to just one call also, hopefully) with the one big array at the end. It's a bit more costly on the CPU side but the parallelism and lack of repeated GPU/CPU synchronisations should save you a lot.
While tiptoeing around topics currently under NDA, I strongly suggest you look at the Xcode 4 beta. Amongst other features Apple have stated publicly to be present is an OpenGL ES profiler. So you can easily compare approaches.
To copy data to the GPU, you need to use a vertex buffer object. That means creating a buffer with glGenBuffers, pushing data to it with glBufferData and then posting a glVertexPointer with an address of e.g. 0 if the first byte in the data you uploaded is the first byte of your vertices. In ES 1.x, you can upload data as GL_DYNAMIC_DRAW to flag that you intend to update it quite often and draw from it quite often. It's probably worth doing if you can get into a position where you're drawing more often than you're uploading.
If you ever switch to ES 2.x there's also GL_STREAM_DRAW, which may be worth investigating but isn't directly relevant to your question. I mention it as it'll likely come up if you Google for vertex buffer objects, being available on desktop OpenGL. Options for ES 1.x are only GL_STATIC_DRAW and GL_DYNAMIC_DRAW.
I've just recently worked on an iPad ES 1.x application with objects that change every frame but are drawn twice per the rendering pipeline in use. There are only five such objects on screen, each 40 vertices, but switching from the initial implementation to the VBO implementation cut 20% off my total processing time.

Is there a faster way to draw text?

Shark complains about a big performance hit with this line, which takes like 80% of CPU time. I have a counter that is updated very frequently and performance seriously sucks.
It's an custom UILabel subclass with -drawRect: implemented. Every time the counter value changes, this is used to draw the new text:
[self.text drawInRect:textRect withFont:correctedFont lineBreakMode:self.lineBreakMode alignment:self.textAlignment];
When I comment this line out, performance rocks. Its smooth and fast. So Shark isn't wrong about this. But what could I do to improve this? Maybe go a level deeper? Does that make any sense?
Probably drawing text is really so incredible heavy...?
There's no reason the drawing of a single label should cause such a massive performance hit. If you're updating it more than 30-60 times per second, though, the system may have trouble keeping up. In that case, you could use an NSTimer to only perform the drawing at fixed intervals. There's no doubt that drawing text is expensive, but you've pretty much found the optimal way of doing the drawing itself, unless the label is only a single line, in which case you can use the slightly cheaper drawAtPoint:withAttributes:
Underneath, the text is being drawn with Quartz2D. You might see some improvement if you use it directly.