I am currently reading an iPhone OpenGL ES project that draws some 3D shapes (shpere, cone, ..). I am a little bit confused about the behavior of glDrawElements.
After binding the vertexbuffer to GL_ARRAY_BUFFER, and the indexbuffer to GL_ELEMENT_ARRAY_BUFFER, the function glDrawElements is called:
glDrawElements(GL_TRIANGLES, IndexCount, GL_UNSIGNED_SHORT, 0);
At first I thought this function draws the shapes on screen, but actually the shapes are later drawn on the screen using:
[m_context presentRenderbuffer:GL_RENDERBUFFER];
So what does glDrawElements do? The manual describes it as render primitives from array data. But I don't understand the real meaning of render & it's difference from draw (my native language is not english)
The DrawElements call is really what "does" the drawing. Or rather it tells the GPU to draw. And the GPU will do that eventually.
The present call is only needed because the GPU usually works double buffered: One buffer that you don't see but draw to, and one buffer that is currently on display on the screen. Once you are done with all the drawing you flip them.
If you would not do this you would see flickering while drawing.
Also it allows for parallel operation. When you call DrawElements you call it multiple times for one frame. Only when you call present does the GPU have to wait for all of them to be done.
It's true that glDraw commands are responsible for your drawing and that you don't see any visible results until you call the presentRenderbuffer: method, but it's not about double buffering.
All iOS devices use GPUs designed around Tile-Based Deferred Rendering (TBDR). This is an algorithm that works very well in embedded environments with less compute resources, memory, and power-usage budget than a desktop GPU. On a desktop (stream-based) GPU, every draw command immediately does work that (typically) ends with some pixels being painted into a renderbuffer. (Usually, you don't see this because, usually, desktop GL apps are set up to use double or triple buffering, drawing a complete frame into an offscreen buffer and then swapping it onto the screen.)
TBDR is different. You can think about it being sort of like a laser printer: for that, you put together a bunch of PostScript commands (set state, draw this, set a different state, draw that, and so on), but the printer doesn't do any work until you've sent all the draw commands and finished with the one that says "okay, start laying down ink!" Then the printer computes the areas that need to be black and prints them all in one pass (instead of running up and down the page printing over areas it's already printed).
The advantage of TBDR is that the GPU knows about the whole scene before it starts painting -- this lets it use tricks like hidden surface removal to avoid doing work that won't result in visible pixels being painted.
So, on an iOS device, glDraw still "does" the drawing, but that work doesn't happen until the GPU needs it to happen. In the simple case, the GPU doesn't need to start working until you call presentRenderbuffer: (or, if you're using GLKView, until you return from its drawRect: or its delegate's glkView:drawInRect: method, which implicitly presents the renderbuffer). If you're using more advanced tricks, like rendering into textures and then using those textures to render to the screen, the GPU starts working for one render target as soon as you switch to another (using glBindFramebuffer or similar).
There's a great explanation of how TBDR works in the Advances in OpenGL ES talk from WWDC 2013.
Related
I am developing an iPhone app that uses OpenGL ES to draw the graphics onscreen. With each new frame i draw to the entire screen, which brings me to my question. Is it necessary to call glClear to clear the color buffer each frame or can i just overwrite it with the new data? The glClear function is currently 9.9% of the frametime so if i could get rid of it it would provide a nice boost.
Forgive me if this is a very naive question as i am something of a noob on the topic. Thanks in advance.
Theoretically you can leave it out.
But if you don't clear the color buffer, the previously drawn data will stay on screen.
That limits applications to such with non-modifying data.
If you have a static scene that doesn't need re-drawing, then you could skip the call to glClear() for the color buffer.
From the Khronos OpenGL ES docs:
glClear sets the bitplane area of the window to values previously selected by glClearColor.
glClear takes a single argument indicating which buffer is to be cleared.
GL_COLOR_BUFFER_BIT Indicates the buffers currently enabled for color writing.
Typically (when making 3D games) you don't clear just the color buffer, but the depth buffer as well and use depth test to make sure that the stuff that's closest to the camera is drawn. Therefore it is important to clear the depth buffer, otherwise only pixels with lower Z value than in the previous frame would be redrawn. Not clearing the color buffer would also be problematic when drawing transparent things.
If you're making a 2D application and manage the drawing order yourself (first draw the stuff that's furthest away from camera, and then what is closer a.k.a. painters algorithm) then you can leave out the clear call. One thing to note is that with such algorithm you have another problem - overdrawing i.e. drawing the same pixels more often than necessary.
I'm using OpenGL ES 1.1 to render a large view in an iPhone app. I have a "screenshot"/"save" function, which basically creates a new GL context offscreen, and then takes exactly the same geometry and renders it to the offscreen context. This produces the expected result.
Yet for reasons I don't understand, the amount of time (measured with CFAbsoluteTimeGetCurrent before and after) that the actual draw calls take when sending to the offscreen buffer is more than an order of magnitude longer than when drawing to the main framebuffer that backs an actual UIView. All of the GL state is the same for both, and the geometry list is the same, and the sequence of calls to draw is the same.
Note that there happens to be a LOT of geometry here-- the order of magnitude is clearly measurable and repeatable. Also note that I'm not timing the glReadPixels call, which is the thing that I believe actually pulls data back from the GPU. This is just a mesaure of the time spent in e.g. glDrawArrays.
I've tried:
Render that geometry to the screen again just after doing the offscreen render: takes the same quick time for the screen draw.
Render the offscreen thing twice in a row-- both times show the same slow draw speed.
Is this an inherent limitation of offscreen buffers? Or might I be missing something fundamental here?
Thanks for your insight/explanation!
Your best bet is probably to sample both your offscreen rendering and window system rendering each running in a tight loop with the CPU Sampler in Instruments and compare the results to see what differences there are.
Also, could you be a bit more clear about what exactly you mean by “render the offscreen thing twice in a row?” You mentioned at the beginning of the question that you “create a new GL context offscreen”—do you mean a new framebuffer and renderbuffer, or a completely new EAGLContext? Depending on how many new resources and objects you’re creating in order to do your offscreen rendering, the driver may need to do a lot of work to set up these resources the first time you use them in a draw call. If you’re just screenshotting the same content you were putting onscreen, you shouldn’t even need to do any of this—it should be sufficient to call glReadPixels before -[EAGLContext presentRenderbuffer:], since the backbuffer contents will still be defined at that point.
Could offscreen rendering be forcing the GPU to flush all its normal state, then do your render, flush the offscreen context, and have to reload all the normal stuff back in from CPU memory? That could take a lot longer than any rendering using data and frame buffers that stays completely on the GPU.
I'm not an expert on the issue but from what I understand graphics accelerators are used for sending data off to the screen so normally the path is Code ---vertices---> Accelerator ---rendered-image---> Screen. In your case you are flushing the framebuffer back into main memory which might be hitting some kind of bottleneck in bandwidth in the memory controller or something or other.
Apple's Technical Q&A on addressing flickering (QA1650) includes the following paragraph. (Emphasis mine.)
You must provide a color to every pixel on the screen. At the beginning of your drawing code, it is a good idea to use glClear() to initialize the color buffer. A full-screen clear of each of your color, depth, and stencil buffers (if you're using them) at the start of a frame can also generally improve your application's performance.
On other platforms, I've always found it to be an optimization to not clear the color buffer if you're going to draw to every pixel. (Why waste time filling the color buffer if you're just going to overwrite that clear color?)
How can a call to glClear() improve performance?
It is most likely related to tile-based rendering, which divides the whole viewport to tiles (smaller windows, can be typically of size 32x32) and these tiles are kept in faster memories. The copy operations between this smaller memory and the real framebuffer can take some time (memory operations are a lot slower than arithmetic operations). By issuing a glClear command, you are telling the hardware that you do not need previous buffer content, thus it does not need to copy the color/depth/whatever from the framebuffer to the smaller tile memory.
With the long perspective of official comments on iOS 4, which postdates the accepted answer...
I think that should be read in conjunction with Apple's comments about the GL_EXT_discard_framebuffer extension, which should always be used at the end of a frame if possible (and indeed elsewhere). When you discard a frame buffer you put its contents into an undefined state. The benefit of that is that when you next bind some other frame buffer, there's never any need to store the current contents of your buffer out to somewhere, and similarly when you next restore your buffer there's no need to retrieve them. Those should all be GPU memory copies and so quite cheap, but they're far from free and the iPhone's shared memory architecture presumably means that even more complicated considerations can crop up.
Based on the compositing model of iOS it's reasonable to assume that that even if your app doesn't bind and unbind frame buffers in its context, the GPU has to do those tasks implicitly at least once for each of your frames.
I would dare guess that the driver is smart enough that if the first thing you do is a clear, you get half the benefit of the discard extension without actually using it.
Try decreasing the size of your framebuffer in the glSurface attributes that you pass to ChooseConfig.
For example, set attributes to 0 for minimum or omit them completely to use defaults unless you have a specific requirement.
I can definitely confirm that you have to provide a color to every pixel on the screen.
I've verified this in a bare bones test app built on XCode's iPhone OpenGL template:
[context presentRenderbuffer:GL_RENDERBUFFER];
glClear( GL_COLOR_BUFFER_BIT );
If I leave out the glClear line (or move it further down in the loop, after some other OpenGL calls), the thread (running via CADisplayLink) is hardly getting any updates anymore. It seems as if CPU/GPU synchronization goes haywire and the thread gets blocked.
Pretty scary stuff if you ask me, and totally not in line with my expectations.
BTW, you don't neccessarily have to use glClear(), just drawing a fullscreen quad seems to have the same effect (obviously, a textured quad is more expensive). It seems you just have to invalidate all the tiles.
I'm developing a 2D game for the iPhone using OpenGL ES and I'd like to use a 320x480 bitmapped image as a persistent background.
My first thought was to create a 320x480 quad and then map a texture onto it that represents the background. So... I created a 512x512 texture with a 320x480 image on it. Then I mapped that to the 320x480 quad.
I draw this background every frame and then draw animated sprites on top of it. This works fine except that the drawing of all of these objects (background + sprites) is too slow.
I did some testing and discovered that my slowdown is in the pixel pipeline. Not surprisingly, the large background image is the main culprit. To prove this, I removed the background draw and everything else rendered very fast.
I am looking for advice on how to keep my background and also improve performance.
Here's some more info:
1) I am currently testing on the Simulator (still waiting on Apple for the license)
2) The background is a PVR texture squeezed down to 128k
3) I had hoped that there might be a way to cache this background into a color buffer but haven't had any luck with that. that may be due to my inexperience with OpenGL ES or it just might be a stupid idea that won't work :)
4) I realize that the entire background does not always have to refresh, just the parts that have been drawn over by the moving sprites. I started to look into techniques for refreshing (as necessary) parts of the the background either as separate textures or with a scissor box, however this seems less than elegant.
Any tips/advice would be greatly appreciated...
Thank you.
Do not do performance testing on the simulator. Ever!
The differences to the real hardware are huge. In both directions.
If you draw the background every frame:
Do not clear the framebuffer. The background will overdraw the whole thing anyway.
Do you really need a background texture ?
What about using a color gradient via vertex colors ?
Try using the 2bit mode for the texture.
Turn of all render steps that you do not need for the background.
E.g.: Lighting, Blending, Depth-Test, ...
If you could post some of your drawing code it would be a lot easier to help you.
If you're making a 2D game, is there any reason you aren't using an existing library? Specifically, the cocos2d for iPhone may be worth your time. I can't answer your question about how to fix the issue doing it all yourself, but I can say that I've done exactly what you're talking about (having one full screen background with sprites on top) with cocos2d and it works great. (Assuming 60 fps is fast enough for you.) You may have your reasons for doing it yourself, but if you can, I would highly suggest at least doing a quick prototype with cocos2d and seeing if that doesn't help you along. (Details and source for the iPhone version are here: http://code.google.com/p/cocos2d-iphone/)
Thanks to everyone who provided info on this. All of the advice helped out in one way or another.
However, I wanted to make it clear that the main issue here turned out to be the behavior of simulator itself (as implied by Andreas in his response). Once I was able to get the application on the device, it performed much, much better. I mention this because, prior to developing my game, I had seen a lot of posts that indicated that the device was much slower than the simulator. This might be true in some instances (e.g. general application logic) but in my experience, animation (particularly 3d transformations) are much faster on the device.
I dont have much experience with OpenGL ES, but this problem occurs generally.
Your idea about the 'color buffer' is good intuition, essentially you want to be storing your background as a frame buffer and loading it directly onto your rendering buffer before drawing the foreground.
In OpenGL this is fairly straight forward with Frame Buffer Objects (FBO's). Unfortunatly I dont think OpenGL ES supports them, but it might give you somewhere to start looking.
you may want to try using VBOs (Vertex Buffer Objects) and see if that speeds up things. Tutorial is here
In addition, I just saw, that since OpenGL ES v1.1, there is a function called glDrawTex (Draw Texture) that is designed for
fast rendering of background paintings, bitmapped font glyphs, and 2D framing elements in games
You could use frame buffer objects similar to the GLPaint example from Apple.
Use a texture atlas to minimize the number of draw calls you make. You can use glTexCoordPointer for setting your texture coordinates that maps each image to its correct position. Remember to set your vertex buffer too. Ideally one draw call will render your entire 2D scene.
Avoid enabling/disabling states where possible.
I am working on a 2D scrolling game for iPhone. I have a large image background, say 480×6000 pixels, of only a part is visible (exactly one screen’s worth, 480×320 pixels). What is the best way to get such a background on the screen?
Currently I have the background split into several textures (to get around the maximum texture size limit) and draw the whole background in each frame as a textured triangle strip. The scrolling is done by translating the modelview matrix. The scissor box is set to the window size, 480×320 pixels. This is not meant to be fast, I just wanted a working code before I get to optimizing.
I thought that maybe the OpenGL implementation would be smart enough to discard the invisible portion of the background, but according to some measuring code I wrote it looks like background takes 7 ms to draw on average and 84 ms at maximum. (This is measured in the simulator.) This is about a half of the whole render loop, ie. quite slow for me.
Drawing the background should be as easy as copying some 480×320 pixels from one part of the VRAM to another, or, in other words, blazing fast. What is the best way to get closer to such performance?
That's the fast way of doing it. Things you can do to improve performance:
Try different texture-formats. Presumably the SDK docs have details on the preferred format, and presumably smaller is better.
Cull out entirely offscreen tiles yourself
Split the image into smaller textures
I'm assuming you're drawing at a 1:1 zoom-level; is that the case?
Edit: Oops. Having read your question more carefully, I have to offer another piece of advice: Timings made on the simulator are worthless.
The quick solution:
Create a geometry matrix of tiles (quads preferably) so that there is at least one row/column of off-screen tiles on all sides of the viewable area.
Map textures to all those tiles.
As soon as one tile is outside the viewable area you can release this texture and bind a new one.
Move the tiles using a modulo of the tile width and tile height as position (so that the tile will reposition itself at its starting pos when it have moved exactly one tile in length). Also remember to remap the textures during that operation. This allows you to have a very small grid/very little texture memory loaded at any given time. Which I guess is especially important in GL ES.
If you have memory to spare and are still plagued with slow load speed (although you shouldn't for that amount of textures). You could build a texture streaming engine that preloads textures into faster memory (whatever that may be on your target device) when you reach a new area. Mapping as textures will in that case go from that faster memory when needed. Just be sure that you are able to preload it without using up all memory and remember to release it dynamically when not needed.
Here is a link to a GL (not ES) tile engine. I haven't used it myself so I cannot vouch for its functionality but it might be able to help you: http://www.mesa3d.org/brianp/TR.html