I have gone through https://www.raywenderlich.com/146414/metal-tutorial-swift-3-part-1-getting-started. For every frame
renderEncoder.setVertexBuffer(vertexBuffer, offset: 0, at: 0)
renderEncoder.setFragmentTexture(texture, at: 0)
is done. But vertex and texture data is never changed. Only Uniform matrices change. My object being rendered contains 8*4*4*4*4 triangles(yep, its a sphere). I could only get 4FPS. I am skeptical about setting the vertexBuffer every frame.
Its done similarly in OpenGL tutorials http://www.opengl-tutorial.org/beginners-tutorials/tutorial-5-a-textured-cube/
In OpenGL I could pull out vertex/texture buffer binding out of the render loop. But in Metal MTLRenderCommandEncoder needs CAMetalDrawable which is fetched for every frame.
You would typically use a new render command encoder for each frame. Anything you did with the previous render command encoder, like setting vertex buffers or fragment textures, is "lost" when that encoder is ended and you drop any references to it. So, yes, you need to set buffers and textures again.
However, that should not be expensive. Both of those methods just put a reference to the buffer or texture into a table. It's cheap. If you haven't modified their contents on the CPU, no data has to be copied. It shouldn't cause any state compilation, either. (Apple has said a design goal of Metal is to avoid any implicit state compilation. It's all explicit, such as when creating a render pipeline state object from a render pipeline descriptor.)
You need to profile your app to figure out what's limiting your frame rate, rather than guessing.
Related
I am working on a painting program where I draw interactive strokes via an MTKView. If I set the renderPassDescriptor loadAction to 'clear':
renderPassDescriptor?.colorAttachments[0].loadAction = .clear
The frame buffer, as expected, shows the latest contents of renderCommandEncoder?.drawPrimitives, which is this case is the leading edge of the brushstroke.
If I set loadAction to 'load':
renderPassDescriptor?.colorAttachments[0].loadAction = .load
The frame buffer flashes like crazy and shows a patchy trail of what I've just drawn. I now understand that the flashing is likely caused by MTKView's default triple buffering in place. Thus, each time I write to the currentDrawable, I'm likely writing to one of 3 cycling buffers. Please correct me if I'm wrong.
My question is, what do I need to do to draw a clean brushstroke without the frame buffer flashing as it does now? In other words, is there a way to have a master buffer that gets updated with the latest contents of commandEncoder?
You can use a texture of your own as the color attachment of a render pass. You don't have to use the texture of a drawable. In that way, you can use the .load action without getting garbage or weird flashing or whatever. You will have full control over which texture you're rendering to and what its contents are.
After rendering to that texture for a render pass, you then need to blit that to the drawable's texture for display.
The main complication here is that you won't have the benefits of double- or triple-buffering. You'll lose a certain amount of performance, since everything will have to be synced to that one texture's state. I suspect, though, that you don't need that much performance, since this is interactive and only has to keep up with the speed of a human.
I have a vertex buffer and an index buffer to render a polygon mesh.
I would like to manipulate the position of N number of vertices (move them around independent of other neighboring vertices).
How can i go about doing this?
And i certainly hope I dont have to go back to using glDrawArrays (instead of glDrawElements). It took me forever just to figure out the vertex/index buffer rendering.
You may get slightly better performance if you update the data using glBufferSubData, specially if you can avoid updating all the buffer but just an small part of it. Unless you move your vertex animation into the vertex shader, you need to update the vertex buffer each time a vertex is moved (by your user), and glBuffer(Sub)Data is your best bet.
EDIT: Create the VBO as DYNAMIC, and if you make changes very often, create two buffer and use a double buffering approach, to avoid a performance hit since this way you can write data while the gpu is using the other buffer for rendering.
I am an Opengl ES 2.0 newbie (and GLSL newbie) so forgive me if this is an obvious question.
If I have a VBO that I initialize once on the CPU at the start of my program is it possible to then use vertex shaders to update it each frame without doing calculations on the cpu and then reuploading it to the GPU? Im not referring to sending a uniform and manipulating the data based on that. Instead I mean causing a persistent change in the VBO on the GPU itself.
So the simplest example I can think of would be adding 1 to the x,y and z component of gl_Position in the vertex shader every time the frame is rendered. This would mean that if I had only one vertex and its initial position was set on the cpu to be (0,0,0,1) then after 30 frames it would (30,30,30,1) .
If this is possible what would it look like in code?
On modern desktop hardware (GL3/DX10) you can use transform feedback to write back the output of the vertex or geometry shader into a buffer, but I really doubt that the transform_feedback extension is supported on the iPhone (or in ES in general).
If PBOs are supported (what I also doubt), you can at least do it with some GPU-GPU copies. Just copy the vertex buffer into a texture (by binding it as a PBO), then render a textured fullscreen quad and perform the update in the fragment shader. After that you copy the framebuffer (which now contains the updated vertex data) into the vertex buffer (again by binding it as PBO). But this way you have to do 2 copies (although they should both happen completely on the GPU) and if the vertex data is floating point you will need to floating point render targets and framebuffer objects to be supported, too.
I think in ES the best solution would really be to do the computation on the CPU. Just hold a CPU copy (so you at least have no unneccessary GPU-CPU readback) and update the buffer data every frame (using GL_DYNAMIC_DRAW or even GL_STREAM_DRAW as buffer usage).
Maybe you can also completely prevent the persistent update by making the changes dependent on another simpler data. In your example you could just use a uniform for the frame number and set this as coordinate in the vertex shader every frame, but I don't know how complex your update function really is.
I'm developing a game for iPhone in OpenGL ES 1.1; I have a lot of textured quads in a data structure where each node has a list of children nodes. So I traverse the structure from the root, and do the render of each quad, then its childs and so on.
The thing is, for each quad I'm calling glVertexPointer to set the vertices.
Should I avoid calling it for each quad? Will improve performance calling just once for example?
glVertexPointer copies the vertices to GPU memory or just saves the pointer?
Trying to minimize the number of calls will not be easy since each node may have a different quad. I have a lot of equal sprites with the same vertex data, but I'm not necessarily rendering one after another since I may be drawing a different sprite between them.
Thanks.
glVertexPointer keeps just the pointer, but incurs a state change in the OpenGL driver and an explicit synchronisation, so costs quite a lot. Normally when you say 'here's my data, please draw', the GPU starts drawing and continues to do so in parallel to whatever is going on on the CPU for as long as it can. When you change rendering state, it needs to finish whatever it was doing in the old state. So by changing once per quad, you're effectively forcing what could be concurrent processing to be consecutive. Hence, avoiding glVertexPointer (and, presumably, a glDrawArrays or glDrawElements?) per quad should give you a significant benefit.
An immediate optimisation is simply to keep a count of the number of quads in total in the data structure, allocate a single target buffer for vertices that is at least that size and have all quads copy their geometry into the target buffer rather than calling glVertexPointer each time. Then call glVertexPointer and your drawing calls (condensed to just one call also, hopefully) with the one big array at the end. It's a bit more costly on the CPU side but the parallelism and lack of repeated GPU/CPU synchronisations should save you a lot.
While tiptoeing around topics currently under NDA, I strongly suggest you look at the Xcode 4 beta. Amongst other features Apple have stated publicly to be present is an OpenGL ES profiler. So you can easily compare approaches.
To copy data to the GPU, you need to use a vertex buffer object. That means creating a buffer with glGenBuffers, pushing data to it with glBufferData and then posting a glVertexPointer with an address of e.g. 0 if the first byte in the data you uploaded is the first byte of your vertices. In ES 1.x, you can upload data as GL_DYNAMIC_DRAW to flag that you intend to update it quite often and draw from it quite often. It's probably worth doing if you can get into a position where you're drawing more often than you're uploading.
If you ever switch to ES 2.x there's also GL_STREAM_DRAW, which may be worth investigating but isn't directly relevant to your question. I mention it as it'll likely come up if you Google for vertex buffer objects, being available on desktop OpenGL. Options for ES 1.x are only GL_STATIC_DRAW and GL_DYNAMIC_DRAW.
I've just recently worked on an iPad ES 1.x application with objects that change every frame but are drawn twice per the rendering pipeline in use. There are only five such objects on screen, each 40 vertices, but switching from the initial implementation to the VBO implementation cut 20% off my total processing time.
I need a shader that starts with a given texture, then each frame of animation operates on the previous output of the shader plus other input.
How do I organize the framebuffers so that each frame has access to the output of the previous frame without having to move the buffer back and forth from the CPU?
OpenGL ES 2.0 has Framebuffer Objects (FBOs), with it you can render directly into a texture, and you can use that texture as input for your next iteration.
That's the only way of doing it. Use two FBOs and two textures, each attached to each FBO. Read from one texture and write into the other, and then swap the textures, so you read from the last written, and write to the first. This is called "Ping-Pong" rendering.