I am developing a 2D Tile based game and currently struggling with performance issue as I am getting around 10 - 15 FPS even when running on iPad 3. OpenGL ES Frame capture reveals that I am making call to glDrawElements 689 times per frame! Is that a lot? Could it be the case of low performance?
Should I stack everything in one huge array and perform 1 draw call? will it make any difference?
At this point in time, you are currently limited by your command issue(assuming), if you run opengl performance detective ( it's under xcode (right click, open developer tools) you may have to download it through preferences ).
Your goal is to be limited by fill rate at the end of the day, here are some tips to help you get there
Sort all sprites by
Draw Depth
Blend Mode
Texture ID
Once sorted,
Pack all sprites into one vertex buffer object and an index buffer object.
When ever your draw depth, blend mode, or texture ID change, it's time to make a new draw call and bind those resources.
Also keep in mind that your sprites should have your vertices flatted on the cpu side (pos x mvp ) and you should not be sending over matrices and any other attributes such as color, should be part of the vertex.
Typical vertex
{
float pos[3]
int color
float uv[2]
}
Related
There is an empty scene with one standard cube. If you change its scale to (5;5;1), then the fps does not drop.
But if I change it to (5; 10; 1) my fps drops to ~30.
If I move the camera away from the cube with the scale (5;10;1), then the fps is again 60.
Maybe I have wrong camera settings or something else.
How to achieve high fps without moving the camera away?
p.s. The fps does not drop in the editor. Only after launching on android.
Unity version 2020.3.18f1. Tried on another version same problem.
cube with scale(5;5;1)
cube with scale(5;10;1)
cube with scale(5;10;1) and camera is distant
The problem may be due to the fact that more pixels is rendered (fragment shader is executed for every one) when the object is scaled up. The other hint is that when you move the camera far from the object the frame rate is increased as the rendered object generates fewer pixels.
As you mentioned that the program runs on Android, changing a regular shader to mobile shader may improve the performance.
From some of Unity's documentation on transforms
Performance Issues and Limitations with Non-Uniform Scaling
Non-uniform scaling is when the Scale in a Transform has different values for x, y, and z; for example (2, 4, 2). In contrast, uniform scaling has the same value for x, y, and z; for example (3, 3, 3). Non-uniform scaling can be useful in a few select cases but should be avoided whenever possible.
Non-uniform scaling has a negative impact on rendering performance. In order to transform vertex normals correctly, we transform the mesh on the CPU and create an extra copy of the data. Normally we can keep the mesh shared between instances in graphics memory, but in this case you pay both a CPU and memory cost per instance.
I'm not certain if your Z's scale matters in this case, because you're only rendering the x-y plane. I can't comment for certain on why the performance hit is reduced as you increase your camera distance. I suspect Unity has some intelligent vertex manipulation going on to simplify rendering of distant objects, saving you on CPU cost.
That being said, try to avoid non-uniform scaling. Primitives should typically only be used as placeholders.
So, I am using SceneKit to render a collection of parametric surfaces (the sum of which make an object). To put these on screen I am creating custom geometries by sampling the points and creating triangles. Here is a quick over view of how I do it.
Loop through the collection of surfaces
Generate a random color C
For each surface calculate a grid of N x N points (both positions and normals)
Assign all vertexes for that surface the color C
Add groups of 3 vertexes from this surface to the face index list
And that seems to work. After I get all this data, I make it into the proper structures (SCNGeometrySource and SCNGeometryElement) and make a SCNGeometry like so
SCNGeometry(sources: [vertexSource, normalSource, colorSource], elements: [element])
This works and displays my surfaces on the screen fine as one single geometry element. My problem is, I have some really complicated objects that I am trying to work with and it is just running really slow to move the camera around when looking at the object. Rendering is taking around 500 ms. Which is making my frame rate and experience awful.
So the question is, what steps can I take to speed up SceneKit performance? I did this same project with WebGL using Three.js with the same amount of data and was able to use an orbiting camera fine, so I can't believe that scene kit couldn't at least compete with that. What features can I tweak and turn off to speed up performance? I am using the triangle primitive type, the allowsCameraControl = true for the orbiting camera, and metal for the SCNView.
For those curious, the model I am struggling on generated 231,900 vertices and 347,850 indices for faces (11.1312 MB of vertex data (position and normal) and 1.3914 MB of face data (essentially just index positions of vertexes in order for triangles.))
1) If you are "standing" on center of your generated surface, then your problem maybe that you drawing alot offscreen (no frustum culling) and you need to split your sufrface (single node) into subsurfaces (child nodes), so only nodes that is visible in camera view space is drawn.
That being said, 231,900 vertices is really not much, I draw several milions #60fps with SceneKit Metal renderer (+20% faster than using OpenGL renderer) on OSX.
2) If you are looking on your surfaces from distance and have bad performance, check what ammount of bytesPerComponent: you feeding when creating SCNGeometrySource. I experienced big performance drop when using CGFloat (double) instead of plain float on GeForce GTX (while okay on integrated Intel graphics).
I am an Opengl ES 2.0 newbie (and GLSL newbie) so forgive me if this is an obvious question.
If I have a VBO that I initialize once on the CPU at the start of my program is it possible to then use vertex shaders to update it each frame without doing calculations on the cpu and then reuploading it to the GPU? Im not referring to sending a uniform and manipulating the data based on that. Instead I mean causing a persistent change in the VBO on the GPU itself.
So the simplest example I can think of would be adding 1 to the x,y and z component of gl_Position in the vertex shader every time the frame is rendered. This would mean that if I had only one vertex and its initial position was set on the cpu to be (0,0,0,1) then after 30 frames it would (30,30,30,1) .
If this is possible what would it look like in code?
On modern desktop hardware (GL3/DX10) you can use transform feedback to write back the output of the vertex or geometry shader into a buffer, but I really doubt that the transform_feedback extension is supported on the iPhone (or in ES in general).
If PBOs are supported (what I also doubt), you can at least do it with some GPU-GPU copies. Just copy the vertex buffer into a texture (by binding it as a PBO), then render a textured fullscreen quad and perform the update in the fragment shader. After that you copy the framebuffer (which now contains the updated vertex data) into the vertex buffer (again by binding it as PBO). But this way you have to do 2 copies (although they should both happen completely on the GPU) and if the vertex data is floating point you will need to floating point render targets and framebuffer objects to be supported, too.
I think in ES the best solution would really be to do the computation on the CPU. Just hold a CPU copy (so you at least have no unneccessary GPU-CPU readback) and update the buffer data every frame (using GL_DYNAMIC_DRAW or even GL_STREAM_DRAW as buffer usage).
Maybe you can also completely prevent the persistent update by making the changes dependent on another simpler data. In your example you could just use a uniform for the frame number and set this as coordinate in the vertex shader every frame, but I don't know how complex your update function really is.
How many maximum triangles can be drawn on ipad in a single frame. Also, is there a limit to the number of gl calls used to draw those triangles?
The only limit on total triangles that you'll run into on the iPad is in terms of memory size and how quickly you wish for this to render. The more vertices you send, the more memory your application will use, and the slower it will render.
For example, in my benchmarks I was able to push over 1,800,000 triangles per second on an iPad 1 using OpenGL ES 1.1 smooth shading, a single light source, geometry stored in vertex buffer objects (VBOs), and vertices represented by GLshorts in order to minimize total size. The iPad 2 is significantly faster than that, especially when you start doing more complex operations in your fragment shaders. From that number, I can estimate that I'd want to have fewer than 30,000 triangles in my scene geometry if I wanted to render at 60 FPS on the iPad 1.
OpenGL ES 2.0 shaders make things more complicated because of their varying complexity, but they enable new effects and may allow you to use fewer triangles to achieve the same image quality as the fixed function pipeline.
For another example, in this question Davido has a model with about 900,000 triangles that he's able to render at nearly 10 FPS on an iPad 2. I also present some geometry optimization techniques in my answer there that I've found to have a significant impact on OpenGL ES 1.1 rendering when you are maxing out tiler utilization on the device.
I'm working on an iPhone App that relies heavily on OpenGL. Right now it runs a bit slow on the iPhone 3G, but looks snappy on the new 32G iPod Touch. I assume this is hardware related. Anyway, I want to get the iPhone performance to resemble the iPod Touch performance. I believe I'm doing a lot of things sub-optimally in OpenGL and I'd like advice on what improvements will give me the most bang for the buck.
My scene rendering goes something like this:
Repeat 35 times
glPushMatrix
glLoadIdentity
glTranslate
Repeat 7 times
glBindTexture
glVertexPointer
glNormalPointer
glTexCoordPointer
glDrawArrays(GL_TRIANGLES, ...)
glPopMatrix
My Vertex, Normal and Texture Coords are already interleaved.
So, what steps should I take to speed this up? What step would you try first?
My first thought is to eliminate all those glBindTexture() calls by using a Texture Atlas.
What about some more efficient matrix operations? I understand the gl*() versions aren't too efficient.
What about VBOs?
Update
There are 8260 triangles.
Texture sizes are 64x64 pngs. There are 58 different textures.
I have not run instruments.
Update 2
After running the OpenGL ES Instrument on the iPhone 3G I found that my Tiler Utilization is in the 90-100% range, and my Render Utilization is in the 30% range.
Update 3
Texture Atlasing had no noticeable affect on the problem. Utilization ranges are still as noted above.
Update 4
Converting my Vertex and Normal pointers to GL_SHORT seemed to improve FPS, but the Tiler Utilization is still in the 90% range a lot of the time. I'm still using GL_FLOAT for my texture coordinates. I suppose I could knock those down to GL_SHORT and save four more bytes per vertex.
Update 5
Converting my texture coordinates to GL_SHORT yielded another performance increase. I'm now consistently getting >30 FPS. Tiler Utilization is still around 90%, but frequently drops down in the the 70-80% range. The Renderer Utilization is hovering around 50%. I suppose this might have something to do with scaling the texture coordinates from GL_TEXTURE Matrix Mode.
I'm still seeking additional improvements. I'd like to get closer to 40 FPS, as that's what my iPod Touch gets and it's silky smooth there. If anyone is still paying attention, what other low-hanging fruit can I pick?
With a tiler utilization still above 90%, you’re likely still vertex throughput-bound. Your renderer utilization is higher because the GPU is rendering more frames. If your primary focus is improving performance on older devices, then the key is still to cut down on the amount of vertex data needed per triangle. There are two sides to this:
Reducing the amount of data per vertex: Now that all of your vertex attributes are already GL_SHORTs, the next thing to pursue is finding a way to do what you want using fewer attributes or components. For example, if you can live without specular highlights, using DOT3 lighting instead of OpenGL ES fixed-function lighting would replace your 3 shorts (+ 1 short of padding) for normals with 2 shorts for an extra texture coordinate. As an additional bonus, you’d be able to light your models per-pixel.
Reducing the number of vertices needed per triangle: When drawing with indexed triangles, you should make sure that your indices are sorted for maximum reuse. Running your geometry through Imagination Technologies’ PVRTTriStrip tool would probably be your best bet here.
If you only have 58 different 64x64 textures, a texture atlas seems like a good idea, since they'd all fit in a single 512x512 texture... if you don't rely on texture wrap modes, I'd certainly at least try this.
What format are your textures in? You might try using a compressed PVRTC texture; I think that's less load on the Tiler, and I've been pleasantly surprised by the image quality even for 2-bit-per-pixel textures. (Good for natural images, not good if you're doing something that looks like an 8-bit video game)
The first thing I would do is run Instruments profiling on the hardware device that is slow. It should show you pretty quickly where the bottlenecks are for your particular case.
Update after instruments results:
This question has a similar result in Instruments to you, perhaps the advice is also applicable in your case (basically reducing number vertex data)
The biggest win in graphics programming comes down to this:
Batch, Batch, Batch
TextureAtlasing will make a bigger difference than most anything else you can do. Switching textures is like stopping a speeding train to let on new passengers every time.
Combine all those textures into an atlas and cut your draw calls down a lot.
This web-based tool may be helpful: http://zwoptex.zwopple.com/
Have you looked over the "OpenGL ES Programming Guide for iPhone OS" in the dev center? There are sections on Best Practices for Vertex Data and Texture Data.
Is your data formatted to be able to use triangle strips?
In terms of least effort, the modification sequence for you would probably be:
Reducing vertex attribute size
VBOs
Note that when you do these, you need to make sure that components are aligned on their native alignment, i.e. the floats or full ints are on 4-byte boundaries, the shorts are on 2-byte boundaries. If you don't do this it will tank your performance. It might be helpful to mentally map it by typing out your attribute ordering as a struct definition so you can sanity check your layout and alignment.
making sure your data is stripped to share vertices
using a texture atlas to reduce texture swaps
To try converting your textures to 16-bit RGB565 format, see this code in Apple's venerable Texture2D.m, search for kTexture2DPixelFormat_RGB565
http://code.google.com/p/cocos2d-iphone/source/browse/branches/branch-0.1/OpenGLSupport/Texture2D.m
(this code loads PNGs and converts them to RGB565 at texture creation time; I don't know if there's an RGB565 file format as such)
For more information on PVRTC compressed textures (which looked way better than I expected when I used them, even at 2 bits per pixel) see Apple's PVRTextureLoader sample:
http://developer.apple.com/iPhone/library/samplecode/PVRTextureLoader/index.html
it has both the code for loading PVRTC textures in your app and also instructions for using the texturetool to convert your .png files into .pvr files.