PVR textures versus PNG in OpenGL ES - iphone

I'm developing a 2D application for the iPhone that renders lots of textures. Most of them are loaded from PNG files with alpha transparency at the moment. As a test I've been playing around with PVR-testures as well to see if there is any performance difference.
The PNG-textures are loaded with the Texture2D class that came with the crash landing example. The PVR-testures are loaded with the PVRTexture class from the PVRTextureLoader example. I create the PVR textures using Apple's texturetool.
As a test I render a background (512*512) and on top of that 36 90*64 pixel sprites (from a 512*512 texture) with transparency. PVR textures renders at around 58 fps and the PNG at 47 fps. Is this what I can expect or should the difference be bigger? Also, the textures generated by texturetool looks really bad, is the PVRTexTool better?

PNGs:
High precision color representation, not lossy
Slower read/decompression from disk.
Slow uploading to graphics hardware, internal pixel reordering (swizzling) is performed by drivers. Possibly, there's also conversion between RGBA<->BGRA<->ARGB, though Xcode usually converts PNGs to the color format more optimized for the hardware.
Slower rendering because of limited memory bandwidth(more bytes to read from memory for GPU). Actual amount of slowdown depends on the usage scenario. This problem is mostly noticeable with lower than 1x magnification ratio and no MIP-mapping.
Take more RAM/VRAM space.
Editable, can be filtered/blended/resized/converted by your software before upload.
Mip-maps can be generated automatically during texture upload by the drivers
Disk space usage varies with content, very small for simple images, almost uncompressed for photorealistic ones.
Can be exported from any image-editing software directly and quickly.
PVRs:
Low precision lossy compression. 2 compression levels, 2 bits per pixel and 4 bits per pixel, are available. Blocky, may damage sharp edges and smooth gradients. Image quality varies with content. 3 or 4 color channels, so you can use alpha channel, but lossy compression may yield undesirable results.
Fast loading from disk, no software decompression needed.
Almost instant texture upload, because it's an internal hardware format, will go through drivers unchanged.
Fast rendering because of smaller memory bandwidth usage. Pixel rendering speed is mostly limited by other factors when PVR textures are used.
Use least amount of RAM & VRAM space.
Mip-maps must be pre-generated.
You can't generate or edit PVRs inside of your software AFAIK. Or it will very slow.
Disk space usage is directly proportional to the source image sizes(fixed compression ratio). Can be further compressed (slightly) by other methods.
Size limitations. Powers of 2, square only.
Additional conversion tool is required, processing can be automated, but will slow down build times considerably.

Performance should be better with the PVRTC textures, as they are compressed (lossy). The decompression is done in the graphics hardware itself. Less texture data is being transferred around, so you get more bandwidth. The price you pay for the RAM and bandwidth saving is the loss of quality.

Related

Does mesh compression increase performance of Unity3D games?

I would like to know whether Unity's mesh compression can increase the performance of a game for low-end hardware.
Does mesh compression make a game less demanding, or is it only useful for reducing file size and the space the game takes in storage?
(Expanded from comments.)
Based on documentation that references Unity's mesh compression, the compression used by Unity will reduce the mesh's size in the game's build files, saving storage space. However, it will not reduce its memory usage when the game is running, as indicated by the quote:
Note that mesh compression only produces smaller data files and does not use less memory at run time.
As such, there is no reason to assume that game performance would be improved by mesh compression.
The mesh is not made smaller by removing vertices/whatnot. Rather, Unity reduces the precision by which it stores mesh vertex data, which in turn saves space. However, when it finally renders the mesh in-game, the mesh will still have the same complexity as before - just a bit less accurate. (There's a lengthier explanation of the topic in this Unity forum thread.)
Unity's mesh compression uses quantization, which is a lossy compression method. So it is worth experimenting with different compression levels to determine what works best for each model - reducing game size in itself is never a bad thing, but will sometimes have to be foregone if your 3D assets visually suffer from the drop in data precision.

How can I optimize the rendering of a large model in OpenGL ES 1.1?

I just finished implementing VBO's in my 3D app and saw a roughly 5-10x speed increase in rendering. What used to render at 1-2 frames per second now renders at 10-11 frames per second.
My question is, are there any further improvements I can make to increase rendering speed? Will triangle strips make a big difference? Currently vertices are not being shared between faces, each faces vertices are unique but overlapping.
My Device Utilization is 100%, Tiler Utilization is 100%, Renderer Utilization is 11%, and resource bytes is 114819072. This is rendering 912,120 faces on a CAD model.
Any suggestions?
A Tiler Utilization of 100% indicates that your bottleneck is in the size of the geometry being sent to the GPU. Whatever you can do to shrink the geometry size can lead to an almost linear reduction in rendering time, in my experience. These tuning steps have worked for me in the past:
If you're not already, you could look at using indexing, which might cut down on geometry by eliminating some redundant vertices. The PowerVR GPUs in the iOS devices are optimized for using indexed geometry, as well.
Try using a smaller data type for your vertex information. I found that I could use GLshort instead of GLfloat for my vertices and normals without losing much precision in the rendering. This will significantly compact your geometry and lead to a nice speed boost in rendering.
Bin similarly colored vertices and render them as one group at a set color, rather than supplying per-vertex color information. The overhead from the few extra draw calls this requires will be vastly outweighed by the speedup you get from not having to send all that color information. I saw a ~18% reduction in rendering time by binning the colors in one of my larger models.
You're already using VBOs, so you've taken advantage of that optimization.
Don't halt the rendering pipeline at any point. Cut out anything that reads the current state, like all glGet* calls, because they really mess with the flow of the PowerVR GPUs.
There are other things you can do that will lead to smaller performance improvements, like using interleaved vertex, normal, texture data in your VBOs, aligning your data to 4 byte boundaries, etc., but the ones above are what I've found to have the largest impact in the tuning of my own OpenGL ES 1.1 application.
Most of these points are covered well in the "Best Practices for Working with Vertex Data" section of Apple's OpenGL ES Programming Guide for iOS.

What does the Tiler Utilization statistic mean in the iPhone OpenGL ES instrument?

I have been trying to perform some OpenGL ES performance optimizations in an attempt to boost the number of triangles per second that I'm able to render in my iPhone application, but I've hit a brick wall. I've tried converting my OpenGL ES data types from fixed to floating point (per Apple's recommendation), interleaving my vertex buffer objects, and minimizing changes in drawing state, but none of these changes have made a difference in rendering speed. No matter what, I can't seem to push my application above 320,000 triangles / s on an iPhone 3G running the 3.0 OS. According to this benchmark, I should be able to hit 687,000 triangles/s on this hardware with the smooth shading I'm using.
In my testing, when I run the OpenGL ES performance tool in Instruments against the running device, I'm seeing the statistic "Tiler Utilization" reaching nearly 100% when rendering my benchmark, yet the "Renderer Utilization" is only getting to about 30%. This may be providing a clue as to what the bottleneck is in the display process, but I don't know what these values mean, and I've not found any documentation on them. Does someone have a good description of what this and the other statistics in the iPhone OpenGL ES instrument stand for? I know that the PowerVR MBX Lite in the iPhone 3G is a tile-based deferred renderer, but I'm not sure what the difference would be between the Renderer and Tiler in that architecture.
If it helps in any way, the (BSD-licensed) source code to this application is available if you want to download and test it yourself. In the current configuration, it starts a little benchmark every time you load a new molecular structure and outputs the triangles / s to the console.
The Tiler Utilization and Renderer Utilization percentages measure the duty cycle of the vertex and fragment processing hardware, respectively. On the MBX, Tiler Utilization typically scales with the amount of vertex data being sent to the GPU (in terms of both the number of vertices and the size of the attributes sent per-vertex), and Fragment Utilization generally increases with overdraw and texture sampling.
In your case, the best thing would be to reduce the size of each vertex you’re sending. For starters, I’d try binning your atoms and bonds by color, and sending each of these bins using a constant color instead of an array. I’d also suggest investigating if shorts are suitable for your positions and normals, given appropriate scaling. You might also have to bin by position in this case, if shorts scaled to provide sufficient precision aren’t covering the range you need. These sorts of techniques might require additional draw calls, but I suspect the improvement in vertex throughput will outweigh the extra per-draw call CPU overhead.
Note that it’s generally beneficial (on MBX and elsewhere) to ensure that each vertex attribute begins on a 32-bit boundary, which implies that you should pad your positions and normals out to 4 components if you switch them to shorts. The peculiarities of the MBX platform also make it such that you want to actually include the W component of the position in the call to glVertexPointer in this case.
You might also consider pursuing alternate lighting methods like DOT3 for your polygon data, particularly the spheres, but this requires special care to make sure that you aren’t making your rendering fragment-bound, or inadvertently sending more vertex data than before.
Great answer, #Pivot! For reference, this Apple doc defines these terms:
Renderer Utilization %. The percentage of time the GPU spent performing fragment processing.
Tiler Utilization %. The percentage of time the GPU spent performing vertex processing and tiling.
Device Utilization %. The percentage of time the GPU spent doing any tiling or rendering work.

Advice on speeding up OpenGL ES 1.1 on the iPhone

I'm working on an iPhone App that relies heavily on OpenGL. Right now it runs a bit slow on the iPhone 3G, but looks snappy on the new 32G iPod Touch. I assume this is hardware related. Anyway, I want to get the iPhone performance to resemble the iPod Touch performance. I believe I'm doing a lot of things sub-optimally in OpenGL and I'd like advice on what improvements will give me the most bang for the buck.
My scene rendering goes something like this:
Repeat 35 times
glPushMatrix
glLoadIdentity
glTranslate
Repeat 7 times
glBindTexture
glVertexPointer
glNormalPointer
glTexCoordPointer
glDrawArrays(GL_TRIANGLES, ...)
glPopMatrix
My Vertex, Normal and Texture Coords are already interleaved.
So, what steps should I take to speed this up? What step would you try first?
My first thought is to eliminate all those glBindTexture() calls by using a Texture Atlas.
What about some more efficient matrix operations? I understand the gl*() versions aren't too efficient.
What about VBOs?
Update
There are 8260 triangles.
Texture sizes are 64x64 pngs. There are 58 different textures.
I have not run instruments.
Update 2
After running the OpenGL ES Instrument on the iPhone 3G I found that my Tiler Utilization is in the 90-100% range, and my Render Utilization is in the 30% range.
Update 3
Texture Atlasing had no noticeable affect on the problem. Utilization ranges are still as noted above.
Update 4
Converting my Vertex and Normal pointers to GL_SHORT seemed to improve FPS, but the Tiler Utilization is still in the 90% range a lot of the time. I'm still using GL_FLOAT for my texture coordinates. I suppose I could knock those down to GL_SHORT and save four more bytes per vertex.
Update 5
Converting my texture coordinates to GL_SHORT yielded another performance increase. I'm now consistently getting >30 FPS. Tiler Utilization is still around 90%, but frequently drops down in the the 70-80% range. The Renderer Utilization is hovering around 50%. I suppose this might have something to do with scaling the texture coordinates from GL_TEXTURE Matrix Mode.
I'm still seeking additional improvements. I'd like to get closer to 40 FPS, as that's what my iPod Touch gets and it's silky smooth there. If anyone is still paying attention, what other low-hanging fruit can I pick?
With a tiler utilization still above 90%, you’re likely still vertex throughput-bound. Your renderer utilization is higher because the GPU is rendering more frames. If your primary focus is improving performance on older devices, then the key is still to cut down on the amount of vertex data needed per triangle. There are two sides to this:
Reducing the amount of data per vertex: Now that all of your vertex attributes are already GL_SHORTs, the next thing to pursue is finding a way to do what you want using fewer attributes or components. For example, if you can live without specular highlights, using DOT3 lighting instead of OpenGL ES fixed-function lighting would replace your 3 shorts (+ 1 short of padding) for normals with 2 shorts for an extra texture coordinate. As an additional bonus, you’d be able to light your models per-pixel.
Reducing the number of vertices needed per triangle: When drawing with indexed triangles, you should make sure that your indices are sorted for maximum reuse. Running your geometry through Imagination Technologies’ PVRTTriStrip tool would probably be your best bet here.
If you only have 58 different 64x64 textures, a texture atlas seems like a good idea, since they'd all fit in a single 512x512 texture... if you don't rely on texture wrap modes, I'd certainly at least try this.
What format are your textures in? You might try using a compressed PVRTC texture; I think that's less load on the Tiler, and I've been pleasantly surprised by the image quality even for 2-bit-per-pixel textures. (Good for natural images, not good if you're doing something that looks like an 8-bit video game)
The first thing I would do is run Instruments profiling on the hardware device that is slow. It should show you pretty quickly where the bottlenecks are for your particular case.
Update after instruments results:
This question has a similar result in Instruments to you, perhaps the advice is also applicable in your case (basically reducing number vertex data)
The biggest win in graphics programming comes down to this:
Batch, Batch, Batch
TextureAtlasing will make a bigger difference than most anything else you can do. Switching textures is like stopping a speeding train to let on new passengers every time.
Combine all those textures into an atlas and cut your draw calls down a lot.
This web-based tool may be helpful: http://zwoptex.zwopple.com/
Have you looked over the "OpenGL ES Programming Guide for iPhone OS" in the dev center? There are sections on Best Practices for Vertex Data and Texture Data.
Is your data formatted to be able to use triangle strips?
In terms of least effort, the modification sequence for you would probably be:
Reducing vertex attribute size
VBOs
Note that when you do these, you need to make sure that components are aligned on their native alignment, i.e. the floats or full ints are on 4-byte boundaries, the shorts are on 2-byte boundaries. If you don't do this it will tank your performance. It might be helpful to mentally map it by typing out your attribute ordering as a struct definition so you can sanity check your layout and alignment.
making sure your data is stripped to share vertices
using a texture atlas to reduce texture swaps
To try converting your textures to 16-bit RGB565 format, see this code in Apple's venerable Texture2D.m, search for kTexture2DPixelFormat_RGB565
http://code.google.com/p/cocos2d-iphone/source/browse/branches/branch-0.1/OpenGLSupport/Texture2D.m
(this code loads PNGs and converts them to RGB565 at texture creation time; I don't know if there's an RGB565 file format as such)
For more information on PVRTC compressed textures (which looked way better than I expected when I used them, even at 2 bits per pixel) see Apple's PVRTextureLoader sample:
http://developer.apple.com/iPhone/library/samplecode/PVRTextureLoader/index.html
it has both the code for loading PVRTC textures in your app and also instructions for using the texturetool to convert your .png files into .pvr files.

Paletted textures with 8-bit alpha channel in OpenGL ES

Can I get paletted textures with RGB palette and 8-bit alpha channel in OpenGL ES? (I am targetting the iPhone OpenGL ES implementation.) After peeking into the OpenGL documentation it seems to me that there is support for paletted textures with alpha in the palette, ie. the texture contains 8-bit indexes into a palette with 256 RGBA colors. I would like a texture that contains 8-bit indexes into an RGB palette and a standalone 8-bit alpha channel. (I am trying to save memory and 32-bit RGBA textures are quite a luxury.) Or is this supposed to be done by hand, ie. by creating two independent textures, one for the colors and one for the alpha map, and combining them manually?
No. In general, graphics chips really don't like palletized textures (why? because reading any texel from it requires two memory reads, one of the index and another into the palette. Double the latency of a normal read).
If you're only after saving memory, then look into compressed texture formats. In iPhone specifically, it supports PVRTC compressed formats with 2 bits/pixel and 4 bits/pixel. The compression is lossy (just like palette compression is often lossy), but memory and bandwidth savings are substantial.
Palette textures will be expanded on load for the GPU in the iPhone so you won't gain any advantage by using them other than storage size. Your best bet for a cartoon style game is to explore using PVRTC compressed textures (like NeARAZ says) and 16-bit textures (RGBA4444 for alpha, RGBA5551 for punch-through, RGB565 for opaque textures). Using a mixture of both is my own approach.
Disclaimer:
This info is based on the official OpenGL|ES spec. I have no idea of the OpenGL|ES implementation on the iphone supports compressed textures or not. In theory it should at least simulate compressed textures, but you'll never now.
You can load 8 bits/pixel textures via the glCompressedTexImage2D call. The compression type you most likey want to use is GL_PALETTE8_RGBA8_OES.
Storing an additional alpha-channel along with the palette is not directly possible. Either you use the second texture unit for the alpha-component, or you take the alpha during color quantization into account and use a palettized format that contains alpha.
There is a command-line color quantization tool called BRIGHT out there somewhere on the net. It does an incredible job at quantizing images with alpha. You may want to give it a try..