Poor performance when downscaling surfaces - cairo

I am using PyCairo to write a chess game, each piece is a 512x512 ImageSurface. The pieces need to scale up and down. When the scale (x or y) is less than 1.0, the app is painfully slow, with only 32 pieces. When scale is equal to or greater than 1.0, it is blisteringly fast.
I have tried canvas.set_antialias(cairo.ANTIALIAS_NONE), with no good results. I have tried both cr.set_scale() and surface.set_device_scale() with the same poor performance. Is there any way to scale down faster, with possibly lower quality, which is acceptable?
I thought of recreating surfaces every time the chess board is resized, and use a scale of 1.0 in that case. However it will choke as the user resizes the window.

After you set the surface pattern as a source in a cairo context, do cairo_pattern_set_filter(cairo_get_source(cr), CAIRO_FILTER_FAST);. I think this means that 'nearest' is used as the scaling algorithm.

Related

FPS drops when scale object. How fix? Unity Android

There is an empty scene with one standard cube. If you change its scale to (5;5;1), then the fps does not drop.
But if I change it to (5; 10; 1) my fps drops to ~30.
If I move the camera away from the cube with the scale (5;10;1), then the fps is again 60.
Maybe I have wrong camera settings or something else.
How to achieve high fps without moving the camera away?
p.s. The fps does not drop in the editor. Only after launching on android.
Unity version 2020.3.18f1. Tried on another version same problem.
cube with scale(5;5;1)
cube with scale(5;10;1)
cube with scale(5;10;1) and camera is distant
The problem may be due to the fact that more pixels is rendered (fragment shader is executed for every one) when the object is scaled up. The other hint is that when you move the camera far from the object the frame rate is increased as the rendered object generates fewer pixels.
As you mentioned that the program runs on Android, changing a regular shader to mobile shader may improve the performance.
From some of Unity's documentation on transforms
Performance Issues and Limitations with Non-Uniform Scaling
Non-uniform scaling is when the Scale in a Transform has different values for x, y, and z; for example (2, 4, 2). In contrast, uniform scaling has the same value for x, y, and z; for example (3, 3, 3). Non-uniform scaling can be useful in a few select cases but should be avoided whenever possible.
Non-uniform scaling has a negative impact on rendering performance. In order to transform vertex normals correctly, we transform the mesh on the CPU and create an extra copy of the data. Normally we can keep the mesh shared between instances in graphics memory, but in this case you pay both a CPU and memory cost per instance.
I'm not certain if your Z's scale matters in this case, because you're only rendering the x-y plane. I can't comment for certain on why the performance hit is reduced as you increase your camera distance. I suspect Unity has some intelligent vertex manipulation going on to simplify rendering of distant objects, saving you on CPU cost.
That being said, try to avoid non-uniform scaling. Primitives should typically only be used as placeholders.

Does changing Physics.defaultContactOffset have an important impact on performance?

As usual, the documentation lacking some information we have to gather somewhere else: Physics.defaultContactOffset.
Physics.defaultContactOffset is used by the collision detection system to predictively enforce the contact constraint.
Unity explains you should use 1 unit = 1 meter for physic simulation.
I needed a lot of small spheres and cubes: 10cm width. Thus 0,1 "unit".
What they dont say is that when you're working on a small scale (I'm using objects of 0,1m width = 10cm) you have to change Physics.defaultContactOffset to a smaller value than the default one.
Hence my question: is Physics.defaultContactOffset important for calculations, i.e. if I change this to a very small value, does it have a negative impact on performance?
I have to change it from 0.001 to 0.00001 to get an acceptable collision detection system and I'm worried about a negative impact on performance.
From Unity3D documentation on Default Contact Offset:
Use this to set the distance the collision detection system uses to
generate collision contacts. The value must be positive, and if set
too close to zero, it can cause jitter. This is set to 0.01 by
default. Colliders only generate collision contacts if their distance
is less than the sum of their contact offset values.
So we can assume the physics engine is calculating distances between colliders and checking if the distance counts as a collision or not. I don't think it matters so much for performance as the calculation is done anyway.
With all this being said, Unity3d physics engine doesn't really do well with tiny objects, so it's better if you scale the spheres up to 1 unit, and scale everything else to compensate. You will most likely run into issues with these tiny colliders.

does world size affect performance?

Surprisingly I couldn't find any meaty info about this, so here I am.
I understand that 1 unit in the editor = 1 meter, but does large worlds affect performance? I mean, they obviously do in sandbox games and other games packed with content. But what if it's mostly empty? My current project is a 2D Gravity Wars clone game with moving planets. Should I make it as small and condensed as possible, or is there room to scale things up?
Basically, do large distances in a mostly empty world affect in-game performance?
Simple answer ... No.
The only thing you lose with big numbers when we are talking about float based vectors is precision.
So the real question becomes ... how accurate do you want your logic to be?
I tend to build stuff placing verts on round value points then scale for size rather than space vertex info at scaled points ... if that makes sense / helps?

How can I optimize the rendering of a large model in OpenGL ES 1.1?

I just finished implementing VBO's in my 3D app and saw a roughly 5-10x speed increase in rendering. What used to render at 1-2 frames per second now renders at 10-11 frames per second.
My question is, are there any further improvements I can make to increase rendering speed? Will triangle strips make a big difference? Currently vertices are not being shared between faces, each faces vertices are unique but overlapping.
My Device Utilization is 100%, Tiler Utilization is 100%, Renderer Utilization is 11%, and resource bytes is 114819072. This is rendering 912,120 faces on a CAD model.
Any suggestions?
A Tiler Utilization of 100% indicates that your bottleneck is in the size of the geometry being sent to the GPU. Whatever you can do to shrink the geometry size can lead to an almost linear reduction in rendering time, in my experience. These tuning steps have worked for me in the past:
If you're not already, you could look at using indexing, which might cut down on geometry by eliminating some redundant vertices. The PowerVR GPUs in the iOS devices are optimized for using indexed geometry, as well.
Try using a smaller data type for your vertex information. I found that I could use GLshort instead of GLfloat for my vertices and normals without losing much precision in the rendering. This will significantly compact your geometry and lead to a nice speed boost in rendering.
Bin similarly colored vertices and render them as one group at a set color, rather than supplying per-vertex color information. The overhead from the few extra draw calls this requires will be vastly outweighed by the speedup you get from not having to send all that color information. I saw a ~18% reduction in rendering time by binning the colors in one of my larger models.
You're already using VBOs, so you've taken advantage of that optimization.
Don't halt the rendering pipeline at any point. Cut out anything that reads the current state, like all glGet* calls, because they really mess with the flow of the PowerVR GPUs.
There are other things you can do that will lead to smaller performance improvements, like using interleaved vertex, normal, texture data in your VBOs, aligning your data to 4 byte boundaries, etc., but the ones above are what I've found to have the largest impact in the tuning of my own OpenGL ES 1.1 application.
Most of these points are covered well in the "Best Practices for Working with Vertex Data" section of Apple's OpenGL ES Programming Guide for iOS.

Advice on speeding up OpenGL ES 1.1 on the iPhone

I'm working on an iPhone App that relies heavily on OpenGL. Right now it runs a bit slow on the iPhone 3G, but looks snappy on the new 32G iPod Touch. I assume this is hardware related. Anyway, I want to get the iPhone performance to resemble the iPod Touch performance. I believe I'm doing a lot of things sub-optimally in OpenGL and I'd like advice on what improvements will give me the most bang for the buck.
My scene rendering goes something like this:
Repeat 35 times
glPushMatrix
glLoadIdentity
glTranslate
Repeat 7 times
glBindTexture
glVertexPointer
glNormalPointer
glTexCoordPointer
glDrawArrays(GL_TRIANGLES, ...)
glPopMatrix
My Vertex, Normal and Texture Coords are already interleaved.
So, what steps should I take to speed this up? What step would you try first?
My first thought is to eliminate all those glBindTexture() calls by using a Texture Atlas.
What about some more efficient matrix operations? I understand the gl*() versions aren't too efficient.
What about VBOs?
Update
There are 8260 triangles.
Texture sizes are 64x64 pngs. There are 58 different textures.
I have not run instruments.
Update 2
After running the OpenGL ES Instrument on the iPhone 3G I found that my Tiler Utilization is in the 90-100% range, and my Render Utilization is in the 30% range.
Update 3
Texture Atlasing had no noticeable affect on the problem. Utilization ranges are still as noted above.
Update 4
Converting my Vertex and Normal pointers to GL_SHORT seemed to improve FPS, but the Tiler Utilization is still in the 90% range a lot of the time. I'm still using GL_FLOAT for my texture coordinates. I suppose I could knock those down to GL_SHORT and save four more bytes per vertex.
Update 5
Converting my texture coordinates to GL_SHORT yielded another performance increase. I'm now consistently getting >30 FPS. Tiler Utilization is still around 90%, but frequently drops down in the the 70-80% range. The Renderer Utilization is hovering around 50%. I suppose this might have something to do with scaling the texture coordinates from GL_TEXTURE Matrix Mode.
I'm still seeking additional improvements. I'd like to get closer to 40 FPS, as that's what my iPod Touch gets and it's silky smooth there. If anyone is still paying attention, what other low-hanging fruit can I pick?
With a tiler utilization still above 90%, you’re likely still vertex throughput-bound. Your renderer utilization is higher because the GPU is rendering more frames. If your primary focus is improving performance on older devices, then the key is still to cut down on the amount of vertex data needed per triangle. There are two sides to this:
Reducing the amount of data per vertex: Now that all of your vertex attributes are already GL_SHORTs, the next thing to pursue is finding a way to do what you want using fewer attributes or components. For example, if you can live without specular highlights, using DOT3 lighting instead of OpenGL ES fixed-function lighting would replace your 3 shorts (+ 1 short of padding) for normals with 2 shorts for an extra texture coordinate. As an additional bonus, you’d be able to light your models per-pixel.
Reducing the number of vertices needed per triangle: When drawing with indexed triangles, you should make sure that your indices are sorted for maximum reuse. Running your geometry through Imagination Technologies’ PVRTTriStrip tool would probably be your best bet here.
If you only have 58 different 64x64 textures, a texture atlas seems like a good idea, since they'd all fit in a single 512x512 texture... if you don't rely on texture wrap modes, I'd certainly at least try this.
What format are your textures in? You might try using a compressed PVRTC texture; I think that's less load on the Tiler, and I've been pleasantly surprised by the image quality even for 2-bit-per-pixel textures. (Good for natural images, not good if you're doing something that looks like an 8-bit video game)
The first thing I would do is run Instruments profiling on the hardware device that is slow. It should show you pretty quickly where the bottlenecks are for your particular case.
Update after instruments results:
This question has a similar result in Instruments to you, perhaps the advice is also applicable in your case (basically reducing number vertex data)
The biggest win in graphics programming comes down to this:
Batch, Batch, Batch
TextureAtlasing will make a bigger difference than most anything else you can do. Switching textures is like stopping a speeding train to let on new passengers every time.
Combine all those textures into an atlas and cut your draw calls down a lot.
This web-based tool may be helpful: http://zwoptex.zwopple.com/
Have you looked over the "OpenGL ES Programming Guide for iPhone OS" in the dev center? There are sections on Best Practices for Vertex Data and Texture Data.
Is your data formatted to be able to use triangle strips?
In terms of least effort, the modification sequence for you would probably be:
Reducing vertex attribute size
VBOs
Note that when you do these, you need to make sure that components are aligned on their native alignment, i.e. the floats or full ints are on 4-byte boundaries, the shorts are on 2-byte boundaries. If you don't do this it will tank your performance. It might be helpful to mentally map it by typing out your attribute ordering as a struct definition so you can sanity check your layout and alignment.
making sure your data is stripped to share vertices
using a texture atlas to reduce texture swaps
To try converting your textures to 16-bit RGB565 format, see this code in Apple's venerable Texture2D.m, search for kTexture2DPixelFormat_RGB565
http://code.google.com/p/cocos2d-iphone/source/browse/branches/branch-0.1/OpenGLSupport/Texture2D.m
(this code loads PNGs and converts them to RGB565 at texture creation time; I don't know if there's an RGB565 file format as such)
For more information on PVRTC compressed textures (which looked way better than I expected when I used them, even at 2 bits per pixel) see Apple's PVRTextureLoader sample:
http://developer.apple.com/iPhone/library/samplecode/PVRTextureLoader/index.html
it has both the code for loading PVRTC textures in your app and also instructions for using the texturetool to convert your .png files into .pvr files.