Metal: limit MTLRenderCommandEncoder texture loading to only part of texture - swift

I do have a Metal rendering pipeline setup which encodes render commands and operates on a texture: MTLTexture object to load and store the output. This texture is rather large and and each render command just operates on a small fraction of the whole texture. The basic setup is roughly the following:
// texture: MTLTexture, pipelineState: MTLRenderPipelineState, commandBuffer: MTLCommandBuffer
// create and setup MTLRenderPassDescriptor with loadAction = .load
let renderPassDescriptor = MTLRenderPassDescriptor()
if let attachment = self.renderPassDescriptor?.colorAttachments[0] {
attachment.clearColor = MTLClearColorMake(0.0, 0.0, 0.0, 0.0)
attachment.texture = texture // texture size is rather large
attachment.loadAction = .load
attachment.storeAction = .store
}
// create MTLRenderCommandEncoder
guard let renderCommandEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: renderPassDescriptor) else { return }
// limit rendering to small fraction of texture
let scissorRect = CGRect(origin: CGPoint.zero, size: 0.1 * CGSize(width: CGFloat(texture.width), height: CGFloat(texture.height))) // create rect begin small fraction of texture rect
let metalScissorRect = MTLScissorRect(x: Int(scissorRect.origin.x), y: Int(scissorRect.origin.y), width: Int(scissorRect.width), height: Int(scissorRect.height))
renderCommandEncoder.setScissorRect(metalScissorRect)
renderCommandEncoder.setRenderPipelineState(pipelineState)
renderCommandEncoder.setScissorRect(metalScissorRect)
// encode some commands here
renderCommandEncoder.endEncoding()
In practice many renderCommandEncoder objects are created, each time just operating on a small fraction of the texture. Unfortunately, each time a renderCommandEncoder is commited the whole texture is loaded and stored at the end, which is specified by the renderPassDescriptor due to the corresponding setting of its colorAttachment loadAction and storeAction.
My Question is:
Is it possible to limit the load and store process to a region of texture? (in order to avoid wasting computation time for loading and storing huge parts of the texture when only a small part is needed)

One approach, to avoid loading and storing the entire texture into the render pipeline, could be the following, assuming your scissor rectangle is constant between drawcalls:
Blit (MTLBlitCommandEncoder) the region of interest from the large texture to a smaller(e.g. the size of your scissor rectangle) intermediate texture.
Load and store, and draw/operate only on the smaller intermediate texture.
When done encoding, blit back the result to the original source region of the larger texture.
This way you load and store only the region of interest in your pipeline, with only the added constant memory cost of maintaining a smaller intermediate texture(assuming region of interest is constant between drawcalls).
Blitting is a fast operation and the above method should thus optimize your current pipeline.

Related

Swift, SpriteKit: Low FPS with a huge for-loop in update method

Is it normal to have very low FPS (~7fps to ~10fps) with Sprite Kit using the code below?
Use case:
I'm drawing just lines from bottom to top (1024 * 64 lines). I have some delta value that determines the positions of a single line for every frame. These lines represent my CGPath, which is assigned to the SKShapeNode every frame. Nothing else. I'm wondering about the performance of SpriteKit (or maybe of Swift).
Do you have any suggestions to improve the performance?
Screen:
Code:
import UIKit
import SpriteKit
class SKViewController: UIViewController {
#IBOutlet weak var skView: SKView!
var scene: SKScene!
var lines: SKShapeNode!
let N: Int = 1024 * 64
var delta: Int = 0
override func viewDidLoad() {
super.viewDidLoad()
scene = SKScene(size: skView.bounds.size)
scene.delegate = self
skView.showsFPS = true
skView.showsDrawCount = true
skView.presentScene(scene)
lines = SKShapeNode()
lines.lineWidth = 1
lines.strokeColor = .white
scene.addChild(lines)
}
}
extension SKViewController: SKSceneDelegate {
func update(_ currentTime: TimeInterval, for scene: SKScene) {
let w: CGFloat = scene.size.width
let offset: CGFloat = w / CGFloat(N)
let path = UIBezierPath()
for i in 0 ..< N { // N -> 1024 * 64 -> 65536
let x1: CGFloat = CGFloat(i) * offset
let x2: CGFloat = x1
let y1: CGFloat = 0
let y2: CGFloat = CGFloat(delta)
path.move(to: CGPoint(x: x1, y: y1))
path.addLine(to: CGPoint(x: x2, y: y2))
}
lines.path = path.cgPath
// Updating delta to simulate the changes
//
if delta > 100 {
delta = 0
}
delta += 1
}
}
Thanks and Best regards,
Aiba ^_^
CPU
65536 is a rather large number. Telling the CPU to comprehend this many loops will always result in slowness. For example, even if I make a test Command Line project that only measures the time it takes to run an empty loop:
while true {
let date = Date().timeIntervalSince1970
for _ in 1...65536 {}
let date2 = Date().timeIntervalSince1970
print(1 / (date2 - date))
}
It will result in ~17 fps. I haven't even applied the CGPath, and it's already appreciably slow.
Dispatch Queue
If you want to keep your game at 60fps, but your rendering of specifically your CGPath may be still slow, you can use a DispatchQueue.
var rendering: Bool = false // remember to make this one an instance value
while true {
let date = Date().timeIntervalSince1970
if !rendering {
rendering = true
let foo = DispatchQueue(label: "Run The Loop")
foo.async {
for _ in 1...65536 {}
let date2 = Date().timeIntervalSince1970
print("Render", 1 / (date2 - date))
}
rendering = false
}
}
This retains a natural 60fps experience, and you can update other objects, however, the rendering of your SKShapeNode object is still quite slow.
GPU
If you'd like to speed up the rendering, I would recommend looking into running it on the GPU instead of the CPU. The GPU (Graphics Processing Unit) is much better fitted for this, and can handle huge loops without disturbing gameplay. This may require you to program it as an SKShader, in which there are tutorials for.
Check the number of subdivisions
No iOS device has a screen width over 3000 pixels or 1500 points (retina screens have logical points and physical pixels where a point is equivalent to 2 or 3 pixels depending on the scale factor; iOS works with points, but you have to also remember pixels), and the ones that even come close are those with the biggest screens (iPad Pro 12.9 and iPhone Pro Max) in landscape mode.
A typical device in portrait orientation will be less than 500 points and 1500 pixels wide.
You are dividing this width into 65536 parts, and will end up with pixel (not even point) coordinates like 0.00, 0.05, 0.10, 0.15, ..., 0.85, which will actually refer to the same pixel twenty times (my result, rounded up, in an iPhone simulator).
Your code draws twenty to sixty lines in the exact same physical position, on top of each other! Why do that? If you set N to w and use 1.0 for offset, you'll have the same visible result at 60 FPS.
Reconsider the approach
The implementation will still have some drawbacks, though, even if you greatly reduce the amount of work to be done per frame. It's not recommended to advance animation frames in update(_:) since you get no guarantees on the FPS, and you usually want your animation to follow a set schedule, i.e. complete in 1 second rather than 60 frames. (Should the FPS drop to, say, 10, a 60-frame animation would complete in 6 seconds, whereas a 1-second animation would still finish in 1 second, but at a lower frame rate, i.e. skipping frames.)
Visibly, what your animation does is draw a rectangle on the screen whose width fills the screen, and whose height increases from 0 to 100 points. I'd say, a more "standard" way of achieving this would be something like this:
let sprite = SKSpriteNode(color: .white, size: CGSize(width: scene.size.width, height: 100.0))
sprite.yScale = 0.0
scene.addChild(sprite)
sprite.run(SKAction.repeatForever(SKAction.sequence([
SKAction.scaleY(to: 1.0, duration: 2),
SKAction.scaleY(to: 0.0, duration: 0.0)
])))
Note that I used SKSpriteNode because SKShapeNode is said to suffer from bugs and performance issues, although people have reported some improvements in the past few years.
But if you do insist on redrawing the entire texture of your sprite every frame due to some specific need, that may indeed be something for custom shaders… But those require learning a whole new approach, not to mention a new programming language.
Your shader would be executed on the GPU for each pixel. I repeat: the code would be executed for each single pixel – a radical departure from the world of SpriteKit.
The shader would access a bunch of values to work with, such a normalized set of coordinates (between (0.0,0.0) and (1.0,1.0) in a variable called v_tex_coord) and a system time (seconds elapsed since the shader has been running) in u_time, and it would need to determine what color value the pixel in question would need to be – and set it by storing the value in the variable gl_FragColor.
It could be something like this:
void main() {
// default to a black color, or a three-dimensional vector v3(0.0, 0.0, 0.0):
vec3 color = vec3(0.0);
// take the fraction part of the time in seconds;
// this value will go from 0.0 to 0.9999… every second, then drop back to 0.0.
// use this to determine the relative height of the area we want to paint white:
float height = fract(u_time);
// check if the current pixel is below the height of the white area:
if (v_tex_coord.y < height) {
// if so, set it to white (a three-dimensional vector v3(1.0, 1.0, 1.0)):
color = vec3(1.0);
}
gl_FragColor = vec4(color,1.0); // the fourth dimension is the alpha
}
Put this in a file called shader.fsh, create a full-screen sprite mySprite, and assign the shader to it:
mySprite.shader = SKShader.init(fileNamed: "shader.fsh")
Once you display the sprite, its shader will take care of all of the rendering. Note, however, that your sprite will lose some SpriteKit functionalities as a result.

CGContext, issue when drawing neighboring paths individually, don't blend together

this is my first post on stack, so pardon any etiquette problems I'm not aware of
I am creating a simple drawing engine for one of my apps. It is mostly based on Apple's Touch Canvas demo for the apple pencil
https://developer.apple.com/documentation/uikit/touches_presses_and_gestures/illustrating_the_force_altitude_and_azimuth_properties_of_touch_input
That engine works by stroking paths, since I need to vary the width smoothly I decided to do it by filling paths, the logic to create that path is very similar to this cocos2d project
https://github.com/krzysztofzablocki/LineDrawing
I’m very happy with the performance of the engine at the moment, so I decided to be more ambitious: introduce variable opacity according to force. To do this, instead of drawing the whole path from the stroke, I subdivide it into segments, a segment looks like this using a stroke line

I draw each segment path one by one, in filled out form, it looks like this
So no issues right? Well, if you have an Ipad (or zoom), you notice a problem

The paths are actually stitched together! There is no smooth transition. Seems to be an antialiasing problem. What I noticed is that, if I very slightly offset the beginning of a new segment into the previous segment (right now, the end of the previous segment is the start of the new one), it seems to blend better , sometimes disappearing the stitches completely, but this is very tricky, while it works with colors at full opacity, when you reduce the opacity the same constant might not work, and sometimes there are variations in results when using the simulator vs a real device
I’m kinda lost here, is there any way kinda smooth the transition between segments?
This is my drawing routine, right now I'm using gradients, but I already tried to fill the path with a solid color with no clipping and got the same results
for (index,subpath) in pathResults.subPaths.enumerated(){
//Si no es inicio de linea el primer segmento es historico para suavizar, no dibujar
if !startOfLine && index == 0{
continue
}
let colors = [properties.color.cgColor.copy(alpha: opacityForPoint(subpath.startPoint)),
properties.color.cgColor.copy(alpha: opacityForPoint(subpath.endPoint))]
as CFArray
if let gradient = CGGradient(colorsSpace: context.colorSpace,
colors: colors,
locations: [0.0,1.0]){
context.addPath(subpath.subpath)
context.clip()
context.drawLinearGradient(gradient,
start: subpath.startPoint.properties.location,
end: subpath.endPoint.properties.location,
options: gradientOptions)
context.resetClip()
}
}
this is the code for the context I draw in:
private lazy var frozenContext: CGContext = {
let scale = self.properties.scale
var size = self.properties.size
size.width *= scale
size.height *= scale
let colorSpace = CGColorSpaceCreateDeviceRGB()
let context: CGContext = CGContext(data: nil,
width: Int(size.width),
height: Int(size.height),
bitsPerComponent: 8,
bytesPerRow: 0,
space: colorSpace,
bitmapInfo: CGImageAlphaInfo.premultipliedLast.rawValue)!
let transform = CGAffineTransform(scaleX: scale, y: scale)
context.concatenate(transform)
return context
}()

How to scale SCNNodes to fit in a box?

I have multiple collada files with objects (humans) of various sizes, created from different 3D program sources. I desire to scale the objects so they fit inside frame or box. From my reading, I cant using the bounding box to scale the node, so what feature do you utilize to scale the nodes, relative to each other?
// humanNode = {...get node, which is some unknown size }
let (minBound, maxBound) = humanNode.boundingBox
let blockNode = SCNNode(geometry: SCNBox(width: 10, height: 10, length: 10, chamferRadius: 0))
// calculate scale factor so it fits inside of box without having known its size before hand.
s = { ...some method to calculate the scale to fit the humanNode into the box }
humanNode.scale = SCNVector3Make(s, s, s)
How get its size relative to the literal box I want to put it in and scale it?
Is it possible to draw the node off screen to measure its size?

how to avoid resizing texture in swift

I am adding the spritenode to the scene, the size is given.
But when I change the texture of the spritenode, the size automatically changes to the original size of the image(png) of the texture.
How can I avoid this?
My code:
var bomba = SKSpriteNode(imageNamed: "bomba2")
var actionbomba = SKAction()
bomba.size = CGSizeMake(frame2.size.width/18, frame2.size.width/18)
let bomba3 = SKTexture(imageNamed: "bomba3.png")
actionbomba.addObject(SKAction.moveBy(CGVectorMake(0, frame.size.height/2.65), duration: beweegsnelheid))
actionbomba.addObject(SKAction.setTexture(bomba3,resize: false))
addChild(bomba)
bomba.runAction(SKAction.repeatAction(SKAction.sequence(actionbomba), count: -1))
Do not set the size explicitly. From your information sprite kit will not automatically find the scale factor for each texture and scale it.
Instead, you set the scale factor of the node and each texture will have that scale applied to it.
[playernode setScale: x];
Something like this. You only have to set it when you create the node and each texture will be the size you would expect, given that your textures are the same size.
I use this method for all of my nodes that are animated with multiple textures and it works every time.

SceneKit's performance with a cube test

In learning 3d graphics programming for games I decided to start off simple by using the Scene Kit 3D API. My first gaming goal was to build a very simplified mimic of MineCraft. A game of just cubes - how hard can it be.
Below is a loop I wrote to place a ride of 100 x 100 cubes (10,000) and the FPS performance was abysmal (~20 FPS). Is my initial gaming goal too much for Scene Kit or is there a better way to approach this?
I have read other topics on StackExchange but don't feel they answer my question. Converting the exposed surface blocks to a single mesh won't work as the SCNGeometry is immutable.
func createBoxArray(scene : SCNScene, lengthCount: Int, depthCount: Int) {
let startX : CGFloat = -(CGFloat(lengthCount) * CUBE_SIZE) + (CGFloat(lengthCount) * CUBE_MARGIN) / 2.0
let startY : CGFloat = 0.0
let startZ : CGFloat = -(CGFloat(lengthCount) * CUBE_SIZE) + (CGFloat(lengthCount) * CUBE_MARGIN) / 2.0
var currentZ : CGFloat = startZ
for z in 0 ..< depthCount {
currentZ += CUBE_SIZE + CUBE_MARGIN
var currentX = startX
for x in 0 ..< lengthCount {
currentX += CUBE_SIZE + CUBE_MARGIN
createBox(scene, x: currentX, y: startY, z: currentZ)
}
}
}
func createBox(scene : SCNScene, x: CGFloat, y: CGFloat, z: CGFloat) {
var box = SCNBox(width: CUBE_SIZE, height: CUBE_SIZE, length: CUBE_SIZE, chamferRadius: 0.0)
box.firstMaterial?.diffuse.contents = NSColor.purpleColor()
var boxNode = SCNNode(geometry: box)
boxNode.position = SCNVector3Make(x, y, z)
scene.rootNode.addChildNode(boxNode)
}
UPDATE 12-30-2014:
I modified the code so the SCNBoxNode is created once and then each additional box in the array of 100 x 100 is created via:
var newBoxNode = firstBoxNode.clone()
newBoxNode.position = SCNVector3Make(x, y, z)
This change appears to have increased FPS to ~30fps. The other statistics are as follows (from the statistics displayed in the SCNView):
10K (I assume this is draw calls?)
120K (I assume this is faces)
360K (Assuming this is the vertex count)
The bulk of the run loop is in Rendering (I'm guesstimating 98%). The total loop time is 26.7ms (ouch). I'm running on a Mac Pro Late 2013 (6-core w/Dual D500 GPU).
Given that a MineCraft style game has a landscape that constantly changes based on the players actions I don't see how I can optimize this within the confines of Scene Kit. A big disappointment as I really like the framework. I'd love to hear someone's ideas on how I can address this issue - without that, I'm forced to go with OpenGL.
UPDATE 12-30-2014 # 2:00pm ET:
I am seeing a significant performance improvement when using flattenedClone(). The FPS is now a solid 60fps even with more boxes and TWO drawing calls. However, accommodating a dynamic environment (as MineCraft supports) is still proving problematic - see below.
Since the array would change composition over time I added a keyDown handler to add an even larger box array to the existing and timed the difference between adding the array of boxes resulting in far more calls versus adding as a flattenedClone. Here's what I found:
On keyDown I add another array of 120 x 120 boxes (14,400 boxes)
// This took .0070333 milliseconds
scene?.rootNode.addChildNode(boxArrayNode)
// This took .02896785 milliseconds
scene?.rootNode.addChildNode(boxArrayNode.flattenedClone())
Calling flattenedClone() again is 4x slower than adding the array.
This results in two drawing calls having 293K faces and 878K vertices. I'm still playing with this and will update if I find anything new. Bottom line, with my additional testing I still feel Scene Kit's immutable geometric constraints mean I can't leverage the framework.
As you mentionned Minecraft, I think it's worth looking at how it works.
I have no technical details or code sample for you, but everything should be pretty straightfoward:
Have you ever played minecraft online, and the terrain is not loading allowing you to see through? That's because there is no geometry inside.
let's assume I have a 2x2x2 array of cubes. That makes 2*2*2*6*2 = 96 triangles.
However, if you test and draw only the polygons on the visible from the camera point of view, maybe by testing the normals (easy since it's cubes), this number goes down to 48 triangles.
If you find a way to see which faces are occluded by other ones (which shouldn't be too hard either considering you're working with flat, quared, grid based faces) you can only draw these. that way, we're drawing between 8 and 24 triangleS. That's up to 90% optimisation.
If you want to get really deep, you can even combine faces, to make a single N-gon out of the visible, flat faces. You can do that if you create a new way to generate the geometry on the fly that combines the two previous methods and test for adgacent visible faces on the same plane.
If you succeed, we're talking 2 to 6 polygons instead of 96, to render 8 cubes.
Note that the last method only works if your blocks are touching each other.
There is probably a ton of Minecraft-like renderer papers, a few googles will help you figure it out!
Why does drop-frame occur?
September 04, 2022
Almost 8 years passed since you asked this question, but not much has changed...
1. Polygons' count
The number of polygons in SceneKit or RealityKit scene must not exceed 100,000 triangular polygons. An ideal SceneKit's scene, that is capable of rendering all the models faster, should contain less than 50,000 polygons. Your scene contains 120,000 polygons. Do not forget that SceneKit renders models using single thread (unlike multi-threaded RealityKit renderer).
2. Shaders
In Xcode 14.0+, SceneKit's default .lightingModel of any 3D library's primitive set in Material Inspector (UI version) is .physicallyBased material. This is the most computationally intensive shader. Programmatic version of the .lightingModel for any SCN procedural geometry is .blinn shading model. The least computationally intensive shader is .constant (it doesn't depend on lighting).
3. What's inside a frustum
If all 10,000 cubes are inside the SceneKit camera frustum, then the frame rate will be 20-30 fps. But if you dollied in the cubes' matrix and see no more than a ninth part of it, then the frame rate will be 60 fps. Thus, SceneKit does not render those objects that are outside the frustum's bounds.
4. Number of meshes in SCNScene
Each model mesh results in a draw call. To achieve 60 fps each draw call should be 16 milliseconds or less. For best performance, Apple engineers advise to limit the number of meshes in a .usdz file to around 50. Unfortunately, I did not find a value for .scn files in the official documentation.
5. Lighting and shadows
Lighting and shadowing (especially shadowing) are very computationally intensive tasks. The general advice is the following – avoid using .forward shadows and hi-rez textures with fake shadows.
Look at this post for details.
SwiftUI code for testing
Xcode 14.0+, SwiftUI 4.0+, Swift 5.7
import SwiftUI
import SceneKit
struct ContentView: View {
var scene = SCNScene()
var options: SceneView.Options = [.allowsCameraControl]
var body: some View {
ZStack {
ForEach(-50...49, id: \.self) { x in
ForEach(-50...49, id: \.self) { z in
let _ = DispatchQueue.global().async {
scene.rootNode.addChildNode(createCube(x, 0, z))
}
}
}
SceneView(scene: scene, options: options)
.ignoresSafeArea()
let _ = scene.background.contents = UIColor.black
}
}
func createCube(_ posX: Int, _ posY: Int, _ posZ: Int) -> SCNNode {
let geo = SCNBox(width: 0.5, height: 0.5, length: 0.5,
chamferRadius: 0.0)
geo.firstMaterial?.lightingModel = .constant
let boxNode = SCNNode(geometry: geo)
boxNode.position = SCNVector3(posX, posY, posZ)
return boxNode
}
}
Here, all cubes are within the viewing frustum, so there are obvious reasons for a drop-frame.
And here, just a part of a scene is within the viewing frustum, so there is no drop-frame.