How to draw text in iOS swift using Metal Framework - swift

I am working on a Real-Time Data Monitoring app. I have successfully drawn waves using the Metal framework but I am facing problems while drawing simple text/strings. Like how to print "Hello" in MTKView. Here I am updating the vertices using a timer and then calling draw() to perform drawing. Only GPU rendering is required.
func draw(in view: MTKView) {
// print("calling")
// guard let drawablelayer = metalLayer!.nextDrawable(),
guard //let mainDrawable = view.currentDrawable,
// let _pipeLineState = self.pipelineState,
let discriptor = view.currentRenderPassDescriptor else {
return
}
let commandBuffer = commandQue.makeCommandBuffer()
let commandEncoder = commandBuffer?.makeRenderCommandEncoder(descriptor: discriptor)
//commandEncoder?.setRenderPipelineState(_pipeLineState)
viewPort = MTLViewport.init(originX: 0.0, originY: 0.0, width: 750, height: 1334, znear: 0.0, zfar: 0.0)
commandEncoder?.setViewport(viewPort!)
commandEncoder?.setVertexBuffer(layoutBuffer, offset: 0, index: 0)
commandEncoder?.setRenderPipelineState(noninterleavedRenderPipeline)
commandEncoder?.drawPrimitives(type: .triangle, vertexStart:0, vertexCount: verticesLayout.count)
commandEncoder?.setVertexBuffer(vertexBuffer, offset: 0, index: 0)
commandEncoder?.setRenderPipelineState(interleavedRenderPipeline)
commandEncoder?.drawPrimitives(type: .line, vertexStart:0, vertexCount: vertices.count)
commandEncoder?.setVertexBuffer(topBuffer, offset: 0, index: 0)
commandEncoder?.setRenderPipelineState(topInterleavedRenderPipeline)
commandEncoder?.drawPrimitives(type: .triangle, vertexStart:0, vertexCount: verticesRect.count)
commandEncoder?.endEncoding()
// commandBuffer?.present(mainDrawable)
if let drawable = view.currentDrawable {
commandBuffer?.present(drawable)
}
commandBuffer?.commit()
}

There are a number of libraries around, most of which use information directly from fonts to generate lines, Bezier curves, and other primitives that are "procedural" rather than representational (the latter being like a bitmap), and then tessellating them for presentation using Metal primitives.
Warren Moore has a 2018 piece (with code) that describes such an approach using libtess, an established tessilation library. I have not used this version but have played with earlier code that Moore has posted, with some success. You can find it at: https://metalbyexample.com/text-3d/
The article describes 3D text rendering (really artsy stuff) but also has links to other sites that describe simpler 2D solutions.
Warren posted an older 2D Objective-C solution on GitHub at: https://github.com/metal-by-example/sample-code/tree/master/objc/07-Mipmapping/Mipmapping.
It's not for the squeamish, but given the associated code I think you should be able to make progress.
What might strike you as a bit less intimidating, though the results are a bit grainier, is to use Metal Kit to import a Core Graphics texture into which you have written some text. See: https://developer.apple.com/documentation/metalkit/mtktextureloader

Related

Triple (ring) buffering doesn't increase the FPS

I'm creating a drawing app with Metal. Unlike other drawing app, my app stores all strokes as data model rather than just saving the bitmap result after drawing, the goal of this approach is allowing users to use eraser tool and remove a stroke without touching other strokes.
When user erase a stroke, the app have to render remaining strokes on the screen again, so the function to render all strokes on the screen should be as fast as possible, but I'm having problem with it.
When render strokes on the screen, I use this function:
func renderStroke(strokes: [Stroke]) {
let commandBuffer = commandQueue?.makeCommandBuffer()
let commandEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: renderer.renderPassDescriptor)
commandEncoder?.setRenderPipelineState(pipelineState)
var allPoints = [MetalStrokePoint]()
for stroke in strokes {
let pointsAlongPath = stroke.cachedPointsAlongPath
allPoints.append(contentsOf: pointsAlongPath)
}
let pointsAlongPathBuffer = sharedDevice?.makeBuffer(bytes: allPoints, length: MemoryLayout<MetalStrokePoint>.stride * allPoints.count, options: .cpuCacheModeWriteCombined)
if let pointsAlongPathBuffer {
commandEncoder?.setVertexBuffer(pointsAlongPathBuffer, offset: 0, index: 0)
commandEncoder?.setVertexBuffer(renderer.uniformBuffer, offset: 0, index: 1)
commandEncoder?.setVertexBuffer(renderer.transformBuffer, offset: 0, index: 2)
commandEncoder?.setFragmentTexture(self.stampTexture, index: 0)
commandEncoder?.drawPrimitives(type: .point, vertexStart: 0, vertexCount: pointsAlongPath.count)
}
renderer?.commitCommandBufer()
}
I tested to render 4000 strokes to the screen by calling this function in a CADisplayLink loop, the length of the buffer is 4'000'000 (Is that too much for Metal to handle?) and the FPS of the app is 12 FPS - That is way too slow and make my app not responsive.
Then some guys here have suggested not to create a new buffer for each draw call:
Poor performance in Metal drawing app when render more than 4000 strokes.
So I implemented ring buffer to reuse buffer for each draw call:
func renderStroke(strokes: [Stroke]) {
// Wait for a free ring buffer
renderer?.ringBufferSemaphore?.wait()
let commandBuffer = commandQueue?.makeCommandBuffer()
let commandEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: renderer.renderPassDescriptor)
commandEncoder?.setRenderPipelineState(pipelineState)
var allPoints = [MetalStrokePoint]()
for stroke in strokes {
let pointsAlongPath = stroke.cachedPointsAlongPath
allPoints.append(contentsOf: pointsAlongPath)
}
// Reuse ring buffer to avoid creating new buffer
let ringBuffer = renderer.ringBuffers.getCurrentRingBuffer()
let ringBufferContents = ringBuffer.contents().bindMemory(to: MetalStrokePoint.self, capacity: allPoints.count)
for (pointIndex, point) in allPoints.enumerated() {
ringBufferContents[pointIndex] = point
}
commandEncoder?.setVertexBuffer(ringBuffer, offset: 0, index: 0)
commandEncoder?.setVertexBuffer(renderer.uniformBuffer, offset: 0, index: 1)
commandEncoder?.setVertexBuffer(renderer.transformBuffer, offset: 0, index: 2)
commandEncoder?.setFragmentTexture(self.stampTexture, index: 0)
commandEncoder?.drawPrimitives(type: .point, vertexStart: 0, vertexCount: pointsAlongPath.count)
// Signal to free current ring buffer
renderer.commandBuffer?.addCompletedHandler({ commandBuffer in
renderer.ringBufferSemaphore?.signal()
})
renderer?.commitCommandBufer()
}
Result: CPU usage recuced by 5% but the FPS do not change. Am I doing anything wrong here or just ring buffers do not help in this case?

How to apply a texture to a specific channel on a 3d obj model in Swift?

I'm kind of stuck right now when it comes to applying a specific texture on my 3d obj model.
Easiest solution of all would be to do let test = SCNScene(named: "models.scnassets/modelFolder/ModelName.obj"), but this requires that the mtl file maps the texture file directly inside of it which is not something that's possible with my current workflow.
With my current understanding, this leaves me with the option of using a scattering function to apply textures to a specific semantic, something like such :
if let url = URL(string: obj) {
let asset = MDLAsset(url: url)
guard let object = asset.object(at: 0) as? MDLMesh else {
print("Failed to get mesh from asset.")
self.presentAlert(title: "Warning", message: "Could not fetch the model.", firstBtn: "Ok")
return
}
// Create a material from the various textures with a scatteringFunction
let scatteringFunction = MDLScatteringFunction()
let material = MDLMaterial(name: "material", scatteringFunction: scatteringFunction)
let property = MDLMaterialProperty(name: "texture", semantic: .baseColor, url: URL(string: self.textureURL))
material.setProperty(property)
// Apply the texture to every submesh of the asset
object.submeshes?.forEach {
if let submesh = $0 as? MDLSubmesh {
submesh.material = material
}
}
// Wrap the ModelIO object in a SceneKit object
let node = SCNNode(mdlObject: object)
let scene = SCNScene()
scene.rootNode.addChildNode(node)
// Set up the SceneView
sceneView.scene = scene
...
}
The actual problem is the semantics. The 3d models are made on Unreal and for many models there's a png texture which has 3 semantics inside of it, namely Ambient Occlusion, Roughness and Metallic. Ambient Occlusion would need to be applied on the red channel, Roughness on the greed channel and Metallic on the blue channel.
How could I achieve this? An MdlMaterialSemantic has all of these possible semantics, but metallic, ambient occlusion and roughness are all separate. I tried simply applying the texture on each, but obviously this did not work very well.
Considering that my .png texture has all of those 3 "packaged" in it under a different channel, how can I work with this? I was thinking that maybe I could somehow use a small script to add mapping to the texture in the mtl file on my end in the app directly, but this seems sketchy lol..
What are my other options if there's no way of doing this? I've also been trying to use fbx files with assimpKit, but I couldn't manage to load any textures, just the model in black...
I am open to any suggestion, if more info is needed, please let me know! Thank you very much!
Sorry, I don't have enough rep to comment, but this might be more of a comment than an answer!
Have you tried loading the texture png image separately (as a NS/UI/CGImage) and then splitting it into three channels manually, then applying these channels separately? (Splitting into three separate channels is not as simple as it could be... but you could use this grayscale conversion for guidance, and just do one channel at a time.)
Once you have your objects in SceneKit, it is possibly slightly easier to modify these materials. Once you have a SCNNode with a SCNGeometry with a SCNMaterial you can access any of these materials and set the .contents property to almost anything (including a XXImage).
Edit:
Here's an extension you can try to extract the individual channels from a CGImage using Accelerate. You can get a CGImage from an NSImage/UIImage depending on whether you're on Mac or iOS (and you can load the file directly into one of those image formats).
I've just adapted the code from the link above, I am not very experienced with the Accelerate framework, so use at your own risk! But hopefully this puts you on the right path.
extension CGImage {
enum Channel {
case red, green, blue
}
func getChannel(channel: Channel) -> CGImage? {
// code adapted from https://developer.apple.com/documentation/accelerate/converting_color_images_to_grayscale
guard let format = vImage_CGImageFormat(cgImage: cgImage) else {return nil}
guard var sourceImageBuffer = try? vImage_Buffer(cgImage: cgImage, format: format) else {return nil}
guard var destinationBuffer = try? vImage_Buffer(width: Int(sourceImageBuffer.width), height: Int(sourceImageBuffer.height), bitsPerPixel: 8) else {return nil}
defer {
sourceImageBuffer.free()
destinationBuffer.free()
}
let redCoefficient: Float = channel == .red ? 1 : 0
let greenCoefficient: Float = channel == .green ? 1 : 0
let blueCoefficient: Float = channel == .blue ? 1 : 0
let divisor: Int32 = 0x1000
let fDivisor = Float(divisor)
var coefficientsMatrix = [
Int16(redCoefficient * fDivisor),
Int16(greenCoefficient * fDivisor),
Int16(blueCoefficient * fDivisor)
]
let preBias: [Int16] = [0, 0, 0, 0]
let postBias: Int32 = 0
vImageMatrixMultiply_ARGB8888ToPlanar8(&sourceImageBuffer,
&destinationBuffer,
&coefficientsMatrix,
divisor,
preBias,
postBias,
vImage_Flags(kvImageNoFlags))
guard let monoFormat = vImage_CGImageFormat(
bitsPerComponent: 8,
bitsPerPixel: 8,
colorSpace: CGColorSpaceCreateDeviceGray(),
bitmapInfo: CGBitmapInfo(rawValue: CGImageAlphaInfo.none.rawValue),
renderingIntent: .defaultIntent) else {return nil}
guard let result = try? destinationBuffer.createCGImage(format: monoFormat) else {return nil}
return result
}
}

MTKView compositing one UIImage at a time

I have an array of UIImages that I want to composite via an MTKView using a variety of specific comp modes (source-over, erase, etc). With the approach I describe below, I find that the biggest overhead seems to be in converting each UIImage into an MTLTexture that I can use to populate an MTKView's currentDrawable buffer.
The drawing loop looks like this:
for strokeDataCurrent in strokeDataArray {
let strokeImage = UIImage(data: strokeDataCurrent.image) // brushstroke
let strokeBbox = strokeDataCurrent.bbox // brush bounding box
let strokeType = strokeDataCurrent.strokeType // used to define comp mode
// convert strokeImage to a MTLTexture and composite
drawStrokeImage(paintingViewMetal: self.canvasMetalViewPainting, strokeImage: strokeImage!, strokeBbox: strokeBbox, strokeType: strokeType)
} // end of for strokeDataCurrent in strokeDataArray
Inside of drawStrokeImage, I convert each stroke to an MTLTexture like this:
guard let stampTexture = device!.makeTexture(descriptor: texDescriptor) else { return }
let destCGImage = strokeImage.cgImage!
let dstData: CFData = (destCGImage!.dataProvider!.data)!
let pixelData = CFDataGetBytePtr(dstData)
let region = MTLRegionMake2D(0, 0, Int(width), Int(height))
stampTexture.replace(region: region, mipmapLevel: 0, withBytes: pixelData!, bytesPerRow: Int(rowBytes))
with all this in place, I define a vertex buffer, set a commandEncoder:
defineCommandEncoder(renderCommandEncoder: renderCommandEncoder, vertexArrayStamps: vertexArrayStamps, metalTexture: stampTexture)
and call setNeedsDisplay() to render. This is happening for each stroke in the above for loop.
While I get ok performance in this approach, I'm wondering if I can squeeze more performance somewhere along the way? Like I said, I think the current bottleneck is in going from CGImage -> MTLTexture.
Note that I am rendering to a defined MTLTexture metalDrawableTextureComposite which I am blitting to the currentDrawable for each stroke:
copyTexture(buffer: commandBuffer!, from: metalDrawableTextureComposite, to: self.currentDrawable!.texture)
Hopefully this is enough detail to provide context for my question. Also, if anyone has ideas for other (GPU/Metal-based hopefully) faster compositing approaches that would be awesome. Any thoughts would be appreciated.

What can cause lag in recurrent calls to the draw() function of a MetalKit MTKView

I am designing a Cocoa application using the swift 4.0 MetalKit API for macOS 10.13. Everything I report here was done on my 2015 MBPro.
I have successfully implemented an MTKView which renders simple geometry with low vertex count very well (Cubes, triangles, etc.). I implemented a mouse-drag based camera which rotates, strafes and magnifies. Here is a screenshot of the xcode FPS debug screen while I rotate the cube:
However, when I try loading a dataset which contains only ~1500 vertices (which are each stored as 7 x 32bit Floats... ie: 42 kB total), I start getting a very bad lag in FPS. I will show the code implementation lower. Here is a screenshot (note that on this image, the view only encompasses a few of the vertices, which are rendered as large points) :
Here is my implementation:
1) viewDidLoad() :
override func viewDidLoad() {
super.viewDidLoad()
// Initialization of the projection matrix and camera
self.projectionMatrix = float4x4.makePerspectiveViewAngle(float4x4.degrees(toRad: 85.0),
aspectRatio: Float(self.view.bounds.size.width / self.view.bounds.size.height),
nearZ: 0.01, farZ: 100.0)
self.vCam = ViewCamera()
// Initialization of the MTLDevice
metalView.device = MTLCreateSystemDefaultDevice()
device = metalView.device
metalView.colorPixelFormat = .bgra8Unorm
// Initialization of the shader library
let defaultLibrary = device.makeDefaultLibrary()!
let fragmentProgram = defaultLibrary.makeFunction(name: "basic_fragment")
let vertexProgram = defaultLibrary.makeFunction(name: "basic_vertex")
// Initialization of the MTLRenderPipelineState
let pipelineStateDescriptor = MTLRenderPipelineDescriptor()
pipelineStateDescriptor.vertexFunction = vertexProgram
pipelineStateDescriptor.fragmentFunction = fragmentProgram
pipelineStateDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
pipelineState = try! device.makeRenderPipelineState(descriptor: pipelineStateDescriptor)
// Initialization of the MTLCommandQueue
commandQueue = device.makeCommandQueue()
// Initialization of Delegates and BufferProvider for View and Projection matrix MTLBuffer
self.metalView.delegate = self
self.metalView.eventDelegate = self
self.bufferProvider = BufferProvider(device: device, inflightBuffersCount: 3, sizeOfUniformsBuffer: MemoryLayout<Float>.size * float4x4.numberOfElements() * 2)
}
2) Loading of the MTLBuffer for the Cube vertices :
private func makeCubeVertexBuffer() {
let cube = Cube()
let vertices = cube.verticesArray
var vertexData = Array<Float>()
for vertex in vertices{
vertexData += vertex.floatBuffer()
}
VDataSize = vertexData.count * MemoryLayout.size(ofValue: vertexData[0])
self.vertexBuffer = device.makeBuffer(bytes: vertexData, length: VDataSize!, options: [])!
self.vertexCount = vertices.count
}
3) Loading of the MTLBuffer for the dataset vertices. Note that I explicitly declare the storage mode of this buffer as Private in order to ensure efficient access to the data by the GPU since the CPU does not need to access the data once the buffer is loaded. Also, note that I am loading only 1/100th of the vertices in my actual dataset because the entire OS on my machine starts lagging when I try to load it entirely (only 4.2 MB of data).
public func loadDataset(datasetVolume: DatasetVolume) {
// Load dataset vertices
self.datasetVolume = datasetVolume
self.datasetVertexCount = self.datasetVolume!.vertexCount/100
let rgbaVertices = self.datasetVolume!.rgbaPixelVolume[0...(self.datasetVertexCount!-1)]
var vertexData = Array<Float>()
for vertex in rgbaVertices{
vertexData += vertex.floatBuffer()
}
let dataSize = vertexData.count * MemoryLayout.size(ofValue: vertexData[0])
// Make two MTLBuffer's: One with Shared storage mode in which data is initially loaded, and a second one with Private storage mode
self.datasetVertexBuffer = device.makeBuffer(bytes: vertexData, length: dataSize, options: MTLResourceOptions.storageModeShared)
self.datasetVertexBufferGPU = device.makeBuffer(length: dataSize, options: MTLResourceOptions.storageModePrivate)
// Create a MTLCommandBuffer and blit the vertex data from the Shared MTLBuffer to the Private MTLBuffer
let commandBuffer = self.commandQueue.makeCommandBuffer()
let blitEncoder = commandBuffer!.makeBlitCommandEncoder()
blitEncoder!.copy(from: self.datasetVertexBuffer!, sourceOffset: 0, to: self.datasetVertexBufferGPU!, destinationOffset: 0, size: dataSize)
blitEncoder!.endEncoding()
commandBuffer!.commit()
// Clean up
self.datasetLoaded = true
self.datasetVertexBuffer = nil
}
4) Finally, here is the render loop. Again, this is using MetalKit.
func draw(in view: MTKView) {
render(view.currentDrawable)
}
private func render(_ drawable: CAMetalDrawable?) {
guard let drawable = drawable else { return }
// Make sure an MTLBuffer for the View and Projection matrices is available
_ = self.bufferProvider?.availableResourcesSemaphore.wait(timeout: DispatchTime.distantFuture)
// Initialize common RenderPassDescriptor
let renderPassDescriptor = MTLRenderPassDescriptor()
renderPassDescriptor.colorAttachments[0].texture = drawable.texture
renderPassDescriptor.colorAttachments[0].loadAction = .clear
renderPassDescriptor.colorAttachments[0].clearColor = Colors.White
renderPassDescriptor.colorAttachments[0].storeAction = .store
// Initialize a CommandBuffer and add a CompletedHandler to release an MTLBuffer from the BufferProvider once the GPU is done processing this command
let commandBuffer = self.commandQueue.makeCommandBuffer()
commandBuffer?.addCompletedHandler { (_) in
self.bufferProvider?.availableResourcesSemaphore.signal()
}
// Update the View matrix and obtain an MTLBuffer for it and the projection matrix
let camViewMatrix = self.vCam.getLookAtMatrix()
let uniformBuffer = bufferProvider?.nextUniformsBuffer(projectionMatrix: projectionMatrix, camViewMatrix: camViewMatrix)
// Initialize a MTLParallelRenderCommandEncoder
let parallelEncoder = commandBuffer?.makeParallelRenderCommandEncoder(descriptor: renderPassDescriptor)
// Create a CommandEncoder for the cube vertices if its data is loaded
if self.cubeLoaded == true {
let cubeRenderEncoder = parallelEncoder?.makeRenderCommandEncoder()
cubeRenderEncoder!.setCullMode(MTLCullMode.front)
cubeRenderEncoder!.setRenderPipelineState(pipelineState)
cubeRenderEncoder!.setTriangleFillMode(MTLTriangleFillMode.fill)
cubeRenderEncoder!.setVertexBuffer(self.cubeVertexBuffer, offset: 0, index: 0)
cubeRenderEncoder!.setVertexBuffer(uniformBuffer, offset: 0, index: 1)
cubeRenderEncoder!.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: vertexCount!, instanceCount: self.cubeVertexCount!/3)
cubeRenderEncoder!.endEncoding()
}
// Create a CommandEncoder for the dataset vertices if its data is loaded
if self.datasetLoaded == true {
let rgbaVolumeRenderEncoder = parallelEncoder?.makeRenderCommandEncoder()
rgbaVolumeRenderEncoder!.setRenderPipelineState(pipelineState)
rgbaVolumeRenderEncoder!.setVertexBuffer( self.datasetVertexBufferGPU!, offset: 0, index: 0)
rgbaVolumeRenderEncoder!.setVertexBuffer(uniformBuffer, offset: 0, index: 1)
rgbaVolumeRenderEncoder!.drawPrimitives(type: .point, vertexStart: 0, vertexCount: datasetVertexCount!, instanceCount: datasetVertexCount!)
rgbaVolumeRenderEncoder!.endEncoding()
}
// End CommandBuffer encoding and commit task
parallelEncoder!.endEncoding()
commandBuffer!.present(drawable)
commandBuffer!.commit()
}
Alright, so these are the steps I have been through in trying to figure out what was causing the lag, keeping in mind that the lagging effect is proportional to the size of the dataset's vertex buffer:
I initially though it was due to the GPU not being able to access the memory quickly enough because it was in Shared storage mode -> I changed the dataset MTLBuffer to Private storage mode. This did not solve the problem.
I then though that the problem was due to the CPU spending too much time in my render() function. This could possibly be due to a problem with the BufferProvider or maybe because somehow the CPU was trying to somehow reprocess/reload the dataset vertex buffer every frame -> In order to check this, I used the Time Profiler in xcode's Instruments. Unfortunately, it seems that the problem is that the application calls this render method (in other words, MTKView's draw() method) only very rarely. Here are some screenshots :
The spike at ~10 seconds is when the cube is loaded
The spikes between ~25-35 seconds are when the dataset is loaded
This image (^) shows the activity between ~10-20 seconds, right after the cube was loaded. This is when the FPS is at ~60. You can see that the main thread spends around 53ms in the render() function during these 10 seconds.
This image (^) shows the activity between ~40-50 seconds, right after the dataset was loaded. This is when the FPS is < 10. You can see that the main thread spends around 4ms in the render() function during these 10 seconds. As you can see, none of the methods which are usually called from within this function are called (ie: the ones we can see called when only the cube is loaded, previous image). Of note, when I load the dataset, the time profiler's timer starts to jump (ie: it stops for a few seconds and then jumps to the current time... repeat).
So this is where I am. The problem seems to be that the CPU somehow gets overloaded with these 42 kB of data... recursively. I also did a test with the Allocator in xcode's Instruments. No signs of memory leak, as far as I could tell (You might have noticed that a lot of this is new to me).
Sorry for the convoluted post, I hope it's not too hard to follow. Thank you all in advance for your help.
Edit:
Here are my shaders, in case you would like to see them:
struct VertexIn{
packed_float3 position;
packed_float4 color;
};
struct VertexOut{
float4 position [[position]];
float4 color;
float size [[point_size]];
};
struct Uniforms{
float4x4 cameraMatrix;
float4x4 projectionMatrix;
};
vertex VertexOut basic_vertex(const device VertexIn* vertex_array [[ buffer(0) ]],
constant Uniforms& uniforms [[ buffer(1) ]],
unsigned int vid [[ vertex_id ]]) {
float4x4 cam_Matrix = uniforms.cameraMatrix;
float4x4 proj_Matrix = uniforms.projectionMatrix;
VertexIn VertexIn = vertex_array[vid];
VertexOut VertexOut;
VertexOut.position = proj_Matrix * cam_Matrix * float4(VertexIn.position,1);
VertexOut.color = VertexIn.color;
VertexOut.size = 15;
return VertexOut;
}
fragment half4 basic_fragment(VertexOut interpolated [[stage_in]]) {
return half4(interpolated.color[0], interpolated.color[1], interpolated.color[2], interpolated.color[3]);
}
I think the main problem is that you're telling Metal to do instanced drawing when you shouldn't be. This line:
rgbaVolumeRenderEncoder!.drawPrimitives(type: .point, vertexStart: 0, vertexCount: datasetVertexCount!, instanceCount: datasetVertexCount!)
is telling Metal to draw datasetVertexCount! instances of each of datasetVertexCount! vertexes. The GPU work is growing with the square of the vertex count. Also, since you don't make use of the instance ID to, for example, tweak the vertex position, all of these instances are identical and thus redundant.
I think the same applies to this line:
cubeRenderEncoder!.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: vertexCount!, instanceCount: self.cubeVertexCount!/3)
although it's not clear what self.cubeVertexCount! is and whether it grows with vertexCount. In any case, since it seems you're using the same pipeline state and thus same shaders which don't make use of the instance ID, it's still useless and wasteful.
Other things:
Why are you using MTLParallelRenderCommandEncoder when you're not actually using the parallelism that it enables? Don't do that.
Everywhere you're using the size method of MemoryLayout, you should almost certainly be using stride instead. And if you're computing the stride of a compound data structure, do not take the stride of one element of that structure and multiply by the number of elements. Take the stride of the whole data structure.

Chunk Rendering in Metal

I'm trying to create a procedural game using Metal, and I'm using an octree based chunk approach for a Level of Detail implementation.
The method I'm using involves the CPU creating the octree nodes for the terrain, which then has its mesh created on the GPU using a compute shader. This mesh is stored in a vertex buffer and index buffer in the chunk object for rendering.
All of this seems to work fairly well, however when it comes to rendering chunks I'm hitting performance issues early on. Currently I gather an array of chunks to draw, then submit that to my renderer, that will create an MTLParallelRenderCommandEncoder to then create an MTLRenderCommandEncoder for each chunk, which is then submitted to the GPU.
By the looks of it around 50% of the CPU time is spent on creating the MTLRenderCommandEncoder for each chunk. Currently I'm just creating a simple 8 vertex cube mesh for each chunk, and I have an 4x4x4 array of chunks and I'm dropping to around 50fps in these early stages. (In reality it seems that there can only be up to 63 MTLRenderCommandEncoder in each MTLParallelRenderCommandEncoder so it's not fully 4x4x4)
I've read that the point of the MTLParallelRenderCommandEncoder is to create each MTLRenderCommandEncoder in a separate thread, yet I've not had much luck with getting this to work. Also multithreading it wouldn't get around the cap of 63 chunks being rendered as a max.
I feel that somehow consolidating the vertex and index buffers for each chunk into one or two larger buffers for submission would help, but I'm not sure how to do this without copious memcpy() calls and whether or not this would even improve efficiency.
Here's my code that takes in the array of nodes and draws them:
func drawNodes(nodes: [OctreeNode], inView view: AHMetalView){
// For control of several rotating buffers
dispatch_semaphore_wait(displaySemaphore, DISPATCH_TIME_FOREVER)
makeDepthTexture()
updateUniformsForView(view, duration: view.frameDuration)
let commandBuffer = commandQueue.commandBuffer()
let optDrawable = layer.nextDrawable()
guard let drawable = optDrawable else{
return
}
let passDescriptor = MTLRenderPassDescriptor()
passDescriptor.colorAttachments[0].texture = drawable.texture
passDescriptor.colorAttachments[0].clearColor = MTLClearColorMake(0.2, 0.2, 0.2, 1)
passDescriptor.colorAttachments[0].storeAction = .Store
passDescriptor.colorAttachments[0].loadAction = .Clear
passDescriptor.depthAttachment.texture = depthTexture
passDescriptor.depthAttachment.clearDepth = 1
passDescriptor.depthAttachment.loadAction = .Clear
passDescriptor.depthAttachment.storeAction = .Store
let parallelRenderPass = commandBuffer.parallelRenderCommandEncoderWithDescriptor(passDescriptor)
// Currently 63 nodes as a maximum
for node in nodes{
// This line is taking up around 50% of the CPU time
let renderPass = parallelRenderPass.renderCommandEncoder()
renderPass.setRenderPipelineState(renderPipelineState)
renderPass.setDepthStencilState(depthStencilState)
renderPass.setFrontFacingWinding(.CounterClockwise)
renderPass.setCullMode(.Back)
let uniformBufferOffset = sizeof(AHUniforms) * uniformBufferIndex
renderPass.setVertexBuffer(node.vertexBuffer, offset: 0, atIndex: 0)
renderPass.setVertexBuffer(uniformBuffer, offset: uniformBufferOffset, atIndex: 1)
renderPass.setTriangleFillMode(.Lines)
renderPass.drawIndexedPrimitives(.Triangle, indexCount: AHMaxIndicesPerChunk, indexType: AHIndexType, indexBuffer: node.indexBuffer, indexBufferOffset: 0)
renderPass.endEncoding()
}
parallelRenderPass.endEncoding()
commandBuffer.presentDrawable(drawable)
commandBuffer.addCompletedHandler { (commandBuffer) -> Void in
self.uniformBufferIndex = (self.uniformBufferIndex + 1) % AHInFlightBufferCount
dispatch_semaphore_signal(self.displaySemaphore)
}
commandBuffer.commit()
}
You note:
I've read that the point of the MTLParallelRenderCommandEncoder is to create each MTLRenderCommandEncoder in a separate thread...
And you're correct. What you're doing is sequentially creating, encoding with, and ending command encoders — there's nothing parallel going on here, so MTLParallelRenderCommandEncoder is doing nothing for you. You'd have roughly the same performance if you eliminated the parallel encoder and just created encoders with renderCommandEncoderWithDescriptor(_:) on each pass through your for loop... which is to say, you'd still have the same performance problem due to the overhead of creating all those encoders.
So, if you're going to encode sequentially, just reuse the same encoder. Also, you should reuse as much of your other shared state as possible. Here's a quick pass at a possible refactoring (untested):
let passDescriptor = MTLRenderPassDescriptor()
// call this once before your render loop
func setup() {
makeDepthTexture()
passDescriptor.colorAttachments[0].clearColor = MTLClearColorMake(0.2, 0.2, 0.2, 1)
passDescriptor.colorAttachments[0].storeAction = .Store
passDescriptor.colorAttachments[0].loadAction = .Clear
passDescriptor.depthAttachment.texture = depthTexture
passDescriptor.depthAttachment.clearDepth = 1
passDescriptor.depthAttachment.loadAction = .Clear
passDescriptor.depthAttachment.storeAction = .Store
// set up render pipeline state and depthStencil state
}
func drawNodes(nodes: [OctreeNode], inView view: AHMetalView) {
updateUniformsForView(view, duration: view.frameDuration)
// Set up completed handler ahead of time
let commandBuffer = commandQueue.commandBuffer()
commandBuffer.addCompletedHandler { _ in // unused parameter
self.uniformBufferIndex = (self.uniformBufferIndex + 1) % AHInFlightBufferCount
dispatch_semaphore_signal(self.displaySemaphore)
}
// Semaphore should be tied to drawable acquisition
dispatch_semaphore_wait(displaySemaphore, DISPATCH_TIME_FOREVER)
guard let drawable = layer.nextDrawable()
else { return }
// Set up the one part of the pass descriptor that changes per-frame
passDescriptor.colorAttachments[0].texture = drawable.texture
// Get one render pass descriptor and reuse it
let renderPass = commandBuffer.renderCommandEncoderWithDescriptor(passDescriptor)
renderPass.setTriangleFillMode(.Lines)
renderPass.setRenderPipelineState(renderPipelineState)
renderPass.setDepthStencilState(depthStencilState)
for node in nodes {
// Update offsets and draw
let uniformBufferOffset = sizeof(AHUniforms) * uniformBufferIndex
renderPass.setVertexBuffer(node.vertexBuffer, offset: 0, atIndex: 0)
renderPass.setVertexBuffer(uniformBuffer, offset: uniformBufferOffset, atIndex: 1)
renderPass.drawIndexedPrimitives(.Triangle, indexCount: AHMaxIndicesPerChunk, indexType: AHIndexType, indexBuffer: node.indexBuffer, indexBufferOffset: 0)
}
renderPass.endEncoding()
commandBuffer.presentDrawable(drawable)
commandBuffer.commit()
}
Then, profile with Instruments to see what, if any, further performance issues you might have. There's a great WWDC 2015 session about that showing several of the common "gotchas", how to diagnose them in profiling, and how to fix them.