Resulting MTLTexture lighter than CGImage - swift

I have kernel func which must convert Y and CbCr textures created from pixelBuffer(ARFrame.capturedImage) to RGB texture like in apple guide https://developer.apple.com/documentation/arkit/displaying_an_ar_experience_with_metal
But I get over lighted texture
kernel void renderTexture(texture2d<float, access::sample> capturedImageTextureY [[ texture(0) ]],
texture2d<float, access::sample> capturedImageTextureCbCr [[ texture(1) ]],
texture2d<float, access::read_write> outTextue [[texture(2)]],
uint2 size [[threads_per_grid]],
uint2 pid [[thread_position_in_grid]]){
constexpr sampler colorSampler(mip_filter::linear,
mag_filter::linear,
min_filter::linear);
const float4x4 ycbcrToRGBTransform = float4x4(
float4(+1.0000f, +1.0000f, +1.0000f, +0.0000f),
float4(+0.0000f, -0.3441f, +1.7720f, +0.0000f),
float4(+1.4020f, -0.7141f, +0.0000f, +0.0000f),
float4(-0.7010f, +0.5291f, -0.8860f, +1.0000f)
);
float2 texCoord;
texCoord.x = float(pid.x) / size.x;
texCoord.y = float(pid.y) / size.y;
// Sample Y and CbCr textures to get the YCbCr color at the given texture coordinate
float4 ycbcr = float4(capturedImageTextureY.sample(colorSampler, texCoord).r,
capturedImageTextureCbCr.sample(colorSampler, texCoord).rg, 1.0);
float4 color = ycbcrToRGBTransform * ycbcr;
outTextue.write(color, pid);
}
I create CGImage with this code:
var cgImage: CGImage?
VTCreateCGImageFromCVPixelBuffer(pixelBuffer, options: nil, imageOut: &cgImage)
cgImage has normal lightning
when I try to create texture from cgImage with MTKTextureLoader I get over lighted texture too
How to get MTLTexture with normal light like in cgImage
cgImage: (expected result)
kernel func:
create texture with this code:
let descriptor = MTLTextureDescriptor()
descriptor.width = Int(Self.maxTextureSize.width)
descriptor.height = Int(Self.maxTextureSize.height)
descriptor.usage = [.shaderWrite, .shaderRead]
let texture = MTLCreateSystemDefaultDevice()?.makeTexture(descriptor: descriptor)
and write pixels with kernel func.
already tried different pixelFormats of MTLTextureDescriptor
textureLoader:
let textureLoader = MTKTextureLoader(device: MTLCreateSystemDefaultDevice()!)
let texturee = try! textureLoader.newTexture(cgImage: cgImage!, options: [.SRGB : (false as NSNumber)])
already tried different MTKTextureLoader.Options
GitHub project demonstrating issue: PixelBufferToMTLTexture

Problem was solved Thanks 0xBFE1A8, by adding gamma correction
by replacing
outTextue.write(color, pid);
with:
outTextue.write(float4(pow(color.rgb, float3(2,2,2)), color.a), pid);

when I try to create texture from cgImage with MTKTextureLoader I get
over lighted texture
It's because metal applies gamma correction to your texture.
MTKTextureLoader has an SRGB key that is used to specify whether the texture data is stored as sRGB image data.
If the value is false, the image data is treated as linear pixel data.
If the value is true, the image data is treated as sRGB pixel data. If
this key is not specified and the image being loaded has been
gamma-corrected, the image data uses the specified sRGB information.
let path = Bundle.main.path(forResource: "yourTexture", ofType: "png")!
let data = NSData(contentsOfFile: path) as! Data
let texture = try! textureLoader.newTexture(with: data, options: [MTKTextureLoaderOptionSRGB : (false as NSNumber)])
You can also solve this problem by adding a gamma correcting equation to your shader.
Linear to sRGB and vice versa:
rgb = mix(rgb.0.0774, pow(rgb*0.9479 + 0.05213, 2.4), step(0.04045, rgb))
rgb = mix(rgb12.92, pow(rgb*0.4167) * 1.055 - 0.055, step(0.00313, rgb))

Related

How to apply a texture to a specific channel on a 3d obj model in Swift?

I'm kind of stuck right now when it comes to applying a specific texture on my 3d obj model.
Easiest solution of all would be to do let test = SCNScene(named: "models.scnassets/modelFolder/ModelName.obj"), but this requires that the mtl file maps the texture file directly inside of it which is not something that's possible with my current workflow.
With my current understanding, this leaves me with the option of using a scattering function to apply textures to a specific semantic, something like such :
if let url = URL(string: obj) {
let asset = MDLAsset(url: url)
guard let object = asset.object(at: 0) as? MDLMesh else {
print("Failed to get mesh from asset.")
self.presentAlert(title: "Warning", message: "Could not fetch the model.", firstBtn: "Ok")
return
}
// Create a material from the various textures with a scatteringFunction
let scatteringFunction = MDLScatteringFunction()
let material = MDLMaterial(name: "material", scatteringFunction: scatteringFunction)
let property = MDLMaterialProperty(name: "texture", semantic: .baseColor, url: URL(string: self.textureURL))
material.setProperty(property)
// Apply the texture to every submesh of the asset
object.submeshes?.forEach {
if let submesh = $0 as? MDLSubmesh {
submesh.material = material
}
}
// Wrap the ModelIO object in a SceneKit object
let node = SCNNode(mdlObject: object)
let scene = SCNScene()
scene.rootNode.addChildNode(node)
// Set up the SceneView
sceneView.scene = scene
...
}
The actual problem is the semantics. The 3d models are made on Unreal and for many models there's a png texture which has 3 semantics inside of it, namely Ambient Occlusion, Roughness and Metallic. Ambient Occlusion would need to be applied on the red channel, Roughness on the greed channel and Metallic on the blue channel.
How could I achieve this? An MdlMaterialSemantic has all of these possible semantics, but metallic, ambient occlusion and roughness are all separate. I tried simply applying the texture on each, but obviously this did not work very well.
Considering that my .png texture has all of those 3 "packaged" in it under a different channel, how can I work with this? I was thinking that maybe I could somehow use a small script to add mapping to the texture in the mtl file on my end in the app directly, but this seems sketchy lol..
What are my other options if there's no way of doing this? I've also been trying to use fbx files with assimpKit, but I couldn't manage to load any textures, just the model in black...
I am open to any suggestion, if more info is needed, please let me know! Thank you very much!
Sorry, I don't have enough rep to comment, but this might be more of a comment than an answer!
Have you tried loading the texture png image separately (as a NS/UI/CGImage) and then splitting it into three channels manually, then applying these channels separately? (Splitting into three separate channels is not as simple as it could be... but you could use this grayscale conversion for guidance, and just do one channel at a time.)
Once you have your objects in SceneKit, it is possibly slightly easier to modify these materials. Once you have a SCNNode with a SCNGeometry with a SCNMaterial you can access any of these materials and set the .contents property to almost anything (including a XXImage).
Edit:
Here's an extension you can try to extract the individual channels from a CGImage using Accelerate. You can get a CGImage from an NSImage/UIImage depending on whether you're on Mac or iOS (and you can load the file directly into one of those image formats).
I've just adapted the code from the link above, I am not very experienced with the Accelerate framework, so use at your own risk! But hopefully this puts you on the right path.
extension CGImage {
enum Channel {
case red, green, blue
}
func getChannel(channel: Channel) -> CGImage? {
// code adapted from https://developer.apple.com/documentation/accelerate/converting_color_images_to_grayscale
guard let format = vImage_CGImageFormat(cgImage: cgImage) else {return nil}
guard var sourceImageBuffer = try? vImage_Buffer(cgImage: cgImage, format: format) else {return nil}
guard var destinationBuffer = try? vImage_Buffer(width: Int(sourceImageBuffer.width), height: Int(sourceImageBuffer.height), bitsPerPixel: 8) else {return nil}
defer {
sourceImageBuffer.free()
destinationBuffer.free()
}
let redCoefficient: Float = channel == .red ? 1 : 0
let greenCoefficient: Float = channel == .green ? 1 : 0
let blueCoefficient: Float = channel == .blue ? 1 : 0
let divisor: Int32 = 0x1000
let fDivisor = Float(divisor)
var coefficientsMatrix = [
Int16(redCoefficient * fDivisor),
Int16(greenCoefficient * fDivisor),
Int16(blueCoefficient * fDivisor)
]
let preBias: [Int16] = [0, 0, 0, 0]
let postBias: Int32 = 0
vImageMatrixMultiply_ARGB8888ToPlanar8(&sourceImageBuffer,
&destinationBuffer,
&coefficientsMatrix,
divisor,
preBias,
postBias,
vImage_Flags(kvImageNoFlags))
guard let monoFormat = vImage_CGImageFormat(
bitsPerComponent: 8,
bitsPerPixel: 8,
colorSpace: CGColorSpaceCreateDeviceGray(),
bitmapInfo: CGBitmapInfo(rawValue: CGImageAlphaInfo.none.rawValue),
renderingIntent: .defaultIntent) else {return nil}
guard let result = try? destinationBuffer.createCGImage(format: monoFormat) else {return nil}
return result
}
}

iPad Pro Lidar - Export Geometry & Texture

I would like to be able to export a mesh and texture from the iPad Pro Lidar.
There's examples here of how to export a mesh, but Id like to be able to export the environment texture too
ARKit 3.5 – How to export OBJ from new iPad Pro with LiDAR?
ARMeshGeometry stores the vertices for the mesh, would it be the case that one would have to 'record' the textures as one scans the environment, and manually apply them?
This post seems to show a way to get texture co-ordinates, but I can't see a way to do that with the ARMeshGeometry: Save ARFaceGeometry to OBJ file
Any point in the right direction, or things to look at greatly appreciated!
Chris
You need to compute the texture coordinates for each vertex, apply them to the mesh and supply a texture as a material to the mesh.
let geom = meshAnchor.geometry
let vertices = geom.vertices
let size = arFrame.camera.imageResolution
let camera = arFrame.camera
let modelMatrix = meshAnchor.transform
let textureCoordinates = vertices.map { vertex -> vector_float2 in
let vertex4 = vector_float4(vertex.x, vertex.y, vertex.z, 1)
let world_vertex4 = simd_mul(modelMatrix!, vertex4)
let world_vector3 = simd_float3(x: world_vertex4.x, y: world_vertex4.y, z: world_vertex4.z)
let pt = camera.projectPoint(world_vector3,
orientation: .portrait,
viewportSize: CGSize(
width: CGFloat(size.height),
height: CGFloat(size.width)))
let v = 1.0 - Float(pt.x) / Float(size.height)
let u = Float(pt.y) / Float(size.width)
return vector_float2(u, v)
}
// construct your vertices, normals and faces from the source geometry
// directly and supply the computed texture coords to create new geometry
// and then apply the texture.
let scnGeometry = SCNGeometry(sources: [verticesSource, textureCoordinates, normalsSource], elements: [facesSource])
let texture = UIImage(pixelBuffer: frame.capturedImage)
let imageMaterial = SCNMaterial()
imageMaterial.isDoubleSided = false
imageMaterial.diffuse.contents = texture
scnGeometry.materials = [imageMaterial]
let pcNode = SCNNode(geometry: scnGeometry)
pcNode if added to your scene will contain the mesh with the texture applied.
Texture coordinates computation from here
Check out my answer over here
It's a description of this project: MetalWorldTextureScan which demonstrates how to scan your environment and create a textured mesh using ARKit and Metal.

Applying compute/kernel function to vertex buffer before vertex shader

I would like to use a compute shader to modify my vertices before they are passed to the vertex shader. I can’t find any examples or explanations of this, except that it seems to be mentioned here: Metal emulate geometry shaders using compute shaders. This doesn’t help me as it doesn’t explain the CPU part of it.
I have seen many examples where a texture buffer is read and written to in a compute shader, but I need to read and modify the vertex buffer, which contains custom vertex structs with normals, and is created by a MDLMesh. I would be forever grateful for some sample code!
BACKGROUND
What I actually want to achieve is really to be able to modify the vertex normals on the GPU. The other option would be if I could access the entire triangle from the vertex shader, like in the linked answer. For some reason I can only access a single vertex, using the stage_in attribute. Using the entire buffer does not work for me in this particular case, this is probably related to using a mesh provided by Model I/O and MDLMesh. When I create the vertices manually I am able to access the vertex buffer array. Having said that, with that solution I would have to calculate the new vertex normal vector three time for each triangle which seems wasteful, and in any case I want to be able to apply compute shaders to the vertex buffer!
Thanks to Ken Thomases' comments, I managed to find a solution. He made me realise it is quite straightforward:
I'm using a vertex struct that looks like this:
// Metal side
struct Vertex {
float4 position;
float4 normal;
float4 color;
};
// Swift side
struct Vertex {
var position: float4
var normal: float4
var color: float4
}
During setup where I usually create a vertex buffer, index buffer and render pipeline state, I now also make a compute pipeline state:
// Vertex buffer
let dataSize = vertexData.count*MemoryLayout<Vertex>.stride
vertexBuffer = device.makeBuffer(bytes: vertexData, length: dataSize, options: [])!
// Index buffer
indexCount = indices.count
let indexSize = indexCount*MemoryLayout<UInt16>.stride
indexBuffer = device.makeBuffer(bytes: indices, length: indexSize, options: [])!
// Compute pipeline state
let adjustmentFunction = library.makeFunction(name: "adjustment_func")!
cps = try! device.makeComputePipelineState(function: adjustmentFunction)
// Render pipeline state
let rpld = MTLRenderPipelineDescriptor()
rpld.vertexFunction = library.makeFunction(name: "vertex_func")
rpld.fragmentFunction = library.makeFunction(name: "fragment_func")
rpld.colorAttachments[0].pixelFormat = .bgra8Unorm
rps = try! device.makeRenderPipelineState(descriptor: rpld)
commandQueue = device.makeCommandQueue()!
Then my render function looks like this:
let black = MTLClearColor(red: 0, green: 0, blue: 0, alpha: 1)
rpd.colorAttachments[0].texture = drawable.texture
rpd.colorAttachments[0].clearColor = black
rpd.colorAttachments[0].loadAction = .clear
let commandBuffer = commandQueue.makeCommandBuffer()!
let computeCommandEncoder = commandBuffer.makeComputeCommandEncoder()!
computeCommandEncoder.setComputePipelineState(cps)
computeCommandEncoder.setBuffer(vertexBuffer, offset: 0, index: 0)
computeCommandEncoder.dispatchThreadgroups(MTLSize(width: meshSize*meshSize, height: 1, depth: 1), threadsPerThreadgroup: MTLSize(width: 4, height: 1, depth: 1))
computeCommandEncoder.endEncoding()
let renderCommandEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: rpd)!
renderCommandEncoder.setRenderPipelineState(rps)
renderCommandEncoder.setFrontFacing(.counterClockwise)
renderCommandEncoder.setCullMode(.back)
updateUniforms(aspect: Float(size.width/size.height))
renderCommandEncoder.setVertexBuffer(vertexBuffer, offset: 0, index: 0)
renderCommandEncoder.setVertexBuffer(uniformBuffer, offset: 0, index: 1)
renderCommandEncoder.setFragmentBuffer(uniformBuffer, offset: 0, index: 1)
renderCommandEncoder.drawIndexedPrimitives(type: .triangle, indexCount: indexCount, indexType: .uint16, indexBuffer: indexBuffer, indexBufferOffset: 0)
renderCommandEncoder.endEncoding()
commandBuffer.present(drawable)
commandBuffer.commit()
Finally my compute shader looks like this:
kernel void adjustment_func(const device Vertex *vertices [[buffer(0)]], uint2 gid [[thread_position_in_grid]]) {
vertices[gid.x].position = function(pos.xyz);
}
and this is the signature of my vertex function:
vertex VertexOut vertex_func(const device Vertex *vertices [[buffer(0)]], uint i [[vertex_id]], constant Uniforms &uniforms [[buffer(1)]])

MTKView vertex transparency is not getting picked up in "additive" blending mode

I am trying to implement a metal-backed drawing application where brushstrokes are drawn on an MTKView by stamping a textured square repeatedly along a path. I am varying the stamp's color/transparency at the vertex level as the brushstroke is drawn so I can simulate ink effects such as color/transparency fading over time, etc. This seems to work ok when I am using a classic "over" type blending (which does not accumulate value over time), but when I use "additive" blending, vertex transparency is completely ignored (i.e. I only get texture transparency). Below are snippets of pertinent code:
First, my vertex program:
vertex VertexOut basic_vertex(const device VertexIn* vertex_array [[ buffer(0) ]], unsigned int vid [[ vertex_id ]]) {
VertexIn VertexIn = vertex_array[vid];
VertexOut VertexOut;
VertexOut.position = float4(VertexIn.position,1);
VertexOut.color = VertexIn.color;
VertexOut.texCoord = VertexIn.texCoord;
return VertexOut;
}
Next, my fragment program multiplies the stamp's texture (with alpha) by the vertex color (also with alpha) which is needed for gradual tinting or fading of each stamp across a brushstroke
fragment float4 basic_fragment(VertexOut interpolated [[stage_in]], texture2d<float> tex2D [[ texture(0) ]], sampler sampler2D [[ sampler(0) ]])
{
float4 color = interpolated.color * tex2D.sample(sampler2D, interpolated.texCoord); // texture multiplied by vertex color
return color;
}
Next, below are the blending definitions:
// 5a. Define render pipeline settings
let renderPipelineDescriptor = MTLRenderPipelineDescriptor()
renderPipelineDescriptor.vertexFunction = vertexProgram
renderPipelineDescriptor.sampleCount = self.sampleCount
renderPipelineDescriptor.colorAttachments[0].pixelFormat = self.colorPixelFormat
renderPipelineDescriptor.colorAttachments[0].isBlendingEnabled = true
renderPipelineDescriptor.colorAttachments[0].alphaBlendOperation = .add
// settings for additive blending
if drawColorBlendMode == colorBlendMode.compositeAdd {
renderPipelineDescriptor.colorAttachments[0].sourceRGBBlendFactor = .one
renderPipelineDescriptor.colorAttachments[0].destinationRGBBlendFactor = .one
renderPipelineDescriptor.colorAttachments[0].sourceAlphaBlendFactor = .one
renderPipelineDescriptor.colorAttachments[0].destinationAlphaBlendFactor = .one
}
// settings for classic 'over' blending
if drawColorBlendMode == colorBlendMode.compositeOver {
renderPipelineDescriptor.colorAttachments[0].sourceRGBBlendFactor = .sourceAlpha
renderPipelineDescriptor.colorAttachments[0].destinationRGBBlendFactor = .oneMinusSourceAlpha
renderPipelineDescriptor.colorAttachments[0].sourceAlphaBlendFactor = .sourceAlpha
renderPipelineDescriptor.colorAttachments[0].destinationAlphaBlendFactor = .oneMinusSourceAlpha
}
renderPipelineDescriptor.fragmentFunction = fragmentProgram
Finally, my render encoding:
brushTexture = MetalTexture(resourceName: "stamp_stipple1_0256", ext: "png", mipmaped: true)
brushTexture.loadTexture(device: device!, commandQ: commandQueue, flip: true)
renderCommandEncoder?.setRenderPipelineState(renderPipeline!)
renderCommandEncoder?.setVertexBuffer(vertexBuffer, offset: 0, index: 0)
renderCommandEncoder?.setFragmentTexture(brushTexture.texture, index: 0)
renderCommandEncoder?.setFragmentSamplerState(samplerState, index: 0)
Is there anything I'm missing? As stated earlier, this works as expected in "over" mode, but not in "additive" mode. Again, the desired effect is that to pass varying color/transparency settings to each stamp (pair of textured triangles).
Through trial and error, I arrived at the following settings to get what I was after:
// Settings for compositeOver
renderPipelineDescriptor.colorAttachments[0].sourceRGBBlendFactor = .one
renderPipelineDescriptor.colorAttachments[0].destinationRGBBlendFactor = .oneMinusSourceAlpha
renderPipelineDescriptor.colorAttachments[0].sourceAlphaBlendFactor = .one
renderPipelineDescriptor.colorAttachments[0].destinationAlphaBlendFactor = .oneMinusSourceAlpha
Also, because I was dealing with many overlapping stamps, I had to divide the color/alpha values by the number of overlaps in order to avoid over-saturation. I think this, more than anything was the reason i was not seeing color/alpha accumulation the way i expected.
stampColor = UIColor(red: (rgba.red * rgba.alpha / numOverlappingStamps), green: (rgba.green * rgba.alpha / numOverlappingStamps), blue: (rgba.blue * rgba.alpha / numOverlappingStamps), alpha: (rgba.alpha / numOverlappingStamps))

What can cause lag in recurrent calls to the draw() function of a MetalKit MTKView

I am designing a Cocoa application using the swift 4.0 MetalKit API for macOS 10.13. Everything I report here was done on my 2015 MBPro.
I have successfully implemented an MTKView which renders simple geometry with low vertex count very well (Cubes, triangles, etc.). I implemented a mouse-drag based camera which rotates, strafes and magnifies. Here is a screenshot of the xcode FPS debug screen while I rotate the cube:
However, when I try loading a dataset which contains only ~1500 vertices (which are each stored as 7 x 32bit Floats... ie: 42 kB total), I start getting a very bad lag in FPS. I will show the code implementation lower. Here is a screenshot (note that on this image, the view only encompasses a few of the vertices, which are rendered as large points) :
Here is my implementation:
1) viewDidLoad() :
override func viewDidLoad() {
super.viewDidLoad()
// Initialization of the projection matrix and camera
self.projectionMatrix = float4x4.makePerspectiveViewAngle(float4x4.degrees(toRad: 85.0),
aspectRatio: Float(self.view.bounds.size.width / self.view.bounds.size.height),
nearZ: 0.01, farZ: 100.0)
self.vCam = ViewCamera()
// Initialization of the MTLDevice
metalView.device = MTLCreateSystemDefaultDevice()
device = metalView.device
metalView.colorPixelFormat = .bgra8Unorm
// Initialization of the shader library
let defaultLibrary = device.makeDefaultLibrary()!
let fragmentProgram = defaultLibrary.makeFunction(name: "basic_fragment")
let vertexProgram = defaultLibrary.makeFunction(name: "basic_vertex")
// Initialization of the MTLRenderPipelineState
let pipelineStateDescriptor = MTLRenderPipelineDescriptor()
pipelineStateDescriptor.vertexFunction = vertexProgram
pipelineStateDescriptor.fragmentFunction = fragmentProgram
pipelineStateDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
pipelineState = try! device.makeRenderPipelineState(descriptor: pipelineStateDescriptor)
// Initialization of the MTLCommandQueue
commandQueue = device.makeCommandQueue()
// Initialization of Delegates and BufferProvider for View and Projection matrix MTLBuffer
self.metalView.delegate = self
self.metalView.eventDelegate = self
self.bufferProvider = BufferProvider(device: device, inflightBuffersCount: 3, sizeOfUniformsBuffer: MemoryLayout<Float>.size * float4x4.numberOfElements() * 2)
}
2) Loading of the MTLBuffer for the Cube vertices :
private func makeCubeVertexBuffer() {
let cube = Cube()
let vertices = cube.verticesArray
var vertexData = Array<Float>()
for vertex in vertices{
vertexData += vertex.floatBuffer()
}
VDataSize = vertexData.count * MemoryLayout.size(ofValue: vertexData[0])
self.vertexBuffer = device.makeBuffer(bytes: vertexData, length: VDataSize!, options: [])!
self.vertexCount = vertices.count
}
3) Loading of the MTLBuffer for the dataset vertices. Note that I explicitly declare the storage mode of this buffer as Private in order to ensure efficient access to the data by the GPU since the CPU does not need to access the data once the buffer is loaded. Also, note that I am loading only 1/100th of the vertices in my actual dataset because the entire OS on my machine starts lagging when I try to load it entirely (only 4.2 MB of data).
public func loadDataset(datasetVolume: DatasetVolume) {
// Load dataset vertices
self.datasetVolume = datasetVolume
self.datasetVertexCount = self.datasetVolume!.vertexCount/100
let rgbaVertices = self.datasetVolume!.rgbaPixelVolume[0...(self.datasetVertexCount!-1)]
var vertexData = Array<Float>()
for vertex in rgbaVertices{
vertexData += vertex.floatBuffer()
}
let dataSize = vertexData.count * MemoryLayout.size(ofValue: vertexData[0])
// Make two MTLBuffer's: One with Shared storage mode in which data is initially loaded, and a second one with Private storage mode
self.datasetVertexBuffer = device.makeBuffer(bytes: vertexData, length: dataSize, options: MTLResourceOptions.storageModeShared)
self.datasetVertexBufferGPU = device.makeBuffer(length: dataSize, options: MTLResourceOptions.storageModePrivate)
// Create a MTLCommandBuffer and blit the vertex data from the Shared MTLBuffer to the Private MTLBuffer
let commandBuffer = self.commandQueue.makeCommandBuffer()
let blitEncoder = commandBuffer!.makeBlitCommandEncoder()
blitEncoder!.copy(from: self.datasetVertexBuffer!, sourceOffset: 0, to: self.datasetVertexBufferGPU!, destinationOffset: 0, size: dataSize)
blitEncoder!.endEncoding()
commandBuffer!.commit()
// Clean up
self.datasetLoaded = true
self.datasetVertexBuffer = nil
}
4) Finally, here is the render loop. Again, this is using MetalKit.
func draw(in view: MTKView) {
render(view.currentDrawable)
}
private func render(_ drawable: CAMetalDrawable?) {
guard let drawable = drawable else { return }
// Make sure an MTLBuffer for the View and Projection matrices is available
_ = self.bufferProvider?.availableResourcesSemaphore.wait(timeout: DispatchTime.distantFuture)
// Initialize common RenderPassDescriptor
let renderPassDescriptor = MTLRenderPassDescriptor()
renderPassDescriptor.colorAttachments[0].texture = drawable.texture
renderPassDescriptor.colorAttachments[0].loadAction = .clear
renderPassDescriptor.colorAttachments[0].clearColor = Colors.White
renderPassDescriptor.colorAttachments[0].storeAction = .store
// Initialize a CommandBuffer and add a CompletedHandler to release an MTLBuffer from the BufferProvider once the GPU is done processing this command
let commandBuffer = self.commandQueue.makeCommandBuffer()
commandBuffer?.addCompletedHandler { (_) in
self.bufferProvider?.availableResourcesSemaphore.signal()
}
// Update the View matrix and obtain an MTLBuffer for it and the projection matrix
let camViewMatrix = self.vCam.getLookAtMatrix()
let uniformBuffer = bufferProvider?.nextUniformsBuffer(projectionMatrix: projectionMatrix, camViewMatrix: camViewMatrix)
// Initialize a MTLParallelRenderCommandEncoder
let parallelEncoder = commandBuffer?.makeParallelRenderCommandEncoder(descriptor: renderPassDescriptor)
// Create a CommandEncoder for the cube vertices if its data is loaded
if self.cubeLoaded == true {
let cubeRenderEncoder = parallelEncoder?.makeRenderCommandEncoder()
cubeRenderEncoder!.setCullMode(MTLCullMode.front)
cubeRenderEncoder!.setRenderPipelineState(pipelineState)
cubeRenderEncoder!.setTriangleFillMode(MTLTriangleFillMode.fill)
cubeRenderEncoder!.setVertexBuffer(self.cubeVertexBuffer, offset: 0, index: 0)
cubeRenderEncoder!.setVertexBuffer(uniformBuffer, offset: 0, index: 1)
cubeRenderEncoder!.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: vertexCount!, instanceCount: self.cubeVertexCount!/3)
cubeRenderEncoder!.endEncoding()
}
// Create a CommandEncoder for the dataset vertices if its data is loaded
if self.datasetLoaded == true {
let rgbaVolumeRenderEncoder = parallelEncoder?.makeRenderCommandEncoder()
rgbaVolumeRenderEncoder!.setRenderPipelineState(pipelineState)
rgbaVolumeRenderEncoder!.setVertexBuffer( self.datasetVertexBufferGPU!, offset: 0, index: 0)
rgbaVolumeRenderEncoder!.setVertexBuffer(uniformBuffer, offset: 0, index: 1)
rgbaVolumeRenderEncoder!.drawPrimitives(type: .point, vertexStart: 0, vertexCount: datasetVertexCount!, instanceCount: datasetVertexCount!)
rgbaVolumeRenderEncoder!.endEncoding()
}
// End CommandBuffer encoding and commit task
parallelEncoder!.endEncoding()
commandBuffer!.present(drawable)
commandBuffer!.commit()
}
Alright, so these are the steps I have been through in trying to figure out what was causing the lag, keeping in mind that the lagging effect is proportional to the size of the dataset's vertex buffer:
I initially though it was due to the GPU not being able to access the memory quickly enough because it was in Shared storage mode -> I changed the dataset MTLBuffer to Private storage mode. This did not solve the problem.
I then though that the problem was due to the CPU spending too much time in my render() function. This could possibly be due to a problem with the BufferProvider or maybe because somehow the CPU was trying to somehow reprocess/reload the dataset vertex buffer every frame -> In order to check this, I used the Time Profiler in xcode's Instruments. Unfortunately, it seems that the problem is that the application calls this render method (in other words, MTKView's draw() method) only very rarely. Here are some screenshots :
The spike at ~10 seconds is when the cube is loaded
The spikes between ~25-35 seconds are when the dataset is loaded
This image (^) shows the activity between ~10-20 seconds, right after the cube was loaded. This is when the FPS is at ~60. You can see that the main thread spends around 53ms in the render() function during these 10 seconds.
This image (^) shows the activity between ~40-50 seconds, right after the dataset was loaded. This is when the FPS is < 10. You can see that the main thread spends around 4ms in the render() function during these 10 seconds. As you can see, none of the methods which are usually called from within this function are called (ie: the ones we can see called when only the cube is loaded, previous image). Of note, when I load the dataset, the time profiler's timer starts to jump (ie: it stops for a few seconds and then jumps to the current time... repeat).
So this is where I am. The problem seems to be that the CPU somehow gets overloaded with these 42 kB of data... recursively. I also did a test with the Allocator in xcode's Instruments. No signs of memory leak, as far as I could tell (You might have noticed that a lot of this is new to me).
Sorry for the convoluted post, I hope it's not too hard to follow. Thank you all in advance for your help.
Edit:
Here are my shaders, in case you would like to see them:
struct VertexIn{
packed_float3 position;
packed_float4 color;
};
struct VertexOut{
float4 position [[position]];
float4 color;
float size [[point_size]];
};
struct Uniforms{
float4x4 cameraMatrix;
float4x4 projectionMatrix;
};
vertex VertexOut basic_vertex(const device VertexIn* vertex_array [[ buffer(0) ]],
constant Uniforms& uniforms [[ buffer(1) ]],
unsigned int vid [[ vertex_id ]]) {
float4x4 cam_Matrix = uniforms.cameraMatrix;
float4x4 proj_Matrix = uniforms.projectionMatrix;
VertexIn VertexIn = vertex_array[vid];
VertexOut VertexOut;
VertexOut.position = proj_Matrix * cam_Matrix * float4(VertexIn.position,1);
VertexOut.color = VertexIn.color;
VertexOut.size = 15;
return VertexOut;
}
fragment half4 basic_fragment(VertexOut interpolated [[stage_in]]) {
return half4(interpolated.color[0], interpolated.color[1], interpolated.color[2], interpolated.color[3]);
}
I think the main problem is that you're telling Metal to do instanced drawing when you shouldn't be. This line:
rgbaVolumeRenderEncoder!.drawPrimitives(type: .point, vertexStart: 0, vertexCount: datasetVertexCount!, instanceCount: datasetVertexCount!)
is telling Metal to draw datasetVertexCount! instances of each of datasetVertexCount! vertexes. The GPU work is growing with the square of the vertex count. Also, since you don't make use of the instance ID to, for example, tweak the vertex position, all of these instances are identical and thus redundant.
I think the same applies to this line:
cubeRenderEncoder!.drawPrimitives(type: .triangle, vertexStart: 0, vertexCount: vertexCount!, instanceCount: self.cubeVertexCount!/3)
although it's not clear what self.cubeVertexCount! is and whether it grows with vertexCount. In any case, since it seems you're using the same pipeline state and thus same shaders which don't make use of the instance ID, it's still useless and wasteful.
Other things:
Why are you using MTLParallelRenderCommandEncoder when you're not actually using the parallelism that it enables? Don't do that.
Everywhere you're using the size method of MemoryLayout, you should almost certainly be using stride instead. And if you're computing the stride of a compound data structure, do not take the stride of one element of that structure and multiply by the number of elements. Take the stride of the whole data structure.