How are the ARKit People Occlusion samples being done? - swift

This may be an obscure question, but I see lots of very cool samples online of how people are using the new ARKit people occlusion technology in ARKit 3 to effectively "separate" the people from the background, and apply some sort of filtering to the "people" (see here).
In looking at Apple's provided source code and documentation, I see that I can retrieve the segmentationBuffer from an ARFrame, which I've done, like so;
func session(_ session: ARSession, didUpdate frame: ARFrame) {
let image = frame.capturedImage
if let segementationBuffer = frame.segmentationBuffer {
// Get the segmentation's width
let segmentedWidth = CVPixelBufferGetWidth(segementationBuffer)
// Create the mask from that pixel buffer.
let sementationMaskImage = CIImage(cvPixelBuffer: segementationBuffer, options: [:])
// Smooth edges to create an alpha matte, then upscale it to the RGB resolution.
let alphaUpscaleFactor = Float(CVPixelBufferGetWidth(image)) / Float(segmentedWidth)
let alphaMatte = sementationMaskImage.clampedToExtent()
.applyingFilter("CIGaussianBlur", parameters: ["inputRadius": 2.0)
.cropped(to: sementationMaskImage.extent)
.applyingFilter("CIBicubicScaleTransform", parameters: ["inputScale": alphaUpscaleFactor])
// Unknown...
}
}
In the "unknown" section, I am trying to determine how I would render my new "blurred" person on top of the original camera feed. There does not seem to be any methods to draw the new CIImage on "top" of the original camera feed, as the ARView has no way of being manually updated.

In the following code snippet we see personSegmentationWithDepth type property for depth compositing (there are RGB, Alpha and Depth channels):
// Automatically segmenting and then compositing foreground (people),
// middle-ground (3D model) and background.
let session = ARSession()
if let configuration = session.configuration as? ARWorldTrackingConfiguration {
configuration.frameSemantics.insert(.personSegmentationWithDepth)
session.run(configuration)
}
You can manually access a Depth Data of World Tracking in CVPixelBuffer (depth values for a performed segmentation):
let image = frame.estimatedDepthData
And you can manually access a Depth Data of Face Tracking in CVPixelBuffer (from TrueDepth camera):
let image = session.currentFrame?.capturedDepthData?.depthDataMap
Also, there's a generateDilatedDepth instance method in ARKit 3.0:
func generateDilatedDepth(from frame: ARFrame,
commandBuffer: MTLCommandBuffer) -> MTLTexture
In your case you have to use estimatedDepthData because Apple documentation says:
It's a buffer that represents the estimated depth values from the camera feed that you use to occlude virtual content.
var estimatedDepthData: CVPixelBuffer? { get }
If you multiply DEPTH data from this buffer (at first you have to convert Depth channel to RGB) by RGB or ALPHA using compositing techniques and you'll get awesome effects.
Look at these 6 images: the lower row represents three RGB-images corrected with Depth channel: depth grading, depth blurring, depth point position pass.

the Bringing People into AR WWDC session has some information, especially about ARMatteGenerator. The session also comes with a sample code.

Related

SCNKit: Hit test doesn't hit node's dynamically modified geometry

I'm facing an issue where SCNView.hitTest does not detect hits against geometry that I'm modifying dynamically on the cpu.
Here's the overview: I have a node that uses an SCNGeometry created from a MTLBuffer of vertices:
func createGeometry(vertexBuffer: MTLBuffer, vertexCount: Int) -> SCNGeometry {
let geometry = SCNGeometry(sources: [
SCNGeometrySource.init(
buffer: vertexBuffer,
vertexFormat: .float3,
semantic: .vertex,
vertexCount: vertexCount,
dataOffset: 0,
dataStride: MemoryLayout<SIMD3<Float>>.stride),
], elements: [
SCNGeometryElement(indices: ..., primitiveType: .triangles)
])
}
let vertexBuffer: MTLBuffer = // shared buffer
let vertexCount = ...
let node = SCNNode(geometry: createGeometry(vertexBuffer: vertexBuffer, vertexCount: vertexCount))
As the app is running, I then dynamically modify the vertex buffer in the SceneKit update loop:
// In SceneKit update function
var ptr = vertexBuffer.contents().bindMemory(to: SIMD3<Float>.self, capacity: vertexCount)
for i in 0..<vertexCount {
ptr[i] = // modify vertex
}
This dynamic geometry is correctly rendered by SceneKit. However when I then try hit testing against node using SCNView.hitTest, no hits are detected against the modified geometry.
I can work around this by re-creating the node's geometry after modifying the data:
// after updating data
node.geometry = createGeometry(vertexBuffer: vertexBuffer, vertexCount: vertexCount)
However this feels like a hack.
What is the proper way to have hit testing work reliably for a node with dynamically changing SCNGeometry?
I think there's no proper way to make hit-testing work reliably in your situation. It would apparently be possible if it didn't depend on the SceneKit/Metal render loop and delegate pattern. But since it entirely depends on them, this is an unrealistically expensive operation to recreate SCNGeometry's instances, as you said earlier. So, I totally agree with #HamidYusifli.
When you perform a hit-test search, SceneKit looks for SCNGeometry objects along the ray you specify. For each intersection between the ray and a geometry, SceneKit creates a hit-test result to provide information about both the SCNNode object containing the geometry and the location of the intersection on the geometry’s surface.
The problem in your case is that when you modify the buffer’s contents (MTLBuffer) at render time, SceneKit does not know about it, and therefore cannot update SCNGeometry object which is used for performing hit-test.
So the only way I can see to solve this issue is to recreate your SCNGeometry object.

Complex CoreImage CIFilter pipeline recursively eats GBs of memory

I'm writing a macOS app which performs a complicated chain of CIFilter operations on an image to greatly change the appearance of high resolution photographs, often 24 megapixels or larger.
Some of these effects included are gaussian blurs, unsharp masks, bloom, gloom, as well as a custom "grain" fragment shader I wrote in Metal using a custom CIKernel. The CIContext is using a Metal device to render it. Essentially, it's a long chain of initialImage -> CIFilter -> outputImage -> CIFilter -> outputImage -> CIFilter -> ...
Not only must all of these CIFilters be run in sequence for a final output, they must also be run on the full resolution for effects to be correctly scaled.
The problem I face is that executing entire process results in a massive usage of memory. With a 6000x4000 input image, the memory usage jumps to 6.6GiB while rendering.
I used the Metal instrumentation in Xcode to diagnose the problem and it seems that CoreImage is recursively allocating memory for each filter, so that memory just piles up and up until it can finally let go of all of the resources at the end.
I would have hoped this was sequential, releasing each source buffer before the next operation. I'm not sure exactly how to help the situation. Is there some way to pass each output image to the next filter, forcefully cleaning up each input CIImage rendering first?
if let device = self.device,
let texture = device.makeTexture(descriptor: descriptor),
let queue = device.makeCommandQueue(),
let buffer = queue.makeCommandBuffer() {
let destination = CIRenderDestination(
width: descriptor.width,
height: descriptor.height,
pixelFormat: self.colorPixelFormat,
commandBuffer: buffer) {
return texture
}
try! metalImage.context.context.startTask(toRender: metalImage.image, to: destination)
self.renderedImage = texture
buffer.commit()
}
This seems to work well with only a few hundred MB of memory usage when displaying a 24 megapixel CIImage to an NSImageView without any CIContext.
let image = // Create CIImage from complex CIFilter pipeline
let rep = NSBitmapImageRep(ciImage: image)
let previewImage = NSImage(size: rep.size)
previewImage.addRepresentation(rep)
previewImageView?.image = previewImage
On a 5K iMac, it will render pretty efficiently for previewing (less than half a second to update). No CIContexts are needed until exporting the image using a CGImageDestination.

iOS RealityKit : get world position of placed object from raycast center of screen

Using raycast and the tap gesture, I have successfully placed multiples objects (entity and anchors) in my arview.
Now, I am trying to get the entity that is the closest to the us using the center of the screen, in order to place a object near it. So we can imagine that every time an anchor is close to the center of the phone screen, a new object "spawn"
For that I am trying to use the raycast, but my code :
func session(_ session: ARSession, didUpdate frame: ARFrame) {
//
if let hitEntity = arView.entity(
at: self.arView.center
) {
print("hitEntity IS FOUND !")
print(hitEntity.name) // this is the object I previously placed
guard let result = arView.raycast(from: self.arView.center, allowing: .estimatedPlane, alignment: .any).first else { return }
// here result is the surface behind/below the object, and not the object I want
return;
}
}
the result trigger on a surface, but I can't manage to get the world transform of the object (entity) and not the surface behind.
Do you have an idea ?
Thanks
To get the worldTransform of the entity, you can call hitEntity.position(relativeTo: nil).
If I'm not understanding your question correctly, this might help too:
To get the position of result relative to the entity's local space, you can call hitEntity.convert(position: result.worldTransform, from: nil)
Convert is a super useful method in RealityKit. There's a few different methods, but here's the documentation for the one I added above:
https://developer.apple.com/documentation/realitykit/hastransform/3244194-convert/
I hope one of those helps you!

SceneKit – Applied CIFilter on SCNNode hides SCNTorus

I have the following node setup on my scene:
A container node with child nodes: earth, torus, and moon. I applied to the earth node's filter property the following custom HighlightFilter with CIBloom and CISourceOverCompositing filter for the glow effect:
CIFilter Class (code from awesome blog: Highlighting SCNNode with Glow)
class HighlightFilter: CIFilter {
static let filterName = "highlightFilter"
#objc dynamic var inputImage: CIImage?
#objc dynamic var inputIntensity: NSNumber?
#objc dynamic var inputRadius: NSNumber?
override var outputImage: CIImage? {
guard let inputImage = inputImage else {
return nil
}
let bloomFilter = CIFilter(name:"CIBloom")!
bloomFilter.setValue(inputImage, forKey: kCIInputImageKey)
bloomFilter.setValue(inputIntensity, forKey: "inputIntensity")
bloomFilter.setValue(inputRadius, forKey: "inputRadius")
let sourceOverCompositing = CIFilter(name:"CISourceOverCompositing")!
sourceOverCompositing.setValue(inputImage, forKey: "inputImage")
sourceOverCompositing.setValue(bloomFilter.outputImage, forKey: "inputBackgroundImage")
return sourceOverCompositing.outputImage
}}
I don't understand this invisible rectangle. I assume it is because of the CIFilter SourceOverCompositing which overlays the modified image over the original. But why is the torus hidden and the moon not?
I added the torus material to the moons'. material property, to see if something is wrong with the material. But still the same, moon is visible, torus is hidden. The moon rotates around with a helperNode as a child to the containerNode.
This may happen due to two potential problems.
SOLUTION 1.
When you're using CISourceOverCompositing you need a premultiplied RGBA image (RGB * A) for foreground, where an alpha channel has the same shape as the Earth (left picture). But you're having an alpha channel covering all the image (right picture).
If you wanna know what's the shape of Alpha channel in your Earth image – use one of these compositing applications: The Foundry Nuke, Adobe After Effect, Apple Motion, Blackmagic Fusion, etc.
Also, if you want to composite the Moon and the Earth separately, you have to have them as two different images.
In compositing classical OVER operation has the following formula:
(RGB_image1 * A_image1) + (RGB_image2 * (1 – A_image1))
The first part of this formula is a premultiplied foreground image (the Earth) – RGB1 * A1.
The second part of this formula is a background image with a hole – RGB2 * inverted_A1. You've got the inversion of the alpha channel using (1-A). The background image itself could have only three components – RGB (without A).
Then you add two images together using simple addition operation. If you have several OVER operations – the order of these ops is crucial.
SOLUTION 2.
It could be due to a Depth Buffer. Disable writesToDepthBuffer instance property. It is a Boolean value that determines whether SceneKit produces depth information when rendering the material.
yourTorusNode.geometry?.materials.first?.writesToDepthBuffer = false
Hope this helps.
I don't know exactly why this strange behaviour occured, but deleting the following line made it disappear.
self.torus.firstMaterial?.writesToDepthBuffer = false
The writesToDepthBuffer property on the material on the torus was set to false.

ARKit Adding node causes frame drop even after using `prepare`

I am adding a 3D model containing animations to the scene that I previously download from the internet. Before adding this node I use prepare function on it because I wan't to avoid frame drop. But still I get a very short frame drop to about 47 fps. This is caused by executing this prepare function. I also tried using prepare(_:, shouldAbortBlock:) on other dispatch queue, but this still didn't help. Can someone help me resolve this or tell me why there is this happening?
arView.sceneView.prepare([mediaNode]) { [mediaNode, weak self] (success) in
guard let `self` = self else { return }
guard
let currentMediaNode = self.mediaNode as? SCNNode,
currentMediaNode === mediaNode,
!self.mainNode.childNodes.contains(mediaNode)
else { return }
self.mainNode.addChildNode(mediaNode)
}
By the way this is a list of files I'm using to load this model:
https://www.dropbox.com/s/7968fe5wfdcxbyu/Serah-iOS.dae?dl=1
https://www.dropbox.com/s/zqb6b6rxynnvc5e/0001.png?dl=1
https://www.dropbox.com/s/hy9y8qyazkcnvef/0002.tga?dl=1
https://www.dropbox.com/s/fll9jbjud7zjlsq/0004.tga?dl=1
https://www.dropbox.com/s/4niq12mezlvi5oz/0005.png?dl=1
https://www.dropbox.com/s/wikqgd46643327i/0007.png?dl=1
https://www.dropbox.com/s/fioj9bqt90vq70c/0008.tga?dl=1
https://www.dropbox.com/s/4a5jtmccyx413j7/0010.png?dl=1
DAE file is already compiled by Xcode tools so that it can be loaded after being downloaded from the internet. And this is the code I use to load it after it's downloaded:
class func loadModel(fromURL url: URL) -> SCNNode? {
let options = [SCNSceneSource.LoadingOption.animationImportPolicy : SCNSceneSource.AnimationImportPolicy.playRepeatedly]
let sceneSource = SCNSceneSource(url: url, options: options)
let node = sceneSource?.entryWithIdentifier("MDL_Obj", withClass: SCNNode.self)
return node
}
I was experiencing the same issue. My nodes were all taking advantage of physically-based rendering (PBR) and the first time I added a node to the scene, the frame rate dropped significantly, but was fine after that. I could add as many other nodes without a frame rate drop.
I figured out a work around to this issue. What I do is after I create my ARConfiguration and before I call session.run(configuration) I add a test node with PBR to the scene. In order for that node to not appear, I set the node's material's colorBufferWriteMask to an empty array (see this answer: ARKit hide objects behind walls) Then before I add my content I remove that node. Adding and removing this test node does the trick for me.
Here is an example:
var pbrTestNode: SCNNode!
func addPBRTestNode() {
let testGeometrie = SCNBox(width: 0.5, height: 0.5, length: 0.5, chamferRadius: 0)
testGeometrie.materials.first?.diffuse.contents = UIColor.blue
testGeometrie.materials.first?.colorBufferWriteMask = []
testGeometrie.materials.first?.lightingModel = .physicallyBased
pbrTestNode = SCNNode(geometry: testGeometrie)
scene.rootNode.addChildNode(pbrTestNode)
}
func removePBRTestNode() {
pbrTestNode.removeFromParentNode()
}
func startSessionWithPlaneDetection() {
// Create a session configuration
let configuration = ARWorldTrackingConfiguration()
if #available(iOS 11.3, *) {
configuration.planeDetection = [.horizontal, .vertical]
} else {
configuration.planeDetection = .horizontal
}
configuration.isLightEstimationEnabled = true
// this prevents the delay when adding any nodes with PBR later
sceneController.addPBRTestNode()
// Run the view's session
sceneView.session.run(configuration)
}
Call removePBRTestNode() when you add your content to the scene.
Firstly
Get 3D model for AR app with no more than 10K polygons and a texture of 1K x 1K. The best result can be accomplished with 5K...7K polygons per each model. Totally, SceneKit's scene may contain not more than 100K polygons. This recommendation helps you considerably improve rendering performance and, I suppose, you'll have a minimal drop frame.
Secondly
The simplest way to get rid of drop frame in ARKit/SceneKit/AVKit is to use Metal framework. Just imagine: a simple image filter can be more than a hundred times faster to perform on the GPU than an equivalent CPU-based filter. The same things I could say about realtime AV-video and 3D animation – they perform much better on GPU.
For instance, you can read this useful post about using Metal rendering for AVCaptureSession. There's awesome workflow how to use Metal.
P.S. Check your animated object/scene in 3D authoring tool (if it's OK) before writing a code.