What is the best way to utilize memory inside of "session(_:didUpdate:)" method? - swift

My use case is I want to calculate various gestures of a hand (the first hand) seen by the camera. I am able to find body anchors and hand anchors and poses. See my video here.
I am trying to utilize previous position SIMD3 information to calculate what kind of gesture was demonstrated. I did see the example posted by Apple which shows pinching to write virtually, I am not sure that a buffer is the right solution for something like this.
A specific example of what I am trying to do is detect a swipe, long-press, tap as if the user is wearing a pair of AR glasses (made by Apple one day). For clarification I want to raycast from my hand and perform a gesture on an Entity or Anchor.
Here is a snippet for those of you that want to know how to get body anchors:
public func session(_ session: ARSession, didUpdate frame: ARFrame) {
let capturedImage = frame.capturedImage
let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: capturedImage,
orientation: .right,
options: [:])
let handPoseRequest = VNDetectHumanHandPoseRequest()
//let bodyPoseRequest = VNDetectHumanBodyPoseRequest()
do {
try imageRequestHandler.perform([handPoseRequest])
guard let observation = handPoseRequest.results?.first else {
return
}
// Get points for thumb and index finger.
let thumbPoints = try observation.recognizedPoints(.thumb)
let indexFingerPoints = try observation.recognizedPoints(.indexFinger)
let pinkyFingerPoints = try observation.recognizedPoints(.littleFinger)
let ringFingerPoints = try observation.recognizedPoints(.ringFinger)
let middleFingerPoints = try observation.recognizedPoints(.middleFinger)
self.detectHandPose(handObservations: observation)
} catch {
print("Failed to perform image request.")
}
}

Related

Scenekit: rotate camera to tap point (3D)

I have a camera node.
Around the camera node, there is another big node (.obj file) of a building.
User can move inside the building.
User can do LongPressGesture, and additional node (let's say a sphere) appears on the wall of the building. I want to rotate my camera to this new node (to tap location).
I don't know how to do it. Can someone help me?
Other answers are not correct for me. Camera just rotates in random directions.
I've found a way!
I take the location of a tap (or any coordinates you need to turn to)
#objc private func handleLongPress(pressRec: UILongPressGestureRecognizer) {
let arr: [UIGestureRecognizer.State] = [.cancelled, .ended, .failed]
if !arr.contains(pressRec.state) {
let touchPoint = pressRec.location(in: sceneView)
let hitResults = sceneView.hitTest(touchPoint, options: [:])
if let result: SCNHitTestResult = hitResults.first {
createAnnotation(result.worldCoordinates)
pressRec.state = .cancelled
}
}
}
func for turn camera:
func turnCameraTo(worldCoordinates: SCNVector3) {
SCNTransaction.begin()
SCNTransaction.animationDuration = C.hotspotAnimationDuration
cameraNode.look(at: worldCoordinates)
sceneView.defaultCameraController.clearRoll()
SCNTransaction.completionBlock = {
}
SCNTransaction.commit()
}

ARKit SCNNode always in the center when camera move

I am working on a project where I have to place a green dot to be always in the center even when we rotate the camera in ARKit. I am using ARSCNView and I have added the node so far everything is good. Now I know I need to modify the position of the node in
func session(_ session: ARSession, didUpdate frame: ARFrame)
But I have no idea how to do that. I saw some example which was close to what I have but it does not run as it suppose to.
func session(_ session: ARSession, didUpdate frame: ARFrame) {
let location = sceneView.center
let hitTest = sceneView.hitTest(location, types: .featurePoint)
if hitTest.isEmpty {
print("No Plane Detected")
return
} else {
let columns = hitTest.first?.worldTransform.columns.3
let position = SCNVector3(x: columns!.x, y: columns!.y, z: columns!.z)
var node = sceneView.scene.rootNode.childNode(withName: "CenterShip", recursively: false) ?? nil
if node == nil {
let scene = SCNScene(named: "art.scnassets/ship.scn")!
node = scene.rootNode.childNode(withName: "ship", recursively: false)
node?.opacity = 0.7
let columns = hitTest.first?.worldTransform.columns.3
node!.name = "CenterShip"
node!.position = SCNVector3(x: columns!.x, y: columns!.y, z: columns!.z)
sceneView.scene.rootNode.addChildNode(node!)
}
let position2 = node?.position
if position == position2! {
return
} else {
//action
let action = SCNAction.move(to: position, duration: 0.1)
node?.runAction(action)
}
}
}
It doesn't matter how I rotate the camera this dot must be in the middle.
It's not clear exactly what you're trying to do, but I assume its one of the following:
A) Place the green dot centered in front of the camera at a fixed distance, eg. always exactly 1 meter in front of the camera.
B) Place the green dot centered in front of the camera at the depth of the nearest detected plane, i.e. using the results of a raycast from the mid point of the ARSCNView
I would have assumed A, but your example code is using (now deprecated) sceneView.hitTest() function which in this case would give you the depth of whatever is behind the pixel at sceneView.center
Anyway here's both:
Fixed Depth Solution
This is pretty straightforward, though there are few options. The simplest is to make the green dot a child node of the scene's camera node, and give it position with a negative z value, since z increases as a position moves toward the camera.
cameraNode.addChildNode(textNode)
textNode.position = SCNVector3(x: 0, y: 0, z: -1)
As the camera moves, so too will its child nodes. More details in this very thorough answer
Scene Depth Solution
To determine the estimated depth behind a pixel, you should use ARSession.raycast instead of SceneView.hitTest, because the latter is definitely deprecated.
Note that, if the raycast() (or still hitTest()) methods return an empty result set (not uncommon given the complexity of scene estimation going on in ARKit), you won't have a position to update the node and this it might not be directly centered in every frame. To handle this is a bit more complex, as you'd need decide exactly what you want to do in that case.
The SCNAction is unnecessary and potentially causing problems. These delegate methods run 60fps, so simply updating the position directly will produce smooth results.
Adapting and simplifying the code you posted:
func createCenterShipNode() -> SCNNode {
let scene = SCNScene(named: "art.scnassets/ship.scn")!
let node = scene.rootNode.childNode(withName: "ship", recursively: false)
node!.opacity = 0.7
node!.name = "CenterShip"
sceneView.scene.rootNode.addChildNode(node!)
return node!
}
func session(_ session: ARSession, didUpdate frame: ARFrame) {
// Check the docs for what the different raycast query parameters mean, but these
// give you the depth of anything ARKit has detected
guard let query = sceneView.raycastQuery(from: sceneView.center, allowing: .estimatedPlane, alignment: .any) else {
return
}
let results = session.raycast(query)
if let hit = results.first {
let node = sceneView.scene.rootNode.childNode(withName: "CenterShip", recursively: false) ?? createCenterShipNode()
let pos = hit.worldTransform.columns.3
node.simdPosition = simd_float3(pos.x, pos.y, pos.z)
}
}
See also: ARRaycastQuery
One last note - you generally don't want to do scene manipulation within this delegate method. It runs on a different thread than the Scenekit rendering thread, and SceneKit is very thread sensitive. This will likely work fine, but beyond adding or moving a node will certainly cause crashes from time to time. You'd ideally want to store the new position, and then update the actual scene contents from within the renderer(_ renderer: SCNSceneRenderer, updateAtTime time: TimeInterval) delegate method.

RealityKit and Vision – How to call RayCast API

This question is also asked in the Apple Forum but so far, I have not seen any response there.
The question is really, after finding the point of interested from a frame in ARSession. How to convert that into 3D world coordinate.
How did I got a point:
let handler = VNImageRequestHandler(cvPixelBuffer: frame.capturedImage, orientation: .up, options: [:])
let handPoseRequest = VNDetectHumanHandPoseRequest()
....
try handler.perform([handPoseRequest])
Then I need to Raycast from the 2D point derived from ARFrame.capturedImage to 3D world coordinate:
fileprivate func convertVNPointTo3D(_ point: VNRecognizedPoint,
_ session: ARSession,
_ frame: ARFrame,
_ viewSize: CGSize) -> Transform? {
let pointX = (point.x / Double(frame.camera.imageResolution.width))*Double(viewSize.width)
let pointY = (point.y / Double(frame.camera.imageResolution.height))*Double(viewSize.height)
let query = frame.raycastQuery(from: CGPoint(x: pointX, y: pointY), allowing: .estimatedPlane, alignment: .any)
let results = session.raycast(query)
if let first = results.first {
return Transform(matrix: first.worldTransform)
} else {
return nil
}
}
According to API, I should use UI point. However, I do not know how capturedImage being converted to UI point. The calculate I used for the points are not correct.
Thanks.
The issue was the image orientation. In my case, using iPad back camera in Portrait direction, I need to do .downMirrored (instead of .up).
let handler = VNImageRequestHandler(cvPixelBuffer: frame.capturedImage, orientation: .downMirrored, options: [:])
Once getting the orientation correct, the point values from image recognition could be DIRECTLY used raycast.

Cropping/Compositing An Image With Vision/CoreImage

I am working with the Vision framework in iOS 13 and am trying to achieve the following tasks;
Take an image (in this case, a CIImage) and locate all faces in the image using Vision.
Crop each face into its own CIImage (I'll call this a "face image").
Filter each face image using a CoreImage filter, such as a blur or comic book effect.
Composite the face image back over the original image, hereby creating effects that only apply to the face.
A better example of this would be the end goal of taking a live camera feed from an AVCaptureSession and blurring every face in the video frame, compositing the blurred faces back over the original image for saving.
I almost have this working, save for the fact that there seems to be a coordinates/translation issue. For example, when I test this code and move my face, the "blurred" section goes the wrong direction (if I turn my face right, the box goes left, if I look up, the box goes down). While I think this may have something to do with mirroring on the front-facing camera, I can't seem to figure out what I should try next;
func drawFaceBox(bufferImage: CIImage, observations: [VNFaceObservation]) -> CVPixelBuffer? {
// The filter
let blur = CIFilter(name: "CICrystallize")
// The unfiltered image, prepared for filtering
var filteredImage = bufferImage
// Find and crop each face
if !observations.isEmpty {
for face in observations {
let faceRect = VNImageRectForNormalizedRect(face.boundingBox, Int(bufferImage.extent.size.width), Int(bufferImage.extent.size.height))
let croppedFace = bufferImage.cropped(to: faceRect)
blur?.setValue(croppedFace, forKey: kCIInputImageKey)
blur?.setValue(10.0, forKey: kCIInputRadiusKey)
if let blurred = blur?.value(forKey: kCIOutputImageKey) as? CIImage {
compositorCIFilter?.setValue(blurred, forKey: kCIInputImageKey)
compositorCIFilter?.setValue(filteredImage, forKey: kCIInputBackgroundImageKey)
if let output = compositorCIFilter?.value(forKey: kCIOutputImageKey) as? CIImage {
filteredImage = output
}
}
}
}
// Convert image to CVPixelBuffer and return. This part works fine.
}
Any thoughts on how I can composite the blurred face image(s) back to their original position with accuracy? Or any other approach to only filter part of the original CIImage to avoid this issue altogether/save processing? Thanks!
I believe this issue stems from an orientation problem earlier on in the pipeline (specifically, during the output of the sample buffers from the camera, which is where the Vision task was instantiated). I have updated my didOutputSampleBuffer code like so;
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
...
// Setup the current device orientation
let curDeviceOrientation = UIDevice.current.orientation
// Handle the image property orientation
//let orientation = self.exifOrientation(from: curDeviceOrientation)
// Setup the image request handler
//let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: CGImagePropertyOrientation(rawValue: UInt32(1))!)
let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:])
// Setup the completion handler
let completion: VNRequestCompletionHandler = {request, error in
let observations = request.results as! [VNFaceObservation]
// Draw faces
DispatchQueue.main.async {
// HANDLE FACES
self.drawFaceBoxes(for: observations)
}
}
// Setup the image request
let request = VNDetectFaceRectanglesRequest(completionHandler: completion)
// Handle the request
do {
try handler.perform([request])
} catch {
print(error)
}
}
As noted, I have commented out the let orientation = ... and the first let handler = ..., which was using the orientation. By removing the reference to the orientation, I seem to have removed any issue with orientation in the Vision calculations.

How to set a known position and orientation as a starting point of ARKit

I am starting to use ARKit and I have a use case where I want to know the motion from a known position to another one.
So I was wondering if it is possible (like every tracking solution) to set a known position and orientation a starting point of the tracking in ARKit?
Regards
There are at least six approaches allowing you set a starting point for a model. But using no ARAnchors at all in your ARScene is considered as bad AR experience (although Apple's Augmented Reality app template has no any ARAnchors in a code).
First approach
This is the approach that Apple engineers propose us in Augmented Reality app template in Xcode. This approach doesn't use anchoring, so all you need to do is to accommodate a model in air with coordinates like (x: 0, y: 0, z: -0.5) or in other words your model will be 50 cm away from camera.
override func viewDidLoad() {
super.viewDidLoad()
sceneView.scene = SCNScene(named: "art.scnassets/ship.scn")!
let model = sceneView.scene.rootNode.childNode(withName: "ship",
recursively: true)
model?.position.z = -0.5
sceneView.session.run(ARWorldTrackingConfiguration())
}
Second approach
Second approach is almost the same as the first one, except it uses ARKit's anchor:
guard let sceneView = self.view as? ARSCNView
else { return }
if let currentFrame = sceneView.session.currentFrame {
var translation = matrix_identity_float4x4
translation.columns.3.z = -0.5
let transform = simd_mul(currentFrame.camera.transform, translation)
let anchor = ARAnchor(transform: transform)
sceneView.session.add(anchor: anchor)
}
Third approach
You can also create a pre-defined model's position pinned with ARAnchor using third approach, where you need to import RealityKit module as well:
func session(_ session: ARSession, didUpdate anchors: [ARAnchor]) {
let model = ModelEntity(mesh: MeshResource.generateSphere(radius: 1.0))
// ARKit's anchor
let anchor = ARAnchor(transform: simd_float4x4(diagonal: [1,1,1]))
// RealityKit's anchor based on position of ARAnchor
let anchorEntity = AnchorEntity(anchor: anchor)
anchorEntity.addChild(model)
arView.scene.anchors.append(anchorEntity)
}
Fourth approach
If you turned on a plane detection feature you can use Ray-casting or Hit-testing methods. As a target object you can use a little sphere (located at 0, 0, 0) that will be ray-casted.
let query = arView.raycastQuery(from: screenCenter,
allowing: .estimatedPlane,
alignment: .any)
let raycast = session.trackedRaycast(query) { results in
if let result = results.first {
object.transform = result.transform
}
}
Fifth approach
This approach is focused to save and share ARKit's worldMaps.
func writeWorldMap(_ worldMap: ARWorldMap, to url: URL) throws {
let data = try NSKeyedArchiver.archivedData(withRootObject: worldMap,
requiringSecureCoding: true)
try data.write(to: url)
}
func loadWorldMap(from url: URL) throws -> ARWorldMap {
let mapData = try Data(contentsOf: url)
guard let worldMap = try NSKeyedUnarchiver.unarchivedObject(ofClass: ARWorldMap.self,
from: mapData)
else {
throw ARError(.invalidWorldMap)
}
return worldMap
}
Sixth approach
In ARKit 4.0 a new ARGeoTrackingConfiguration is implemented with the help of MapKit module. So now you can use a pre-defined GPS data.
func session(_ session: ARSession, didAdd anchors: [ARAnchor]) {
for geoAnchor in anchors.compactMap({ $0 as? ARGeoAnchor }) {
arView.scene.addAnchor(Entity.placemarkEntity(for: geoAnchor)
}
}