In my Swift / ARKit / SceneKit project, I need to tell if the user's face in front-facing camera is parallel to the camera.
I was able to tell horizontal parallel by comparing the left and right eyes distance (using faceAnchor.leftEyeTransform and the worldPosition property) from the camera.
But I am stuck on vertical parallel. Any ideas, how to achieve that?
Assuming you are using ARFaceTrackingConfiguration in your app, you can actually retrieve the transforms of both the ARFaceAnchor and the camera to determine their orientations. You can get a simd_float4x4 matrix of the head's orientation in world space by using ARFaceAnchor.transform property. Similarly, you can get the transform of the SCNCamera or ARCamera of your scene.
To compare the camera's and face's orientations relative to each other in a SceneKit app (though there are similar functions on the ARKit side of things), I get the world transform for the node that is attached to each of them, let's call them faceNode attached to the ARFaceAnchor and cameraNode representing the ARSCNView.pointOfView. To find the angle between the camera and your face, for example, you could do something like this:
let faceOrientation: simd_quatf = faceNode.simdWorldTransform
let cameraOrientation: simd_quatf = cameraNode.simdWorldTransform
let deltaOrientation: simd_quatf = faceOrientation.inverse * cameraOrientation
By looking at deltaOrientation.angle and deltaOrientation.axis you can determine the relative angles on each axis between the face and the camera. If you do something like deltaOrientation.axis * deltaOrientation.angles, you have a simd_float3 vector giving you a sense of the pitch, yaw and roll (in radians) of the head relative to the camera.
There are a number of ways you can do this using the face anchor and camera transforms, but this simd quaternion method works quite well for me. Hope this helps!
Related
I am trying to place large 3D models (SCNNode) in ARSCNView using ARKit.
The approximate size is as follows :
I have been through the following links :
is-there-any-size-limitations-on-3d-files-loaded-using-arkit
load-large-3d-object-scn-file-in-arscnview-aspect-fit-in-to-the-screen-arkit-sw
As per the above link, upvoted answer by alex papa, the model gets placed in scene. But the model seems above ground hanging in air. The 3D object seems floating in air and not placed on detected/tapped horizontal plane using hit test.
The x & z position is right but y seems some meters above the horizontal plane.
I need scale to be 1.0. Without scaling down the 3D model is it possible to place / visualise it right?
Any help or leads will be of help. Please provide valuable inputs!
The scale of ARKit, SceneKit and RealityKit is meters. Hence, your model's size is 99m X 184m X 43m. Solution is simple – you need to take one 100th of the nominal scale:
let scaleFactor: Float = 0.01
node.scale = SCNVector3(scaleFactor, scaleFactor, scaleFactor)
And here you can read about positioning of pivot point.
When ARSCNView is configured as ARFaceTrackingConfiguration, how can it be set to non-mirroring ?
Selfie camera's matrix is mirrored absolutely correctly.
ARFaceTrackingConfig uses selfie camera that is oriented 180 degrees - away from the rear camera. Such an orientation places a user's face in the positive Z direction. To the right of the user is the negative X-axis. Thus, when combining the scene with the ARWorldTrackingConfig and ARFaceTrackingConfig, we get an absolutely correct 3D environment.
Swift beginner struggling with moving a scene node in ARkit in response to the device motion.
What I want to achieve is: First detect the floor plane, then place a sphere on the floor. From that point onwards depending on the movement of the device, I want to move the sphere along its x and z axis to move it around the floor of the room. (The sphere once created needs to be in the center of the device screen and locked to that view)
So far I can detect the floor and place a node no problem. I can use device motion to obtain the device attitude (pitch, roll and yaw) but how to translate these values into meaningful x, y, z positions that I can update my node with?
Are there any formulas or methods that are used to calculate such information or is this the wrong approach? I would appreciate a link to some info or an explanation of how to go about this. Also I am unsure how to ensure the node would be always at the center of the device screen.
so, as far as I understood you want to have a following workflow:
Step 1. You create a sphere on a plane (which is already done)
Step 2. Move the sphere with respect to the camera's horizontal plane (i.e. along its x and z axis to move it around the floor of the room depending on the movement of the device)
Assuming that the Step 1 is done, what you can do:
Get the position of the camera and the sphere
This should be first called within the function that is invoked after sphere creation (be it a tapGestureRecognizer(), touchesBegan(), etc.).
You can do it by calling position property of SCNNode for sphere and for camera position and/or orientation by calling sceneView.session.currentFrame's .camera.transform which contains all necessary parameters about current position of the camera
Move the sphere as camera moves
Having the sphere position on the Scene and the transformation matrix of the camera, you can find the distance relation between them. Here you can find a good explanation of how exactly you can do it
After you get those things you should implement a proper logic within renderer(_:didUpdate:for:) to obtain continuous lock of the ball with respect to the camera position
If you are interested about the math behind it, you can kick off by reading more about transformation matrices which is a big part of Image Processing and many other areas
Hope that this will help!
I'm combining ARKit with a CNN to constantly update ARKit nodes when they drift. So:
Get estimate of node position with ARKit and place a virtual object in the world
Use CNN to get its estimated 2D location of the object
Update node position accordingly (to refine it's location in 3D space)
The problem is that #2 takes 0,3s or so. Therefore I can't use sceneView.unprojectPoint because the point will correspond to a 3D point from the device's world position from #1.
How do I calculate the 3D vector from my old location to the CNN's 2D point?
unprojectPoint is just a matrix-math convenience function similar to those found in many graphics-oriented libraries (like DirectX, old-style OpenGL, Three.js, etc). In SceneKit, it's provided as a method on the view, which means it operates using the model/view/projection matrices and viewport the view currently uses for rendering. However, if you know how that function works, you can implement it yourself.
An Unproject function typically does two things:
Convert viewport coordinates (pixels) to the clip-space coordinate system (-1.0 to 1.0 in all directions).
Reverse the projection transform (assuming some arbitrary Z value in clip space) and the view (camera) transform to get to 3D world-space coordinates.
Given that knowledge, we can build our own function. (Warning: untested.)
func unproject(screenPoint: float3, // see below for Z depth hint discussion
modelView: float4x4,
projection: float4x4,
viewport: CGRect) -> float3 {
// viewport to clip: subtract viewport origin, divide by size,
// scale/offset from 0...1 to -1...1 coordinate space
let clip = (screenPoint - float3(viewport.x, viewport.y, 1.0))
/ float3(viewport.width, viewport.height, 1.0)
* float3(2) - float3(1)
// apply the reverse of the model-view-projection transform
let inversePM = (projection * modelView).inverse
let result = inversePM * float4(clip.x, clip.y, clip.z, 1.0)
return float3(result.x, result.y, result.z) / result.w // perspective divide
}
Now, to use it... The modelView matrix you pass to this function is the inverse of ARCamera.transform, and you can also get projectionMatrix directly from ARCamera. So, if you're grabbing a 2D position at one point in time, grab the camera matrices then, too, so that you can work backward to 3D as of that time.
There's still the issue of that "Z depth hint" I mentioned: when the renderer projects 3D to 2D it loses information (one of those D's, actually). So you have to recover or guess that information when you convert back to 3D — the screenPoint you pass in to the above function is the x and y pixel coordinates, plus a depth value between 0 and 1. Zero is closer to the camera, 1 is farther away. How you make use of that sort of depends on how the rest of your algorithm is designed. (At the very least, you can unproject both Z=0 and Z=1, and you'll get the endpoints of line segment in 3D, with your original point somewhere along that line.)
Of course, whether this can actually be put together with your novel CNN-based approach is another question entirely. But at least you learned some useful 3D graphics math!
So,
I have been using the unproject function on the SCNSceneRenderer:
public func unprojectPoint(_ point: SCNVector3) -> SCNVector3
When I want to unproject a screen point I pass in Z = 1.
To check on things I also placed a node in the scene at the unprojected vector position. Things seem to check out.
In the process I have wondered about how ARKit really handle the near and far plane.
The unprojected point on the far plane when logged gives me this, and this is when I point the camera (as straight as possible downtime -Z in world coordinates):
SCNVector3(x: 121.191811, y: -176.614227, z: -1111.88794)
Given that in ARKit the unit is meters, does -1111 mean that the far plane is about 1K away?
I am trying to understand how the near and far planes are positioned in an ARKit session, specifically, is the far plane at a fixed position, meaning, is it always at a fixed distance from the camera? Does it change? And is that about 1K meters seem to make sense?
The unprojectPoint function uses the same projection as the camera. If you want to know what the camera projection’s near and far planes are, ask the view for its pointOfView node, ask that node for its camera, and ask the camera for its zNear and zFar settings.