Is it possible to place objects with hit test without plane detection? - swift

Following Apple's Creating an Immersive AR Experience with Audio
, I thought it would be interesting to experiment and try to place objects anywhere and not just on a vertical and horizontal plane. Is it at all possible to place an object using touch without plane detection? I understand that plane detection would increase the accuracy of hit tests and ARAnchor detection, so would there be any way where one could perform hit tests on any other location in the scene?

If your AR scene already contains any 3D geometry in a current session you can definitely use hit-testing to place a new model there (a placement based on already contained 3D geometry), or you can use feature points for model's placement (if any).
If there's no 3D geometry at all in your AR scene, or there's a extremely sparse point cloud, what do you apply hit-testing method to? Hit-test is a projected 2D point from screen-space onto a 3D surface (remember, detected planes are hidden 3D planes), or onto any appropriate feature point.
So, in AR, plane detection is crucial when developer is using hit-testing.
func hitTest(_ point: CGPoint,
types: ARHitTestResult.ResultType) -> [ARHitTestResult]
Here you can see all the ARHitTestResult.ResultType available.
But pay attention to this, there's a hitTest method returning SCNHitTestResult:
func hitTest(_ point: CGPoint,
options: [SCNHitTestOption : Any]?) -> [SCNHitTestResult]
Usage:
let touchPosition: CGPoint = gesture.location(in: sceneView)
let hitTestResult = sceneView.hitTest(touchPosition,
types: .existingPlaneUsingExtent)
or:
let hitTestResult = sceneView.hitTest(touchPosition,
types: .featurePoint)
Also, hit-testing is actively used in 3D games but it's rather for VR there than for AR.

Related

ARKit: How to tell if user's face is parallel to camera

In my Swift / ARKit / SceneKit project, I need to tell if the user's face in front-facing camera is parallel to the camera.
I was able to tell horizontal parallel by comparing the left and right eyes distance (using faceAnchor.leftEyeTransform and the worldPosition property) from the camera.
But I am stuck on vertical parallel. Any ideas, how to achieve that?
Assuming you are using ARFaceTrackingConfiguration in your app, you can actually retrieve the transforms of both the ARFaceAnchor and the camera to determine their orientations. You can get a simd_float4x4 matrix of the head's orientation in world space by using ARFaceAnchor.transform property. Similarly, you can get the transform of the SCNCamera or ARCamera of your scene.
To compare the camera's and face's orientations relative to each other in a SceneKit app (though there are similar functions on the ARKit side of things), I get the world transform for the node that is attached to each of them, let's call them faceNode attached to the ARFaceAnchor and cameraNode representing the ARSCNView.pointOfView. To find the angle between the camera and your face, for example, you could do something like this:
let faceOrientation: simd_quatf = faceNode.simdWorldTransform
let cameraOrientation: simd_quatf = cameraNode.simdWorldTransform
let deltaOrientation: simd_quatf = faceOrientation.inverse * cameraOrientation
By looking at deltaOrientation.angle and deltaOrientation.axis you can determine the relative angles on each axis between the face and the camera. If you do something like deltaOrientation.axis * deltaOrientation.angles, you have a simd_float3 vector giving you a sense of the pitch, yaw and roll (in radians) of the head relative to the camera.
There are a number of ways you can do this using the face anchor and camera transforms, but this simd quaternion method works quite well for me. Hope this helps!

What is ARAnchor exactly?

I'm trying to understand and use ARKit. But there is one thing that I cannot fully understand.
Apple said about ARAnchor:
A real-world position and orientation that can be used for placing objects in an AR scene.
But that's not enough. So my questions are:
What is ARAnchor exactly?
What are the differences between anchors and feature points?
Is ARAnchor just part of feature points?
And how does ARKit determines its anchors?
Updated: February 02, 2023.
TL;DR
ARAnchor
ARAnchor is an invisible null-object that holds a 3D model at anchor's position. Think of ARAnchor as a parent transform node of your model that you can translate, rotate and scale like any other SceneKit node. Every 3D model has a pivot point, right? Thus, this pivot point must match a location of an ARAnchor in AR app.
If you're not using anchors in ARKit or ARCore app (in RealityKit, however, it's impossible not to use anchors because they are integral part of a scene), your 3D models may drift from where they were placed, and this will dramatically impact app’s realism and user experience. Thus, anchors are crucial elements of any AR scene.
According to ARKit 2017 documentation:
ARAnchor is a real-world position and orientation that can be used for placing objects in AR Scene. Adding an anchor to the session helps ARKit to optimize world-tracking accuracy in the area around that anchor, so that virtual objects appear to stay in place relative to the real world. If a virtual object moves, remove the corresponding anchor from the old position and add one at the new position.
ARAnchor is a parent class of other 10 anchors' types in ARKit, hence all those subclasses inherit from ARAnchor. Usually you do not use ARAnchor directly. I must also say that ARAnchor and Feature Points have nothing in common. Feature Points are rather special visual elements for tracking and debugging.
ARAnchor doesn't automatically track a real world target. When you need automation, you have to use renderer() or session() instance methods that can be implemented in case you comformed to ARSCNViewDelegate or ARSessionDelegate protocols, respectively.
Here's an image with visual representation of plane anchor. Keep in mind: you can neither see a detected plane nor its corresponding ARPlaneAnchor, by default. So, if want to see the anchor in your scene, you may "visualize" it using three thin SCNCylinder primitives. Each color of the cylinder represents a particular axis: so RGB is XYZ.
In ARKit you can automatically add ARAnchors to your scene using different scenarios:
ARPlaneAnchor
If horizontal and/or vertical planeDetection instance property is ON, ARKit is able to add ARPlaneAnchors to anchors' collection in the running session. Sometimes activated planeDetection considerably increases a time required for scene understanding stage.
ARImageAnchor (conforms to ARTrackable protocol)
This type of anchors contains information about a transform of a detected image – anchor is placed at image's center – on world-tracking or image-tracking config. To activate image tracking, use detectionImages instance property. In ARKit 2.0 you can totally track up to 25 images, in ARKit 3.0 / 4.0 – up to 100 images, respectively. But, in both cases, not more than just 4 images simultaneously. However, it was promised, that in ARKit 5.0 / 6.0, you can detect and track up to 100 images at a time (but it's still not implemented yet).
ARBodyAnchor (conforms to ARTrackable protocol)
You can turn on body tracking by running a session based on ARBodyTrackingConfig(). You'll get ARBodyAnchor at a Root Joint of a real performer's skeleton or, in other words, at pelvis position of a tracked character.
ARFaceAnchor (conforms to ARTrackable protocol)
Face Anchor stores information about head's topology, pose and face expression. You can track ARFaceAnchor with a help of the front TrueDepth camera. When face is detected, Face Anchor will be attached slightly behind a nose, in the center of a face. In ARKit 2.0 you can track just one face, in ARKit 3.0 and higher – up to 3 faces, simultaneously. However, the number of tracked faces depends on presence of a TrueDepth sensor and processor version: gadgets with TrueDepth camera can track up to 3 faces, gadgets with A12+ chipset, but without TrueDepth camera, can also track up to 3 faces.
ARObjectAnchor
This anchor's type keeps an information about 6 Degrees of Freedom (position and orientation) of a real-world 3D object detected in a world-tracking session. Remember that you need to specify ARReferenceObject instances for detectionObjects property of session config.
AREnvironmentProbeAnchor
Probe Anchor provides environmental lighting information for a specific area of space in a world-tracking session. ARKit's Artificial Intelligence uses it to supply reflective shaders with environmental reflections.
ARParticipantAnchor
This is an indispensable anchor type for multiuser AR experiences. If you want to employ it, use true value for isCollaborationEnabled property in ARWorldTrackingConfig. Then import MultipeerConnectivity framework.
ARMeshAnchor
ARKit and LiDAR subdivide the reconstructed real-world scene surrounding the user into mesh anchors with corresponding polygonal geometry. Mesh anchors constantly update their data as ARKit refines its understanding of the real world. Although ARKit updates a mesh to reflect a change in the physical environment, the mesh's subsequent change is not intended to reflect in real time. Sometimes your reconstructed scene can have up to 30-40 anchors or even more. This is due to the fact that each classified object (wall, chair, door or table) has its own personal anchor. Each ARMeshAnchor stores data about corresponding vertices, one of eight cases of classification, its faces and vertices' normals.
ARGeoAnchor (conforms to ARTrackable protocol)
In ARKit 4.0+ there's a geo anchor (a.k.a. location anchor) that tracks a geographic location using GPS, Apple Maps and additional environment data coming from Apple servers. This type of anchor identifies a specific area in the world that the app can refer to. When a user moves around the scene, the session updates a location anchor’s transform based on coordinates and device’s compass heading of a geo anchor. Look at the list of supported cities.
ARAppClipCodeAnchor (conforms to ARTrackable protocol)
This anchor tracks the position and orientation of App Clip Code in the physical environment in ARKit 4.0+. You can use App Clip Codes to enable users to discover your App Clip in the real world. There are NFC-integrated App Clip Code and scan-only App Clip Code.
There are also other regular approaches to create anchors in AR session:
Hit-Testing methods
Tapping on the screen, projects a point onto a invisible detected plane, placing ARAnchor on a location where imaginary ray intersects with this plane. By the way, ARHitTestResult class and its corresponding hit-testing methods for ARSCNView and ARSKView will be deprecated in iOS 14, so you have to get used to a Ray-Casting.
Ray-Casting methods
If you're using ray-casting, tapping on the screen results in a projected 3D point on an invisible detected plane. But you can also perform Ray-Casting between A and B positions in 3D scene. So, ray-casting can be 2D-to-3D and 3D-to-3D. When using the Tracked Ray-Casting, ARKit can keep refining the ray-cast as it learns more and more about detected surfaces.
Feature Points
Special yellow points that ARKit automatically generates on a high-contrast margins of real-world objects, can give you a place to put an ARAnchor on.
ARCamera's transform
iPhone's or iPad's camera position and orientation simd_float4x4 can be easily used as a place for ARAnchor.
Any arbitrary World Position
Place a custom ARWorldAnchor anywhere in your scene. You can generate ARKit's version of world anchor like AnchorEntity(.world(transform: mtx)) found in RealityKit.
This code snippet shows you how to use an ARPlaneAnchor in a delegate's method: renderer(_:didAdd:for:):
func renderer(_ renderer: SCNSceneRenderer,
didAdd node: SCNNode,
for anchor: ARAnchor) {
guard let planeAnchor = anchor as? ARPlaneAnchor
else { return }
let grid = Grid(anchor: planeAnchor)
node.addChildNode(grid)
}
AnchorEntity
AnchorEntity is alpha and omega in RealityKit. According to RealityKit documentation 2019:
AnchorEntity is an anchor that tethers virtual content to a real-world object in an AR session.
RealityKit framework and Reality Composer app were announced at WWDC'19. They have a new class named AnchorEntity. You can use AnchorEntity as the root point of any entities' hierarchy, and you must add it to the Scene anchors collection. AnchorEntity automatically tracks real world target. In RealityKit and Reality Composer AnchorEntity is at the top of hierarchy. This anchor is able to hold a hundred of models and in this case it's more stable than if you use 100 personal anchors for each model.
Let's see how it looks in a code:
func makeUIView(context: Context) -> ARView {
let arView = ARView(frame: .zero)
let modelAnchor = try! Experience.loadModel()
arView.scene.anchors.append(modelAnchor)
return arView
}
AnchorEntity has three components:
Anchoring component
Transform component
Synchronization component
To find out the difference between ARAnchor and AnchorEntity look at THIS POST.
Here are nine AnchorEntity's cases available in RealityKit 2.0 for iOS:
// Fixed position in the AR scene
AnchorEntity(.world(transform: mtx))
// For body tracking (a.k.a. Motion Capture)
AnchorEntity(.body)
// Pinned to the tracking camera
AnchorEntity(.camera)
// For face tracking (Selfie Camera config)
AnchorEntity(.face)
// For image tracking config
AnchorEntity(.image(group: "GroupName", name: "forModel"))
// For object tracking config
AnchorEntity(.object(group: "GroupName", name: "forObject"))
// For plane detection with surface classification
AnchorEntity(.plane([.any], classification: [.seat], minimumBounds: [1, 1]))
// When you use ray-casting
AnchorEntity(raycastResult: myRaycastResult)
// When you use ARAnchor with a given identifier
AnchorEntity(.anchor(identifier: uuid))
// Creates anchor entity on a basis of ARAnchor
AnchorEntity(anchor: arAnchor)
And here are only two AnchorEntity's cases available in RealityKit 2.0 for macOS:
// Fixed world position in VR scene
AnchorEntity(.world(transform: mtx))
// Camera transform
AnchorEntity(.camera)
Also it’s not superfluous to say that you can use any subclass of ARAnchor for AnchorEntity needs:
var anchor = AnchorEntity()
func session(_ session: ARSession, didAdd anchors: [ARAnchor]) {
guard let faceAnchor = anchors.first as? ARFaceAnchor
else { return }
arView.session.add(anchor: faceAnchor) // ARKit Session
self.anchor = AnchorEntity(anchor: faceAnchor)
anchor.addChild(model)
arView.scene.anchors.append(self.anchor) // RealityKit Scene
}
Reality Composer's anchors:
At the moment (February 2022) Reality Composer has just 4 types of AnchorEntities:
// 1a
AnchorEntity(plane: .horizontal)
// 1b
AnchorEntity(plane: .vertical)
// 2
AnchorEntity(.image(group: "GroupName", name: "forModel"))
// 3
AnchorEntity(.face)
// 4
AnchorEntity(.object(group: "GroupName", name: "forObject"))
AR USD Schemas
And of course, I should say a few words about preliminary anchors. There are 3 preliminary anchoring types (July 2022) for those who prefer Python scripting for USDZ models – these are plane, image and face preliminary anchors. Look at this code snippet to find out how to implement a schema pythonically.
def Cube "ImageAnchoredBox"(prepend apiSchemas = ["Preliminary_AnchoringAPI"])
{
uniform token preliminary:anchoring:type = "image"
rel preliminary: imageAnchoring:referenceImage = <ImageReference>
def Preliminary_ReferenceImage "ImageReference"
{
uniform asset image = #somePicture.jpg#
uniform double physicalWidth = 45
}
}
If you want to know more about AR USD Schemas, read this story on Meduim.
Visualizing AnchorEntity
Here's an example of how to visualize anchors in RealityKit (mac version).
import AppKit
import RealityKit
class ViewController: NSViewController {
#IBOutlet var arView: ARView!
var model = Entity()
let anchor = AnchorEntity()
fileprivate func visualAnchor() -> Entity {
let colors: [SimpleMaterial.Color] = [.red, .green, .blue]
for index in 0...2 {
let box: MeshResource = .generateBox(size: [0.20, 0.005, 0.005])
let material = UnlitMaterial(color: colors[index])
let entity = ModelEntity(mesh: box, materials: [material])
if index == 0 {
entity.position.x += 0.1
} else if index == 1 {
entity.transform = Transform(pitch: 0, yaw: 0, roll: .pi/2)
entity.position.y += 0.1
} else if index == 2 {
entity.transform = Transform(pitch: 0, yaw: -.pi/2, roll: 0)
entity.position.z += 0.1
}
model.scale *= 1.5
self.model.addChild(entity)
}
return self.model
}
override func awakeFromNib() {
anchor.addChild(self.visualAnchor())
arView.scene.addAnchor(anchor)
}
}
About ArAnchors in ARCore
At the end of my post, I would like to talk about four types of anchors that are used in ARCore 1.35 and higher. Google's official documentation says the following about anchors: "ArAnchor describes a fixed location and orientation in the real world". ARCore anchors work similarly to ARKit anchors.
Let's take a look at ArAnchors' types:
Local anchors
are stored with the app locally, and valid only for that instance of the app. The user must be physically at the location where they are placing the anchor. Anchor can be attached to Trackable or ARCore Session.
Cloud Anchors
are stored in Google Cloud and may be shared between app instances. The user must be physically at the location where they are placing the anchor. Cloud Anchors are anchors that are hosted in the cloud (thanks to the Persistent Cloud Anchors API), you can create a cloud anchor that can be resolved for 1 to 365 days after creation. They can be resolved by multiple users to establish a common frame of reference across users and their devices.
Geospatial anchors
are based on geodetic latitude, longitude, and altitude, plus Google's Visual Positioning System data, to provide precise location almost anywhere in the world. These anchors may be shared between app instances. The user may place an anchor from a remote location as long as the app is connected to the internet and able to use the VPS.
Terrain anchors
is rather a subtype of Geospatial anchor that allows you to place AR objects using only latitude and longitude, leveraging information from Google Maps to find the precise altitude above ground.
When anchoring objects in ARCore, make sure that they are close to the anchor you are using. Avoid placing objects farther than 8 meters from the anchor to prevent unexpected rotational movement due to ARCore's updates to world space coordinates. If you need to place an object more than eight meters away from an existing anchor, create a new anchor closer to this position and attach the object to the new anchor.
These Kotlin code snippets show you how to use a Geospatial anchor:
fun configureSession(session: Session) {
session.configure(
session.config.apply {
geospatialMode = Config.GeospatialMode.ENABLED
}
)
}
val earth = session?.earth ?: return
if (earth.trackingState != TrackingState.TRACKING) { return }
earthAnchor?.detach()
val altitude = earth.cameraGeospatialPose.altitude - 1
val qx = 0f; val qy = 0f; val qz = 0f; val qw = 1f
earthAnchor = earth.createAnchor(latLng.latitude,
latLng.longitude,
altitude,
qx, qy, qz, qw)

ARKit project point with previous device position

I'm combining ARKit with a CNN to constantly update ARKit nodes when they drift. So:
Get estimate of node position with ARKit and place a virtual object in the world
Use CNN to get its estimated 2D location of the object
Update node position accordingly (to refine it's location in 3D space)
The problem is that #2 takes 0,3s or so. Therefore I can't use sceneView.unprojectPoint because the point will correspond to a 3D point from the device's world position from #1.
How do I calculate the 3D vector from my old location to the CNN's 2D point?
unprojectPoint is just a matrix-math convenience function similar to those found in many graphics-oriented libraries (like DirectX, old-style OpenGL, Three.js, etc). In SceneKit, it's provided as a method on the view, which means it operates using the model/view/projection matrices and viewport the view currently uses for rendering. However, if you know how that function works, you can implement it yourself.
An Unproject function typically does two things:
Convert viewport coordinates (pixels) to the clip-space coordinate system (-1.0 to 1.0 in all directions).
Reverse the projection transform (assuming some arbitrary Z value in clip space) and the view (camera) transform to get to 3D world-space coordinates.
Given that knowledge, we can build our own function. (Warning: untested.)
func unproject(screenPoint: float3, // see below for Z depth hint discussion
modelView: float4x4,
projection: float4x4,
viewport: CGRect) -> float3 {
// viewport to clip: subtract viewport origin, divide by size,
// scale/offset from 0...1 to -1...1 coordinate space
let clip = (screenPoint - float3(viewport.x, viewport.y, 1.0))
/ float3(viewport.width, viewport.height, 1.0)
* float3(2) - float3(1)
// apply the reverse of the model-view-projection transform
let inversePM = (projection * modelView).inverse
let result = inversePM * float4(clip.x, clip.y, clip.z, 1.0)
return float3(result.x, result.y, result.z) / result.w // perspective divide
}
Now, to use it... The modelView matrix you pass to this function is the inverse of ARCamera.transform, and you can also get projectionMatrix directly from ARCamera. So, if you're grabbing a 2D position at one point in time, grab the camera matrices then, too, so that you can work backward to 3D as of that time.
There's still the issue of that "Z depth hint" I mentioned: when the renderer projects 3D to 2D it loses information (one of those D's, actually). So you have to recover or guess that information when you convert back to 3D — the screenPoint you pass in to the above function is the x and y pixel coordinates, plus a depth value between 0 and 1. Zero is closer to the camera, 1 is farther away. How you make use of that sort of depends on how the rest of your algorithm is designed. (At the very least, you can unproject both Z=0 and Z=1, and you'll get the endpoints of line segment in 3D, with your original point somewhere along that line.)
Of course, whether this can actually be put together with your novel CNN-based approach is another question entirely. But at least you learned some useful 3D graphics math!

How far is the far plane in an ARKit session?

So,
I have been using the unproject function on the SCNSceneRenderer:
public func unprojectPoint(_ point: SCNVector3) -> SCNVector3
When I want to unproject a screen point I pass in Z = 1.
To check on things I also placed a node in the scene at the unprojected vector position. Things seem to check out.
In the process I have wondered about how ARKit really handle the near and far plane.
The unprojected point on the far plane when logged gives me this, and this is when I point the camera (as straight as possible downtime -Z in world coordinates):
SCNVector3(x: 121.191811, y: -176.614227, z: -1111.88794)
Given that in ARKit the unit is meters, does -1111 mean that the far plane is about 1K away?
I am trying to understand how the near and far planes are positioned in an ARKit session, specifically, is the far plane at a fixed position, meaning, is it always at a fixed distance from the camera? Does it change? And is that about 1K meters seem to make sense?
The unprojectPoint function uses the same projection as the camera. If you want to know what the camera projection’s near and far planes are, ask the view for its pointOfView node, ask that node for its camera, and ask the camera for its zNear and zFar settings.

ARKIT ARCamera zFar

Anyone know how to change the zFar or ARKIT ARCamera?
Or get the current value of it.
I have a very large model thats being clipped. I think.
In blender I had same issue and fixed it by setting far value on Frustum.
I can create a projection matrix for each camera frame but cant set it.
func session(_ session: ARSession, cameraDidChangeTrackingState camera: ARCamera) {
textManager.showTrackingQualityInfo(for: camera.trackingState, autoHide: true)
let projectionMatrix: matrix_float4x4 = camera.projectionMatrix(withViewportSize: camera.viewport.size,
orientation: .portrait,
zNear: 0.1,
zFar: 5000)
//ERROR - readonly
camera.projectionMatrix = matrix_float4x4
...
ARCamera has nothing do to with rendering your 3D virtual content. As its docs say, it's just "information about the camera position and imaging characteristics for a captured video frame in an AR session." That is, it provides data that helps you set up whatever technology you're using to render (be that SceneKit, a custom renderer using Metal, etc).
The camera.projectionMatrix(...) method is part of that information-providing role — it uses what ARKit knows about the orientation of your physical device camera, plus the zNear and zFar values you provide, to construct a matrix you can use in your renderer.
If you're using SceneKit, you can pass that matrix to SCNCamera. (You probably need to convert from simd_float4x4 to SCNMatrix4.) If you're using some other renderer, you can use that matrix there.