Scanning Real-World Object and generating 3D Mesh from it - swift

ARKit app allows us to create an ARReferenceObject, and using it, we can reliably recognize the position and orientation of the real-world objects. But also we can save the finished .arobject file.
However, ARReferenceObject contains only the spatial features information needed for ARKit to recognize the real-world object, and is not a displayable 3D reconstruction of that object.
func createReferenceObject(transform: simd_float4x4,
center: simd_float3,
extent: simd_float3,
completionHandler: (ARReferenceObject?, Error?) -> Void)
My question:
Is there a method that allows us to reconstruct digital 3D geometry (low-poly or high-poly) from the .arobject file using Poisson Surface Reconstruction or Photogrammetry?

RealityKit 2.0 | Object Capture API
Object Capture API, announced at WWDC 2021, provides you with the long-awaited tools for photogrammetry. At the output we get USDZ model with a hi-res texture.
Read about photogrammetry HERE.
ARKit | Mesh Reconstruction
Using iOS device with LiDAR and ARKit 3.5/4.0/5.0 you can easily reconstruct a topological map of surrounding environment. Scene Reconstruction feature starts working immediately after launching a current ARSession.
Apple LiDAR works within 5 meters range. A scanner can help you improve a quality of ZDepth channel, and such features as People/Real World Objects Occlusion, Motion Tracking, Immediate Physics Contact Body and Raycasting.
Other awesome peculiarities of LiDAR scanner are:
you can use your device in a poorly lit room
you can track a pure white walls with no features at all
you can detect a planes almost instantaneously
Consider that a quality of a scanned object when you're using LiDAR isn't as good as you expect. Small details are not scanned. That's because a resolution of an Apple LiDAR isn't high enough.

You answered your own question with a quote from Apple's documentation:
An ARReferenceObject contains only the spatial feature information needed for ARKit to recognize the real-world object, and is not a displayable 3D reconstruction of that object.
If you run that sample code, you can see for yourself the visualizations it creates of the reference object during scanning and after a test recognition — it's just a sparse 3D point cloud. There's certainly no photogrammetry in what Apple's API provides you, and there'd not much to go on in terms of recovering realistic structure in a mesh.
That's not to say that such efforts are impossible — there have been some third parties demoing Here photogrammetry experiments based on top of ARKit. But
1. that's not using ARKit 2 object scanning, just the raw pixel buffer and feature points from ARFrame.
2. the level of extrapolation in those demos would require
non-trivial original R&D, as it's far beyond the kind of information
ARKit itself supplies.

Related

ARKit shows a map on floor/bottom of a screen

Via ARKit, I want to place indoor map on floor.
Currently I tried 2 things:
I've placed large Plane below camera and above floor, But it causes quite drift. Does not move well when we walk, and overall experience is not overwhelming.
Saw a solution where you can identify horizontal plane, but it has its own issues.
So is it really possible with good results?
Devices with LiDAR
The LiDAR scanner has its advantages and disadvantages. The main advantage of LiDAR is its ability to almost instantly reconstruct floor and walls, then you can easily attach any 3D model to the resulted surface – a model will be stable, it will not drift, so a user's AR experience will be overwhelming, as you said. Also, an important advantage of LiDAR is the excellent performance in environment with poor lighting and with poor textures.
Here you can read about Occlusion feature and some of the LiDAR peculiarities. Good news: LiDAR perfectly works in conjunction with the Plane Detection option.
ARKit subdivides the reconstructed scene into ARMeshAnchors which give you access to polygonal geometry and surface classification.
ARMeshAnchor().geometry.classification
ARMeshAnchor().geometry.faces
ARMeshAnchor().geometry.vertices
ARMeshAnchor().transform.columns.3
Devices without LiDAR
In the absence of a LiDAR scanner, we can only detect horizontal and vertical surfaces using the Plane Detection feature. I can say that all AR frameworks (including ARKit and RealityKit) are much better and faster in defining horizontal surfaces, as opposed to vertical ones.
However, Detected Planes are less stable compared to Reconstructed Surfaces, and therefore, a slight drifting is possible sometimes. To successfully complete the Plane Detection stage, you need a well-lit room and good-for-tracking surrounding objects' textures.
ARKit calls your delegate's renderer(_:didAdd:for:) with a ARPlaneAnchor for each unique vertical and/or horizontal surface. And each plane anchor provides details about the surface – its world position, dimensions and real-world surfaces' classification.
In addition to the above, the delegate method called renderer(_:didUpdate:for:) is required to merge multiple coplanar Detected Planes into bigger resulting Detected Plane (a surface of a floor, for example).
ARPlaneAnchor().classification
ARPlaneAnchor().extent
ARPlaneAnchor().alignment
ARPlaneAnchor().center
Is it really possible with good results?
Yes, in both cases, it's possible to attach a map without drifting – whether you're using Plane Detection or Scene Reconstruction.

Facing issue of Real face detection in Vision Framework

I have faced the issue of real face detection using Vision Framework.
I have referred below apple link.
https://developer.apple.com/documentation/vision/tracking_the_user_s_face_in_real_time
I used demo code provided in above link. I see, Camera can detect the face from printed photo or passport photo. It is not real face photo. How can I know if this is not real face in camera using Vision framework?
You can use https://developer.apple.com/documentation/arkit/arfacegeometry
This will create a 3D mesh of a human face. A 3D mesh will have different values (e.g. vertices , triangleIndices), in its topology compared to a 2D picture.
Here is a project link
here I have used camera API for face detection and eye blinking. you can check and customize according to your requirement.
Update: Here is another project for liveness Check using MLKit link
Vision + RealityKit
Apple Vision framework has been processing "2D requests". It works only with RGB channels. If you need to process 3D surfaces you have to implement LiDAR scanner API, that based on Depth principles. It will allow you to distinguish between a photo and a real face. I think that Vision + RealityKit is the best choice for you, because you can detect a face (2D or 3D) at first stage in Vision, and then using LiDAR, it's quite easy to find out whether normals of polygonal faces are directed in the same direction (2D surface), or in different directions (3D head).

How to generate surface/plane around a real world Object (Like bottle) using Unity & ARCore?

I built an apk using the HelloAR scene (which is provided with ARcore package). The app is only detecting Horizontal surface like table and creates it's own semi-transparent plane over it. When I moved my phone around a bottle, the app again, only created a horizontal plane cutting through the bottle. I expected ARCore to create planes along the bottle as I move my phone around, like polygons in a mesh.
Another scenario is, I placed 2 books on the floor, and each of them have different thickness. But the HelloAR app creates only one semi-transparent horizontal surface over the thicker book, instead of creating two surfaces (one for each book).
What is going wrong here? How can I fix it and make the HelloAR app work more precisely? Please help.
Software: Unity v2018.2,
ARcore v1.11.0
ARCore generates an approximate point cloud using a soft movement of the device to identify the featured points, this points are detected by contrast in the different shapes, if you use your application in test mode in unity you can see how the points are placed in your empty scene.
Once the program has enough points at the "same height" (I don't know the exact precision), it generates the plane that you can see, but it won't detect planes separated by a difference of 5cm or even more distance.
If you want to know the approximate accuracy of the app, test it with unity and make a script to capture the generated points that have been used to generate the planes, then check the Y difference to see which is the tolerance distance.
Okay so Vuforia is currently one of the leading SDKs for augmented reality providing a wide area of detection options (Images, Ground, Point, 3D objects, ...)
So regarding your question about detecting a bottle I would most certainly use the 3D model detection feature. You can read the official docs here.
You need to first generate an approximate of the object in a 3d modeling software and the use their program to generate the detection model. Then you put this in Unity and setup the detection. (no coding needed)
I have some experience with this kind of detection. I used it to detect a large 2mx2m scale model of an electric vehicle. It works great, you can walk around it and it tracks it through and through. You can see a short official demo here
Hope it helped to explain this in short!

Using iPhone TrueDepth sensor to detect a real face vs photo?

How can I use the depth data captured using iPhone true-depth Camera to distinguish between a real human 3D face and a photograph of the same?
The requirement is to use it for authentication.
What I did: Created a sample app to get a continuous stream of AVDepthData of what is in front of the camera.
Theory
TrueDepth sensor lets iPhone X ... iPhone 14 generate a high quality ZDepth channel in addition to RGB channels that are captured through a regular selfie camera. ZDepth channel allows us visually make a difference whether it's a real human face or just a photo. In ZDepth channel, a human face is represented as a gradient, but photo has almost solid color because all pixels on a photo's plane are equidistant from camera.
AVFoundation
At the moment AVFoundation API has no Bool-type instance property allowing you to find out if it's a real face or a photo, but AVFoundation's capture subsystem provides you with AVDepthData class – a container for per-pixel distance data (depth map) captured by camera device. A depth map describes at each pixel the distance to an object, in meters.
#available(iOS 11.0, *)
open class AVDepthData: NSObject {
open var depthDataType: OSType { get }
open var depthDataMap: CVPixelBuffer { get }
open var isDepthDataFiltered: Bool { get }
open var depthDataAccuracy: AVDepthDataAccuracy { get }
}
A pixel buffer is capable of containing the depth data's per-pixel depth or disparity map.
var depthDataMap: CVPixelBuffer { get }
ARKit
ARKit heart is beating thanks to AVFoundation and CoreMotion sessions (in a certain extent it also uses Vision). Of course you can use this framework for Human Face detection but remember that ARKit is a computationally intensive module due to its "heavy metal" tracking subsystem. For a successful real face (not a photo) detection, use ARFaceAnchor allowing you to register head's motion and orientation at 60 fps and facial blendshapes allowing you to register user's facial expressions in real time.
Vision
Implement Apple Vision and CoreML techniques to recognize and classify a human face contained in CVPixelBuffer. But remember, you need ZDepth-to-RGB conversion in order to work with Apple Vision – AI / ML mobile frameworks don't work with Depth map data directly, at the moment. When you want to use RGBD data for authentication, and there will be just one or two users' faces to recognize, it considerably simplifies a task for Model Learning process. All you have to do is to create an mlmodel for Vision containing many variations of ZDepth facial images.
You can use Apple Create ML app for generating a lightweight and effective mlmodel files.
Useful links
Sample codes for detecting and classifying images using Vision you can find here and here. Also you can read this post to find out how to convert AVDepthData to regular RGB pattern.
You can make use of AVCaptureMetadataOutput and AVCaptureDepthdataOutput to detect face and then take the required action

Model lost on uniform background surface with ARCamera (Vuforia, Unity)

I'm trying to use Vuforia in Unity to see a model in AR. It is working properly when I'm in a room with lost of different colors, but if I go in a room with one single color (example : white floor, white wall, no furniture), the model keeps disappearing. I'm using Extended tracking with Prediction enabled.
Is there a way to keep the model on screen whatever the background seen by webcam?
Is there a way to keep the model on screen whatever the background seen by webcam??
I am afraid this is not possible. Since vuforia uses Markerless Tracking it requires high contrast on the points.
Since most of AR SDKs only use a monocular RGB camera (not RGB-Depth), they rely on computer vision techniques to calculate missing depth information. It means extracting visual distinct feature points and locating device using estimated distance to these feature points over several frames while you move.
However, they also leverage from sensor fusion which means they combine data gathered from camera and the data from IMU unit(sensors) of the device. Unfortunately, this data is mainly used for complementing when motion tracking fails in situations like excessive motion(when camera image is blurred). Therefore, sensor data itself is not reliable which is the case when you walk into a room where there are no distinctive points to extract.
The only way you can solve this is by placing several image targets in that room. That will allow Vuforia to calculate device position in 3D space. Otherwise this is not possible.
You can also refer to SLAM for more information.