Using iPhone TrueDepth sensor to detect a real face vs photo? - swift

How can I use the depth data captured using iPhone true-depth Camera to distinguish between a real human 3D face and a photograph of the same?
The requirement is to use it for authentication.
What I did: Created a sample app to get a continuous stream of AVDepthData of what is in front of the camera.

Theory
TrueDepth sensor lets iPhone X ... iPhone 14 generate a high quality ZDepth channel in addition to RGB channels that are captured through a regular selfie camera. ZDepth channel allows us visually make a difference whether it's a real human face or just a photo. In ZDepth channel, a human face is represented as a gradient, but photo has almost solid color because all pixels on a photo's plane are equidistant from camera.
AVFoundation
At the moment AVFoundation API has no Bool-type instance property allowing you to find out if it's a real face or a photo, but AVFoundation's capture subsystem provides you with AVDepthData class – a container for per-pixel distance data (depth map) captured by camera device. A depth map describes at each pixel the distance to an object, in meters.
#available(iOS 11.0, *)
open class AVDepthData: NSObject {
open var depthDataType: OSType { get }
open var depthDataMap: CVPixelBuffer { get }
open var isDepthDataFiltered: Bool { get }
open var depthDataAccuracy: AVDepthDataAccuracy { get }
}
A pixel buffer is capable of containing the depth data's per-pixel depth or disparity map.
var depthDataMap: CVPixelBuffer { get }
ARKit
ARKit heart is beating thanks to AVFoundation and CoreMotion sessions (in a certain extent it also uses Vision). Of course you can use this framework for Human Face detection but remember that ARKit is a computationally intensive module due to its "heavy metal" tracking subsystem. For a successful real face (not a photo) detection, use ARFaceAnchor allowing you to register head's motion and orientation at 60 fps and facial blendshapes allowing you to register user's facial expressions in real time.
Vision
Implement Apple Vision and CoreML techniques to recognize and classify a human face contained in CVPixelBuffer. But remember, you need ZDepth-to-RGB conversion in order to work with Apple Vision – AI / ML mobile frameworks don't work with Depth map data directly, at the moment. When you want to use RGBD data for authentication, and there will be just one or two users' faces to recognize, it considerably simplifies a task for Model Learning process. All you have to do is to create an mlmodel for Vision containing many variations of ZDepth facial images.
You can use Apple Create ML app for generating a lightweight and effective mlmodel files.
Useful links
Sample codes for detecting and classifying images using Vision you can find here and here. Also you can read this post to find out how to convert AVDepthData to regular RGB pattern.

You can make use of AVCaptureMetadataOutput and AVCaptureDepthdataOutput to detect face and then take the required action

Related

Facing issue of Real face detection in Vision Framework

I have faced the issue of real face detection using Vision Framework.
I have referred below apple link.
https://developer.apple.com/documentation/vision/tracking_the_user_s_face_in_real_time
I used demo code provided in above link. I see, Camera can detect the face from printed photo or passport photo. It is not real face photo. How can I know if this is not real face in camera using Vision framework?
You can use https://developer.apple.com/documentation/arkit/arfacegeometry
This will create a 3D mesh of a human face. A 3D mesh will have different values (e.g. vertices , triangleIndices), in its topology compared to a 2D picture.
Here is a project link
here I have used camera API for face detection and eye blinking. you can check and customize according to your requirement.
Update: Here is another project for liveness Check using MLKit link
Vision + RealityKit
Apple Vision framework has been processing "2D requests". It works only with RGB channels. If you need to process 3D surfaces you have to implement LiDAR scanner API, that based on Depth principles. It will allow you to distinguish between a photo and a real face. I think that Vision + RealityKit is the best choice for you, because you can detect a face (2D or 3D) at first stage in Vision, and then using LiDAR, it's quite easy to find out whether normals of polygonal faces are directed in the same direction (2D surface), or in different directions (3D head).

Model lost on uniform background surface with ARCamera (Vuforia, Unity)

I'm trying to use Vuforia in Unity to see a model in AR. It is working properly when I'm in a room with lost of different colors, but if I go in a room with one single color (example : white floor, white wall, no furniture), the model keeps disappearing. I'm using Extended tracking with Prediction enabled.
Is there a way to keep the model on screen whatever the background seen by webcam?
Is there a way to keep the model on screen whatever the background seen by webcam??
I am afraid this is not possible. Since vuforia uses Markerless Tracking it requires high contrast on the points.
Since most of AR SDKs only use a monocular RGB camera (not RGB-Depth), they rely on computer vision techniques to calculate missing depth information. It means extracting visual distinct feature points and locating device using estimated distance to these feature points over several frames while you move.
However, they also leverage from sensor fusion which means they combine data gathered from camera and the data from IMU unit(sensors) of the device. Unfortunately, this data is mainly used for complementing when motion tracking fails in situations like excessive motion(when camera image is blurred). Therefore, sensor data itself is not reliable which is the case when you walk into a room where there are no distinctive points to extract.
The only way you can solve this is by placing several image targets in that room. That will allow Vuforia to calculate device position in 3D space. Otherwise this is not possible.
You can also refer to SLAM for more information.

Scanning Real-World Object and generating 3D Mesh from it

ARKit app allows us to create an ARReferenceObject, and using it, we can reliably recognize the position and orientation of the real-world objects. But also we can save the finished .arobject file.
However, ARReferenceObject contains only the spatial features information needed for ARKit to recognize the real-world object, and is not a displayable 3D reconstruction of that object.
func createReferenceObject(transform: simd_float4x4,
center: simd_float3,
extent: simd_float3,
completionHandler: (ARReferenceObject?, Error?) -> Void)
My question:
Is there a method that allows us to reconstruct digital 3D geometry (low-poly or high-poly) from the .arobject file using Poisson Surface Reconstruction or Photogrammetry?
RealityKit 2.0 | Object Capture API
Object Capture API, announced at WWDC 2021, provides you with the long-awaited tools for photogrammetry. At the output we get USDZ model with a hi-res texture.
Read about photogrammetry HERE.
ARKit | Mesh Reconstruction
Using iOS device with LiDAR and ARKit 3.5/4.0/5.0 you can easily reconstruct a topological map of surrounding environment. Scene Reconstruction feature starts working immediately after launching a current ARSession.
Apple LiDAR works within 5 meters range. A scanner can help you improve a quality of ZDepth channel, and such features as People/Real World Objects Occlusion, Motion Tracking, Immediate Physics Contact Body and Raycasting.
Other awesome peculiarities of LiDAR scanner are:
you can use your device in a poorly lit room
you can track a pure white walls with no features at all
you can detect a planes almost instantaneously
Consider that a quality of a scanned object when you're using LiDAR isn't as good as you expect. Small details are not scanned. That's because a resolution of an Apple LiDAR isn't high enough.
You answered your own question with a quote from Apple's documentation:
An ARReferenceObject contains only the spatial feature information needed for ARKit to recognize the real-world object, and is not a displayable 3D reconstruction of that object.
If you run that sample code, you can see for yourself the visualizations it creates of the reference object during scanning and after a test recognition — it's just a sparse 3D point cloud. There's certainly no photogrammetry in what Apple's API provides you, and there'd not much to go on in terms of recovering realistic structure in a mesh.
That's not to say that such efforts are impossible — there have been some third parties demoing Here photogrammetry experiments based on top of ARKit. But
1. that's not using ARKit 2 object scanning, just the raw pixel buffer and feature points from ARFrame.
2. the level of extrapolation in those demos would require
non-trivial original R&D, as it's far beyond the kind of information
ARKit itself supplies.

Augmented Reality for iOS and swift front facing camera applications

I am developing an app that needs to use the front facing camera of the iPhone for Augmented Reality experience using swift. I have tried to use the ARKit, but the front facing camera made by the ARKit is only supported for iPhone X.
So, which frameworks or libraries that I can use with swift to develop apps that has AR experience especially fro front facing camera, other than ARKit?
ARKit isn't the only way possible to create "AR" experiences on iOS, nor is it the only way that Apple permits creating "AR" in the App Store.
If you define "front-facing-camera AR" as something like "uses front camera, detects faces, allows placing virtual 2D/3D content overlays that appear to stay attached to the face", there are any number of technologies one could use. Apps like Snapchat have been doing this kind of "AR" since before ARKit existed, using technology they've either developed in-house or licensed from third parties. How you do it and how well it works depends on the technology you use. ARKit guarantees a certain precision of results by requiring a front-facing depth camera.
It's entirely possible to develop an app that uses ARKit for face tracking on TrueDepth devices and a different technology for other devices. For example, looking only at what you can do "out of the box" with Apple's SDK, there's the Vision framework, which locates and tracks faces in 2D. There's probably a few third party libraries out there, too... or you could go looking through academic journals, since face detection/tracking is a pretty active area of computer vision research.
ARKit 2.0
TrueDepth front-facing camera of iPhone X/Xr/Xs gives you Depth channel at 15 fps frame rate plus Image front-facing camera gives you RGB channels at 60 fps frame rate.
Principle of work: It's like Depth Sensing System in MS Xbox Kinect but more powerful. Infrared emitter projects over 30,000 dots in a known pattern onto the user’s face. Those dots are then photographed by a dedicated infrared camera for analysis. There is a proximity sensor, presumably so that the system knows when a user is close enough to activate. An ambient light sensor helps the system set output light levels.
At the moment only iPhone X/Xr/Xs models have TrueDepth Camera. If you don't have TrueDepth Camera and Sensor System in your iPhone (just like iPhone SE, iPhone 6s, iPhone 7 and iPhone 8 don't have) you cannot use your gadget for such features as Animoji, Face ID, or Depth Occlusion Effects.
In ARKit 2.0 framework a configuration that tracks the movement and expressions of the user’s face with the TrueDepth camera uses special class ARFaceTrackingConfiguration.
So, the answer is NO, you can use the front-facing camera of iPhones with A11 and A12 chipset (or higher version), or iPhone with TrueDepth Camera and its Sensor System.
ARKit 3.0 (addition)
Now ARKit allows you simultaneously track a surrounding environment with back camera and track you face with front camera. Also, you can track up to 3 faces at a time.
Here's a two code snippets how to setup you configuration.
First scenario:
let configuration = ARWorldTrackingConfiguration()
if configuration.supportsUserFaceTracking {
configuration.userFaceTrackingEnabled = true
}
session.run(configuration)
func session(_ session: ARSession, didAdd anchors: [ARAnchor]) {
for anchor in anchors where anchor is ARFaceAnchor {
// you code here...
}
}
Second scenario:
let configuration = ARFaceTrackingConfiguration()
if configuration.supportsWorldTracking {
configuration.worldTrackingEnabled = true
}
session.run(configuration)
func session(_ session: ARSession, didUpdate frame: ARFrame) {
let transform = frame.camera.transform
// you code here...
}

How to find contours of soccer player in dynamic background

My project is a designation for a system which analyze soccer videos. In a part of this project I need to detect contours of players and everybody in the play field. For all players which don’t have occlusion with the advertisement billboards, I have used the color of play field (green) to detect contours and extract players. But I have problem with the situation that players or referee have occlusion with the advertisement billboards. Suppose a situation that advertisements on the billboards are dynamic (LED billboards). As you know in this situation finding the contours is more difficult because there is no static background color or texture. You can see two example of this condition in the following images.
NOTE: in order to find the position of occlusion, I use the region between the field line and advertisement billboards, because this region has the color of the field (green). This region is shown by a red rectangular in the following image.
I expect the result be similar to the following image.
Could anyone suggest an algorithm to detect these contours?
You can try several things.
Use vision.PeopleDetector object to detect people on the field. You can also track the detected people using vision.KalmanFilter, as in the Tracking Pedestrians from a Moving Car example.
Use vision.OpticalFlow object in the Computer Vision System Toolbox to compute optical flow. You can then analyze the resulting flow field to separate camera motion from the player motion.
Use frame differencing to detect moving objects. The good news is that that will give you the contours of the people. The bad news is that it will give you many spurious contours as well.
Optical Flow would work for such problems as it captures motion information. Foreground extraction techniques using HMM or GMM or non-parametric may solve the problem as I have used for motion analysis in surveillance videos to detect anomaly (Background was static). Magnitude and orientation of optical flow seems to be an effective method. I have read papers on segmentation using optical flow. I hope this may help you.