iPhone TrueDepth front camera innacurate face tracking - skewed transformation - iphone

I am using an app that was developed using the ARKit framework. More specifically, I am interested in the 3D facial mesh and the face orientation and position with respect to the phone's front camera.
Having said that I record video with subjects performing in front of the front camera. During these recordings, I have noticed that some videos resulted in inaccurate transformations with the face being placed in the back of the camera whereas the rotation being skewed (not orthogonal basis).
I do not have a deep understanding of how the TrueDepth camera combines all its sensors to track and reconstruct the 3D facial structure and so I do not know what could potentially cause this issue. Although I have experimented with different setups e.g. different subjects, with and without a mirror, screen on and off, etc. I still have not been able to identify the source of the inaccurate transformation. Could it be a combination of the camera angle interfering with the mirror?
Below I attached two recordings of myself that resulted in incorrect (above) and correct (below) estimated transformations.
Do you have any idea of what might be the problem? Thank you in advance.

Related

How to generate surface/plane around a real world Object (Like bottle) using Unity & ARCore?

I built an apk using the HelloAR scene (which is provided with ARcore package). The app is only detecting Horizontal surface like table and creates it's own semi-transparent plane over it. When I moved my phone around a bottle, the app again, only created a horizontal plane cutting through the bottle. I expected ARCore to create planes along the bottle as I move my phone around, like polygons in a mesh.
Another scenario is, I placed 2 books on the floor, and each of them have different thickness. But the HelloAR app creates only one semi-transparent horizontal surface over the thicker book, instead of creating two surfaces (one for each book).
What is going wrong here? How can I fix it and make the HelloAR app work more precisely? Please help.
Software: Unity v2018.2,
ARcore v1.11.0
ARCore generates an approximate point cloud using a soft movement of the device to identify the featured points, this points are detected by contrast in the different shapes, if you use your application in test mode in unity you can see how the points are placed in your empty scene.
Once the program has enough points at the "same height" (I don't know the exact precision), it generates the plane that you can see, but it won't detect planes separated by a difference of 5cm or even more distance.
If you want to know the approximate accuracy of the app, test it with unity and make a script to capture the generated points that have been used to generate the planes, then check the Y difference to see which is the tolerance distance.
Okay so Vuforia is currently one of the leading SDKs for augmented reality providing a wide area of detection options (Images, Ground, Point, 3D objects, ...)
So regarding your question about detecting a bottle I would most certainly use the 3D model detection feature. You can read the official docs here.
You need to first generate an approximate of the object in a 3d modeling software and the use their program to generate the detection model. Then you put this in Unity and setup the detection. (no coding needed)
I have some experience with this kind of detection. I used it to detect a large 2mx2m scale model of an electric vehicle. It works great, you can walk around it and it tracks it through and through. You can see a short official demo here
Hope it helped to explain this in short!

How to simulate multiple sensors by having a 3D object render differently in different cameras

I am trying to work out how to have objects render differently in different cameras.
I have a situation where we have the visible light rendering in the main camera, but the player-controlled objects may have multiple sensors, each represented by a camera.
For example, we may have:
An IR camera which sees a light emitted by the target with a colour based upon the object's temperature, on the IR layer
A radar, which has its own directional light, and sees only that which is the same colour as its own light on the RF layer, and would basically be a rotating vertical slit camera.
A sound sensor, which "sees" the "light" directly emitted by the target object, as well as that reflected off other hard surfaces, on the sound layer.
Radio direction finders, which see all colours of light on the RF layer
An IFF/identification sensor, which sees barcodes on the body of moving objects (The ability of the 2D barcode to be read simulates shape identification with fewer processing resources than a neural network while maintaining uncertainty as to the identity of a newly-seen object until analysed)
We may also have various sensor tricks such as radar ECM, which would be simulated by having false objects placed in the field of view of the "radar camera".
The output from these sensors may be viewed by the players, but is intended to be used by player-implemented AI routines.
My scenario is 3D, but it could also be applied to 2D.
So...
What would be the best way of handling all these different cameras and renderings? I would like to make the game objects as simple as possible, and minimise coding and CPU overhead.
Some renderings will have to change according to whatever the target object is doing - moving or firing a weapon may generate heat and sound, IFF would be complicated by cover, radar could be complicated by radar-absorbent materials, jamming, and shapes that reduce radar cross-section, and so-on.
Edit
These different renderings would need to be available simultaneously, as if looking at the same scene through all of the sensors, and each sensor camera would need to render appropriately.
edit
Given your comment below, it appears to me that the question you intended to ask is somewhat unclear.
To answer the comment:
Set up as many camera as you need, and have each one set to "render to texture".
You will need to create a RenderTexture for each camera.
Now each camera will feed its data in real time to a texture, and you can do with those textures whatever you like.
edit
A simple solution to a handful of the different sensor results you're looking for is to find some shaders that produce the desired effect.
Here are some shaders you could try:
https://github.com/paganini24/ShaderExperiment
The sonic sensor I imagine would be much more difficult to create, but I think a good head start would be to explore Unity's AudioSource and AudioPlayer components. If you can get the volume/pitch of sound bring output and couple that with the position it is emitted from, you could map it to a full screen UI texture.
Not a hard solution to the sonic sensor, but hopefully that points you in the right direction.

Model lost on uniform background surface with ARCamera (Vuforia, Unity)

I'm trying to use Vuforia in Unity to see a model in AR. It is working properly when I'm in a room with lost of different colors, but if I go in a room with one single color (example : white floor, white wall, no furniture), the model keeps disappearing. I'm using Extended tracking with Prediction enabled.
Is there a way to keep the model on screen whatever the background seen by webcam?
Is there a way to keep the model on screen whatever the background seen by webcam??
I am afraid this is not possible. Since vuforia uses Markerless Tracking it requires high contrast on the points.
Since most of AR SDKs only use a monocular RGB camera (not RGB-Depth), they rely on computer vision techniques to calculate missing depth information. It means extracting visual distinct feature points and locating device using estimated distance to these feature points over several frames while you move.
However, they also leverage from sensor fusion which means they combine data gathered from camera and the data from IMU unit(sensors) of the device. Unfortunately, this data is mainly used for complementing when motion tracking fails in situations like excessive motion(when camera image is blurred). Therefore, sensor data itself is not reliable which is the case when you walk into a room where there are no distinctive points to extract.
The only way you can solve this is by placing several image targets in that room. That will allow Vuforia to calculate device position in 3D space. Otherwise this is not possible.
You can also refer to SLAM for more information.

AR Overlay Accuracy in Google Project Tango

I am experimenting with overlaying augmented reality objects over a pass-through image from the rear camera in Unity.
Has anyone experimented with overlaying objects with accurate tracking? I've tweaked the movement scale to get somewhat decent results but rotation is still not accurate and drift is a big issue.
I've had good luck with the augmented reality sample that ships with the latest tango. in my experience it does work the way you speculated where if you add items to the unity scene they are synced to motion detected by the device.
I believe the tracking and syncing function have improved since you asked this question originally because I've noticed an improvement since I got my tango devkit a month or so ago. there was an update a week or so later, with an immediate improvement.
I have found that some scenes track better than others, it seems to help for there to be additional scenery for it to track. in my workspace, a fairly cluttered apartment, it tracks well but in the neighboring identical apartment unit which is currently vacant and empty, it does not track as well. that could also be a product of the blinds hanging up in my unit that are not hanging up in the vacant unit, filtering out additional infrared.
I'm experimenting with placing 3D objects over the real time input from the Tango color camera.
One problem here is that the hardware color camera 'point' in a (strange) direction. I wasn't able to get the direction vector from the api until now. Your virtual camera for rendering the scene needs this rotation to render 3D objects properly.
There are augmented reality examples of Tango's Unity plugin:
https://developers.google.com/tango/apis/unity/unity-simple-ar
They solve this problem with a matrix that rotates the 3d camera.
It can be found in the Unity script "TangoARPoseController" (C#) that, when attached to a unity camera, rotates it so that it looks at the scene in the right direction. The matrix is obtained in the method "SetCameraExtrinsics" of that script.
Unfortunately, when I apply the matrix to my unity scene it does not produce a perfect overlay (actually it's quiet bad). But I have other sources of position input which may be the problem here.
However, until now I'm not sure if the matrix used in the examples is good enough for accurate ar overlays. Maybe it is just suitable for demonstration purposes. But it should be a good starting point for further investigation.
Are we talking about displaying the 'webcam' in the background as opposed to a skybox ?
Take a look at my GhostHunter repo. It includes a shader and a script for displaying the rear facing camera 'behind' the gameplay objects (like the skybox). It should be useable with Tango and it is better than the 'display on a mesh' technique I`ve seen others used.
https://github.com/NVentimiglia/Augmented-Reality-Ghost-Hunter

quartz 2d / openGl / cocos2d image distortion in iphone by moving vertices for 2.5d iphone game

We are trying to achieve the following in an iphone game:
Using 2d png files, set-up a scene that seems 3d. As the user moves the device, the individual png files would warp/distort accordingly to give the effect of depth.
example of a scene: an empty room, 5 walls and a chair in the middle. = 6 png files layered.
We have successfully accomplished this using native functions like skew and scale. By applying transformations to the various walls and the chair, as the device is tilted moved, the walls would skew/scale/translate . However, the problem is since we are using 6 png files, the edges dont meet as we move the device. We need a new solution using a real engine.
Question:
we are thinking of instead of applying skew/scale transformations, that if given the freedom to move the vertices of the rectangular images, we could precisly distort images and keep all the edges 100% aligned.
What is the best framework to do this in the LEAST amount of time? Are we going about this the correct way?
You should be able to achieve this effect (at least in regards to the perspective being applied to the walls) using Core Animation layers and appropriate 3-D transforms.
A good example of constructing a scene like this can be found in the example John Blackburn provides here. He shows how to set up layers to represent the walls in a maze by applying the appropriate rotation and translation to them, then gives the scene perspective by using the trick of altering the m34 component of the CATransform3D for the scene.
I'm not sure how well your flat chair would look using something like this, but certainly you can get your walls to have a nice perspective to them. Using layers and Core Animation would let you pull off what you want using far less code than implementing this using OpenGL ES.
Altering the camera angle is as simple as rotating the scene in response to shifts in the orientation of the device.
If you're going to the effort of warping textures as they would be warped in a 3D scene, then why not let the graphics hardware do the hard work for you by mapping the textures to 3D polygons, then changing your projection or moving polygons around?
I doubt you could do it faster by restricting yourself to 2D transformations --- the hardware is geared up to do 3x3 (well, 4x4 homogenous) matrix multiplication.