Model lost on uniform background surface with ARCamera (Vuforia, Unity) - unity3d

I'm trying to use Vuforia in Unity to see a model in AR. It is working properly when I'm in a room with lost of different colors, but if I go in a room with one single color (example : white floor, white wall, no furniture), the model keeps disappearing. I'm using Extended tracking with Prediction enabled.
Is there a way to keep the model on screen whatever the background seen by webcam?

Is there a way to keep the model on screen whatever the background seen by webcam??
I am afraid this is not possible. Since vuforia uses Markerless Tracking it requires high contrast on the points.
Since most of AR SDKs only use a monocular RGB camera (not RGB-Depth), they rely on computer vision techniques to calculate missing depth information. It means extracting visual distinct feature points and locating device using estimated distance to these feature points over several frames while you move.
However, they also leverage from sensor fusion which means they combine data gathered from camera and the data from IMU unit(sensors) of the device. Unfortunately, this data is mainly used for complementing when motion tracking fails in situations like excessive motion(when camera image is blurred). Therefore, sensor data itself is not reliable which is the case when you walk into a room where there are no distinctive points to extract.
The only way you can solve this is by placing several image targets in that room. That will allow Vuforia to calculate device position in 3D space. Otherwise this is not possible.
You can also refer to SLAM for more information.

Related

iPhone TrueDepth front camera innacurate face tracking - skewed transformation

I am using an app that was developed using the ARKit framework. More specifically, I am interested in the 3D facial mesh and the face orientation and position with respect to the phone's front camera.
Having said that I record video with subjects performing in front of the front camera. During these recordings, I have noticed that some videos resulted in inaccurate transformations with the face being placed in the back of the camera whereas the rotation being skewed (not orthogonal basis).
I do not have a deep understanding of how the TrueDepth camera combines all its sensors to track and reconstruct the 3D facial structure and so I do not know what could potentially cause this issue. Although I have experimented with different setups e.g. different subjects, with and without a mirror, screen on and off, etc. I still have not been able to identify the source of the inaccurate transformation. Could it be a combination of the camera angle interfering with the mirror?
Below I attached two recordings of myself that resulted in incorrect (above) and correct (below) estimated transformations.
Do you have any idea of what might be the problem? Thank you in advance.

How to generate surface/plane around a real world Object (Like bottle) using Unity & ARCore?

I built an apk using the HelloAR scene (which is provided with ARcore package). The app is only detecting Horizontal surface like table and creates it's own semi-transparent plane over it. When I moved my phone around a bottle, the app again, only created a horizontal plane cutting through the bottle. I expected ARCore to create planes along the bottle as I move my phone around, like polygons in a mesh.
Another scenario is, I placed 2 books on the floor, and each of them have different thickness. But the HelloAR app creates only one semi-transparent horizontal surface over the thicker book, instead of creating two surfaces (one for each book).
What is going wrong here? How can I fix it and make the HelloAR app work more precisely? Please help.
Software: Unity v2018.2,
ARcore v1.11.0
ARCore generates an approximate point cloud using a soft movement of the device to identify the featured points, this points are detected by contrast in the different shapes, if you use your application in test mode in unity you can see how the points are placed in your empty scene.
Once the program has enough points at the "same height" (I don't know the exact precision), it generates the plane that you can see, but it won't detect planes separated by a difference of 5cm or even more distance.
If you want to know the approximate accuracy of the app, test it with unity and make a script to capture the generated points that have been used to generate the planes, then check the Y difference to see which is the tolerance distance.
Okay so Vuforia is currently one of the leading SDKs for augmented reality providing a wide area of detection options (Images, Ground, Point, 3D objects, ...)
So regarding your question about detecting a bottle I would most certainly use the 3D model detection feature. You can read the official docs here.
You need to first generate an approximate of the object in a 3d modeling software and the use their program to generate the detection model. Then you put this in Unity and setup the detection. (no coding needed)
I have some experience with this kind of detection. I used it to detect a large 2mx2m scale model of an electric vehicle. It works great, you can walk around it and it tracks it through and through. You can see a short official demo here
Hope it helped to explain this in short!

How to simulate multiple sensors by having a 3D object render differently in different cameras

I am trying to work out how to have objects render differently in different cameras.
I have a situation where we have the visible light rendering in the main camera, but the player-controlled objects may have multiple sensors, each represented by a camera.
For example, we may have:
An IR camera which sees a light emitted by the target with a colour based upon the object's temperature, on the IR layer
A radar, which has its own directional light, and sees only that which is the same colour as its own light on the RF layer, and would basically be a rotating vertical slit camera.
A sound sensor, which "sees" the "light" directly emitted by the target object, as well as that reflected off other hard surfaces, on the sound layer.
Radio direction finders, which see all colours of light on the RF layer
An IFF/identification sensor, which sees barcodes on the body of moving objects (The ability of the 2D barcode to be read simulates shape identification with fewer processing resources than a neural network while maintaining uncertainty as to the identity of a newly-seen object until analysed)
We may also have various sensor tricks such as radar ECM, which would be simulated by having false objects placed in the field of view of the "radar camera".
The output from these sensors may be viewed by the players, but is intended to be used by player-implemented AI routines.
My scenario is 3D, but it could also be applied to 2D.
So...
What would be the best way of handling all these different cameras and renderings? I would like to make the game objects as simple as possible, and minimise coding and CPU overhead.
Some renderings will have to change according to whatever the target object is doing - moving or firing a weapon may generate heat and sound, IFF would be complicated by cover, radar could be complicated by radar-absorbent materials, jamming, and shapes that reduce radar cross-section, and so-on.
Edit
These different renderings would need to be available simultaneously, as if looking at the same scene through all of the sensors, and each sensor camera would need to render appropriately.
edit
Given your comment below, it appears to me that the question you intended to ask is somewhat unclear.
To answer the comment:
Set up as many camera as you need, and have each one set to "render to texture".
You will need to create a RenderTexture for each camera.
Now each camera will feed its data in real time to a texture, and you can do with those textures whatever you like.
edit
A simple solution to a handful of the different sensor results you're looking for is to find some shaders that produce the desired effect.
Here are some shaders you could try:
https://github.com/paganini24/ShaderExperiment
The sonic sensor I imagine would be much more difficult to create, but I think a good head start would be to explore Unity's AudioSource and AudioPlayer components. If you can get the volume/pitch of sound bring output and couple that with the position it is emitted from, you could map it to a full screen UI texture.
Not a hard solution to the sonic sensor, but hopefully that points you in the right direction.

Troubleshooting in case of tracking loss

I created an Object to Play Animation through the HelloAR Example of ARCore. Then he covered Camera with his hand and caused a tracking loss.
And if you shine the space again, the object you create will return, but the Animation will start from the beginning.
If space is recognized again after the tracking loss occurs, sometimes the object is returned but not returned. Is there a way of distinguishing?
If you recognize space again after a tracking loss occurs, why does Animation start all over again when the object returns? Are you deleting and recreating the object?
ARCore uses a techniques called Visual Inertial Odometry. It is a hybrid techniques which combine computer vision and sensor fusion.
So what VIO does is it combines data extracted from feature points(corners, blobs, edges, etc) with data acquired from mobile device IMU unit. It is crucial in ARCore you know the position of your device. Because you estimate every trackable position based on this information(triangulation using device pose).
Also another aspect is ARCore builds a sparse map of the environment while you move in the room. So those extracted feature points are stored in the memory based on a confidence level and used later to localize device.
At last, what happens when tracking is lost is you can not extract feature points due to a while wall for example. When you can not extract feature points you can not localize the device. Therefore, device does not know where it is in this Sparse map i mentioned above. Sometimes you recover because you go back to the places which are already scanned and kept in this Sparse map.
Now for your questions:
If you anchor your objects. Your objects will return but there can be drifts because ARCore can accumulate errors during this process especially if you move during device tracking is lost. So probably they return but they are not at the same physical position anymore because of the drifts.
As in animation restarting since those anchors can not be tracked they deactivated. Also since you anchor your objects they are child of the anchor so your objects are deactivated as well. That is why your animation restart.
You can test both issues using instant preview and see what happens to anchors when you lose tracking. Good luck!

How to find contours of soccer player in dynamic background

My project is a designation for a system which analyze soccer videos. In a part of this project I need to detect contours of players and everybody in the play field. For all players which don’t have occlusion with the advertisement billboards, I have used the color of play field (green) to detect contours and extract players. But I have problem with the situation that players or referee have occlusion with the advertisement billboards. Suppose a situation that advertisements on the billboards are dynamic (LED billboards). As you know in this situation finding the contours is more difficult because there is no static background color or texture. You can see two example of this condition in the following images.
NOTE: in order to find the position of occlusion, I use the region between the field line and advertisement billboards, because this region has the color of the field (green). This region is shown by a red rectangular in the following image.
I expect the result be similar to the following image.
Could anyone suggest an algorithm to detect these contours?
You can try several things.
Use vision.PeopleDetector object to detect people on the field. You can also track the detected people using vision.KalmanFilter, as in the Tracking Pedestrians from a Moving Car example.
Use vision.OpticalFlow object in the Computer Vision System Toolbox to compute optical flow. You can then analyze the resulting flow field to separate camera motion from the player motion.
Use frame differencing to detect moving objects. The good news is that that will give you the contours of the people. The bad news is that it will give you many spurious contours as well.
Optical Flow would work for such problems as it captures motion information. Foreground extraction techniques using HMM or GMM or non-parametric may solve the problem as I have used for motion analysis in surveillance videos to detect anomaly (Background was static). Magnitude and orientation of optical flow seems to be an effective method. I have read papers on segmentation using optical flow. I hope this may help you.