I am working on Optical flow based vehicle detection and tracking purely on MATLAB.
Provided that camera is in motion and object is also in motion.
Previously, a lot work is done on camera in stationary condition and object moving. Optical flow vectors can easily be determined using LUCAS-KANADE method and Horn and shunck. Sipmly taking two consecutive images results are achevied. I have done tests and acheived.
There is also simulink example viptrafficof_win available.
I need to perform optical flow based detection and tracking for camera and object both in motion. What methodology shall I pursue?
If your camera is moving, you would have to separate the camera motion (ego motion) from the motion of the objects. There are different ways of doing that. Here is a recent paper describing an approach using the orientations of optical flow vectors.
Related
I am using an app that was developed using the ARKit framework. More specifically, I am interested in the 3D facial mesh and the face orientation and position with respect to the phone's front camera.
Having said that I record video with subjects performing in front of the front camera. During these recordings, I have noticed that some videos resulted in inaccurate transformations with the face being placed in the back of the camera whereas the rotation being skewed (not orthogonal basis).
I do not have a deep understanding of how the TrueDepth camera combines all its sensors to track and reconstruct the 3D facial structure and so I do not know what could potentially cause this issue. Although I have experimented with different setups e.g. different subjects, with and without a mirror, screen on and off, etc. I still have not been able to identify the source of the inaccurate transformation. Could it be a combination of the camera angle interfering with the mirror?
Below I attached two recordings of myself that resulted in incorrect (above) and correct (below) estimated transformations.
Do you have any idea of what might be the problem? Thank you in advance.
Via ARKit, I want to place indoor map on floor.
Currently I tried 2 things:
I've placed large Plane below camera and above floor, But it causes quite drift. Does not move well when we walk, and overall experience is not overwhelming.
Saw a solution where you can identify horizontal plane, but it has its own issues.
So is it really possible with good results?
Devices with LiDAR
The LiDAR scanner has its advantages and disadvantages. The main advantage of LiDAR is its ability to almost instantly reconstruct floor and walls, then you can easily attach any 3D model to the resulted surface – a model will be stable, it will not drift, so a user's AR experience will be overwhelming, as you said. Also, an important advantage of LiDAR is the excellent performance in environment with poor lighting and with poor textures.
Here you can read about Occlusion feature and some of the LiDAR peculiarities. Good news: LiDAR perfectly works in conjunction with the Plane Detection option.
ARKit subdivides the reconstructed scene into ARMeshAnchors which give you access to polygonal geometry and surface classification.
ARMeshAnchor().geometry.classification
ARMeshAnchor().geometry.faces
ARMeshAnchor().geometry.vertices
ARMeshAnchor().transform.columns.3
Devices without LiDAR
In the absence of a LiDAR scanner, we can only detect horizontal and vertical surfaces using the Plane Detection feature. I can say that all AR frameworks (including ARKit and RealityKit) are much better and faster in defining horizontal surfaces, as opposed to vertical ones.
However, Detected Planes are less stable compared to Reconstructed Surfaces, and therefore, a slight drifting is possible sometimes. To successfully complete the Plane Detection stage, you need a well-lit room and good-for-tracking surrounding objects' textures.
ARKit calls your delegate's renderer(_:didAdd:for:) with a ARPlaneAnchor for each unique vertical and/or horizontal surface. And each plane anchor provides details about the surface – its world position, dimensions and real-world surfaces' classification.
In addition to the above, the delegate method called renderer(_:didUpdate:for:) is required to merge multiple coplanar Detected Planes into bigger resulting Detected Plane (a surface of a floor, for example).
ARPlaneAnchor().classification
ARPlaneAnchor().extent
ARPlaneAnchor().alignment
ARPlaneAnchor().center
Is it really possible with good results?
Yes, in both cases, it's possible to attach a map without drifting – whether you're using Plane Detection or Scene Reconstruction.
I'm trying to use Vuforia in Unity to see a model in AR. It is working properly when I'm in a room with lost of different colors, but if I go in a room with one single color (example : white floor, white wall, no furniture), the model keeps disappearing. I'm using Extended tracking with Prediction enabled.
Is there a way to keep the model on screen whatever the background seen by webcam?
Is there a way to keep the model on screen whatever the background seen by webcam??
I am afraid this is not possible. Since vuforia uses Markerless Tracking it requires high contrast on the points.
Since most of AR SDKs only use a monocular RGB camera (not RGB-Depth), they rely on computer vision techniques to calculate missing depth information. It means extracting visual distinct feature points and locating device using estimated distance to these feature points over several frames while you move.
However, they also leverage from sensor fusion which means they combine data gathered from camera and the data from IMU unit(sensors) of the device. Unfortunately, this data is mainly used for complementing when motion tracking fails in situations like excessive motion(when camera image is blurred). Therefore, sensor data itself is not reliable which is the case when you walk into a room where there are no distinctive points to extract.
The only way you can solve this is by placing several image targets in that room. That will allow Vuforia to calculate device position in 3D space. Otherwise this is not possible.
You can also refer to SLAM for more information.
My project is a designation for a system which analyze soccer videos. In a part of this project I need to detect contours of players and everybody in the play field. For all players which don’t have occlusion with the advertisement billboards, I have used the color of play field (green) to detect contours and extract players. But I have problem with the situation that players or referee have occlusion with the advertisement billboards. Suppose a situation that advertisements on the billboards are dynamic (LED billboards). As you know in this situation finding the contours is more difficult because there is no static background color or texture. You can see two example of this condition in the following images.
NOTE: in order to find the position of occlusion, I use the region between the field line and advertisement billboards, because this region has the color of the field (green). This region is shown by a red rectangular in the following image.
I expect the result be similar to the following image.
Could anyone suggest an algorithm to detect these contours?
You can try several things.
Use vision.PeopleDetector object to detect people on the field. You can also track the detected people using vision.KalmanFilter, as in the Tracking Pedestrians from a Moving Car example.
Use vision.OpticalFlow object in the Computer Vision System Toolbox to compute optical flow. You can then analyze the resulting flow field to separate camera motion from the player motion.
Use frame differencing to detect moving objects. The good news is that that will give you the contours of the people. The bad news is that it will give you many spurious contours as well.
Optical Flow would work for such problems as it captures motion information. Foreground extraction techniques using HMM or GMM or non-parametric may solve the problem as I have used for motion analysis in surveillance videos to detect anomaly (Background was static). Magnitude and orientation of optical flow seems to be an effective method. I have read papers on segmentation using optical flow. I hope this may help you.
I am very new to camera calibration and I have been trying to work with the camera calibration app from MATLAB's computer vision toolbox.
So I followed the steps they suggested on the website and so far so good, I was able to obtain the intrinsic parameters of the camera.
So now, I am kind of confused about what should I do with the "cameraParameter" object that was created when the calibration was done.
So my questions are:
(1) What should I do with the cameraParameter object that was created?
(2) How do I use this object when I am using the camera to capture images of something?
(3) Do I need the checker board around each time I capture images for my experiment?
(4) Should the camera be placed at the same spot each time?
I am sorry if those questions are really beginner level, camera calibration is new to me and I was not able to find my answers.
Thank you so much for your help.
I assume you are working with 1 just camera, so only intrinsic parameters of the camera are in the game.
(1),(2). Once your camera is calibrated, you need to use this parameters to undistort the image. Cameras dont take the images as they are in reality as the lenses distort it a bit, and the calibration parameters are for fixing the images. More in wikipedia.
About when you need to recalibrate the camera (3): if you set up the camera and don't change its focus, then you can use the same calibration parameters, but once you change the focal distance a recalibration is necessary.
(4) As long as you dont change the focal distance and you are not using a stereo camera sistem you can change your camera freely.
What you are looking for are two separate calibration steps: Alignment of depth image to color image and conversion from depth to a point cloud. Both functions are provided by windows sdk. There are matlab wrappers that call these SDK functions. You may want to do your own calibration only if you are not satisfied with the manufacturer calibration information stored on Kinect. Usually the error is within 1-2 pixels in the 2D alignment, and 4mm in 3D.
When you calibrate a single camera, you can use the resulting cameraParameters object for several things. First, you can remove the effects of lens distortion using the undistortImage function, which requires cameraParameters. There is also a function called extrinsics, which you can use to locate your calibrated camera in the world relative to some reference object (e. g. a checkerboard). Here's an example of how you can use a single camera to measure planar objects.
A depth sensor, like the Kinect, is a bit of different animal. It already gives you the depth in real units at each pixel. Additional calibration of the Kinect is useful if you want a more precise 3D reconstruction than what it gives you out of the box.
It would generally be helpful if you could tell us more about what it is you are trying to accomplish with your experiments.