How to convert 3D point cloud (extracted from 3D sparse reconstruction) to millimeters? - matlab

Using Stereo vision and based on Multiple View Geometry book (http://www.robots.ox.ac.uk/~vgg/hzbook/), I have created a 3D point cloud in MATLAB. To do that, I first calibrated the cameras and rectified the stereo images. Then feature extraction and matching. Then eliminated the noisy matched based on camera locations. Finally created the 3D point cloud using triangulation.
Now my question is how to convert this 3D point cloud from pixel domain to actual millimeter/centimeter domain knowing my focal length and camera calibration matrices?
the goal is to find DEPTH IN MILLIMETERS.
I know how to do it in disparity/depth map case using formula: Z=(t*f)/d.
But here in the sparse case, can I do something like this? http://matlab.wikia.com/wiki/FAQ#How_do_I_measure_a_distance_or_area_in_real_world_units_instead_of_in_pixels.3F
or there is a more sophisticated method with more in depth explanation?
Thanks.

The formula you wrote is valid only in the special case when the image planes of the two cameras are on the same geometrical plane, and the motion from one to the other is a translation parallel to one of the image axes.
In the general case you'll need to triangulate actual rays in 3D space, using one of the techniques described in that book (it has a whole chapter on reconstruction). The reconstruction will be metrical if your calibration is. In particular, if the coordinate transform between the cameras has a translation vector whose units are meters (or millimeters, or inches, ...).

Related

Does ARKit consider Lens Distortion in iPhone and iPad?

ARKit updates many intrinsic (and extrinsic) parameters of the ARCamera from frame to frame. I'd like to know if it also takes Radial Lens Distortion into consideration (like in AVCameraCalibrationData class that ARKit doesn't use), and fix the video frames' distortion appropriately (distort/undistort operations) for back iPhone and iPad cameras?
var intrinsics: simd_float3x3 { get }
As we all know, the Radial Lens Distortion greatly affects the 6 DOF pose estimation accuracy when we place undistorted 3D objects in distorted by a lens real world scene.
var lensDistortionLookupTable: Data? { get }
/* A map of floating-point values describing radial */
/* lens distortions in AVCameraCalibrationData class */
If Lens Distortion's math in ARKit is available in API, where I can find it?
Although it's not explicitly stated, I'm certain that ARKit is correcting for non-linear lens distortion. Lens distortion (and inverse distortion) lookup tables exist in iOS11 and are available via AVCameraCalibrationData, but they are not exposed by ARKit, presumably because there is no need for them since you're already working with rectified coordinates.
Whether or not the distortion model parameters are the same for each device model (i.e. exact same values for each iPhone 7) it's an interesting question. I don't have access to multiple phones of the same model, but this shouldn't be hard to figure out for someone who does.
source
As an exapmple from : https://github.com/verebes1/ARKit-Multiplayer
QR marker detection
With the help of Apple's Vision now it's possible to recognize QR marker in camera's videofeed and track it while it is in the field of view. The framework provides us the coordinates of the QR marker square corners in the screen's coordinate system.
QR marker pose estimation
The next thing you probably want to do after detecting the QR markers is to obtain the camera pose from them.
To perform QR marker pose estimation you need to know the calibration parameters of your camera. This is the camera matrix and distortion coefficients. Each camera lens has unique parameters, such as focal length, principal point, and lens distortion model. The process of finding intrinsic camera parameters is called camera calibration. The camera calibration process is important for Augmented Reality applications because it describes the perspective transformation and lens distortion on an output image. To achieve the best user experience with Augmented Reality, visualization of an augmented object should be done using the same perspective projection.
At the end, what you get after the calibration is the camera matrix: a matrix of 3x3 elements with the focal distances and the camera center coordinates (a.k.a intrinsic parameters), and the distortion coefficients: a vector of 5 elements or more that models the distortion produced by your camera. The calibration parameters are pretty the same for most of iDevices.
With the precise location of marker corners, we can estimate a transformation between our camera and a marker in 3D space. This operation is known as pose estimation from 2D-3D correspondences. The pose estimation process finds an Euclidean transformation (that consists only of rotation and translation components) between the camera and the object.
The C is used to denote the camera center. The P1-P4 points are 3D points in the world coordinate system and the p1-p4 points are their projections on the camera's image plane. Our goal is to find relative transformation between a known marker position in the 3D world (p1-p4) and the camera C using an intrinsic matrix and known point projections on image plane (P1-P4).
OpenCV functions are used to calculate the QR marker transformation in such a way that it minimizes the reprojection error, that is the sum of squared distances between the observed projection's imagePoints and the projected objectPoints. The estimated transformation is defined by rotation (rvec) and translation components (tvec). This is also known as Euclidean transformation or rigid transformation. At the end we get rotation quaternion and a translation matrix of the QR marker.
Integration into Apple's ARKit
The final part is the integration of all the information about QR marker's pose into the 3D scene created by ARKit. ARKit uses Visual Inertial Odometry (VIO) to accurately track the world around it. VIO fuses camera sensor data with CoreMotion data. These two inputs allow the device to sense how it moves within a room with a high degree of accuracy, and without any additional calibration. All the rendering stuff is based on Apple's Metal and Apple's SceneKit above it.
In order to render SceneKit's node on our QR marker in a proper way we need to create a model matrix of our QR marker from the quaternion and translation matrix we've got from OpenCV. The next step is to multiply QR marker's model matrix by SceneKit scene virtual camera's transform matrix. As a result, we can see a custom node (Axes node in our project) that repeats all the QR marker's movements in the real world while it's in the field of view of the iPhone's camera and if it is not - it stays on the last updated position so we can examine it around.

How to superimpose two stereo images so that they become aligned? (Camera intrinsics and extrinsics known)

I'd like to find a transformation that projects the image from the Left camera onto the image from the Right camera so that the two become aligned. I already managed to do that with two similar cameras (IGB and RGB), by using the disparity map and shifting each pixel by the corresponding disparity value. My problem is that this doesn't work for other cameras that I'm using (for example multispectral and infrared sensors), because the calculated disparity maps have very little detail. I am currently using the Matlab Computer Vision Tool Box, and I suspect that the problem is the poor correlation of information in the images (little correspondences found by the disparity algorithm).
I would like to know if there is another way of doing this transformation, for example just by using the Extrinsic and Intrinsic Parameters of the cameras (the are already calibrated).
The disparity IS the transformation you are looking for. Any sensible left-right mapping depends on 3D information, which the disparity provides. Anything else is just hallucinating some values based on assumptions that may or may not make sense.

How to convert from world coordinates to pixel indices in Matlab

I have 512x512x313 volume of dicom images and i have a point represented in world coordinates say (57.7475 63.4184 83.1515), how could I get the corresponding pixel coordinate of that world coordinate in Matlab??
I hate to burst your bubble, but what you are asking for is impossible. The only way that I can think of where you are able to get a correspondence between real-world co-ordinates and pixel co-ordinates is if you calibrate the camera that was used to capture the images. Once you know the intrinsic and extrinsic parameters, you then have a transformation matrix that can map real-world co-ordinates to pixel co-ordinates.
I'm assuming you don't have calibration information for your camera, and so an alternative approach would be knowing which pixels in your image map to which real-world co-ordinates. You would need to know point correspondences between those points that map between the real-world and to your image. Once you know this, you would then compute the camera transformation matrix through least-squares and then you would use this matrix to determine which points map from the real-world to your image.
Unless you have pixel correspondences to each of your real-world co-ordinates, it is impossible to do what you're asking.
FWIW, if you want to see the procedure on how to obtain the transformation matrix, check out these notes: http://www.peterhillman.org.uk/downloads/whitepapers/calibration.pdf. This was a great starting point for me when I started learning about camera calibration. Take a look at Section #5 (Page #8), as this is what I believe you are looking for.... but you will need to have correspondences between the real-world co-ordinates and your image.
Good luck!

Matlab: From Disparity Map to 3D coordinates

I copied the matlab code from: http://www.mathworks.fr/fr/help/vision/ug/stereo-image-rectification.html
I can compute the 3D coordinates but I am not sure if it is the correct one.
Starting from the disparity map and calculating the 3D coordinates, how do we take into account of the warping tform1 and tform2?
The problem here is that you are using uncalibrated cameras. In this case you can get up-to-scale reconstruction, but if you want the 3D points in world units, you would need to know actual distances to some points in the world.
I think you would be better off calibrating your stereo system. Please see this example.

Stereo Camera Geometry

This paper describes nicely the geometry of a stereo image system. I am trying to figure out, if the cameras tilted towards each other with a certain angle, how the calculation would change? I looked around but couldn't find any reference to tilted camera systems.
Unfortunately, the calculation changes significantly. The rectified case (where both cameras are well-aligned to each other) has the advantage that you can calculate the disparity and the depth is proportional to the disparity. This is not the case in the general case.
When you introduce tilts, you end up with something called epipolar geometry. Here is a paper about this I just googled. In order to calculate the depth from a pixel-pair you need the fundamental matrix or the essential matrix. Both are not easy to obtain from the image pair. If, however, you have the geometric relation of both cameras (translation and rotation), calculating these matrices is a lot easier.
There are several ways to calculate the depth of a pixel-pair. One way is to use the fundamental matrix to rectify both images (although rectifying is not easy either, or even unique) and run a simple disparity check.