Stereo Camera Geometry - stereo-3d

This paper describes nicely the geometry of a stereo image system. I am trying to figure out, if the cameras tilted towards each other with a certain angle, how the calculation would change? I looked around but couldn't find any reference to tilted camera systems.

Unfortunately, the calculation changes significantly. The rectified case (where both cameras are well-aligned to each other) has the advantage that you can calculate the disparity and the depth is proportional to the disparity. This is not the case in the general case.
When you introduce tilts, you end up with something called epipolar geometry. Here is a paper about this I just googled. In order to calculate the depth from a pixel-pair you need the fundamental matrix or the essential matrix. Both are not easy to obtain from the image pair. If, however, you have the geometric relation of both cameras (translation and rotation), calculating these matrices is a lot easier.
There are several ways to calculate the depth of a pixel-pair. One way is to use the fundamental matrix to rectify both images (although rectifying is not easy either, or even unique) and run a simple disparity check.

Related

Measuring objects in a photo taken by calibrated cameras, knowing the size of a reference object in the photo

I am writing a program that captures real time images from a scene by two calibrated cameras (so the internal parameters of the cameras are known to us). Using two view geometry, I can find the essential matrix and use OpenCV or MATLAB to find the relative position and orientation of one camera with respect to another. Having the essential matrix, it is shown in Hartley and Zisserman's Multiple View Geometry that one can reconstruct the scene using triangulation up to scale. Now I want to use a reference length to determine the scale of reconstruction and resolve ambiguity.
I know the height of the front wall and I want to use it for determining the scale of reconstruction to measure other objects and their dimensions or their distance from the center of my first camera. How can it be done in practice?
Thanks in advance.
Edit: To add more information, I have already done linear trianglation (minimizing the algebraic error) but I am not sure if it is any useful because there is still a scale ambiguity that I don't know how to get rid of it. My ultimate goal is to recognize an object (like a Pepsi can) and separate it in a rectangular area (which is going to be written as a separate module by someone else) and then find the distance of each pixel in this rectangular area, i.e. the region of interest, to the camera. Then the distance from the camera to the object will be the minimum of the distances from the camera to the 3D coordinates of the pixels in the region of interest.
Might be a bit late, but at least for someone struggling with the same staff.
As far as I remember it is actually linear problem. You got essential matrix, which gives you rotation matrix and normalized translation vector specifying relative position of cameras. If you followed Hartley and Zissermanm you probably chose one of the cameras as origin of world coordinate system. Meaning all your triangulated points are in normalized distance from this origin. What is important is, that the direction of every triangulated point is correct.
If you have some reference in the scene (lets say height of the wall), then you just have to find this reference (2 points are enough - so opposite ends of the wall) and calculate "normalization coefficient" (sorry for terminology) as
coeff = realWorldDistanceOf2Points / distanceOfTriangulatedPoints
Once you have this coeff, just mulptiply all your triangulated points with it and you got real world points.
Example:
you know that opposite corners of the wall are 5m from each other. you find these corners in both images, triangulate them (lets call triangulated points c1 and c2), calculate their distance in the "normalized" world as ||c1 - c2|| and get the
coeff = 5 / ||c1 - c2||
and you get real 3d world points as triangulatedPoint*coeff.
Maybe easier option is to have both cameras in fixed relative position and calibrate them together by stereoCalibrate openCV/Matlab function (there is actually pretty nice GUI in Matlab for that) - it returns not just intrinsic params, but also extrinsic. But I don't know if this is your case.

How to superimpose two stereo images so that they become aligned? (Camera intrinsics and extrinsics known)

I'd like to find a transformation that projects the image from the Left camera onto the image from the Right camera so that the two become aligned. I already managed to do that with two similar cameras (IGB and RGB), by using the disparity map and shifting each pixel by the corresponding disparity value. My problem is that this doesn't work for other cameras that I'm using (for example multispectral and infrared sensors), because the calculated disparity maps have very little detail. I am currently using the Matlab Computer Vision Tool Box, and I suspect that the problem is the poor correlation of information in the images (little correspondences found by the disparity algorithm).
I would like to know if there is another way of doing this transformation, for example just by using the Extrinsic and Intrinsic Parameters of the cameras (the are already calibrated).
The disparity IS the transformation you are looking for. Any sensible left-right mapping depends on 3D information, which the disparity provides. Anything else is just hallucinating some values based on assumptions that may or may not make sense.

How to convert 3D point cloud (extracted from 3D sparse reconstruction) to millimeters?

Using Stereo vision and based on Multiple View Geometry book (http://www.robots.ox.ac.uk/~vgg/hzbook/), I have created a 3D point cloud in MATLAB. To do that, I first calibrated the cameras and rectified the stereo images. Then feature extraction and matching. Then eliminated the noisy matched based on camera locations. Finally created the 3D point cloud using triangulation.
Now my question is how to convert this 3D point cloud from pixel domain to actual millimeter/centimeter domain knowing my focal length and camera calibration matrices?
the goal is to find DEPTH IN MILLIMETERS.
I know how to do it in disparity/depth map case using formula: Z=(t*f)/d.
But here in the sparse case, can I do something like this? http://matlab.wikia.com/wiki/FAQ#How_do_I_measure_a_distance_or_area_in_real_world_units_instead_of_in_pixels.3F
or there is a more sophisticated method with more in depth explanation?
Thanks.
The formula you wrote is valid only in the special case when the image planes of the two cameras are on the same geometrical plane, and the motion from one to the other is a translation parallel to one of the image axes.
In the general case you'll need to triangulate actual rays in 3D space, using one of the techniques described in that book (it has a whole chapter on reconstruction). The reconstruction will be metrical if your calibration is. In particular, if the coordinate transform between the cameras has a translation vector whose units are meters (or millimeters, or inches, ...).

MATLAB - What are the units of Matlab Camera Calibration Toolbox

When showing the extrinsic parameters of calibration (the 3D model including the camera position and the position of the calibration checkerboards), the toolbox does not include units for the axes. It seemed logical to assume that they are in mm, but the z values displayed can not possibly be correct if they are indeed in mm. I'm assuming that there is some transformation going on, perhaps having to do with optical coordinates and units, but I can't figure it out from the documentation. Has anyone solved this problem?
If you marked the side length of your squares in mm, then the z-distance shown would be in mm.
I know next to nothing about matlabs (not entirely true but i avoid matlab wherever I can, and that would be almost always possible) tracking utilities but here's some general info.
Pixel dimension on the sensor has nothing to do with the size of the pixel on screen, or in model space. For all purposes a camera produces a picture that has no meaningful units. A tracking process is unaware of the scale of the scene. (the perspective projection takes care of that). You can re insert a scale by taking 2 tracked points and measuring the distance between those points. This is the solver spaces distance is pretty much arbitrary. Now if you know the real distance between these points you can get a conversion factor. By doing:
real distance / solver space distance.
There's really now way to knowing this distance form the cameras settings as the camera is unable to differentiate between different scales of scenes. So a perfect 1:100 replica is no different for the solver than the real deal. So you must allays relate to something you can measure separately for each measuring session. The camera always produces something that's relative in nature.

How to calculate perspective transformation using ellipse

I'm very new to 3D image processing.i'm working in my project to find the perspective angle of an circle.
A plate having set of white circles,using those circles i want to find the rotation angles (3D) of that plate.
For that i had finished camera calibration part and got camera error parameters.The next step i have captured an image and apply the sobel edge detection.
After that i have a little bit confusion about the ellipse fitting algorithm.i saw a lot of algorithms in ellipse fit.which one is the best method and fast method?
after finished ellipse fit i don't know how can i proceed further?how to calculate rotation and translation matrix using that ellipse?
can you tell me which algorithm is more suitable and easy. i need some matlab code to understand concept.
Thanks in advance
sorry for my English.
First, find the ellipse/circle centres (e.g. as Eddy_Em in other comments described).
You can then refer to Zhang's classic paper
https://research.microsoft.com/en-us/um/people/zhang/calib/
which allows you to estimate camera pose from a single image if some camera parameters are known, e.g. centre of projection. Note that the method fails for frontal recordings, i.e. the more of a perspective effect, the more accurate your estimate will be. The algorithm is fairly simple, you'll need a SVD and some cross products.