I have problems understanding the depth (Z) value in 3D point cloud resulted from 3d sparse reconstruction like this example in MATLAB: http://www.mathworks.com/help/vision/ug/sparse-3-d-reconstruction-from-multiple-views.html
I have attached a picture showing the reconstructed 3D point cloud in the above example. I have put some datatips on the figure so we know the (x,y,z) coordinates of the points. here are my questions:
1- what does the Z value in point cloud represent? is it the distance in millimeters from the camera? if that's the case then it does not make sense based on the picture I attached since I am sure the distance of the sphere and checkerboard from the camera must be greater than 200 mm.
Or maybe it is from some reference point in space? then what is this reference point? and how can I make a 3D point cloud that the Z values indicate the distance from the camera?
2- why is there negative values for Z? what does that mean in terms of distance to the camera?
I appreciate if someone can explain.
In this example the world coordinates are defined by the checkerboard. The checkerboard defines the X-Y plane, and the Z-axis points into the checkerboard, as explained in the documentation:
Since your 3D points are above the checkerboard, they have negative Z-coordinates.
Your (x,y,z) coordinates are in world units, which are completely disconnected from metric values (unless you build a scale between world and metric, there are various methods to do it). So the z value tells you about the depth of each point in world coordinates.
If you have the pose of each camera, and you multiply each point by the camera projection matrix, you will get the (x',y',z') points in camera coordinates. At that point, if z' is negative, it means it's behind the camera.
Related
I need to calculate the X,Y coordinates in the world with respect to the camera using u,v coordinates in the 2D image. I am using an S7 edge camera to send a 720x480 video feed to MATLAB.
What I know: Z i.e the depth of the object from the camera, size of the camera pixels (1.4um), focal length (4.2mm)
Let's say the image point is at (u,v) = (400,400).
My approach is as follows:
Subtract the pixel value of center point (240,360) from the u,v pixel coordinates of the point in the image. This should give us the pixel coordinates with respect to the camera's optical axis (z axis). The origin is now at the center of the image. So new coordinates are: (160, -40)
Multiply the new u,v pixel values with pixel size to obtain the distance of the point from the origin in physical units. Let's call it (x,y). We get (x,y) = (0.224,-0.056) in mm units.
Use the formula X = xZ/f & Y = yZ/f to calculate X,Y coordinates in the real world with respect to the camera's optical axis.
Is my approach correct?
Your approach is going in the right way, but it would be easier if you use a more standardize approach. What we usually do is use Pinhole Camera Model to give you a transformation between the world coordinates [X, Y, Z] to the pixel [x, y]. Take a look in this guide which describes step-by-step the process of building your transformation.
Basically you have to define you Internal Camera Matrix to do the transformation:
fx and fy are your focal length scaled to use as pixel distance. You can calculate this with your FOV and the total pixel in each direction. Take a look here and here for more info.
u0 and v0 are the piercing point. Since our pixels are not centered in the [0, 0] these parameters represents a translation to the center of the image. (intersection of the optical axis with the image plane provided in pixel coordinates).
If you need, you can also add a the skew factor a, which you can use to correct shear effects of your camera. Then, the Internal Camera Matrix will be:
Since your depth is fixed, just fix your Z and continue the transformation without a problem.
Remember: If you want the inverse transformation (camera to world) just invert you Camera Matrix and be happy!
Matlab has also a very good guide for this transformation. Take a look.
I am writing a program that captures real time images from a scene by two calibrated cameras (so the internal parameters of the cameras are known to us). Using two view geometry, I can find the essential matrix and use OpenCV or MATLAB to find the relative position and orientation of one camera with respect to another. Having the essential matrix, it is shown in Hartley and Zisserman's Multiple View Geometry that one can reconstruct the scene using triangulation up to scale. Now I want to use a reference length to determine the scale of reconstruction and resolve ambiguity.
I know the height of the front wall and I want to use it for determining the scale of reconstruction to measure other objects and their dimensions or their distance from the center of my first camera. How can it be done in practice?
Thanks in advance.
Edit: To add more information, I have already done linear trianglation (minimizing the algebraic error) but I am not sure if it is any useful because there is still a scale ambiguity that I don't know how to get rid of it. My ultimate goal is to recognize an object (like a Pepsi can) and separate it in a rectangular area (which is going to be written as a separate module by someone else) and then find the distance of each pixel in this rectangular area, i.e. the region of interest, to the camera. Then the distance from the camera to the object will be the minimum of the distances from the camera to the 3D coordinates of the pixels in the region of interest.
Might be a bit late, but at least for someone struggling with the same staff.
As far as I remember it is actually linear problem. You got essential matrix, which gives you rotation matrix and normalized translation vector specifying relative position of cameras. If you followed Hartley and Zissermanm you probably chose one of the cameras as origin of world coordinate system. Meaning all your triangulated points are in normalized distance from this origin. What is important is, that the direction of every triangulated point is correct.
If you have some reference in the scene (lets say height of the wall), then you just have to find this reference (2 points are enough - so opposite ends of the wall) and calculate "normalization coefficient" (sorry for terminology) as
coeff = realWorldDistanceOf2Points / distanceOfTriangulatedPoints
Once you have this coeff, just mulptiply all your triangulated points with it and you got real world points.
Example:
you know that opposite corners of the wall are 5m from each other. you find these corners in both images, triangulate them (lets call triangulated points c1 and c2), calculate their distance in the "normalized" world as ||c1 - c2|| and get the
coeff = 5 / ||c1 - c2||
and you get real 3d world points as triangulatedPoint*coeff.
Maybe easier option is to have both cameras in fixed relative position and calibrate them together by stereoCalibrate openCV/Matlab function (there is actually pretty nice GUI in Matlab for that) - it returns not just intrinsic params, but also extrinsic. But I don't know if this is your case.
stereoParameters takes two extrinsic parameters: RotationOfCamera2 and TranslationOfCamera2.
The problem is that the documentation is a not very detailed about what RotationOfCamera2 really means, it only says: Rotation of camera 2 relative to camera 1, specified as a 3-by-3 matrix.
What is the coordinate system in this case ?
A rotation matrix can be specified in any coordinate system.
What does it exactly mean "the coordinate system of Camera 1" ? What are its x,y,z axes ?
In other words, if I calculate the Essential Matrix, how can I get the corresponding RotationOfCamera2 and TranslationOfCamera2 from the Essential Matrix ?
RotationOfCamera2 and TranslationOfCamera2 describe the transformation from camera1's coordinates into camera2's coordinates. A camera's coordinate system has its origin at the camera's optical center. Its X and Y-axes are in the image plane, and its Z-axis points out along the optical axis.
Equivalently, the extrinsics of camera 1 are identity rotation and zero translation, while the extrinsics of camera 2 are RotationOfCamera2 and TranslationOfCamera2.
If you have the Essential matrix, you can decompose it into the rotation and a translation. Two things to keep in mind. First, the translation is up to scale, so t will be a unit vector. Second, the rotation matrix will be a transpose of what you get from estimateCameraParameters, because of the difference in the vector-matrix multiplication conventions.
Out of curiosity, what is it that you are trying to accomplish? Are you working with a single moving camera? Otherwise, why not use the Stereo Camera Calibrator app to calibrate your cameras, and get rotation and translation for free?
Suppose for left camera's 1st checkerboard (or to any world reference) rotation is R1 and translation is T1, right camera's 1st checkerboard rotation is R2 and translation is T2, then you can calculate them as follows;
RotationOfCamera2 = R2*R1';
TranslationOfCamera2= T2-RotationOfCamera2*T1
But please note that this calculations are just for one identical checkerboard reference. Inside matlab these two parameters are calculated by all given pair of checkerboard images and calculate median values as initial guess. Later these parameters will be refine by nonlinear optimization. So after median calculations they might be sigtly differ. But if you have just one reference point tranfomation for both two camera, you should use above formula. Note Dima told, matlab's rotation matrix is transpose of normal usage. So I wrote it as how the literature tells not matlab's style.
I copied the matlab code from: http://www.mathworks.fr/fr/help/vision/ug/stereo-image-rectification.html
I can compute the 3D coordinates but I am not sure if it is the correct one.
Starting from the disparity map and calculating the 3D coordinates, how do we take into account of the warping tform1 and tform2?
The problem here is that you are using uncalibrated cameras. In this case you can get up-to-scale reconstruction, but if you want the 3D points in world units, you would need to know actual distances to some points in the world.
I think you would be better off calibrating your stereo system. Please see this example.
I have N 3D observations taken from an optical motion capture system in XYZ form.
The motion that was captured was just a simple circle arc, derived from a rigid body with fixed axis of rotation.
I used the princomp function in matlab to get all marker points on the same plane i.e. the plane on which the motion has been done.
(See a pic representing 3D data on the plane that was found, below)
What i want to do after the previous step is to look the fitted data on the plane that was found and get the curve of the captured motion in 2D.
In the princomp how to, it is said that
The first two coordinates of the principal component scores give the
projection of each point onto the plane, in the coordinate system of
the plane.
(from "Fitting an Orthogonal Regression Using Principal Components Analysis" article on mathworks help site)
So i thought that if i just plot those pc scores -plot(score(:,1),score(:,2))- i'll get the motion curve. Instead what i got is this.
(See a pic representing curve data in 2D derived from pc scores, below)
The 2d curve seems stretched and nonlinear (different y values for same x values) when it shouldn't be. The curve that i am looking for, should be interpolated by just using simple polynomial (polyfit) or circle fit in matlab.
Is this happening because the plane that was found looks like rhombus relative to the original coordinate system and the pc axes are rotated with respect to the basis of plane in such way that produce this stretch?
Then i thought that, this is happening because of the different coordinate systems of optical system and Matlab. Optical system's (ie cameras) co.sys. is XZY oriented and Matlab's default (i think) co.sys is XYZ oriented. I transformed my data to correspond to Matlab's co.sys through a rotation matrix, run again princomp but i got the same stretch in the 2D curve (the new curve just had different orientation now).
Somewhere else i read that
Principal Components Analysis chooses the first PCA axis as that line
that goes through the centroid, but also minimizes the square of the
distance of each point to that line. Thus, in some sense, the line is
as close to all of the data as possible. Equivalently, the line goes
through the maximum variation in the data. The second PCA axis also
must go through the centroid, and also goes through the maximum
variation in the data, but with a certain constraint: It must be
completely uncorrelated (i.e. at right angles, or "orthogonal") to PCA
axis 1.
I know that i am missing something but i have a problem understanding why i get a stretched curve. What i have to do so i can get the curve right?
Thanks in advance.
EDIT: Here is a sample data file (3 columns XYZ coords for 2 markers)
w w w.sendspace.com/file/2hiezc