I have a calibrated camera whose intrinsics were calculated prior to doing an initial two view reconstruction. Suppose I have 20 images around a static, rigid body all taken with the same camera. Using the first two views and a ground-truth measurement of the scene, I have the
1) initial reconstruction using Stewenius 5 point algorithm to find E (essential matrix).
2) camera matrices P1 and P2 where the origin is set to that of camera P1.
My question is, how would I add more views? For the first two views, I found the feature points by hand since I found that MATLAB feature-detectors and matchers were outputting false correspondences.
Do I continuously do two-view reconstructions to get the other camera extrinsics i.e. P1 and P3, P1 and P4...P1 and P20; all using the same feature points as that of P1-P2? Wouldn't there be some sort of error propagation with this approach? The reason for using P1 as a reference is because it is chosen to be at the world origin.
I do have a procedure to bundle adjust after I acquire all initial estimates for the camera extrinics, but my problem is getting the initial camera matrices P3...P20.
Thanks in advance!
You start by obtaining pairwise calibration P1-P2, P2-P3, P3-P4 ... using feature points of the corresponding pair. You need using some sort of RANSAC to get rid of false correspondences here or do matching manually between all pairs. The you need to put all cameras to the common coordinate frame. Say we select P1 as a key camera. To add third camera P3 to the pair P1-P2 you need to calculate rotation delta between P2 and P3 from pairwise calibration, Delta2-3 and then apply it to known camera matrix of P2. And so on until all camera matrices in the common coordinate frame. The you do bundle adjustment.
Related
I am writing a program that captures real time images from a scene by two calibrated cameras (so the internal parameters of the cameras are known to us). Using two view geometry, I can find the essential matrix and use OpenCV or MATLAB to find the relative position and orientation of one camera with respect to another. Having the essential matrix, it is shown in Hartley and Zisserman's Multiple View Geometry that one can reconstruct the scene using triangulation up to scale. Now I want to use a reference length to determine the scale of reconstruction and resolve ambiguity.
I know the height of the front wall and I want to use it for determining the scale of reconstruction to measure other objects and their dimensions or their distance from the center of my first camera. How can it be done in practice?
Thanks in advance.
Edit: To add more information, I have already done linear trianglation (minimizing the algebraic error) but I am not sure if it is any useful because there is still a scale ambiguity that I don't know how to get rid of it. My ultimate goal is to recognize an object (like a Pepsi can) and separate it in a rectangular area (which is going to be written as a separate module by someone else) and then find the distance of each pixel in this rectangular area, i.e. the region of interest, to the camera. Then the distance from the camera to the object will be the minimum of the distances from the camera to the 3D coordinates of the pixels in the region of interest.
Might be a bit late, but at least for someone struggling with the same staff.
As far as I remember it is actually linear problem. You got essential matrix, which gives you rotation matrix and normalized translation vector specifying relative position of cameras. If you followed Hartley and Zissermanm you probably chose one of the cameras as origin of world coordinate system. Meaning all your triangulated points are in normalized distance from this origin. What is important is, that the direction of every triangulated point is correct.
If you have some reference in the scene (lets say height of the wall), then you just have to find this reference (2 points are enough - so opposite ends of the wall) and calculate "normalization coefficient" (sorry for terminology) as
coeff = realWorldDistanceOf2Points / distanceOfTriangulatedPoints
Once you have this coeff, just mulptiply all your triangulated points with it and you got real world points.
Example:
you know that opposite corners of the wall are 5m from each other. you find these corners in both images, triangulate them (lets call triangulated points c1 and c2), calculate their distance in the "normalized" world as ||c1 - c2|| and get the
coeff = 5 / ||c1 - c2||
and you get real 3d world points as triangulatedPoint*coeff.
Maybe easier option is to have both cameras in fixed relative position and calibrate them together by stereoCalibrate openCV/Matlab function (there is actually pretty nice GUI in Matlab for that) - it returns not just intrinsic params, but also extrinsic. But I don't know if this is your case.
How do you determine that the intrinsic and extrinsic parameters you have calculated for a camera at time X are still valid at time Y?
My idea would be
to use a known calibration object (a chessboard) and place it in the camera's field of view at time Y.
Calculate the chessboard corner points in the camera's image (at time Y).
Define one of the chessboard corner points as world origin and calculate the world coordinates of all remaining chessboard corners based on that origin.
Relate the coordinates of 3. with the camera coordinate system.
Use the parameters calculated at time X to calculate the image points of the points from 4.
Calculate distances between points from 2. with points from 5.
Is that a clever way to go about it? I'd eventually like to implement it in MATLAB and later possibly openCV. I think I'd know how to do steps 1)-2) and step 6). Maybe someone can give a rough implementation for steps 2)-5). Especially I'd be unsure how to relate the "chessboard-world-coordinate-system" with the "camera-world-coordinate-system", which I believe I would have to do.
Thanks!
If you have a single camera you can easily follow the steps from this article:
Evaluating the Accuracy of Single Camera Calibration
For achieving step 2, you can easily use detectCheckerboardPoints function from MATLAB.
[imagePoints, boardSize, imagesUsed] = detectCheckerboardPoints(imageFileNames);
Assuming that you are talking about stereo-cameras, for stereo pairs, imagePoints(:,:,:,1) are the points from the first set of images, and imagePoints(:,:,:,2) are the points from the second set of images. The output contains M number of [x y] coordinates. Each coordinate represents a point where square corners are detected on the checkerboard. The number of points the function returns depends on the value of boardSize, which indicates the number of squares detected. The function detects the points with sub-pixel accuracy.
As you can see in the following image the points are estimated relative to the first point that covers your third step.
[The image is from this page at MATHWORKS.]
You can consider point 1 as the origin of your coordinate system (0,0). The directions of the axes are shown on the image and you know the distance between each point (in the world coordinate), so it is just the matter of depth estimation.
To find a transformation matrix between the points in the world CS and the points in the camera CS, you should collect a set of points and perform an SVD to estimate the transformation matrix.
But,
I would estimate the parameters of the camera and compare them with the initial parameters at time X. This is easier, if you have saved the images that were used when calibrating the camera at time X. By repeating the calibrating process using those images you should get very similar results, if the camera calibration is still valid.
Edit: Why you need the set of images used in the calibration process at time X?
You have a set of images to do the calibrations for the first time, right? To recalibrate the camera you need to use a new set of images. But for checking the previous calibration, you can use the previous images. If the parameters of the camera are changes, there would be an error between the re-estimation and the first estimation. This can be used for evaluating the validity of the calibration not for recalibrating the camera.
stereoParameters takes two extrinsic parameters: RotationOfCamera2 and TranslationOfCamera2.
The problem is that the documentation is a not very detailed about what RotationOfCamera2 really means, it only says: Rotation of camera 2 relative to camera 1, specified as a 3-by-3 matrix.
What is the coordinate system in this case ?
A rotation matrix can be specified in any coordinate system.
What does it exactly mean "the coordinate system of Camera 1" ? What are its x,y,z axes ?
In other words, if I calculate the Essential Matrix, how can I get the corresponding RotationOfCamera2 and TranslationOfCamera2 from the Essential Matrix ?
RotationOfCamera2 and TranslationOfCamera2 describe the transformation from camera1's coordinates into camera2's coordinates. A camera's coordinate system has its origin at the camera's optical center. Its X and Y-axes are in the image plane, and its Z-axis points out along the optical axis.
Equivalently, the extrinsics of camera 1 are identity rotation and zero translation, while the extrinsics of camera 2 are RotationOfCamera2 and TranslationOfCamera2.
If you have the Essential matrix, you can decompose it into the rotation and a translation. Two things to keep in mind. First, the translation is up to scale, so t will be a unit vector. Second, the rotation matrix will be a transpose of what you get from estimateCameraParameters, because of the difference in the vector-matrix multiplication conventions.
Out of curiosity, what is it that you are trying to accomplish? Are you working with a single moving camera? Otherwise, why not use the Stereo Camera Calibrator app to calibrate your cameras, and get rotation and translation for free?
Suppose for left camera's 1st checkerboard (or to any world reference) rotation is R1 and translation is T1, right camera's 1st checkerboard rotation is R2 and translation is T2, then you can calculate them as follows;
RotationOfCamera2 = R2*R1';
TranslationOfCamera2= T2-RotationOfCamera2*T1
But please note that this calculations are just for one identical checkerboard reference. Inside matlab these two parameters are calculated by all given pair of checkerboard images and calculate median values as initial guess. Later these parameters will be refine by nonlinear optimization. So after median calculations they might be sigtly differ. But if you have just one reference point tranfomation for both two camera, you should use above formula. Note Dima told, matlab's rotation matrix is transpose of normal usage. So I wrote it as how the literature tells not matlab's style.
I'm trying to make a 3D reconstruction from a set of uncalibrated photographs in MATLAB. I use SIFT to detect feature points and matches between images. I want to make a projective reconstruction first and then update this to a metric one using auto-calibration.
I know how to estimate the 3D points from 2 images by computing the fundamental matrix, camera matrices and triangulation. Now say I have 3 images, a, b and c. I compute the camera matrices and 3D points for image a and b. Now I want to update the structure by adding image c. I estimate the camera matrix by using known 3D points (calculated from a and b) that match with 2D points in image c, since:
However when I reconstruct the 3D points between b and c they don't add up with the existing 3D points from a and b. I'm assuming this is because I don't know the correct depth estimates of the points (depicted by s in above formula).
With the factorization method of Sturm and Triggs I can estimate the depths and find the structure and motion. However in order to do this, all points have to be visible in all views, which is not the case for my images. How can I estimate the depths for points not visible in all views?
This is not a question about Matlab. It is about an algorithm.
It is not mathematically possible to estimate the position of a 3D point in an image when you don't see an observation of the point in said image.
There are extensions for factorization to work with missing data. However, the field seems to have converged to Bundle Adjustment as the Gold Standard.
An excellent tutorial on how to achieve what you want can be found here, which is a culmination of several years of research into a working application. Starting from projective reconstruction up to the metric upgrade.
I am trying to compute the 3D coordinates from several pair of two view points.
First, I used the matlab function estimateFundamentalMatrix() to get the F of the matched points (Number > 8) which is:
F1 =[-0.000000221102386 0.000000127212463 -0.003908602702784
-0.000000703461004 -0.000000008125894 -0.010618266198273
0.003811584026121 0.012887141181108 0.999845683961494]
And my camera - taken these two pictures - was pre-calibrated with the intrinsic matrix:
K = [12636.6659110566, 0, 2541.60550098958
0, 12643.3249022486, 1952.06628069233
0, 0, 1]
From this information I then computed the essential matrix using:
E = K'*F*K
With the method of SVD, I finally got the projective transformation matrices:
P1 = K*[ I | 0 ]
and
P2 = K*[ R | t ]
Where R and t are:
R = [ 0.657061402787646 -0.419110137500056 -0.626591577992727
-0.352566614260743 -0.905543541110692 0.235982367268031
-0.666308558758964 0.0658603659069099 -0.742761951588233]
t = [-0.940150699101422
0.320030970080146
0.117033504470591]
I know there should be 4 possible solutions, however, my computed 3D coordinates seemed to be not correct.
I used the camera to take pictures of a FLAT object with marked points. I matched the points by hand (which means there should not be obvious mistake exists about the raw material). But the result turned out to be a surface with a little bit banding.
I guess this might be due to the reason pictures did not processed with distortions (but actually I remember I did).
I just want to know whether this method to solve the 3D reconstruction issue right? Especially when we already know the camera intrinsic matrix.
Edit by JCraft at Aug.4: I have redone the process and got some pictures showing the problem, I will write another question with detail then post the link.
Edit by JCraft at Aug.4: I have posted a new question: Calibrated camera get matched points for 3D reconstruction, ideal test failed. And #Schorsch really appreciate your help formatting my question. I will try to learn how to do inputs in SO and also try to improve my gramma. Thanks!
If you only have the fundamental matrix and the intrinsics, you can only get a reconstruction up to scale. That is your translation vector t is in some unknown units. You can get the 3D points in real units in several ways:
You need to have some reference points in the world with known distances between them. This way you can compute their coordinates in your unknown units and calculate the scale factor to convert your unknown units into real units.
You need to know the extrinsics of each camera relative to a common coordinate system. For example, you can have a checkerboard calibration pattern somewhere in your scene that you can detect and compute extrinsics from. See this example. By the way, if you know the extrinsics, you can compute the Fundamental matrix and the camera projection matrices directly, without having to match points.
You can do stereo calibration to estimate the R and the t between the cameras, which would also give you the Fundamental and the Essential matrices. See this example.
Flat objects are critical surfaces, not possible to achive your goal from them. try adding two (or more) points off the plane (see Hartley and Zisserman or other text on the matter if still interested)