Uncalibrated multi-view reconstruction depth estimation

Uncalibrated multi-view reconstruction depth estimation - matlab

I'm trying to make a 3D reconstruction from a set of uncalibrated photographs in MATLAB. I use SIFT to detect feature points and matches between images. I want to make a projective reconstruction first and then update this to a metric one using auto-calibration.
I know how to estimate the 3D points from 2 images by computing the fundamental matrix, camera matrices and triangulation. Now say I have 3 images, a, b and c. I compute the camera matrices and 3D points for image a and b. Now I want to update the structure by adding image c. I estimate the camera matrix by using known 3D points (calculated from a and b) that match with 2D points in image c, since:
However when I reconstruct the 3D points between b and c they don't add up with the existing 3D points from a and b. I'm assuming this is because I don't know the correct depth estimates of the points (depicted by s in above formula).
With the factorization method of Sturm and Triggs I can estimate the depths and find the structure and motion. However in order to do this, all points have to be visible in all views, which is not the case for my images. How can I estimate the depths for points not visible in all views?

This is not a question about Matlab. It is about an algorithm.
It is not mathematically possible to estimate the position of a 3D point in an image when you don't see an observation of the point in said image.
There are extensions for factorization to work with missing data. However, the field seems to have converged to Bundle Adjustment as the Gold Standard.
An excellent tutorial on how to achieve what you want can be found here, which is a culmination of several years of research into a working application. Starting from projective reconstruction up to the metric upgrade.

Related

Use calibrated camera get matched points for 3D reconstruction

I am trying to compute the 3D coordinates from several pair of two view points.
First, I used the matlab function estimateFundamentalMatrix() to get the F of the matched points (Number > 8) which is:
F1 =[-0.000000221102386 0.000000127212463 -0.003908602702784
-0.000000703461004 -0.000000008125894 -0.010618266198273
0.003811584026121 0.012887141181108 0.999845683961494]
And my camera - taken these two pictures - was pre-calibrated with the intrinsic matrix:
K = [12636.6659110566, 0, 2541.60550098958
0, 12643.3249022486, 1952.06628069233
0, 0, 1]
From this information I then computed the essential matrix using:
E = K'*F*K
With the method of SVD, I finally got the projective transformation matrices:
P1 = K*[ I | 0 ]
and
P2 = K*[ R | t ]
Where R and t are:
R = [ 0.657061402787646 -0.419110137500056 -0.626591577992727
-0.352566614260743 -0.905543541110692 0.235982367268031
-0.666308558758964 0.0658603659069099 -0.742761951588233]
t = [-0.940150699101422
0.320030970080146
0.117033504470591]
I know there should be 4 possible solutions, however, my computed 3D coordinates seemed to be not correct.
I used the camera to take pictures of a FLAT object with marked points. I matched the points by hand (which means there should not be obvious mistake exists about the raw material). But the result turned out to be a surface with a little bit banding.
I guess this might be due to the reason pictures did not processed with distortions (but actually I remember I did).
I just want to know whether this method to solve the 3D reconstruction issue right? Especially when we already know the camera intrinsic matrix.
Edit by JCraft at Aug.4: I have redone the process and got some pictures showing the problem, I will write another question with detail then post the link.
Edit by JCraft at Aug.4: I have posted a new question: Calibrated camera get matched points for 3D reconstruction, ideal test failed. And #Schorsch really appreciate your help formatting my question. I will try to learn how to do inputs in SO and also try to improve my gramma. Thanks!

If you only have the fundamental matrix and the intrinsics, you can only get a reconstruction up to scale. That is your translation vector t is in some unknown units. You can get the 3D points in real units in several ways:
You need to have some reference points in the world with known distances between them. This way you can compute their coordinates in your unknown units and calculate the scale factor to convert your unknown units into real units.
You need to know the extrinsics of each camera relative to a common coordinate system. For example, you can have a checkerboard calibration pattern somewhere in your scene that you can detect and compute extrinsics from. See this example. By the way, if you know the extrinsics, you can compute the Fundamental matrix and the camera projection matrices directly, without having to match points.
You can do stereo calibration to estimate the R and the t between the cameras, which would also give you the Fundamental and the Essential matrices. See this example.

Flat objects are critical surfaces, not possible to achive your goal from them. try adding two (or more) points off the plane (see Hartley and Zisserman or other text on the matter if still interested)

Multiview 3D reconstruction

I tried to do 3D reconstruction of multiple views by using multiview essential matrices to construct 3D view of each image view of object. However, I am shocked that the 3D points I found are all on about XY plane. I guess that it maybe regarding to the large value of essential matrix or weird number of projection matrix estimated. What are the suggestions for me to compute precise 3D points coordinate?

If you have the Computer Vision System Toolbox, this example may be helpful.

Understanding of openCV undistortion

I'm receiving depth images of a tof camera via MATLAB. the delivered drivers of the tof camera to compute x,y,z coordinates out of the depth image are using openCV function, which are implemented in MATLAB via mex-files.
But later on I can't use those drivers anymore nor use openCV functions, therefore I need to implement the 2d to 3d mapping on my own including the compensation of radial distortion. I already got hold of the camera parameters and the computation of the x,y,z coordinates of each pixel of the depth image is working. Until now I am solving the implicit equations of the undistortion via the newton method (which isn't really fast...). But I want to implement the undistortion of the openCV function.
... and there is my problem: I dont really understand it and I hope you can help me out there. how is it actually working? I tried to search through the forum, but havent found any useful threads concerning this case.
greetings!

The equations of the projection of a 3D point [X; Y; Z] to a 2D image point [u; v] are provided on the documentation page related to camera calibration :
(source: opencv.org)
In the case of lens distortion, the equations are non-linear and depend on 3 to 8 parameters (k1 to k6, p1 and p2). Hence, it would normally require a non-linear solving algorithm (e.g. Newton's method, Levenberg-Marquardt algorithm, etc) to inverse such a model and estimate the undistorted coordinates from the distorted ones. And this is what is used behind function undistortPoints, with tuned parameters making the optimization fast but a little inaccurate.
However, in the particular case of image lens correction (as opposed to point correction), there is a much more efficient approach based on a well-known image re-sampling trick. This trick is that, in order to obtain a valid intensity for each pixel of your destination image, you have to transform coordinates in the destination image into coordinates in the source image, and not the opposite as one would intuitively expect. In the case of lens distortion correction, this means that you actually do not have to inverse the non-linear model, but just apply it.
Basically, the algorithm behind function undistort is the following. For each pixel of the destination lens-corrected image do:
Convert the pixel coordinates (u_dst, v_dst) to normalized coordinates (x', y') using the inverse of the calibration matrix K,
Apply the lens-distortion model, as displayed above, to obtain the distorted normalized coordinates (x'', y''),
Convert (x'', y'') to distorted pixel coordinates (u_src, v_src) using the calibration matrix K,
Use the interpolation method of your choice to find the intensity/depth associated with the pixel coordinates (u_src, v_src) in the source image, and assign this intensity/depth to the current destination pixel.
Note that if you are interested in undistorting the depthmap image, you should use a nearest-neighbor interpolation, otherwise you will almost certainly interpolate depth values at object boundaries, resulting in unwanted artifacts.

The above answer is correct, but do note that UV coordinates are in screen space and centered around (0,0) instead of "real" UV coordinates.
Source: own re-implementation using Python/OpenGL. Code:
def correct_pt(uv, K, Kinv, ds):
uv_3=np.stack((uv[:,0],uv[:,1],np.ones(uv.shape[0]),),axis=-1)
xy_=uv_3#Kinv.T
r=np.linalg.norm(xy_,axis=-1)
coeff=(1+ds[0]*(r**2)+ds[1]*(r**4)+ds[4]*(r**6));
xy__=xy_*coeff[:,np.newaxis]
return (xy__#K.T)[:,0:2]

Stereo matching

I am using Camera Calibration Toolbox for Matlab. After calibration I have intrinsic and extrinsic parameters of stereo camera system. Next, I would like to determine the distance between the camera system and the object. To get this information, I used the function stereo_triangulation which is included in the Toolbox. Input are two matrixes including pixel coordinates of correspondences in the left and right image.
I tried to get coordinates of correspondences with using of Basic Block Matching method which is described in Matlab's help for Stereo Vision.
Resolution of my pictures is 1280x960 pixels. I know that the biggest disparity is around 520 pixels. I set the maximum of disparity range to 520. But then determine the coordinates takes ages. It is not possible use in practice. Calculating of disparity map is much faster with using of Matlab's function disparity(). But I want the step before - coordinates of correspondences.
Please can you suggest how can I effectively get the coordinates with Matlab?

Disparity and 3D are related by simple formulas (see below) so the time for calculating 3D data and disparity map should be the same. The notation is
f - focal length in pixels,
B - separation between cameras,
u, v - row and column in the system centered on the middle of the image,
d-disparity,
x, y, z - 3D coordinates.
z=f*B/d;
x=z*u/f;
y=z*v/f;
1280x960 is too large resolution for any correlation stereo to work in real time. Think about it: you have to loop over a 2d image, over 2d correlation window and over the range of disparities. This means 5 embedded loops! I don't work with Matlab anymore but I know that it is quite slow.

Creating 3D volume from 2D slice set of grayscale images

I am to create a 3D volume out of grayscale image set using Matlab. A set contains a continuous and quantized slices of 2D grayscale image. I am still considered myself a rookie in Matlab, but this is what I currently have in my mind:
create an empty space for 3D volume.
On each image, we perform all the preprocessing operation so that we only got the part that is of our interest. (In this question, assume that this preprocessing part always work flawlessly)
Go through the image, each pixel's x and y coordinate on 2D will be transfer to the empty space. For z coordinate, we can use the slice number with respect to the distance between each slice. If a pixel is adjacent to another pixel, the 3D points will be connected together.
Repeat the previous 2 steps until all slices are done. We will now have all the points connected just like in the 2D slices.
But here comes the trouble, how can we connect the points between the slices, so that these points can become a volume? Or is there a more robust way to do in Matlab? Any suggestion is highly appreciated.

Part 0 - Assumptions
all 2D images are of the same dimension, hence your 3D volume can hold all of them in a rectangular cube
majority of the pixels in each of the 2D images have 3D spatial relationships (you can't visualize much if the pixels in each of the 2D images are of some random distribution. )
Part 1 - Visualizing 3D Volume from A Stack of 2D Images
To visualize or reconstruct a 3D volume from a stack of 2D images, you can try the following toolkits in matlab.
1 3D CT/MRI images interactive sliding viewer
http://www.mathworks.com/matlabcentral/fileexchange/29134-3d-ctmri-images-interactive-sliding-viewer
[2] Viewer3D
http://www.mathworks.com/matlabcentral/fileexchange/21993-viewer3d
[3] Image3
http://www.mathworks.com/matlabcentral/fileexchange/21881-image3
[4] Surface2Volume
http://www.mathworks.com/matlabcentral/fileexchange/8772-surface2volume
[5] SliceOMatic
http://www.mathworks.com/matlabcentral/fileexchange/764
Note that if you are familiar with VTK, you can try this:
[6] matVTK
http://www.cir.meduniwien.ac.at/matvtk/
I am currently sticking with [5] SliceOMatic for its simplicity and ease of use. However, by default, rendering 3D is quite slow in Matlab. Turning on openGL would give faster rendering. (http://www.mathworks.com/help/techdoc/ref/opengl.html) Or simply put, set(gcf, 'Renderer', 'OpenGL').
Part 2 - Interpolating pixels in between the slices
To interpolate pixels in between the slices, you need to specify an interpolation method (some of the above toolkits have this capability / flexibility. Otherwise, to give you a head start, some examples for interpolation are bicubic, spline, polynomial etc..(you can work this out by looking up on google or google/scholar for interpolation methods much more specific to your problem domain).
Part 3 - 3D Pre-processing
Looking at your procedures, you process the volumetric data by processing each of the 2D images first. In many advanced algorithms, or in true 3D processing, what you can do is to process the volumetric data in 3D domain first (simply put, you take the 26 neighbors or more in to account first.). Once this step is done, you can simply output the volumetric data into a stack of 2D images for cross-sectional viewing or supply to one of the aforementioned toolkits for 3D viewing or output to third party 3D viewing applications.
I have followed the above concepts for my own medical imaging research projects and the above finding is based on my research experience documented here (with latest revisions).

MATLAB generally plots volumetric data using a 3d array. The data points are spatially evenly separated along each axis. If there are sites in the 3d array for which you do not have data for, usually they are assigned the NaN value and the various plotting functions can generally handle this in a reasonable way (i.e. will generally behave as you intended).
If you load the slices into the 3d array such that adjacent points in the z-direction of the data are also adjacent in the 3rd dimension of the array then you should be fine.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse