How to convert from world coordinates to pixel indices in Matlab - matlab

I have 512x512x313 volume of dicom images and i have a point represented in world coordinates say (57.7475 63.4184 83.1515), how could I get the corresponding pixel coordinate of that world coordinate in Matlab??

I hate to burst your bubble, but what you are asking for is impossible. The only way that I can think of where you are able to get a correspondence between real-world co-ordinates and pixel co-ordinates is if you calibrate the camera that was used to capture the images. Once you know the intrinsic and extrinsic parameters, you then have a transformation matrix that can map real-world co-ordinates to pixel co-ordinates.
I'm assuming you don't have calibration information for your camera, and so an alternative approach would be knowing which pixels in your image map to which real-world co-ordinates. You would need to know point correspondences between those points that map between the real-world and to your image. Once you know this, you would then compute the camera transformation matrix through least-squares and then you would use this matrix to determine which points map from the real-world to your image.
Unless you have pixel correspondences to each of your real-world co-ordinates, it is impossible to do what you're asking.
FWIW, if you want to see the procedure on how to obtain the transformation matrix, check out these notes: http://www.peterhillman.org.uk/downloads/whitepapers/calibration.pdf. This was a great starting point for me when I started learning about camera calibration. Take a look at Section #5 (Page #8), as this is what I believe you are looking for.... but you will need to have correspondences between the real-world co-ordinates and your image.
Good luck!

Related

Measuring objects in a photo taken by calibrated cameras, knowing the size of a reference object in the photo

I am writing a program that captures real time images from a scene by two calibrated cameras (so the internal parameters of the cameras are known to us). Using two view geometry, I can find the essential matrix and use OpenCV or MATLAB to find the relative position and orientation of one camera with respect to another. Having the essential matrix, it is shown in Hartley and Zisserman's Multiple View Geometry that one can reconstruct the scene using triangulation up to scale. Now I want to use a reference length to determine the scale of reconstruction and resolve ambiguity.
I know the height of the front wall and I want to use it for determining the scale of reconstruction to measure other objects and their dimensions or their distance from the center of my first camera. How can it be done in practice?
Thanks in advance.
Edit: To add more information, I have already done linear trianglation (minimizing the algebraic error) but I am not sure if it is any useful because there is still a scale ambiguity that I don't know how to get rid of it. My ultimate goal is to recognize an object (like a Pepsi can) and separate it in a rectangular area (which is going to be written as a separate module by someone else) and then find the distance of each pixel in this rectangular area, i.e. the region of interest, to the camera. Then the distance from the camera to the object will be the minimum of the distances from the camera to the 3D coordinates of the pixels in the region of interest.
Might be a bit late, but at least for someone struggling with the same staff.
As far as I remember it is actually linear problem. You got essential matrix, which gives you rotation matrix and normalized translation vector specifying relative position of cameras. If you followed Hartley and Zissermanm you probably chose one of the cameras as origin of world coordinate system. Meaning all your triangulated points are in normalized distance from this origin. What is important is, that the direction of every triangulated point is correct.
If you have some reference in the scene (lets say height of the wall), then you just have to find this reference (2 points are enough - so opposite ends of the wall) and calculate "normalization coefficient" (sorry for terminology) as
coeff = realWorldDistanceOf2Points / distanceOfTriangulatedPoints
Once you have this coeff, just mulptiply all your triangulated points with it and you got real world points.
Example:
you know that opposite corners of the wall are 5m from each other. you find these corners in both images, triangulate them (lets call triangulated points c1 and c2), calculate their distance in the "normalized" world as ||c1 - c2|| and get the
coeff = 5 / ||c1 - c2||
and you get real 3d world points as triangulatedPoint*coeff.
Maybe easier option is to have both cameras in fixed relative position and calibrate them together by stereoCalibrate openCV/Matlab function (there is actually pretty nice GUI in Matlab for that) - it returns not just intrinsic params, but also extrinsic. But I don't know if this is your case.

Verify that camera calibration is still valid

How do you determine that the intrinsic and extrinsic parameters you have calculated for a camera at time X are still valid at time Y?
My idea would be
to use a known calibration object (a chessboard) and place it in the camera's field of view at time Y.
Calculate the chessboard corner points in the camera's image (at time Y).
Define one of the chessboard corner points as world origin and calculate the world coordinates of all remaining chessboard corners based on that origin.
Relate the coordinates of 3. with the camera coordinate system.
Use the parameters calculated at time X to calculate the image points of the points from 4.
Calculate distances between points from 2. with points from 5.
Is that a clever way to go about it? I'd eventually like to implement it in MATLAB and later possibly openCV. I think I'd know how to do steps 1)-2) and step 6). Maybe someone can give a rough implementation for steps 2)-5). Especially I'd be unsure how to relate the "chessboard-world-coordinate-system" with the "camera-world-coordinate-system", which I believe I would have to do.
Thanks!
If you have a single camera you can easily follow the steps from this article:
Evaluating the Accuracy of Single Camera Calibration
For achieving step 2, you can easily use detectCheckerboardPoints function from MATLAB.
[imagePoints, boardSize, imagesUsed] = detectCheckerboardPoints(imageFileNames);
Assuming that you are talking about stereo-cameras, for stereo pairs, imagePoints(:,:,:,1) are the points from the first set of images, and imagePoints(:,:,:,2) are the points from the second set of images. The output contains M number of [x y] coordinates. Each coordinate represents a point where square corners are detected on the checkerboard. The number of points the function returns depends on the value of boardSize, which indicates the number of squares detected. The function detects the points with sub-pixel accuracy.
As you can see in the following image the points are estimated relative to the first point that covers your third step.
[The image is from this page at MATHWORKS.]
You can consider point 1 as the origin of your coordinate system (0,0). The directions of the axes are shown on the image and you know the distance between each point (in the world coordinate), so it is just the matter of depth estimation.
To find a transformation matrix between the points in the world CS and the points in the camera CS, you should collect a set of points and perform an SVD to estimate the transformation matrix.
But,
I would estimate the parameters of the camera and compare them with the initial parameters at time X. This is easier, if you have saved the images that were used when calibrating the camera at time X. By repeating the calibrating process using those images you should get very similar results, if the camera calibration is still valid.
Edit: Why you need the set of images used in the calibration process at time X?
You have a set of images to do the calibrations for the first time, right? To recalibrate the camera you need to use a new set of images. But for checking the previous calibration, you can use the previous images. If the parameters of the camera are changes, there would be an error between the re-estimation and the first estimation. This can be used for evaluating the validity of the calibration not for recalibrating the camera.

Matlab: From Disparity Map to 3D coordinates

I copied the matlab code from: http://www.mathworks.fr/fr/help/vision/ug/stereo-image-rectification.html
I can compute the 3D coordinates but I am not sure if it is the correct one.
Starting from the disparity map and calculating the 3D coordinates, how do we take into account of the warping tform1 and tform2?
The problem here is that you are using uncalibrated cameras. In this case you can get up-to-scale reconstruction, but if you want the 3D points in world units, you would need to know actual distances to some points in the world.
I think you would be better off calibrating your stereo system. Please see this example.

Understanding of openCV undistortion

I'm receiving depth images of a tof camera via MATLAB. the delivered drivers of the tof camera to compute x,y,z coordinates out of the depth image are using openCV function, which are implemented in MATLAB via mex-files.
But later on I can't use those drivers anymore nor use openCV functions, therefore I need to implement the 2d to 3d mapping on my own including the compensation of radial distortion. I already got hold of the camera parameters and the computation of the x,y,z coordinates of each pixel of the depth image is working. Until now I am solving the implicit equations of the undistortion via the newton method (which isn't really fast...). But I want to implement the undistortion of the openCV function.
... and there is my problem: I dont really understand it and I hope you can help me out there. how is it actually working? I tried to search through the forum, but havent found any useful threads concerning this case.
greetings!
The equations of the projection of a 3D point [X; Y; Z] to a 2D image point [u; v] are provided on the documentation page related to camera calibration :
(source: opencv.org)
In the case of lens distortion, the equations are non-linear and depend on 3 to 8 parameters (k1 to k6, p1 and p2). Hence, it would normally require a non-linear solving algorithm (e.g. Newton's method, Levenberg-Marquardt algorithm, etc) to inverse such a model and estimate the undistorted coordinates from the distorted ones. And this is what is used behind function undistortPoints, with tuned parameters making the optimization fast but a little inaccurate.
However, in the particular case of image lens correction (as opposed to point correction), there is a much more efficient approach based on a well-known image re-sampling trick. This trick is that, in order to obtain a valid intensity for each pixel of your destination image, you have to transform coordinates in the destination image into coordinates in the source image, and not the opposite as one would intuitively expect. In the case of lens distortion correction, this means that you actually do not have to inverse the non-linear model, but just apply it.
Basically, the algorithm behind function undistort is the following. For each pixel of the destination lens-corrected image do:
Convert the pixel coordinates (u_dst, v_dst) to normalized coordinates (x', y') using the inverse of the calibration matrix K,
Apply the lens-distortion model, as displayed above, to obtain the distorted normalized coordinates (x'', y''),
Convert (x'', y'') to distorted pixel coordinates (u_src, v_src) using the calibration matrix K,
Use the interpolation method of your choice to find the intensity/depth associated with the pixel coordinates (u_src, v_src) in the source image, and assign this intensity/depth to the current destination pixel.
Note that if you are interested in undistorting the depthmap image, you should use a nearest-neighbor interpolation, otherwise you will almost certainly interpolate depth values at object boundaries, resulting in unwanted artifacts.
The above answer is correct, but do note that UV coordinates are in screen space and centered around (0,0) instead of "real" UV coordinates.
Source: own re-implementation using Python/OpenGL. Code:
def correct_pt(uv, K, Kinv, ds):
uv_3=np.stack((uv[:,0],uv[:,1],np.ones(uv.shape[0]),),axis=-1)
xy_=uv_3#Kinv.T
r=np.linalg.norm(xy_,axis=-1)
coeff=(1+ds[0]*(r**2)+ds[1]*(r**4)+ds[4]*(r**6));
xy__=xy_*coeff[:,np.newaxis]
return (xy__#K.T)[:,0:2]

Stereo matching

I am using Camera Calibration Toolbox for Matlab. After calibration I have intrinsic and extrinsic parameters of stereo camera system. Next, I would like to determine the distance between the camera system and the object. To get this information, I used the function stereo_triangulation which is included in the Toolbox. Input are two matrixes including pixel coordinates of correspondences in the left and right image.
I tried to get coordinates of correspondences with using of Basic Block Matching method which is described in Matlab's help for Stereo Vision.
Resolution of my pictures is 1280x960 pixels. I know that the biggest disparity is around 520 pixels. I set the maximum of disparity range to 520. But then determine the coordinates takes ages. It is not possible use in practice. Calculating of disparity map is much faster with using of Matlab's function disparity(). But I want the step before - coordinates of correspondences.
Please can you suggest how can I effectively get the coordinates with Matlab?
Disparity and 3D are related by simple formulas (see below) so the time for calculating 3D data and disparity map should be the same. The notation is
f - focal length in pixels,
B - separation between cameras,
u, v - row and column in the system centered on the middle of the image,
d-disparity,
x, y, z - 3D coordinates.
z=f*B/d;
x=z*u/f;
y=z*v/f;
1280x960 is too large resolution for any correlation stereo to work in real time. Think about it: you have to loop over a 2d image, over 2d correlation window and over the range of disparities. This means 5 embedded loops! I don't work with Matlab anymore but I know that it is quite slow.