Find 3D coordinate with respect to the camera using 2D image coordinates - matlab

I need to calculate the X,Y coordinates in the world with respect to the camera using u,v coordinates in the 2D image. I am using an S7 edge camera to send a 720x480 video feed to MATLAB.
What I know: Z i.e the depth of the object from the camera, size of the camera pixels (1.4um), focal length (4.2mm)
Let's say the image point is at (u,v) = (400,400).
My approach is as follows:
Subtract the pixel value of center point (240,360) from the u,v pixel coordinates of the point in the image. This should give us the pixel coordinates with respect to the camera's optical axis (z axis). The origin is now at the center of the image. So new coordinates are: (160, -40)
Multiply the new u,v pixel values with pixel size to obtain the distance of the point from the origin in physical units. Let's call it (x,y). We get (x,y) = (0.224,-0.056) in mm units.
Use the formula X = xZ/f & Y = yZ/f to calculate X,Y coordinates in the real world with respect to the camera's optical axis.
Is my approach correct?

Your approach is going in the right way, but it would be easier if you use a more standardize approach. What we usually do is use Pinhole Camera Model to give you a transformation between the world coordinates [X, Y, Z] to the pixel [x, y]. Take a look in this guide which describes step-by-step the process of building your transformation.
Basically you have to define you Internal Camera Matrix to do the transformation:
fx and fy are your focal length scaled to use as pixel distance. You can calculate this with your FOV and the total pixel in each direction. Take a look here and here for more info.
u0 and v0 are the piercing point. Since our pixels are not centered in the [0, 0] these parameters represents a translation to the center of the image. (intersection of the optical axis with the image plane provided in pixel coordinates).
If you need, you can also add a the skew factor a, which you can use to correct shear effects of your camera. Then, the Internal Camera Matrix will be:
Since your depth is fixed, just fix your Z and continue the transformation without a problem.
Remember: If you want the inverse transformation (camera to world) just invert you Camera Matrix and be happy!
Matlab has also a very good guide for this transformation. Take a look.

Related

How to get the coordinate of a pixel that is containing right hand wrist joint, in depth image using kinect?

I captured a depth image of a human body in the room and I collected and saved Skeletal data related to that depth image (the joints of wrists, elbows, ...).
Considering the joints' coordinates are in the camera space and depth image is in depth space, I was able to show the location of the right hand wrist joint on depth image using this code:
depthJointIndices = metadata.DepthJointIndices(:, :, trackedBodies);
plot(depthJointIndices(11,1), depthJointIndices(11,2), '*');
Now I want to know which pixel EXACTLY contains the right hand wrist joint, how can I do this properly?
I thought that I can get the coordinate of x,y of that joint using the code I used to show the right hand wrist joint.
As follows:
depthJointIndices = metadata.DepthJointIndices(:, :, trackedBodies);
x=depthJointIndices(11, 1)
y=depthJointIndices(11, 2)
But x,y are calculated as follows:
x = 303.5220
y = 185.6131
As you can see x,y are Floating-point numbers, but coordinates of pixels can't be Floating-point numbers.
So can anyone help me with this problem? how can I get coordinate of a pixel that is containing right hand wrist joint, in depth image, using kinect?
You can use the following 2 equations to derive the coordinates.
Here,
(U,V) and Z denote screen coordinates and depth value, respectively, Cx and Cy denote the center of a depth map, and fx and fy are the focal lengths of the camera. For Kinect-V1 cameras, fx = fy = 580
You can refer the paper Action Recognition From Depth Maps Using Deep Convolutional Neural Networks by P. Wang et.al for more information.

Units in the camera frame of the matlab's calibration toolbox

What is the units of the coordinates in camera frame by using matlab's camera calibration toolbox? Say I transform some points in pixels to camera's frame, what are the units? Is there some possible way of knowing?
I found this similar question, but no definite answer.
Can you please clarify how exactly you are transforming those points?
The pin-hole camera model looks like this:
w*[x,y,1] = [X,Y,Z,1]*[R;t]*K
[X,Y,Z] are the world coordinates in world units (e. g. millimeters), and [x,y] are the image coordinates in pixels. K is the matrix of camera intrinsics, and R and t are the camera extrinsics. w is an arbitrary scale factor.
If you take a world point [X,Y,Z,1] and multiply it by [R;t], then you get a point in a "camera's coordinate system", where the origin is at the focal point, and the units are the same as in your world coordinates (e. g. millimeters).
If you take a point in the image [x,y,1] and multiply it by the inverse of K, then you get a point in "normalized image coordinates", where the origin is at the optical center, and the axis have no units. This happens because you are dividing x and y in pixels by the focal length fx and fy, which is also in pixels. So pixels cancel out.

Creating stereoParameters class in Matlab: what coordinate system should be used for relative camera rotation parameter?

stereoParameters takes two extrinsic parameters: RotationOfCamera2 and TranslationOfCamera2.
The problem is that the documentation is a not very detailed about what RotationOfCamera2 really means, it only says: Rotation of camera 2 relative to camera 1, specified as a 3-by-3 matrix.
What is the coordinate system in this case ?
A rotation matrix can be specified in any coordinate system.
What does it exactly mean "the coordinate system of Camera 1" ? What are its x,y,z axes ?
In other words, if I calculate the Essential Matrix, how can I get the corresponding RotationOfCamera2 and TranslationOfCamera2 from the Essential Matrix ?
RotationOfCamera2 and TranslationOfCamera2 describe the transformation from camera1's coordinates into camera2's coordinates. A camera's coordinate system has its origin at the camera's optical center. Its X and Y-axes are in the image plane, and its Z-axis points out along the optical axis.
Equivalently, the extrinsics of camera 1 are identity rotation and zero translation, while the extrinsics of camera 2 are RotationOfCamera2 and TranslationOfCamera2.
If you have the Essential matrix, you can decompose it into the rotation and a translation. Two things to keep in mind. First, the translation is up to scale, so t will be a unit vector. Second, the rotation matrix will be a transpose of what you get from estimateCameraParameters, because of the difference in the vector-matrix multiplication conventions.
Out of curiosity, what is it that you are trying to accomplish? Are you working with a single moving camera? Otherwise, why not use the Stereo Camera Calibrator app to calibrate your cameras, and get rotation and translation for free?
Suppose for left camera's 1st checkerboard (or to any world reference) rotation is R1 and translation is T1, right camera's 1st checkerboard rotation is R2 and translation is T2, then you can calculate them as follows;
RotationOfCamera2 = R2*R1';
TranslationOfCamera2= T2-RotationOfCamera2*T1
But please note that this calculations are just for one identical checkerboard reference. Inside matlab these two parameters are calculated by all given pair of checkerboard images and calculate median values as initial guess. Later these parameters will be refine by nonlinear optimization. So after median calculations they might be sigtly differ. But if you have just one reference point tranfomation for both two camera, you should use above formula. Note Dima told, matlab's rotation matrix is transpose of normal usage. So I wrote it as how the literature tells not matlab's style.

Create depth map from 3d points

I have given 3d points of a scene or a subset of these points comprising one object of the scene. I would like to create a depth image from these points, that is the pixel value in the image encodes the distance of the corresponding 3d point to the camera.
I have found the following similar question
http://www.mathworks.in/matlabcentral/newsreader/view_thread/319097
however the answers there do not help me, since I want to use MATLAB. To get the image values is not difficult (e.g. simply compute the distance of each 3d point to the camera's origin), however I do not know how to figure out the corresponding locations in the 2d image.
I could only imagine that you project all 3d points on a plane and bin their positions on the plane in discrete, well, rectangles on the plane. Then you could average the depth value for each bin.
I could however imagine that the result of such a procedure would be a very pixelated image, not being very smooth.
How would you go about this problem?
Assuming you've corrected for camera tilt (a simple matrix multiplication if you know the angle), you can probably just follow this example
X = data(:,1);
Y = data(:,1);
Z = data(:,1);
%// This bit requires you to make some choices like the start X and Z, end X and Z and resolution (X and Z) of your desired depth map
[Xi, Zi] = meshgrid(X_start:X_res:X_end, Z_start:Z_res:Z_end);
depth_map = griddata(X,Z,Y,Xi,Zi)

Kinect - Calculating Surface Area

I'd like to be able to calculate the surface area of objects seen by the depth camera. Is there an easy way to do this? For example if the kinect is seeing a player I need to calculate how much surface it is covering.
If there is no such functions existing, I can calculate by creating multiple squares with coordinate (x,y) (x+1,y) (x, y+1) (x+1, y+1) and taking into consideration the z value. But I'm not sure how to get the distance in mm or cm between pixels in the x or y axis.
Thanks