I know a function
(Baseline*focal)
depth = ------------------
(disparity*SensorSize)
when focal/SensorSize = fx in intrinsics[0][0], right?
But this calculated depth is always less than the real depth.
the real depth like:
(Baseline*focal)
depth = ------------------
((disparity-bias)*SensorSize)
And I have 4 same cameras, when use camera0 and camera1 the bias is 4, but when use camera 2 and camera3 the bias is 5.
I don't know why!!!
Related
I am facing the same problem as mentioned in this post, however, I am not facing it with OpenGL, but simply with MATLAB. Depth as distance to camera plane in GLSL
I have a depth image rendered from the Z-Buffer from 3ds max. I was not able to get an orthographic representation of the z-buffer. For a better understanding, I will use the same sketch as made by the previous post:
* |--*
/ |
/ |
C-----* C-----*
\ |
\ |
* |--*
The 3 asterisks are pixels and the C is the camera. The lines from the
asterisks are the "depth". In the first case, I get the distance from the pixel to the camera. In the second, I wish to get the distance from each pixel to the plane.
The settins of my camera are the following:
WIDTH = 512;
HEIGHT = 424;
FOV = 89.971;
aspect_ratio = WIDTH/HEIGHT;
%clipping planes
near = 500;
far = 5000;
I calulate the frustum settings like the following:
%calculate frustums settings
top = tan((FOV/2)*5000)
bottom = -top
right = top*aspect_ratio
left = -top*aspect_ratio
And set the projection matrix like this:
%Generate matrix
O_p = [2/(right-left) 0 0 -((right+left)/(right-left)); ...
0 2/(top-bottom) 0 -((top+bottom)/(top-bottom));...
0 0 -2/(far-near) -(far+near)/(far-near);...
0 0 0 1];
After this I read in the depth image, which was saved as a 48 bit RGB- image, where each channel is the same, thus only one channel has to be used.
%Read in image
img = imread('KinectImage.png');
%Throw away, except one channel (all hold the same information)
c1 = img(:,:,1);
The pixel values have to be inverted, since the closer the values are to the camera, the brigher they were. If a pixel is 0 (no object to render available) it is set to 2^16, so , that after the bit complementation, the value is still 0.
%Inverse bits that are not zero, so that the z-image has the correct values
c1(c1 == 0) = 2^16
c1_cmp = bitcmp(c1);
To apply the matrix, to each z-Buffer value, I lay out the vector one dimensional and build up a vector like this [0 0 z 1] , over every element.
c1_cmp1d = squeeze(reshape(c1_cmp,[512*424,1]));
converted = double([zeros(WIDTH*HEIGHT,1) zeros(WIDTH*HEIGHT,1) c1_cmp1d zeros(WIDTH*HEIGHT,1)]) * double(O_p);
After that, I pick out the 4th element of the result vector and reshape it to a image
img_con = converted(:,4);
img_con = reshape(img_con,[424,512]);
However, the effect, that the Z-Buffer is not orthographic is still there, so did I get sth wrong? Is my calculation flawed ? Or did I make mistake here?
Depth Image coming from 3ds max
After the computation (the white is still "0" , but the color axis has changed)
It would be great to achieve this with 3ds max, which would resolve this issue, however I was not able to find this setting for the z-buffer. Thus, I want to solve this using Matlab.
I'm working with the Kinect v2 and I have to map the depth information onto the RGB images to process them: in particular, I need to know which pixels in the RGB images are in a certain range of distance (depth) along the Z axis; I'm acquiring all the data with a C# program and saving them as images (RGB) and txt files (depth).
I've followed the instruction from here and here (and I thank them for sharing), but I still have some problems I don't know how to solve.
I have calculated the rotation (R) and translation (T) matrix between the depth sensor and the RGB camera, as well as their intrinsic parameters.
I have created P3D_d (depth pixels in world coordinates related to depth sensor) and P3D_rgb (depth pixels in world coordinates related to rgb camera).
row_num = 424;
col_num = 512;
P3D_d = zeros(row_num,col_num,3);
P3D_rgb = zeros(row_num,col_num,3);
cont = 1;
for row=1:row_num
for col=1:col_num
P3D_d(row,col,1) = (row - cx_d) * depth(row,col) / fx_d;
P3D_d(row,col,2) = (col - cy_d) * depth(row,col) / fy_d;
P3D_d(row,col,3) = depth(row,col);
temp = [P3D_d(row,col,1);P3D_d(row,col,2);P3D_d(row,col,3)];
P3D_rgb(row,col,:) = R*temp+T;
end
end
I have created P2D_rgb_x and P2D_rgb_y.
P2D_rgb_x(:,:,1) = (P3D_rgb(:,:,1)./P3D_rgb(:,:,3))*fx_rgb+cx_rgb;
P2D_rgb_y(:,:,2) = (P3D_rgb(:,:,2)./P3D_rgb(:,:,3))*fy_rgb+cy_rgb;
but now I don't understand how to continue.
Assuming that the calibration parameters are correct, I've tried to click on a defined point in both the depth (coordinates: row_d, col_d) and rgb (coordinates: row_rgb, col_rgb) images, but P2D_rgb_x(row_d, col_d) is totally different from row_rgb, as well as P2D_rgb_y(row_d, col_d) is totally different from col_rgb.
So, what do exactly mean P2D_rgb_x and P2D_rgb_y? How can I use them to map depth value onto rgb images or just to get the depth of a certain RGB pixel?
I'll apreciate any suggest or help!
PS: I've also a related post on MathWorks at this link
I am working on Stereo vision task and I would like to get the distance between stereo vision cameras and the object. I am using Matlab with Computer Vision System Toolbox.
I have calibrated cameras with using "Camera Calibration Toolbox for Matlab" thus I have Intrinsic parameters of left and right camera and Extrinsic parameters (position of right camera wrt left camera). I have also a pair of rectified pictures and thier disparity map. For estimation of disparity I have used Matlab function disparity(). I know the baseline and the focal length of cameras but my results are still wrong.
baseline = 70 mm
focal length = 25 mm
disparity = 60 pixels
---------------------
depth = baseline * focal length / disparity = 70 * 25 / 60 = 29 mm
But I know that the distance is cca 600 mm. Is this formula right? What about the units? mm * mm / pixel != mm. And especially I would like to use Camera matrix (Intrinsic parameters) for calculation but I didn't figure out how. I would be thankful for any hint.
Like you said, you have to convert the unit into mm. And for that you need this formulas
z = (b*F) / (d*s)
mm = (mm * mm) / (pixel * (mm/pixel))
Where
z = depth in mm
b = baseline in mm
F = focal length in mm
d = depth in pixel
s = sensor size in mm/pixel. (Normally it provide in um, so do conversion before).
EDIT
Sometime your focal is in pixel so you don't need to use the sensor size. So just use your formula :
z = b*F / d
mm = mm * pixel / pixel
I am developing an application for the iPhone using opencv. I have to use the method solvePnPRansac:
http://opencv.willowgarage.com/documentation/cpp/camera_calibration_and_3d_reconstruction.html
For this method I need to provide a camera matrix:
__ __
| fx 0 cx |
| 0 fy cy |
|_0 0 1 _|
where cx and cy represent the center pixel positions of the image and fx and fy represent focal lengths, but that is all the documentation says. I am unsure what to provide for these focal lengths. The iPhone 5 has a focal length of 4.1 mm, but I do not think that this value is usable as is.
I checked another website:
http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
which shows how opencv creates camera matrices. Here it states that focal lengths are measured in pixel units.
I checked another website:
http://www.velocityreviews.com/forums/t500283-focal-length-in-pixels.html
(about half way down)
it says that focal length can be converted from units of millimeters to pixels using the equation: fx = fy = focalMM * pixelDensity / 25.4;
Another Link I found states that fx = focalMM * width / (sensorSizeMM);
fy = focalMM * length / (sensorSizeMM);
I am unsure about these equations and how to properly create this matrix.
Any help, advice, or links on how to create an accurate camera matrix (especially for the iPhone 5) would be greatly appreciated,
Isaac
p.s. I think that (fx/fy) or (fy/fx) might be equal to the aspect ratio of the camera, but that might be completely wrong.
UPDATE:
Pixel coordinates to 3D line (opencv)
using this link, I can figure out how they want fx and fy to be formatted because they use it to scale angles relative to their distance from the center. therefore, fx and fy are likely in pixels/(unit length) but im still not sure what this unit length needs to be, can it be arbitrary as long as x and y are scaled to each other?
You can get an initial (rough) estimate of the focal length in pixel dividing the focal length in mm by the width of a pixel of the camera' sensor (CCD, CMOS, whatever).
You get the former from the camera manual, or read it from the EXIF header of an image taken at full resolution. Finding out the latter is a little more complicated: you may look up on the interwebs the sensor's spec sheet, if you know its manufacturer and model number, or you may just divide the overall width of its sensitive area by the number of pixels on the side.
Absent other information, it's usually safe to assume that the pixels are square (i.e. fx == fy), and that the sensor is orthogonal to the lens's focal axis (i.e. that the term in the first row and second column of the camera matrix is zero). Also, the pixel coordinates of the principal point (cx, cy) are usually hard to estimate accurately without a carefully designed calibration rig, and an as-carefully executed calibration procedure (that's because they are intrinsically confused with the camera translation parallel to the image plane). So it's best to just set them equal to the geometrical geometrical center of the image, unless you know that the image has been cropped asymmetrically.
Therefore, your simplest camera model has only one unknown parameter, the focal length f = fx = fy.
Word of advice: in your application is usually more convenient to carry around the horizontal (or vertical) field-of-view angle, rather than the focal length in pixels. This is because the FOV is invariant to image scaling.
The "focal length" you are dealing with here is simply a scaling factor from objects in the world to camera pixels, used in the pinhole camera model (Wikipedia link). That's why its units are pixels/unit length. For a given f, an object of size L at a distance (perpendicular to the camera) z, would be f*L/z pixels.
So, you could estimate the focal length by placing an object of known size at a known distance of your camera and measuring its size in the image. You could aso assume the central point is the center of the image. You should definitely not ignore the lens distortion (dist_coef parameter in solvePnPRansac).
In practice, the best way to obtain the camera matrix and distortion coefficients is to use a camera calibration tool. You can download and use the MRPT camera_calib software from this link, there's also a video tutorial here. If you use matlab, go for the Camera Calibration Toolbox.
Here you have a table with the spec of the cameras for iPhone 4 and 5.
The calculation is:
double f = 4.1;
double resX = (double)(sourceImage.cols);
double resY = (double)(sourceImage.rows);
double sensorSizeX = 4.89;
double sensorSizeY = 3.67;
double fx = f * resX / sensorSizeX;
double fy = f * resY / sensorSizeY;
double cx = resX/2.;
double cy = resY/2.;
Try this:
func getCamMatrix()->(Float, Float, Float, Float)
{
let format:AVCaptureDeviceFormat? = deviceInput?.device.activeFormat
let fDesc:CMFormatDescriptionRef = format!.formatDescription
let dim:CGSize = CMVideoFormatDescriptionGetPresentationDimensions(fDesc, true, true)
// dim = dimensioni immagine finale
let cx:Float = Float(dim.width) / 2.0;
let cy:Float = Float(dim.height) / 2.0;
let HFOV : Float = format!.videoFieldOfView
let VFOV : Float = ((HFOV)/cx)*cy
let fx:Float = abs(Float(dim.width) / (2 * tan(HFOV / 180 * Float(M_PI) / 2)));
let fy:Float = abs(Float(dim.height) / (2 * tan(VFOV / 180 * Float(M_PI) / 2)));
return (fx, fy, cx, cy)
}
Old thread, present problem.
As Milo and Isaac mentioned after Milo's answer, there seems to be no "common" params available for, say, the iPhone 5.
For what it is worth, here is the result of a run with the MRPT calibration tool, with a good old iPhone 5:
[CAMERA_PARAMS]
resolution=[3264 2448]
cx=1668.87585
cy=1226.19712
fx=3288.47697
fy=3078.59787
dist=[-7.416752e-02 1.562157e+00 1.236471e-03 1.237955e-03 -5.378571e+00]
Average err. of reprojection: 1.06726 pixels (OpenCV error=1.06726)
Note that dist means distortion here.
I am conducting experiments on a toy project, with these parameters---kind of ok. If you do use them on your own project, please keep in mind that they may be hardly good enough to get started. The best will be to follow Milo's recommendation with your own data. The MRPT tool is quite easy to use, with the checkerboard they provide. Hope this does help getting started !
When I ran this Matlab code to get the depth image, the result I got is a matrix of 480x640. The min element value is 0 and the max element value is 2711. What does 2711 mean? Is that the distance from the camera to the farthest part of the image. But what is the unit of 2711. Is that meter of feet or ??
I don't know what the Matlab code exactly does to the depth, but it probably does some processing on it because the depth sent by the Kinect is on 11 bits, so it shouldn't be higher than 2048. Try to find out what it does, or to get access to the raw data sent by the Kinect.
The data sent by the Kinect is not a proper distance (it's a "disparity"), so you have to do some math to convert it to useful units.
From the OpenKinect project wiki (which contains useful information about the Kinect) :
From their data, a basic first order
approximation for converting the raw
11-bit disparity value to a depth
value in centimeters is: 100/(-0.00307
* rawDisparity + 3.33). This approximation is approximately 10 cm
off at 4 m away, and less than 2 cm
off within 2.5 m.
A better approximation is given by
Stéphane Magnenat in this post:
distance = 0.1236 * tan(rawDisparity /
2842.5 + 1.1863) in meters. Adding a final offset term of -0.037 centers
the original ROS data. The tan
approximation has a sum squared
difference of .33 cm while the 1/x
approximation is about 1.7 cm.
Once you have the distance using the
measurement above, a good
approximation for converting (i, j, z)
to (x,y,z) is:
x = (i - w / 2) * (z + minDistance) * scaleFactor * (w/h)
y = (j - h / 2) * (z + minDistance) * scaleFactor
z = z
Where
minDistance = -10
scaleFactor = .0021.
These values were found by hand.
You can find more details about the Kinect's depth camera and its calibration on the ROS website (and many others !).
If you map the data to a meter scale it compresses the depth image slightly. I found this was an issue when I was trying to look for planes in the mapped data.