create opencv camera matrix for iPhone 5 solvepnp - iphone

I am developing an application for the iPhone using opencv. I have to use the method solvePnPRansac:
http://opencv.willowgarage.com/documentation/cpp/camera_calibration_and_3d_reconstruction.html
For this method I need to provide a camera matrix:
__ __
| fx 0 cx |
| 0 fy cy |
|_0 0 1 _|
where cx and cy represent the center pixel positions of the image and fx and fy represent focal lengths, but that is all the documentation says. I am unsure what to provide for these focal lengths. The iPhone 5 has a focal length of 4.1 mm, but I do not think that this value is usable as is.
I checked another website:
http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
which shows how opencv creates camera matrices. Here it states that focal lengths are measured in pixel units.
I checked another website:
http://www.velocityreviews.com/forums/t500283-focal-length-in-pixels.html
(about half way down)
it says that focal length can be converted from units of millimeters to pixels using the equation: fx = fy = focalMM * pixelDensity / 25.4;
Another Link I found states that fx = focalMM * width / (sensorSizeMM);
fy = focalMM * length / (sensorSizeMM);
I am unsure about these equations and how to properly create this matrix.
Any help, advice, or links on how to create an accurate camera matrix (especially for the iPhone 5) would be greatly appreciated,
Isaac
p.s. I think that (fx/fy) or (fy/fx) might be equal to the aspect ratio of the camera, but that might be completely wrong.
UPDATE:
Pixel coordinates to 3D line (opencv)
using this link, I can figure out how they want fx and fy to be formatted because they use it to scale angles relative to their distance from the center. therefore, fx and fy are likely in pixels/(unit length) but im still not sure what this unit length needs to be, can it be arbitrary as long as x and y are scaled to each other?

You can get an initial (rough) estimate of the focal length in pixel dividing the focal length in mm by the width of a pixel of the camera' sensor (CCD, CMOS, whatever).
You get the former from the camera manual, or read it from the EXIF header of an image taken at full resolution. Finding out the latter is a little more complicated: you may look up on the interwebs the sensor's spec sheet, if you know its manufacturer and model number, or you may just divide the overall width of its sensitive area by the number of pixels on the side.
Absent other information, it's usually safe to assume that the pixels are square (i.e. fx == fy), and that the sensor is orthogonal to the lens's focal axis (i.e. that the term in the first row and second column of the camera matrix is zero). Also, the pixel coordinates of the principal point (cx, cy) are usually hard to estimate accurately without a carefully designed calibration rig, and an as-carefully executed calibration procedure (that's because they are intrinsically confused with the camera translation parallel to the image plane). So it's best to just set them equal to the geometrical geometrical center of the image, unless you know that the image has been cropped asymmetrically.
Therefore, your simplest camera model has only one unknown parameter, the focal length f = fx = fy.
Word of advice: in your application is usually more convenient to carry around the horizontal (or vertical) field-of-view angle, rather than the focal length in pixels. This is because the FOV is invariant to image scaling.

The "focal length" you are dealing with here is simply a scaling factor from objects in the world to camera pixels, used in the pinhole camera model (Wikipedia link). That's why its units are pixels/unit length. For a given f, an object of size L at a distance (perpendicular to the camera) z, would be f*L/z pixels.
So, you could estimate the focal length by placing an object of known size at a known distance of your camera and measuring its size in the image. You could aso assume the central point is the center of the image. You should definitely not ignore the lens distortion (dist_coef parameter in solvePnPRansac).
In practice, the best way to obtain the camera matrix and distortion coefficients is to use a camera calibration tool. You can download and use the MRPT camera_calib software from this link, there's also a video tutorial here. If you use matlab, go for the Camera Calibration Toolbox.

Here you have a table with the spec of the cameras for iPhone 4 and 5.
The calculation is:
double f = 4.1;
double resX = (double)(sourceImage.cols);
double resY = (double)(sourceImage.rows);
double sensorSizeX = 4.89;
double sensorSizeY = 3.67;
double fx = f * resX / sensorSizeX;
double fy = f * resY / sensorSizeY;
double cx = resX/2.;
double cy = resY/2.;

Try this:
func getCamMatrix()->(Float, Float, Float, Float)
{
let format:AVCaptureDeviceFormat? = deviceInput?.device.activeFormat
let fDesc:CMFormatDescriptionRef = format!.formatDescription
let dim:CGSize = CMVideoFormatDescriptionGetPresentationDimensions(fDesc, true, true)
// dim = dimensioni immagine finale
let cx:Float = Float(dim.width) / 2.0;
let cy:Float = Float(dim.height) / 2.0;
let HFOV : Float = format!.videoFieldOfView
let VFOV : Float = ((HFOV)/cx)*cy
let fx:Float = abs(Float(dim.width) / (2 * tan(HFOV / 180 * Float(M_PI) / 2)));
let fy:Float = abs(Float(dim.height) / (2 * tan(VFOV / 180 * Float(M_PI) / 2)));
return (fx, fy, cx, cy)
}

Old thread, present problem.
As Milo and Isaac mentioned after Milo's answer, there seems to be no "common" params available for, say, the iPhone 5.
For what it is worth, here is the result of a run with the MRPT calibration tool, with a good old iPhone 5:
[CAMERA_PARAMS]
resolution=[3264 2448]
cx=1668.87585
cy=1226.19712
fx=3288.47697
fy=3078.59787
dist=[-7.416752e-02 1.562157e+00 1.236471e-03 1.237955e-03 -5.378571e+00]
Average err. of reprojection: 1.06726 pixels (OpenCV error=1.06726)
Note that dist means distortion here.
I am conducting experiments on a toy project, with these parameters---kind of ok. If you do use them on your own project, please keep in mind that they may be hardly good enough to get started. The best will be to follow Milo's recommendation with your own data. The MRPT tool is quite easy to use, with the checkerboard they provide. Hope this does help getting started !

Related

How to get rectangular inclination from axis-aligned bounding box?

My binary image has rectangular rotated objects of known size on it. I'd like to get the object inclination using axis-aligned bounding box that MATLAB's regionprops returns. What are my suggestions:
Let bounding box width be W, side of rectangle be C and inclination alpha
Then
Using Weierstrass substitution
After some simplification:
Solving the equation for tan(alpha/2) with
For any nonzero inclination discriminant is positive.
Logic seems to be OK, so as math. Could you please point where I make a mistake, or what is a better way to get inclination?
Here is corresponding MATLAB code:
img = false(25,25);
img(5:16,5:16) = true;
rot_img = imrotate(img, 30, 'crop');
props = regionprops(bwlabel(rot_img),'BoundingBox');
bbox = cat(1,props.BoundingBox);
w = bbox(3);
h = 12;
a = -1*(1+w/h); b = 2; c = 1 - w/h;
D = b^2 - 4*a*c;
alpha = 2*atand((-b + sqrt(D))/(2*a));
%alpha = 25.5288
EDIT Thank you for trigonometry hints. They significantly simplify the calculations, but they give wrong answer. As I now understand, the question is asked in wrong way. The thing I really need is finding inclination of short lines (10-50 pixels) with high accuracy (+/- 0.5 deg), the lines' position is out of interest.
The approach used in the question and answers show better accuracy for long lines, for c = 100 error is less than 0.1 degree. That means we're into rasterization error here, and need subpixel accuracy. At the moment I have only one algorithm that solves the problem - Radon transform, but I hope you can recommend something else.
p = bwperim(rot_img);
theta=0:0.1:179.9;
[R,xp] = radon(p,theta); %Radon transform of contours
a=imregionalmax(R,true(3,3)); %Regional maxima of the transform
[r,c]=find(a); idx=sub2ind(size(a),r,c); maxvals=R(idx);
[val,midx]=sort(maxvals,'descend'); %Choose 4 highest maxima
mean(rem(theta(c(midx(1:4))),90)) %And average corresponding angles
%29.85
If rectangle is square:
w/c=sin(a)+cos(a)
(w/c)^2=1+sin(2a)
sin(2a)=(w/c)^2-1
a=0.5*arcsin((w/c)^2-1)
May be use regionprops function with 'Orientation' option...

How to perform an orthographic projection on a z-Buffer image in Matlab?

I am facing the same problem as mentioned in this post, however, I am not facing it with OpenGL, but simply with MATLAB. Depth as distance to camera plane in GLSL
I have a depth image rendered from the Z-Buffer from 3ds max. I was not able to get an orthographic representation of the z-buffer. For a better understanding, I will use the same sketch as made by the previous post:
* |--*
/ |
/ |
C-----* C-----*
\ |
\ |
* |--*
The 3 asterisks are pixels and the C is the camera. The lines from the
asterisks are the "depth". In the first case, I get the distance from the pixel to the camera. In the second, I wish to get the distance from each pixel to the plane.
The settins of my camera are the following:
WIDTH = 512;
HEIGHT = 424;
FOV = 89.971;
aspect_ratio = WIDTH/HEIGHT;
%clipping planes
near = 500;
far = 5000;
I calulate the frustum settings like the following:
%calculate frustums settings
top = tan((FOV/2)*5000)
bottom = -top
right = top*aspect_ratio
left = -top*aspect_ratio
And set the projection matrix like this:
%Generate matrix
O_p = [2/(right-left) 0 0 -((right+left)/(right-left)); ...
0 2/(top-bottom) 0 -((top+bottom)/(top-bottom));...
0 0 -2/(far-near) -(far+near)/(far-near);...
0 0 0 1];
After this I read in the depth image, which was saved as a 48 bit RGB- image, where each channel is the same, thus only one channel has to be used.
%Read in image
img = imread('KinectImage.png');
%Throw away, except one channel (all hold the same information)
c1 = img(:,:,1);
The pixel values have to be inverted, since the closer the values are to the camera, the brigher they were. If a pixel is 0 (no object to render available) it is set to 2^16, so , that after the bit complementation, the value is still 0.
%Inverse bits that are not zero, so that the z-image has the correct values
c1(c1 == 0) = 2^16
c1_cmp = bitcmp(c1);
To apply the matrix, to each z-Buffer value, I lay out the vector one dimensional and build up a vector like this [0 0 z 1] , over every element.
c1_cmp1d = squeeze(reshape(c1_cmp,[512*424,1]));
converted = double([zeros(WIDTH*HEIGHT,1) zeros(WIDTH*HEIGHT,1) c1_cmp1d zeros(WIDTH*HEIGHT,1)]) * double(O_p);
After that, I pick out the 4th element of the result vector and reshape it to a image
img_con = converted(:,4);
img_con = reshape(img_con,[424,512]);
However, the effect, that the Z-Buffer is not orthographic is still there, so did I get sth wrong? Is my calculation flawed ? Or did I make mistake here?
Depth Image coming from 3ds max
After the computation (the white is still "0" , but the color axis has changed)
It would be great to achieve this with 3ds max, which would resolve this issue, however I was not able to find this setting for the z-buffer. Thus, I want to solve this using Matlab.

How to get depth value of specific RGB pixels in Kinect v2 images using Matlab

I'm working with the Kinect v2 and I have to map the depth information onto the RGB images to process them: in particular, I need to know which pixels in the RGB images are in a certain range of distance (depth) along the Z axis; I'm acquiring all the data with a C# program and saving them as images (RGB) and txt files (depth).
I've followed the instruction from here and here (and I thank them for sharing), but I still have some problems I don't know how to solve.
I have calculated the rotation (R) and translation (T) matrix between the depth sensor and the RGB camera, as well as their intrinsic parameters.
I have created P3D_d (depth pixels in world coordinates related to depth sensor) and P3D_rgb (depth pixels in world coordinates related to rgb camera).
row_num = 424;
col_num = 512;
P3D_d = zeros(row_num,col_num,3);
P3D_rgb = zeros(row_num,col_num,3);
cont = 1;
for row=1:row_num
for col=1:col_num
P3D_d(row,col,1) = (row - cx_d) * depth(row,col) / fx_d;
P3D_d(row,col,2) = (col - cy_d) * depth(row,col) / fy_d;
P3D_d(row,col,3) = depth(row,col);
temp = [P3D_d(row,col,1);P3D_d(row,col,2);P3D_d(row,col,3)];
P3D_rgb(row,col,:) = R*temp+T;
end
end
I have created P2D_rgb_x and P2D_rgb_y.
P2D_rgb_x(:,:,1) = (P3D_rgb(:,:,1)./P3D_rgb(:,:,3))*fx_rgb+cx_rgb;
P2D_rgb_y(:,:,2) = (P3D_rgb(:,:,2)./P3D_rgb(:,:,3))*fy_rgb+cy_rgb;
but now I don't understand how to continue.
Assuming that the calibration parameters are correct, I've tried to click on a defined point in both the depth (coordinates: row_d, col_d) and rgb (coordinates: row_rgb, col_rgb) images, but P2D_rgb_x(row_d, col_d) is totally different from row_rgb, as well as P2D_rgb_y(row_d, col_d) is totally different from col_rgb.
So, what do exactly mean P2D_rgb_x and P2D_rgb_y? How can I use them to map depth value onto rgb images or just to get the depth of a certain RGB pixel?
I'll apreciate any suggest or help!
PS: I've also a related post on MathWorks at this link

Cartesian Coordinate System in Perspective Projection

I'm still implementing a perspective projection for my augmented reality application. I've already asked some questions about the viewport-calculation and other camera stuff, which is explained from Aldream in this thread
However, I don't get any useful value at the moment and I think this depends on my calculation of the cartesian coordinate space.
I had some different ways to transform latitude,longitude and altitude to a cartesian coordinate space, but nothing of them seems to work properly. Currently I'm using ECEF(earth centered), but I also tried different calculations like a combination of the haversine-formula and trigonometry (to calculate x and y from the distance and the bearing between two points).
So my question is:
How does the cartesian coordinate space affect my perspective projection? Where do I have to "compensate" my units?(When I'm using meter or centimeter for example)?
Lets say I'm using ECEF, than I get values in meter, so for example, my camera is at (0,0,2m height) and my point is at (10,10,0). Now I can easily use the function mentioned on wikipedia and afterwards using the conversion of dx,dy,dz explained in my other thread (mentioned above). What I still don't get: How does this projection "know" what my units in the coordinate system are? I think this is the mistake I'm currently doing. I don't handle the units of my coordinate system and therefore, cannot get any good value from my projection.
When I'm using a coordinate system with centimeter as unit, all of my values from my perspective projection are increasing. Where do I have to "resolve" this unit-problem? Do I have to "transform" my camera-width and camera-height from pixel to meter? Do I have to convert the coordinate system to pixel? Which coordinate-system should be used to handle this situation? I hope you can understand my problem.
Edit:I solved it myself.
I've changed my coordinate system from ecef to a own system (using haversine and bearing and then calculating x,y,z) and now I get good values! :)
I'll try another way to explain it here then. :)
The short answer is: the unit of your cartesian positions doesn't matter as long as you keep it homogeneous, ie as long as you apply this unit both to your scene and to your camera.
For the longer answer, let's go back to the formula you used...
With:
d the relative Cartesian coordinates
s the size of your printable surface
r the size of your "sensor" / recording surface (ie r_x and r_y the size of the sensor and r_z its focal length)
b the position on your printable surface
.. and do the pseudo dimensional analysis. We have:
[PIXEL] = (([LENGTH] x [PIXEL]) / ([LENGTH] * [LENGTH])) * [LENGTH]
Whatever you use as unit for LENGTH, it will be homogenized, ie only the proportion is kept.
Ex:
[PIXEL] = (([MilliM] x [PIXEL]) / ([MilliMeter] * [MilliMeter])) * [MilliMeter]
= (([Meter/1000] x [PIXEL]) / ([Meter/1000] * [Meter/1000])) * [Meter/1000]
= 1000 * 1000 / 1000 /1000 * (([Meter] x [PIXEL]) / ([Meter] * [Meter])) * [Meter]
= (([Meter] x [PIXEL]) / ([Meter] * [Meter])) * [Meter]
Back to my explanations on your other thread:
If we use those notations to express b_x:
b_x = (d_x * s_x) / (d_z * r_x) * r_z
= (d_x * w) / (d_z * 2 * f * tan(α)) * f
= (d_x * w) / (d_z * 2 * tan(α)) // with w in px
Wheter you use (d_x, d_y, d_z) = (X,Y,Z) or (d_x, d_y, d_z) = (1000*X,1000*Y,1000*Z), the ratio d_x / d_z won't change.
Now for the reasons behind your problem, you should maybe check if you apply the correct unit to the position of your camera / to its distance to the scene too. Check also your α or the unit of the focal length, depending on which one you use.
If think the later suggestion is the most likely. It can be easy to forget to also apply the right unit to the characteristics of your camera.

What is the depth image received from Kinect

When I ran this Matlab code to get the depth image, the result I got is a matrix of 480x640. The min element value is 0 and the max element value is 2711. What does 2711 mean? Is that the distance from the camera to the farthest part of the image. But what is the unit of 2711. Is that meter of feet or ??
I don't know what the Matlab code exactly does to the depth, but it probably does some processing on it because the depth sent by the Kinect is on 11 bits, so it shouldn't be higher than 2048. Try to find out what it does, or to get access to the raw data sent by the Kinect.
The data sent by the Kinect is not a proper distance (it's a "disparity"), so you have to do some math to convert it to useful units.
From the OpenKinect project wiki (which contains useful information about the Kinect) :
From their data, a basic first order
approximation for converting the raw
11-bit disparity value to a depth
value in centimeters is: 100/(-0.00307
* rawDisparity + 3.33). This approximation is approximately 10 cm
off at 4 m away, and less than 2 cm
off within 2.5 m.
A better approximation is given by
Stéphane Magnenat in this post:
distance = 0.1236 * tan(rawDisparity /
2842.5 + 1.1863) in meters. Adding a final offset term of -0.037 centers
the original ROS data. The tan
approximation has a sum squared
difference of .33 cm while the 1/x
approximation is about 1.7 cm.
Once you have the distance using the
measurement above, a good
approximation for converting (i, j, z)
to (x,y,z) is:
x = (i - w / 2) * (z + minDistance) * scaleFactor * (w/h)
y = (j - h / 2) * (z + minDistance) * scaleFactor
z = z
Where
minDistance = -10
scaleFactor = .0021.
These values were found by hand.
You can find more details about the Kinect's depth camera and its calibration on the ROS website (and many others !).
If you map the data to a meter scale it compresses the depth image slightly. I found this was an issue when I was trying to look for planes in the mapped data.