What is the depth image received from Kinect

What is the depth image received from Kinect - matlab

When I ran this Matlab code to get the depth image, the result I got is a matrix of 480x640. The min element value is 0 and the max element value is 2711. What does 2711 mean? Is that the distance from the camera to the farthest part of the image. But what is the unit of 2711. Is that meter of feet or ??

I don't know what the Matlab code exactly does to the depth, but it probably does some processing on it because the depth sent by the Kinect is on 11 bits, so it shouldn't be higher than 2048. Try to find out what it does, or to get access to the raw data sent by the Kinect.
The data sent by the Kinect is not a proper distance (it's a "disparity"), so you have to do some math to convert it to useful units.
From the OpenKinect project wiki (which contains useful information about the Kinect) :
From their data, a basic first order
approximation for converting the raw
11-bit disparity value to a depth
value in centimeters is: 100/(-0.00307
* rawDisparity + 3.33). This approximation is approximately 10 cm
off at 4 m away, and less than 2 cm
off within 2.5 m.
A better approximation is given by
Stéphane Magnenat in this post:
distance = 0.1236 * tan(rawDisparity /
2842.5 + 1.1863) in meters. Adding a final offset term of -0.037 centers
the original ROS data. The tan
approximation has a sum squared
difference of .33 cm while the 1/x
approximation is about 1.7 cm.
Once you have the distance using the
measurement above, a good
approximation for converting (i, j, z)
to (x,y,z) is:
x = (i - w / 2) * (z + minDistance) * scaleFactor * (w/h)
y = (j - h / 2) * (z + minDistance) * scaleFactor
z = z
Where
minDistance = -10
scaleFactor = .0021.
These values were found by hand.
You can find more details about the Kinect's depth camera and its calibration on the ROS website (and many others !).

If you map the data to a meter scale it compresses the depth image slightly. I found this was an issue when I was trying to look for planes in the mapped data.

Related

Find Position based on signal strength (intersection area between circles)

I'm trying to estimate a position based on signal strength received from 4 Wi-Fi Access Points. I measure the signal strength from 4 access points located in each corner of a square room with 100 square meters (10x10). I recorded the signal strengths in a known position (x, y) = (9.5, 1.5) using an Android phone. Now I want to check how accurate can a multilateration method be under the circumstances.
Using MATLAB, I applied a formula to calculate distance using the signal strength. The following MATLAB function shows the application of the formula:
function [ d_vect ] = distance( RSS )
% Calculate distance from signal strength
result = (27.55 - (20 * log10(2400)) + abs(RSS)) / 20;
d_vect = power(10, result);
end
The input RSS is a vector with the four signal strengths measured in the test point (x,y) = (9.5, 1.5). The RSS vector looks like this:
RSS =
-57.6000
-60.4000
-44.7000
-54.4000
and the resultant vector with all the estimated distances to each access points looks like this:
d_vect =
7.5386
10.4061
1.7072
5.2154
Now I want to estimate my position based on these distances and the access points position in order to find the error between the estimated position and the known position (9.5, 1.5). I want to find the intersection area (In order to estimate a position) between four circles where each access point is the center of one of the circles and the distance is the radius of the circle.
I want to find the grey area as shown in this image :
http://www.biologycorner.com/resources/venn4.gif

If you want an alternative way of estimating the location without estimating the intersection of circles you can use trilateration. It is a common technique in navigation (e.g. GPS) to estimate a position given a set of distance measurements.
Also, if you wanted the area because you also need an estimate of the uncertainty of the position I would recommend solving the trilateration problem using least squares which will easily give you an estimate of the parameters involved and an error propagation to yield an uncertainty of the location.

I found an answear that solved perfectly the question. It is explained in detail in this link:
https://gis.stackexchange.com/questions/40660/trilateration-algorithm-for-n-amount-of-points
I also developed some MATLAB code for the problem. Here it goes:
Estimate distances from the Access Points:
function [ d_vect ] = distance( RSS )
result = (27.55 - (20 * log10(2400)) + abs(RSS)) / 20;
d_vect = power(10, result);
end
The trilateration function:
function [] = trilat( X, d, real1, real2 )
cla
circles(X(1), X(5), d(1), 'edgecolor', [0 0 0],'facecolor', 'none','linewidth',4); %AP1 - black
circles(X(2), X(6), d(2), 'edgecolor', [0 1 0],'facecolor', 'none','linewidth',4); %AP2 - green
circles(X(3), X(7), d(3), 'edgecolor', [0 1 1],'facecolor', 'none','linewidth',4); %AP3 - cyan
circles(X(4), X(8), d(4), 'edgecolor', [1 1 0],'facecolor', 'none','linewidth',4); %AP4 - yellow
axis([0 10 0 10])
hold on
tbl = table(X, d);
d = d.^2;
weights = d.^(-1);
weights = transpose(weights);
beta0 = [5, 5];
modelfun = #(b,X)(abs(b(1)-X(:,1)).^2+abs(b(2)-X(:,2)).^2).^(1/2);
mdl = fitnlm(tbl,modelfun,beta0, 'Weights', weights);
b = mdl.Coefficients{1:2,{'Estimate'}}
scatter(b(1), b(2), 70, [0 0 1], 'filled')
scatter(real1, real2, 70, [1 0 0], 'filled')
hold off
end
Where,
X: matrix with APs coordinates
d: distance estimation vector
real1: real position x
real2: real position y

If you have three sets of measurements with (x,y) coordinates of location and corresponding signal strength. such as:
m1 = (x1,y1,s1)
m2 = (x2,y2,s2)
m3 = (x3,y3,s3)
Then you can calculate distances between each of the point locations:
d12 = Sqrt((x1 - x2)^2 + (y1 - y2)^2)
d13 = Sqrt((x1 - x3)^2 + (y1 - y3)^2)
d23 = Sqrt((x2 - x3)^2 + (y2 - y3)^2)
Now consider that each signal strength measurement signifies an emitter for that signal, that comes from a location somewhere at a distance. That distance would be a radius from the location where the signal strength was measured, because one would not know at this point the direction from where the signal came from. Also, the weaker the signal... the larger the radius. In other words, the signal strength measurement would be inversely proportional to the radius. The smaller the signal strength the larger the radius, and vice versa. So, calculate the proportional, although not yet accurate, radius's of our three points:
r1 = 1/s1
r2 = 1/s2
r3 = 1/s3
So now, at each point pair, set apart by their distance we can calculate a constant (C) where the radius's from each location will just touch one another. For example, for the point pair 1 & 2:
Ca * r1 + Ca * r2 = d12
... solving for the constant Ca:
Ca = d12 / (r1 + r2)
... and we can do this for the other two pairs, as well.
Cb = d13 / (r1 + r3)
Cc = d23 / (r2 + r3)
All right... select the largest C constant, either Ca, Cb, or Cc. Then, use the parametric equation for a circle to find where the coordinates meet. I will explain.
The parametric equation for a circle is:
x = radius * Cos(theta)
y = radius * Sin(theta)
If Ca was the largest constant found, then you would compare points 1 & 2, such as:
Ca * r1 * Cos(theta1) == Ca * r2 * Cos(theta2) &&
Ca * r1 * Sin(theta1) == Ca * r2 * Sin(theta2)
... iterating theta1 and theta2 from 0 to 360 degrees, for both circles. You might write code like:
for theta1 in 0 ..< 360 {
for theta2 in 0 ..< 360 {
if( abs(Ca*r1*cos(theta1) - Ca*r2*cos(theta2)) < 0.01 && abs(Ca*r1*sin(theta1) - Ca*r2*sin(theta2)) < 0.01 ) {
print("point is: (", Ca*r1*cos(theta1), Ca*r1*sin(theta1),")")
}
}
}
Depending on what your tolerance was for a match, you wouldn't have to do too many iterations around the circumferences of each signal radius to determine an estimate for the location of the signal source.

So basically you need to intersect 4 circles. There can be many approaches to it, and there are two that will generate the exact intersection area.
First approach is to start with one circle, intersect it with the second circle, then intersect the resulting area with the third circle and so on. that is, on each step you know current intersection area, and you intersect it with a new circle. The intersection area will always be a region bounded by circle arcs, so to intersect it with a new circle you walk along the boundary of the area and check whether each bounding arc intersects with a new circle. If it does, then you leave only the part of the arc that lies inside a new circle, remember that you should continue with an arc from a new circle, and continue traversing the boundary until you find the next intersection.
Another approach that seems to result in a worse time complexity, but in your case of 4 circles this will not be important, is to find all the intersection points of two circles and choose only those points that are of interest for you, that is which lie inside all other circles. These points will be the corners of your area, and then it is rather easy to reconstruct the area. After googling a bit, I have even found a live demo of this approach.

Stereo vision: Depth estimation

I am working on Stereo vision task and I would like to get the distance between stereo vision cameras and the object. I am using Matlab with Computer Vision System Toolbox.
I have calibrated cameras with using "Camera Calibration Toolbox for Matlab" thus I have Intrinsic parameters of left and right camera and Extrinsic parameters (position of right camera wrt left camera). I have also a pair of rectified pictures and thier disparity map. For estimation of disparity I have used Matlab function disparity(). I know the baseline and the focal length of cameras but my results are still wrong.
baseline = 70 mm
focal length = 25 mm
disparity = 60 pixels
---------------------
depth = baseline * focal length / disparity = 70 * 25 / 60 = 29 mm
But I know that the distance is cca 600 mm. Is this formula right? What about the units? mm * mm / pixel != mm. And especially I would like to use Camera matrix (Intrinsic parameters) for calculation but I didn't figure out how. I would be thankful for any hint.

Like you said, you have to convert the unit into mm. And for that you need this formulas
z = (b*F) / (d*s)
mm = (mm * mm) / (pixel * (mm/pixel))
Where
z = depth in mm
b = baseline in mm
F = focal length in mm
d = depth in pixel
s = sensor size in mm/pixel. (Normally it provide in um, so do conversion before).
EDIT
Sometime your focal is in pixel so you don't need to use the sensor size. So just use your formula :
z = b*F / d
mm = mm * pixel / pixel

Cartesian Coordinate System in Perspective Projection

I'm still implementing a perspective projection for my augmented reality application. I've already asked some questions about the viewport-calculation and other camera stuff, which is explained from Aldream in this thread
However, I don't get any useful value at the moment and I think this depends on my calculation of the cartesian coordinate space.
I had some different ways to transform latitude,longitude and altitude to a cartesian coordinate space, but nothing of them seems to work properly. Currently I'm using ECEF(earth centered), but I also tried different calculations like a combination of the haversine-formula and trigonometry (to calculate x and y from the distance and the bearing between two points).
So my question is:
How does the cartesian coordinate space affect my perspective projection? Where do I have to "compensate" my units?(When I'm using meter or centimeter for example)?
Lets say I'm using ECEF, than I get values in meter, so for example, my camera is at (0,0,2m height) and my point is at (10,10,0). Now I can easily use the function mentioned on wikipedia and afterwards using the conversion of dx,dy,dz explained in my other thread (mentioned above). What I still don't get: How does this projection "know" what my units in the coordinate system are? I think this is the mistake I'm currently doing. I don't handle the units of my coordinate system and therefore, cannot get any good value from my projection.
When I'm using a coordinate system with centimeter as unit, all of my values from my perspective projection are increasing. Where do I have to "resolve" this unit-problem? Do I have to "transform" my camera-width and camera-height from pixel to meter? Do I have to convert the coordinate system to pixel? Which coordinate-system should be used to handle this situation? I hope you can understand my problem.
Edit:I solved it myself.
I've changed my coordinate system from ecef to a own system (using haversine and bearing and then calculating x,y,z) and now I get good values! :)

I'll try another way to explain it here then. :)
The short answer is: the unit of your cartesian positions doesn't matter as long as you keep it homogeneous, ie as long as you apply this unit both to your scene and to your camera.
For the longer answer, let's go back to the formula you used...
With:
d the relative Cartesian coordinates
s the size of your printable surface
r the size of your "sensor" / recording surface (ie r_x and r_y the size of the sensor and r_z its focal length)
b the position on your printable surface
.. and do the pseudo dimensional analysis. We have:
[PIXEL] = (([LENGTH] x [PIXEL]) / ([LENGTH] * [LENGTH])) * [LENGTH]
Whatever you use as unit for LENGTH, it will be homogenized, ie only the proportion is kept.
Ex:
[PIXEL] = (([MilliM] x [PIXEL]) / ([MilliMeter] * [MilliMeter])) * [MilliMeter]
= (([Meter/1000] x [PIXEL]) / ([Meter/1000] * [Meter/1000])) * [Meter/1000]
= 1000 * 1000 / 1000 /1000 * (([Meter] x [PIXEL]) / ([Meter] * [Meter])) * [Meter]
= (([Meter] x [PIXEL]) / ([Meter] * [Meter])) * [Meter]
Back to my explanations on your other thread:
If we use those notations to express b_x:
b_x = (d_x * s_x) / (d_z * r_x) * r_z
= (d_x * w) / (d_z * 2 * f * tan(α)) * f
= (d_x * w) / (d_z * 2 * tan(α)) // with w in px
Wheter you use (d_x, d_y, d_z) = (X,Y,Z) or (d_x, d_y, d_z) = (1000*X,1000*Y,1000*Z), the ratio d_x / d_z won't change.
Now for the reasons behind your problem, you should maybe check if you apply the correct unit to the position of your camera / to its distance to the scene too. Check also your α or the unit of the focal length, depending on which one you use.
If think the later suggestion is the most likely. It can be easy to forget to also apply the right unit to the characteristics of your camera.

Converting 3D point clouds to range image

I have many 3D point clouds gathered by velodyne sensor. eg(x, y, z) in meter.
I'd like to convert 3D point clouds to range image.
Firstly, I've got transformtation from Catesian to spherical coordinate.
r = sqrt(x*x + y*y + z*z)
azimuth angle = atan2(x, z)
elevation angle = asin(y/r)
Now. How can I convert 3D point to Range image using these transformation in matlab?
Whole points are about 180,000 and I want 870*64 range image.
azimuth angle range(-180 ~ 180), elevation angle range(-15 ~ 15)

Divide up your azimuth and elevation into M and N ranges respectively. Now you have M*N "bins" (M = 870, N = 64).
Then (per bin) accumulate a histogram of points that project into that bin.
Finally, pick a representative value from each bin for the final range image. You could pick the average value (noisy, fast) or fit some distribution and then use that to pick the value (more precise, slow).

The pointcloud2image code available from Matlab File Exchange can help you to directly convert point cloud (in x,y,z format) to 2D raster image.

create opencv camera matrix for iPhone 5 solvepnp

I am developing an application for the iPhone using opencv. I have to use the method solvePnPRansac:
http://opencv.willowgarage.com/documentation/cpp/camera_calibration_and_3d_reconstruction.html
For this method I need to provide a camera matrix:
__ __
| fx 0 cx |
| 0 fy cy |
|_0 0 1 _|
where cx and cy represent the center pixel positions of the image and fx and fy represent focal lengths, but that is all the documentation says. I am unsure what to provide for these focal lengths. The iPhone 5 has a focal length of 4.1 mm, but I do not think that this value is usable as is.
I checked another website:
http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
which shows how opencv creates camera matrices. Here it states that focal lengths are measured in pixel units.
I checked another website:
http://www.velocityreviews.com/forums/t500283-focal-length-in-pixels.html
(about half way down)
it says that focal length can be converted from units of millimeters to pixels using the equation: fx = fy = focalMM * pixelDensity / 25.4;
Another Link I found states that fx = focalMM * width / (sensorSizeMM);
fy = focalMM * length / (sensorSizeMM);
I am unsure about these equations and how to properly create this matrix.
Any help, advice, or links on how to create an accurate camera matrix (especially for the iPhone 5) would be greatly appreciated,
Isaac
p.s. I think that (fx/fy) or (fy/fx) might be equal to the aspect ratio of the camera, but that might be completely wrong.
UPDATE:
Pixel coordinates to 3D line (opencv)
using this link, I can figure out how they want fx and fy to be formatted because they use it to scale angles relative to their distance from the center. therefore, fx and fy are likely in pixels/(unit length) but im still not sure what this unit length needs to be, can it be arbitrary as long as x and y are scaled to each other?

You can get an initial (rough) estimate of the focal length in pixel dividing the focal length in mm by the width of a pixel of the camera' sensor (CCD, CMOS, whatever).
You get the former from the camera manual, or read it from the EXIF header of an image taken at full resolution. Finding out the latter is a little more complicated: you may look up on the interwebs the sensor's spec sheet, if you know its manufacturer and model number, or you may just divide the overall width of its sensitive area by the number of pixels on the side.
Absent other information, it's usually safe to assume that the pixels are square (i.e. fx == fy), and that the sensor is orthogonal to the lens's focal axis (i.e. that the term in the first row and second column of the camera matrix is zero). Also, the pixel coordinates of the principal point (cx, cy) are usually hard to estimate accurately without a carefully designed calibration rig, and an as-carefully executed calibration procedure (that's because they are intrinsically confused with the camera translation parallel to the image plane). So it's best to just set them equal to the geometrical geometrical center of the image, unless you know that the image has been cropped asymmetrically.
Therefore, your simplest camera model has only one unknown parameter, the focal length f = fx = fy.
Word of advice: in your application is usually more convenient to carry around the horizontal (or vertical) field-of-view angle, rather than the focal length in pixels. This is because the FOV is invariant to image scaling.

The "focal length" you are dealing with here is simply a scaling factor from objects in the world to camera pixels, used in the pinhole camera model (Wikipedia link). That's why its units are pixels/unit length. For a given f, an object of size L at a distance (perpendicular to the camera) z, would be f*L/z pixels.
So, you could estimate the focal length by placing an object of known size at a known distance of your camera and measuring its size in the image. You could aso assume the central point is the center of the image. You should definitely not ignore the lens distortion (dist_coef parameter in solvePnPRansac).
In practice, the best way to obtain the camera matrix and distortion coefficients is to use a camera calibration tool. You can download and use the MRPT camera_calib software from this link, there's also a video tutorial here. If you use matlab, go for the Camera Calibration Toolbox.

Here you have a table with the spec of the cameras for iPhone 4 and 5.
The calculation is:
double f = 4.1;
double resX = (double)(sourceImage.cols);
double resY = (double)(sourceImage.rows);
double sensorSizeX = 4.89;
double sensorSizeY = 3.67;
double fx = f * resX / sensorSizeX;
double fy = f * resY / sensorSizeY;
double cx = resX/2.;
double cy = resY/2.;

Try this:
func getCamMatrix()->(Float, Float, Float, Float)
{
let format:AVCaptureDeviceFormat? = deviceInput?.device.activeFormat
let fDesc:CMFormatDescriptionRef = format!.formatDescription
let dim:CGSize = CMVideoFormatDescriptionGetPresentationDimensions(fDesc, true, true)
// dim = dimensioni immagine finale
let cx:Float = Float(dim.width) / 2.0;
let cy:Float = Float(dim.height) / 2.0;
let HFOV : Float = format!.videoFieldOfView
let VFOV : Float = ((HFOV)/cx)*cy
let fx:Float = abs(Float(dim.width) / (2 * tan(HFOV / 180 * Float(M_PI) / 2)));
let fy:Float = abs(Float(dim.height) / (2 * tan(VFOV / 180 * Float(M_PI) / 2)));
return (fx, fy, cx, cy)
}

Old thread, present problem.
As Milo and Isaac mentioned after Milo's answer, there seems to be no "common" params available for, say, the iPhone 5.
For what it is worth, here is the result of a run with the MRPT calibration tool, with a good old iPhone 5:
[CAMERA_PARAMS]
resolution=[3264 2448]
cx=1668.87585
cy=1226.19712
fx=3288.47697
fy=3078.59787
dist=[-7.416752e-02 1.562157e+00 1.236471e-03 1.237955e-03 -5.378571e+00]
Average err. of reprojection: 1.06726 pixels (OpenCV error=1.06726)
Note that dist means distortion here.
I am conducting experiments on a toy project, with these parameters---kind of ok. If you do use them on your own project, please keep in mind that they may be hardly good enough to get started. The best will be to follow Milo's recommendation with your own data. The MRPT tool is quite easy to use, with the checkerboard they provide. Hope this does help getting started !

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse