Cartesian Coordinate System in Perspective Projection

Cartesian Coordinate System in Perspective Projection - android-camera

I'm still implementing a perspective projection for my augmented reality application. I've already asked some questions about the viewport-calculation and other camera stuff, which is explained from Aldream in this thread
However, I don't get any useful value at the moment and I think this depends on my calculation of the cartesian coordinate space.
I had some different ways to transform latitude,longitude and altitude to a cartesian coordinate space, but nothing of them seems to work properly. Currently I'm using ECEF(earth centered), but I also tried different calculations like a combination of the haversine-formula and trigonometry (to calculate x and y from the distance and the bearing between two points).
So my question is:
How does the cartesian coordinate space affect my perspective projection? Where do I have to "compensate" my units?(When I'm using meter or centimeter for example)?
Lets say I'm using ECEF, than I get values in meter, so for example, my camera is at (0,0,2m height) and my point is at (10,10,0). Now I can easily use the function mentioned on wikipedia and afterwards using the conversion of dx,dy,dz explained in my other thread (mentioned above). What I still don't get: How does this projection "know" what my units in the coordinate system are? I think this is the mistake I'm currently doing. I don't handle the units of my coordinate system and therefore, cannot get any good value from my projection.
When I'm using a coordinate system with centimeter as unit, all of my values from my perspective projection are increasing. Where do I have to "resolve" this unit-problem? Do I have to "transform" my camera-width and camera-height from pixel to meter? Do I have to convert the coordinate system to pixel? Which coordinate-system should be used to handle this situation? I hope you can understand my problem.
Edit:I solved it myself.
I've changed my coordinate system from ecef to a own system (using haversine and bearing and then calculating x,y,z) and now I get good values! :)

I'll try another way to explain it here then. :)
The short answer is: the unit of your cartesian positions doesn't matter as long as you keep it homogeneous, ie as long as you apply this unit both to your scene and to your camera.
For the longer answer, let's go back to the formula you used...
With:
d the relative Cartesian coordinates
s the size of your printable surface
r the size of your "sensor" / recording surface (ie r_x and r_y the size of the sensor and r_z its focal length)
b the position on your printable surface
.. and do the pseudo dimensional analysis. We have:
[PIXEL] = (([LENGTH] x [PIXEL]) / ([LENGTH] * [LENGTH])) * [LENGTH]
Whatever you use as unit for LENGTH, it will be homogenized, ie only the proportion is kept.
Ex:
[PIXEL] = (([MilliM] x [PIXEL]) / ([MilliMeter] * [MilliMeter])) * [MilliMeter]
= (([Meter/1000] x [PIXEL]) / ([Meter/1000] * [Meter/1000])) * [Meter/1000]
= 1000 * 1000 / 1000 /1000 * (([Meter] x [PIXEL]) / ([Meter] * [Meter])) * [Meter]
= (([Meter] x [PIXEL]) / ([Meter] * [Meter])) * [Meter]
Back to my explanations on your other thread:
If we use those notations to express b_x:
b_x = (d_x * s_x) / (d_z * r_x) * r_z
= (d_x * w) / (d_z * 2 * f * tan(α)) * f
= (d_x * w) / (d_z * 2 * tan(α)) // with w in px
Wheter you use (d_x, d_y, d_z) = (X,Y,Z) or (d_x, d_y, d_z) = (1000*X,1000*Y,1000*Z), the ratio d_x / d_z won't change.
Now for the reasons behind your problem, you should maybe check if you apply the correct unit to the position of your camera / to its distance to the scene too. Check also your α or the unit of the focal length, depending on which one you use.
If think the later suggestion is the most likely. It can be easy to forget to also apply the right unit to the characteristics of your camera.

Related

Finding the tangent on a given point of a polyline

I have a list of X,Y coordinates that represents a road. For every 5 meters, I need to calculate the angle of the tangent on this road, as I have tried to illustrate in the image.
My problem is that this road is not represented by a mathematical function that I can simply derive, it is represented by a list of coordinates (UTM33N).
In my other similar projects we use ArcGIS/ESRI libraries to perform geographical functions such as this, but in this project I need to be independent of any software that require the end user to have a license, so I need to do the calculations myself (or find a free/open source library that can do it).
I am using a cubic spline function to make the line rounded between the coordinates, since all tangents on a line segment would just be parallell to the segment otherwise.
But now I am stuck. I am considering simply calculating the angle between any three points on the line (given enough points), and using this to find the tangents, but that doesn't sound like a good method. Any suggestions?

In the end, I concluded that the points were plentiful enough to give an accurate angle using simple geometry:
//Calculate delta values
var dx = next.X - curr.X;
var dy = next.Y - curr.Y;
var dz = next.Z - curr.Z;
//Calculate horizontal and 3D length of this segment.
var hLength = Math.Sqrt(dx * dx + dy * dy);
var length = Math.Sqrt(hLength * hLength + dz * dz);
//Calculate horizontal and vertical angles.
hAngle = Math.Atan(dy/dx);
vAngle = Math.Atan(dz/hLength);

create opencv camera matrix for iPhone 5 solvepnp

I am developing an application for the iPhone using opencv. I have to use the method solvePnPRansac:
http://opencv.willowgarage.com/documentation/cpp/camera_calibration_and_3d_reconstruction.html
For this method I need to provide a camera matrix:
__ __
| fx 0 cx |
| 0 fy cy |
|_0 0 1 _|
where cx and cy represent the center pixel positions of the image and fx and fy represent focal lengths, but that is all the documentation says. I am unsure what to provide for these focal lengths. The iPhone 5 has a focal length of 4.1 mm, but I do not think that this value is usable as is.
I checked another website:
http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
which shows how opencv creates camera matrices. Here it states that focal lengths are measured in pixel units.
I checked another website:
http://www.velocityreviews.com/forums/t500283-focal-length-in-pixels.html
(about half way down)
it says that focal length can be converted from units of millimeters to pixels using the equation: fx = fy = focalMM * pixelDensity / 25.4;
Another Link I found states that fx = focalMM * width / (sensorSizeMM);
fy = focalMM * length / (sensorSizeMM);
I am unsure about these equations and how to properly create this matrix.
Any help, advice, or links on how to create an accurate camera matrix (especially for the iPhone 5) would be greatly appreciated,
Isaac
p.s. I think that (fx/fy) or (fy/fx) might be equal to the aspect ratio of the camera, but that might be completely wrong.
UPDATE:
Pixel coordinates to 3D line (opencv)
using this link, I can figure out how they want fx and fy to be formatted because they use it to scale angles relative to their distance from the center. therefore, fx and fy are likely in pixels/(unit length) but im still not sure what this unit length needs to be, can it be arbitrary as long as x and y are scaled to each other?

You can get an initial (rough) estimate of the focal length in pixel dividing the focal length in mm by the width of a pixel of the camera' sensor (CCD, CMOS, whatever).
You get the former from the camera manual, or read it from the EXIF header of an image taken at full resolution. Finding out the latter is a little more complicated: you may look up on the interwebs the sensor's spec sheet, if you know its manufacturer and model number, or you may just divide the overall width of its sensitive area by the number of pixels on the side.
Absent other information, it's usually safe to assume that the pixels are square (i.e. fx == fy), and that the sensor is orthogonal to the lens's focal axis (i.e. that the term in the first row and second column of the camera matrix is zero). Also, the pixel coordinates of the principal point (cx, cy) are usually hard to estimate accurately without a carefully designed calibration rig, and an as-carefully executed calibration procedure (that's because they are intrinsically confused with the camera translation parallel to the image plane). So it's best to just set them equal to the geometrical geometrical center of the image, unless you know that the image has been cropped asymmetrically.
Therefore, your simplest camera model has only one unknown parameter, the focal length f = fx = fy.
Word of advice: in your application is usually more convenient to carry around the horizontal (or vertical) field-of-view angle, rather than the focal length in pixels. This is because the FOV is invariant to image scaling.

The "focal length" you are dealing with here is simply a scaling factor from objects in the world to camera pixels, used in the pinhole camera model (Wikipedia link). That's why its units are pixels/unit length. For a given f, an object of size L at a distance (perpendicular to the camera) z, would be f*L/z pixels.
So, you could estimate the focal length by placing an object of known size at a known distance of your camera and measuring its size in the image. You could aso assume the central point is the center of the image. You should definitely not ignore the lens distortion (dist_coef parameter in solvePnPRansac).
In practice, the best way to obtain the camera matrix and distortion coefficients is to use a camera calibration tool. You can download and use the MRPT camera_calib software from this link, there's also a video tutorial here. If you use matlab, go for the Camera Calibration Toolbox.

Here you have a table with the spec of the cameras for iPhone 4 and 5.
The calculation is:
double f = 4.1;
double resX = (double)(sourceImage.cols);
double resY = (double)(sourceImage.rows);
double sensorSizeX = 4.89;
double sensorSizeY = 3.67;
double fx = f * resX / sensorSizeX;
double fy = f * resY / sensorSizeY;
double cx = resX/2.;
double cy = resY/2.;

Try this:
func getCamMatrix()->(Float, Float, Float, Float)
{
let format:AVCaptureDeviceFormat? = deviceInput?.device.activeFormat
let fDesc:CMFormatDescriptionRef = format!.formatDescription
let dim:CGSize = CMVideoFormatDescriptionGetPresentationDimensions(fDesc, true, true)
// dim = dimensioni immagine finale
let cx:Float = Float(dim.width) / 2.0;
let cy:Float = Float(dim.height) / 2.0;
let HFOV : Float = format!.videoFieldOfView
let VFOV : Float = ((HFOV)/cx)*cy
let fx:Float = abs(Float(dim.width) / (2 * tan(HFOV / 180 * Float(M_PI) / 2)));
let fy:Float = abs(Float(dim.height) / (2 * tan(VFOV / 180 * Float(M_PI) / 2)));
return (fx, fy, cx, cy)
}

Old thread, present problem.
As Milo and Isaac mentioned after Milo's answer, there seems to be no "common" params available for, say, the iPhone 5.
For what it is worth, here is the result of a run with the MRPT calibration tool, with a good old iPhone 5:
[CAMERA_PARAMS]
resolution=[3264 2448]
cx=1668.87585
cy=1226.19712
fx=3288.47697
fy=3078.59787
dist=[-7.416752e-02 1.562157e+00 1.236471e-03 1.237955e-03 -5.378571e+00]
Average err. of reprojection: 1.06726 pixels (OpenCV error=1.06726)
Note that dist means distortion here.
I am conducting experiments on a toy project, with these parameters---kind of ok. If you do use them on your own project, please keep in mind that they may be hardly good enough to get started. The best will be to follow Milo's recommendation with your own data. The MRPT tool is quite easy to use, with the checkerboard they provide. Hope this does help getting started !

Increase the size of a region bounded by GPS coords

I have a webapp in which I query for results that are nearby the user. Because of the way the app works, the user is located in a square bounded by 4 points, 2 for the bottom left corner and 2 for the upper right corner: latsw,latne,longsw,longne.
I need to increase the size of the "square" while keeping the user in the center of the square. I've been trying with basic stuff like:
$latsw= $latsw - $increasing_factor;
$latne= $latne + $increasing_factor;
$longsw=$longsw - $increasing_factor;
$longne=$longne + $increasing_factor;
and
$latsw= $latsw / $increasing_factor;
$latne= $latne * $increasing_factor;
$longsw=$longsw / $increasing_factor;
$longne=$longne * $increasing_factor;
but the results are just giving me a shifted area or some other weird behavior. I guess this is because GPS coords don't really behave linearly in a 2D plane. Any ideas to do something like this while keeping it relatively simple?

You could try something like this:
You want to keep the same center longitude/latitude, so calculate that first (by averaging
the two longitudes and two latitudes you already have):
center_long = (ne_long + sw_long)/2
center_lat = (ne_lat + sw_lat)/2
Then calculate the size of the bounding box by differencing the two longitudes and
two latitudes to get delta_long, delta_lat.
delta_long = ne_long - sw_long
delta_lat = ne_lat - sw_lat
Adjust delta_long and delta_lat by multiplying by some factor (say, 1.5 for a 50% increase):
new_delta_long = delta_long * increasing_factor
new_delta_lat = delta_lat * increasing_factor
Finally calculate the new bounding points:
new_corner_long = center_long +/- new_delta_long
new_corner_lat = center_lat +/- new_delta_lat
As long as you're not too close to the poles, equator, or prime meridian (to avoid awkward range/sign issues), or not using too large a bounding box (to avoid awkward deviations from 2-d plane behavior) this should get you in the ballpark of what you're looking for.

Figuring out distance and course between two coordinates

I have 2 coordinates and would like to do something seemingly straightforward. I want to figure out, given:
1) Coordinate A
2) Course provided by Core Location
3) Coordinate B
the following:
1) Distance between A and B (can currently be done using distanceFromLocation) so ok on that one.
2) The course that should be taken to get from A to B (different from course currently traveling)
Is there a simple way to accomplish this, any third party or built in API?
Apple doesn't seem to provide this but I could be wrong.
Thanks,
~Arash
EDIT:
Thanks for the fast responses, I believe there may have been some confusion, I am looking to get the course (bearing from point a to point b in degrees so that 0 degrees = north, 90 degrees = east, similar to the course value return by CLLocation. Not trying to compute actual turn by turn directions.

I have some code on github that does that. Take a look at headingInRadians here. It is based on the Spherical Law of Cosines. I derived the code from the algorithm on this page.
/*-------------------------------------------------------------------------
* Given two lat/lon points on earth, calculates the heading
* from lat1/lon1 to lat2/lon2.
*
* lat/lon params in radians
* result in radians
*-------------------------------------------------------------------------*/
double headingInRadians(double lat1, double lon1, double lat2, double lon2)
{
//-------------------------------------------------------------------------
// Algorithm found at http://www.movable-type.co.uk/scripts/latlong.html
//
// Spherical Law of Cosines
//
// Formula: θ = atan2( sin(Δlong) * cos(lat2),
// cos(lat1) * sin(lat2) − sin(lat1) * cos(lat2) * cos(Δlong) )
// JavaScript:
//
// var y = Math.sin(dLon) * Math.cos(lat2);
// var x = Math.cos(lat1) * Math.sin(lat2) - Math.sin(lat1) * Math.cos(lat2) * Math.cos(dLon);
// var brng = Math.atan2(y, x).toDeg();
//-------------------------------------------------------------------------
double dLon = lon2 - lon1;
double y = sin(dLon) * cos(lat2);
double x = cos(lat1) * sin(lat2) - sin(lat1) * cos(lat2) * cos(dLon);
return atan2(y, x);
}

See How to get angle between two POI?

Depending on how much work you want to put in this one, I would suggest looking at Tree Traversal Algorithms (check the column on the right), things like A* alpha star, that you can use to find your find from one point to another, even if obstacles are in-between.

If I understand you correctly, you have the current location and you have some other location. You want to find the distance (as the crow flies) between the two points, and to find a walking path between the points.
To answer your first question, distanceFromLocation will find the distance across the earth's surface between 2 points, that is it follows the curvature of the earth, but it will give you the distance as the crow flies. So I think you're right about that.
The second question is a much harder. What you want to do is something called path-finding. Path finding, require's not only a search algorithm that will decide on the path, but you also need data about the possible paths. That is to say, if you want to find a path through the streets, the computer has to know how the streets are connected to each other. Furthermore, if you're trying to make a pathfinder that takes account for traffic and the time differences between taking two different possible paths, you will need a whole lot more data. It is for this reason that we usually leave these kinds of tasks up to big companies, with lots of resources, like Google, and Yahoo.
However, If you're still interested in doing it, check this out
http://www.youtube.com/watch?v=DoamZwkEDK0

What is the depth image received from Kinect

When I ran this Matlab code to get the depth image, the result I got is a matrix of 480x640. The min element value is 0 and the max element value is 2711. What does 2711 mean? Is that the distance from the camera to the farthest part of the image. But what is the unit of 2711. Is that meter of feet or ??

I don't know what the Matlab code exactly does to the depth, but it probably does some processing on it because the depth sent by the Kinect is on 11 bits, so it shouldn't be higher than 2048. Try to find out what it does, or to get access to the raw data sent by the Kinect.
The data sent by the Kinect is not a proper distance (it's a "disparity"), so you have to do some math to convert it to useful units.
From the OpenKinect project wiki (which contains useful information about the Kinect) :
From their data, a basic first order
approximation for converting the raw
11-bit disparity value to a depth
value in centimeters is: 100/(-0.00307
* rawDisparity + 3.33). This approximation is approximately 10 cm
off at 4 m away, and less than 2 cm
off within 2.5 m.
A better approximation is given by
Stéphane Magnenat in this post:
distance = 0.1236 * tan(rawDisparity /
2842.5 + 1.1863) in meters. Adding a final offset term of -0.037 centers
the original ROS data. The tan
approximation has a sum squared
difference of .33 cm while the 1/x
approximation is about 1.7 cm.
Once you have the distance using the
measurement above, a good
approximation for converting (i, j, z)
to (x,y,z) is:
x = (i - w / 2) * (z + minDistance) * scaleFactor * (w/h)
y = (j - h / 2) * (z + minDistance) * scaleFactor
z = z
Where
minDistance = -10
scaleFactor = .0021.
These values were found by hand.
You can find more details about the Kinect's depth camera and its calibration on the ROS website (and many others !).

If you map the data to a meter scale it compresses the depth image slightly. I found this was an issue when I was trying to look for planes in the mapped data.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse