Camera Intrinsics Resolution vs Real Screen Resolution - swift

I am writing an ARKit app where I need to use camera poses and intrinsics for 3D reconstruction.
The camera Intrinsics matrix returned by ARKit seems to be using a different image resolution than mobile screen resolution. Below is one example of this issue
Intrinsics matrix returned by ARKit is :
[[1569.249512, 0, 931.3638306],[0, 1569.249512, 723.3305664],[0, 0, 1]]
whereas input image resolution is 750 (width) x 1182 (height). In this case, the principal point seems to be out of the image which cannot be possible. It should ideally be close to the image center. So above intrinsic matrix might be using image resolution of 1920 (width) x 1440 (height) returned that is completely different than the original image resolution.
The questions are:
Whether the returned camera intrinsics belong to 1920x1440 image resolution?
If yes, how can I get the intrinsics matrix representing original image resolution i.e. 750x1182?

Intrinsics 3x3 matrix
Intrinsics camera matrix converts between the 2D camera plane and 3D world coordinate space. Here's a decomposition of an intrinsic matrix, where:
fx and fy is a Focal Length in pixels
xO and yO is a Principal Point Offset in pixels
s is an Axis Skew
According to Apple Documentation:
The values fx and fy are the pixel focal length, and are identical for square pixels. The values ox and oy are the offsets of the principal point from the top-left corner of the image frame. All values are expressed in pixels.
So you let's examine what your data is:
[1569, 0, 931]
[ 0, 1569, 723]
[ 0, 0, 1]
fx=1569, fy=1569
xO=931, yO=723
s=0
To convert a known focal length in pixels to mm use the following expression:
F(mm) = F(pixels) * SensorWidth(mm) / ImageWidth(pixels)
Points Resolution vs Pixels Resolution
Look at this post to find out what a Point Rez and what a Pixel Rez are.
Let's explore what is what when using iPhoneX data.
#IBOutlet var arView: ARSCNView!
DispatchQueue.main.asyncAfter(deadline: .now() + 1.0) {
let imageRez = (self.arView.session.currentFrame?.camera.imageResolution)!
let intrinsics = (self.arView.session.currentFrame?.camera.intrinsics)!
let viewportSize = self.arView.frame.size
let screenSize = self.arView.snapshot().size
print(imageRez as Any)
print(intrinsics as Any)
print(viewportSize as Any)
print(screenSize as Any)
}
Apple Documentation:
imageResolution instance property describes the image in the capturedImage buffer, which contains image data in the camera device's native sensor orientation. To convert image coordinates to match a specific display orientation of that image, use the viewMatrix(for:) or projectPoint(_:orientation:viewportSize:) method.
iPhone X imageRez (aspect ratio is 4:3).
These aspect ratio values correspond to camera sensor values:
(1920.0, 1440.0)
iPhone X intrinsics:
simd_float3x3([[1665.0, 0.0, 0.0], // first column
[0.0, 1665.0, 0.0], // second column
[963.8, 718.3, 1.0]]) // third column
iPhone X viewportSize (ninth part of screenSize):
(375.0, 812.0)
iPhone X screenSize (resolution declared in tech spec):
(1125.0, 2436.0)
Pay attention, there's no snapshot() method for RealityKit's ARView.

Related

LiDAR to camera image fusion

I want to fuse LiDAR {X,Y,Z,1} points on camera image {u,v} for which we have LiDAR points, camera matrix (K), distortion coefficient (D), position of camera and LiDAR (x,y,z), rotation of camera and LiDAR (w+xi+yj+zk). There are three coordinates system involved. Vehicle axle coordinate system(X:forward, Y:Left, Z: upward), LiDAR coordinate (X:Right, Y:Forward, Z: Up) and camera coordinate system (X: Right, Y:down, Z: Forward). I tried the below approach but the points are not fusing properly. All points are wrongly plotted.
Coordinate system:
For given Rotation and Position of camera and LiDAR we compute the translation using below equation.
t_lidar = R_lidar * Position_lidar^T
t_camera = R_camera *Position_camera^T
Then relative rotation and translation is computed as flows
R_relative = R_camera^T * R_lidar
t_relative = t_lidar - t_camera
Then the final Transformation Matrix and point transformation between LiDAR Points [X,Y,Z,1] and image frame [u,v,1] is given by:
T = [ R_relative | t_relative ]
[u,v,1]^T = K * T * [X,Y,Z,1]^T
Is there anything which I am missing?
Use opencv projectpoint directly
https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#projectpoints
C++: void projectPoints(InputArray objectPoints, InputArray rvec, InputArray tvec, InputArray cameraMatrix, InputArray distCoeffs, OutputArray imagePoints, OutputArray jacobian=noArray(), double aspectRatio=0 )
objectPoints – Array of object points, 3xN/Nx3 1-channel or 1xN/Nx1 3-channel (or vector ), where N is the number of points in the view.
rvec – Rotation vector. See Rodrigues() for details.
tvec – Translation vector.
cameraMatrix – Camera matrix

Geometrical transformation of a polygon to a higher resolution image

I'm trying to resize and reposition a ROI (region of interest) correctly from a low resolution image (256x256) to a higher resolution image (512x512). It should also be mentioned that the two images cover different field of view - the low and high resolution image have 330mm x 330mm and 180mm x 180mm FoV, respectively.
What I've got at my disposal are:
Physical reference point (in mm) in the 256x256 and 512x512 image, which are refpoint_lowres=(-164.424,-194.462) and refpoint_highres=(-94.3052,-110.923). The reference points are located in the top left pixel (1,1) in their respective images.
Pixel coordinates of the ROI in the 256x256 image (named pxX and pxY). These coordinates are positioned relative to the reference point of the lower resolution image, refpoint_lowres=(-164.424,-194.462).
Pixel spacing for the 256x256 and 512x512 image, which are 0.7757 pixel/mm and 2.8444 pixel/mm respectively.
How can I rescale and reposition the ROI (the binary mask) to correct pixel location in the 512x512 image? Many thanks in advance!!
Attempt
% This gives correctly placed and scaled binary array in the 256x256 image
mask_lowres = double(poly2mask(pxX, pxY, 256., 256.));
% Compute translational shift in pixel
mmShift = refpoint_lowres - refpoint_highres;
pxShift = abs(mmShift./pixspacing_highres)
% This produces a binary array that is only positioned correctly in the
% 512x512 image, but it is not upscaled correctly...(?)
mask_highres = double(poly2mask(pxX + pxShift(1), pxY + pxShift(2), 512.,
512.));
So you have coordinates pxX, and pxY in pixels with respect to the low-resolution image. You can transform these coordinates to real-world coordinates:
pxX_rw = pxX / 0.7757 - 164.424;
pxY_rw = pxY / 0.7757 - 194.462;
Next you can transform these coordinates to high-res coordinates:
pxX_hr = (pxX_rw - 94.3052) * 2.8444;
pxY_hr = (pxY_rw - 110.923) * 2.8444;
Since the original coordinates fit in the low-res image, but the high-res image is smaller (in physical coordinates) than the low-res one, it is possible that these new coordinates do not fit in the high-res image. If this is the case, cropping the polygon is a non-trivial exercise, it cannot be done by simply moving the vertices to be inside the field of view. MATLAB R2017b introduces the polyshape object type, which you can intersect:
bbox = polyshape([0 0 180 180] - 94.3052, [180 0 0 180] - 110.923);
poly = polyshape(pxX_rw, pxY_rw);
poly = intersect([poly bbox]);
pxX_rw = poly.Vertices(:,1);
pxY_rw = poly.Vertices(:,2);
If you have an earlier version of MATLAB, maybe the easiest solution is to make the field of view larger to draw the polygon, then crop the resulting image to the right size. But this does require some proper calculation to get it right.

How to change a pixel distance to meters?

I have a .bmp image with a map. What i know:
Height an Width of bmp image
dpi
Map Scale
Image Center's coordinates in meters.
What i want:
How can i calculate some points of image (for example corners) in meters.
Or how can i change a pixel distanse to meters?
What i do before:
For sure i know image center coordinates in pixels:
CenterXpix = Widht/2;
CenterYpix = Height/2;
But what i gonna do to find another corners coordinates. Don't think that:
metersDistance = pixelDistance*Scale;
is a correct equation.
Any advises?
If you know the height or width in both meters and pixels, you can calculate the scale in meters/pixel. You equation:
metersDistance = pixelDistance*Scale;
is correct, but only if your points are on the same axis. If your two points are diagonal from each other, you have to use good old pythagoras (in pseudocode):
X = XdistancePix*scale;
Y = YdistancePix*scale;
Distance_in_m = sqrt(X*X+Y*Y);

create opencv camera matrix for iPhone 5 solvepnp

I am developing an application for the iPhone using opencv. I have to use the method solvePnPRansac:
http://opencv.willowgarage.com/documentation/cpp/camera_calibration_and_3d_reconstruction.html
For this method I need to provide a camera matrix:
__ __
| fx 0 cx |
| 0 fy cy |
|_0 0 1 _|
where cx and cy represent the center pixel positions of the image and fx and fy represent focal lengths, but that is all the documentation says. I am unsure what to provide for these focal lengths. The iPhone 5 has a focal length of 4.1 mm, but I do not think that this value is usable as is.
I checked another website:
http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
which shows how opencv creates camera matrices. Here it states that focal lengths are measured in pixel units.
I checked another website:
http://www.velocityreviews.com/forums/t500283-focal-length-in-pixels.html
(about half way down)
it says that focal length can be converted from units of millimeters to pixels using the equation: fx = fy = focalMM * pixelDensity / 25.4;
Another Link I found states that fx = focalMM * width / (sensorSizeMM);
fy = focalMM * length / (sensorSizeMM);
I am unsure about these equations and how to properly create this matrix.
Any help, advice, or links on how to create an accurate camera matrix (especially for the iPhone 5) would be greatly appreciated,
Isaac
p.s. I think that (fx/fy) or (fy/fx) might be equal to the aspect ratio of the camera, but that might be completely wrong.
UPDATE:
Pixel coordinates to 3D line (opencv)
using this link, I can figure out how they want fx and fy to be formatted because they use it to scale angles relative to their distance from the center. therefore, fx and fy are likely in pixels/(unit length) but im still not sure what this unit length needs to be, can it be arbitrary as long as x and y are scaled to each other?
You can get an initial (rough) estimate of the focal length in pixel dividing the focal length in mm by the width of a pixel of the camera' sensor (CCD, CMOS, whatever).
You get the former from the camera manual, or read it from the EXIF header of an image taken at full resolution. Finding out the latter is a little more complicated: you may look up on the interwebs the sensor's spec sheet, if you know its manufacturer and model number, or you may just divide the overall width of its sensitive area by the number of pixels on the side.
Absent other information, it's usually safe to assume that the pixels are square (i.e. fx == fy), and that the sensor is orthogonal to the lens's focal axis (i.e. that the term in the first row and second column of the camera matrix is zero). Also, the pixel coordinates of the principal point (cx, cy) are usually hard to estimate accurately without a carefully designed calibration rig, and an as-carefully executed calibration procedure (that's because they are intrinsically confused with the camera translation parallel to the image plane). So it's best to just set them equal to the geometrical geometrical center of the image, unless you know that the image has been cropped asymmetrically.
Therefore, your simplest camera model has only one unknown parameter, the focal length f = fx = fy.
Word of advice: in your application is usually more convenient to carry around the horizontal (or vertical) field-of-view angle, rather than the focal length in pixels. This is because the FOV is invariant to image scaling.
The "focal length" you are dealing with here is simply a scaling factor from objects in the world to camera pixels, used in the pinhole camera model (Wikipedia link). That's why its units are pixels/unit length. For a given f, an object of size L at a distance (perpendicular to the camera) z, would be f*L/z pixels.
So, you could estimate the focal length by placing an object of known size at a known distance of your camera and measuring its size in the image. You could aso assume the central point is the center of the image. You should definitely not ignore the lens distortion (dist_coef parameter in solvePnPRansac).
In practice, the best way to obtain the camera matrix and distortion coefficients is to use a camera calibration tool. You can download and use the MRPT camera_calib software from this link, there's also a video tutorial here. If you use matlab, go for the Camera Calibration Toolbox.
Here you have a table with the spec of the cameras for iPhone 4 and 5.
The calculation is:
double f = 4.1;
double resX = (double)(sourceImage.cols);
double resY = (double)(sourceImage.rows);
double sensorSizeX = 4.89;
double sensorSizeY = 3.67;
double fx = f * resX / sensorSizeX;
double fy = f * resY / sensorSizeY;
double cx = resX/2.;
double cy = resY/2.;
Try this:
func getCamMatrix()->(Float, Float, Float, Float)
{
let format:AVCaptureDeviceFormat? = deviceInput?.device.activeFormat
let fDesc:CMFormatDescriptionRef = format!.formatDescription
let dim:CGSize = CMVideoFormatDescriptionGetPresentationDimensions(fDesc, true, true)
// dim = dimensioni immagine finale
let cx:Float = Float(dim.width) / 2.0;
let cy:Float = Float(dim.height) / 2.0;
let HFOV : Float = format!.videoFieldOfView
let VFOV : Float = ((HFOV)/cx)*cy
let fx:Float = abs(Float(dim.width) / (2 * tan(HFOV / 180 * Float(M_PI) / 2)));
let fy:Float = abs(Float(dim.height) / (2 * tan(VFOV / 180 * Float(M_PI) / 2)));
return (fx, fy, cx, cy)
}
Old thread, present problem.
As Milo and Isaac mentioned after Milo's answer, there seems to be no "common" params available for, say, the iPhone 5.
For what it is worth, here is the result of a run with the MRPT calibration tool, with a good old iPhone 5:
[CAMERA_PARAMS]
resolution=[3264 2448]
cx=1668.87585
cy=1226.19712
fx=3288.47697
fy=3078.59787
dist=[-7.416752e-02 1.562157e+00 1.236471e-03 1.237955e-03 -5.378571e+00]
Average err. of reprojection: 1.06726 pixels (OpenCV error=1.06726)
Note that dist means distortion here.
I am conducting experiments on a toy project, with these parameters---kind of ok. If you do use them on your own project, please keep in mind that they may be hardly good enough to get started. The best will be to follow Milo's recommendation with your own data. The MRPT tool is quite easy to use, with the checkerboard they provide. Hope this does help getting started !

matlab will two different cameras give me different results?

the next code gets an image of a grape that I photograph (is called: 'full_img') and calculate the area of the grape:
RGB = imread(full_img);
GRAY = rgb2gray(RGB);
threshold = graythresh(GRAY);
originalImage = im2bw(GRAY, threshold);
originalImage = bwareaopen(originalImage,250);
SE = strel('disk',10);
IM2 = imclose(originalImage,SE);
originalImage = IM2;
labeledImage = bwlabel(originalImage, 8); % Label each blob so we can make measurements of it
blobMeasurements = regionprops(labeledImage, originalImage, 'all');
numberOfBlobs = length(blobMeasurements);
pixperinch=get(0,'ScreenPixelsPerInch'); %# find resolution of your display
dpix=blobMeasurements(numberOfBlobs).Area; %# calculate distance in pixels
dinch=dpix/pixperinch; %# convert to inches from pixels
dcm=dinch*2.54; %# convert to cm from inches
blobArea = dcm; % Get area.
If I photograph the same grape with the same conditions by different cameras (photographed it from the same distance and the same lightning), will I get the same results? (what if I have a camera of 5 Mega Pixel and 12 Mega Pixel?).
No, it won't. You go from image coordinates to world coordinates using dpix/pixperinch. In general this is wrong. It will only work for a specific image (and that alone), if you know the pixperinch. In order to get the geometric characteristics of an object in an image (eg length, area etc), you must back-project the image pixels in the Cartesian space using the Camera matrix and the inverse projective transformation, in order to get Cartesian coordinates (let along calibrating the camera for lens distortion, which is a nonlinear problem). Then, you can perform the calculations. You code won't work even for the same camera.
See this for more.