Does ARKit consider Lens Distortion in iPhone and iPad? - iphone

ARKit updates many intrinsic (and extrinsic) parameters of the ARCamera from frame to frame. I'd like to know if it also takes Radial Lens Distortion into consideration (like in AVCameraCalibrationData class that ARKit doesn't use), and fix the video frames' distortion appropriately (distort/undistort operations) for back iPhone and iPad cameras?
var intrinsics: simd_float3x3 { get }
As we all know, the Radial Lens Distortion greatly affects the 6 DOF pose estimation accuracy when we place undistorted 3D objects in distorted by a lens real world scene.
var lensDistortionLookupTable: Data? { get }
/* A map of floating-point values describing radial */
/* lens distortions in AVCameraCalibrationData class */
If Lens Distortion's math in ARKit is available in API, where I can find it?

Although it's not explicitly stated, I'm certain that ARKit is correcting for non-linear lens distortion. Lens distortion (and inverse distortion) lookup tables exist in iOS11 and are available via AVCameraCalibrationData, but they are not exposed by ARKit, presumably because there is no need for them since you're already working with rectified coordinates.
Whether or not the distortion model parameters are the same for each device model (i.e. exact same values for each iPhone 7) it's an interesting question. I don't have access to multiple phones of the same model, but this shouldn't be hard to figure out for someone who does.
source
As an exapmple from : https://github.com/verebes1/ARKit-Multiplayer
QR marker detection
With the help of Apple's Vision now it's possible to recognize QR marker in camera's videofeed and track it while it is in the field of view. The framework provides us the coordinates of the QR marker square corners in the screen's coordinate system.
QR marker pose estimation
The next thing you probably want to do after detecting the QR markers is to obtain the camera pose from them.
To perform QR marker pose estimation you need to know the calibration parameters of your camera. This is the camera matrix and distortion coefficients. Each camera lens has unique parameters, such as focal length, principal point, and lens distortion model. The process of finding intrinsic camera parameters is called camera calibration. The camera calibration process is important for Augmented Reality applications because it describes the perspective transformation and lens distortion on an output image. To achieve the best user experience with Augmented Reality, visualization of an augmented object should be done using the same perspective projection.
At the end, what you get after the calibration is the camera matrix: a matrix of 3x3 elements with the focal distances and the camera center coordinates (a.k.a intrinsic parameters), and the distortion coefficients: a vector of 5 elements or more that models the distortion produced by your camera. The calibration parameters are pretty the same for most of iDevices.
With the precise location of marker corners, we can estimate a transformation between our camera and a marker in 3D space. This operation is known as pose estimation from 2D-3D correspondences. The pose estimation process finds an Euclidean transformation (that consists only of rotation and translation components) between the camera and the object.
The C is used to denote the camera center. The P1-P4 points are 3D points in the world coordinate system and the p1-p4 points are their projections on the camera's image plane. Our goal is to find relative transformation between a known marker position in the 3D world (p1-p4) and the camera C using an intrinsic matrix and known point projections on image plane (P1-P4).
OpenCV functions are used to calculate the QR marker transformation in such a way that it minimizes the reprojection error, that is the sum of squared distances between the observed projection's imagePoints and the projected objectPoints. The estimated transformation is defined by rotation (rvec) and translation components (tvec). This is also known as Euclidean transformation or rigid transformation. At the end we get rotation quaternion and a translation matrix of the QR marker.
Integration into Apple's ARKit
The final part is the integration of all the information about QR marker's pose into the 3D scene created by ARKit. ARKit uses Visual Inertial Odometry (VIO) to accurately track the world around it. VIO fuses camera sensor data with CoreMotion data. These two inputs allow the device to sense how it moves within a room with a high degree of accuracy, and without any additional calibration. All the rendering stuff is based on Apple's Metal and Apple's SceneKit above it.
In order to render SceneKit's node on our QR marker in a proper way we need to create a model matrix of our QR marker from the quaternion and translation matrix we've got from OpenCV. The next step is to multiply QR marker's model matrix by SceneKit scene virtual camera's transform matrix. As a result, we can see a custom node (Axes node in our project) that repeats all the QR marker's movements in the real world while it's in the field of view of the iPhone's camera and if it is not - it stays on the last updated position so we can examine it around.

Related

Matching RGB image with point cloud

I have an RGB image and a point cloud acquired by LIDAR.
In the RGB image I detect a feature, let's say a circle.
I want to use this circle as a ROI in my 3d point cloud.
How can I do that? I was thinking to produce a 3d point cloud from the RGB image through the camera parameters and then match the 2 with icp algorithm.
The problem's that on the moment I produce the point cloud from the 2D image, my coordinates system change, so I don't know anymore the position of my circle.
To perform 3d reconstruction I use triangulateMultiview function
I was thinking to produce a 3d point cloud from the RGB image through the camera parameters and then match the 2 with icp algorithm.
-> this would not work and not efficient.
Actually, there is a much better way. Assuming that you know the extrinsic between the camera and lidar, any circle(or ellipse) on the image can be extended into a 3d cone using the camera intrinsic and by selecting the points within the cone you can do the ROI operation.
Let's say you can define an ellipse on your image plane by detecting and finding the parameters of an ellipse equation. The ellipse equation can be extended into the quadric(cone) equation which representing the 3D cone. Now the only thing left is testing if your 3d point is within the cone by putting the cone equation.
This is a mathematically little bit complicated problem if you are not comfortable with camera model or quadric equation.

Measuring objects in a photo taken by calibrated cameras, knowing the size of a reference object in the photo

I am writing a program that captures real time images from a scene by two calibrated cameras (so the internal parameters of the cameras are known to us). Using two view geometry, I can find the essential matrix and use OpenCV or MATLAB to find the relative position and orientation of one camera with respect to another. Having the essential matrix, it is shown in Hartley and Zisserman's Multiple View Geometry that one can reconstruct the scene using triangulation up to scale. Now I want to use a reference length to determine the scale of reconstruction and resolve ambiguity.
I know the height of the front wall and I want to use it for determining the scale of reconstruction to measure other objects and their dimensions or their distance from the center of my first camera. How can it be done in practice?
Thanks in advance.
Edit: To add more information, I have already done linear trianglation (minimizing the algebraic error) but I am not sure if it is any useful because there is still a scale ambiguity that I don't know how to get rid of it. My ultimate goal is to recognize an object (like a Pepsi can) and separate it in a rectangular area (which is going to be written as a separate module by someone else) and then find the distance of each pixel in this rectangular area, i.e. the region of interest, to the camera. Then the distance from the camera to the object will be the minimum of the distances from the camera to the 3D coordinates of the pixels in the region of interest.
Might be a bit late, but at least for someone struggling with the same staff.
As far as I remember it is actually linear problem. You got essential matrix, which gives you rotation matrix and normalized translation vector specifying relative position of cameras. If you followed Hartley and Zissermanm you probably chose one of the cameras as origin of world coordinate system. Meaning all your triangulated points are in normalized distance from this origin. What is important is, that the direction of every triangulated point is correct.
If you have some reference in the scene (lets say height of the wall), then you just have to find this reference (2 points are enough - so opposite ends of the wall) and calculate "normalization coefficient" (sorry for terminology) as
coeff = realWorldDistanceOf2Points / distanceOfTriangulatedPoints
Once you have this coeff, just mulptiply all your triangulated points with it and you got real world points.
Example:
you know that opposite corners of the wall are 5m from each other. you find these corners in both images, triangulate them (lets call triangulated points c1 and c2), calculate their distance in the "normalized" world as ||c1 - c2|| and get the
coeff = 5 / ||c1 - c2||
and you get real 3d world points as triangulatedPoint*coeff.
Maybe easier option is to have both cameras in fixed relative position and calibrate them together by stereoCalibrate openCV/Matlab function (there is actually pretty nice GUI in Matlab for that) - it returns not just intrinsic params, but also extrinsic. But I don't know if this is your case.

Does distance from camera to calibration pattern affect calibration parameters?

I'm trying to use to stereo camera measure distance from cameras to a dynamic object(a moving car for example). I used a checkerboard pattern with 7 by 8 squares with square size of 89 millimeters(~ 3.5 inches). distance from camera to pattern was 212 centimeters (~ 83.5 inches). I'm using Python and OpenCV
My questions are:
that does the distance from pattern to camera affect much at the calibration parameters? It is stated in One of Matlab examples that distance from camera to pattern in calibration process should be the same as object distance that it is desired to measure1.
Should I use bigger board size and increase the camera to pattern distance to get more accurate results for my application?
I think that the specific distance you use for the calibration shouldn't really matter. What does matter is, that you take as many possible different images of your checkerboard as possible. At least 15. Checkerboard should be moved so that you cover the whole camera field. Checkerboard should be also imaged at different out of plane orientations. Having a checkerboard with more squares should also be beneficial as this means more corner points per image. Size of the squares shouldn't make a difference.
On the other hand, camera calibration should be performed with fixed focus which also shouldn't change after the calibration. So, in practice, I guess that this forces you to perform calibration at similar distance that will be used later for the experiment.

Association (identification) of balls after stereo calibration and rectification using MATLAB

first I will try to explane what I have to do and then I will ask my question to the problem.
My task is to detect small balls (2mm) in gelatine using two webcams.
The steps for detection are these:
Image taking using two webcams (position: 90 degree to each other)
Stereo calibration of each pair of images
Masking of the areas in the images which are not necessary to analyse
Rectification of each pair of images
Circle detection resulting in structure with the positions (x, y) of the center of each circle (in reality of each ball)
Association of the resulted position to get something like a 3D coordinate to know the position of the balls (this is my problem)
Now the problem (step 6.):
What possibilities are given to compute the 3D-coordinates of each center of the balls using the 2D-coordinates of the two images.
I'm searching here
http://de.mathworks.com/help/vision/stereo-vision.html
for ideas, but I hope you know some easy way and have some ideas.
I can not upload any images (because I'm new at stackoverflow)
Take a look at this example: Depth Estimation from Stereo Video. The example takes a pair images with a calibrated stereo camera, rectifies the images, detects a person, and gets the 3D coordinates of the centroid of the person. You can do the same thing to find the balls.
Calibrate the cameras using the Stereo Camera Calibrator app.
Take two images
Rectify the images using the rectifyStereoImages function
Compute stereo disparity using the disparity funciton
Get the 3D coordinates for every pixel using the reconstructScene funciton
Detect the balls in image 1
Look up the 3D coordinates of their centroids
Once you have the disparity and the dense reconstruction from reconstructScene there is no need to find correspondences between the images. disparity already did that for you.

How to convert 3D point cloud (extracted from 3D sparse reconstruction) to millimeters?

Using Stereo vision and based on Multiple View Geometry book (http://www.robots.ox.ac.uk/~vgg/hzbook/), I have created a 3D point cloud in MATLAB. To do that, I first calibrated the cameras and rectified the stereo images. Then feature extraction and matching. Then eliminated the noisy matched based on camera locations. Finally created the 3D point cloud using triangulation.
Now my question is how to convert this 3D point cloud from pixel domain to actual millimeter/centimeter domain knowing my focal length and camera calibration matrices?
the goal is to find DEPTH IN MILLIMETERS.
I know how to do it in disparity/depth map case using formula: Z=(t*f)/d.
But here in the sparse case, can I do something like this? http://matlab.wikia.com/wiki/FAQ#How_do_I_measure_a_distance_or_area_in_real_world_units_instead_of_in_pixels.3F
or there is a more sophisticated method with more in depth explanation?
Thanks.
The formula you wrote is valid only in the special case when the image planes of the two cameras are on the same geometrical plane, and the motion from one to the other is a translation parallel to one of the image axes.
In the general case you'll need to triangulate actual rays in 3D space, using one of the techniques described in that book (it has a whole chapter on reconstruction). The reconstruction will be metrical if your calibration is. In particular, if the coordinate transform between the cameras has a translation vector whose units are meters (or millimeters, or inches, ...).