I am trying to calibrate two cameras that are placed at angular offset (approx 45 degrees). These cameras are of different focal lengths. I did a lot of google search but couldn't really find a good description on how to calibrate cameras at angular offset.
Following the tutorial from OpenCV stereo calibration I am able to do the following steps.
1. Calibrate the left mono-camera and right mono-camera to get the camera parameters : cv2.calibrateCamera(...)
2. Find the new camera matrix for left and right cameras : cv2.getOptimalNewCameraMatrix (...)
3. Stereo calibrate two cameras: cv2.stereoCalibrate(...)
4. Stereo rectification of the two views : cv2.stereoRectify(...)
5. Mapping to undistort the rectified stereo image pairs : cv2.initUndistortRectifyMap(...
There are two things I am not clear about
Since the cameras are at an angular offset, the rectified stereo image pairs (step 4) are at an angle offset and rightly so since the two cameras at an angle. But how I find the overlapping region i.e., the view common to both cameras in the non-rectified images?
How can the two views be aligned similar to Kinect whereby index (i,j) in both images (left and right) corresponds to the same pixel? Specifically, I_left[i, j] == I_right[i, j]. Do we need to do block matching for this?
ARKit updates many intrinsic (and extrinsic) parameters of the ARCamera from frame to frame. I'd like to know if it also takes Radial Lens Distortion into consideration (like in AVCameraCalibrationData class that ARKit doesn't use), and fix the video frames' distortion appropriately (distort/undistort operations) for back iPhone and iPad cameras?
var intrinsics: simd_float3x3 { get }
As we all know, the Radial Lens Distortion greatly affects the 6 DOF pose estimation accuracy when we place undistorted 3D objects in distorted by a lens real world scene.
var lensDistortionLookupTable: Data? { get }
/* A map of floating-point values describing radial */
/* lens distortions in AVCameraCalibrationData class */
If Lens Distortion's math in ARKit is available in API, where I can find it?
Although it's not explicitly stated, I'm certain that ARKit is correcting for non-linear lens distortion. Lens distortion (and inverse distortion) lookup tables exist in iOS11 and are available via AVCameraCalibrationData, but they are not exposed by ARKit, presumably because there is no need for them since you're already working with rectified coordinates.
Whether or not the distortion model parameters are the same for each device model (i.e. exact same values for each iPhone 7) it's an interesting question. I don't have access to multiple phones of the same model, but this shouldn't be hard to figure out for someone who does.
As an exapmple from : https://github.com/verebes1/ARKit-Multiplayer
QR marker detection
With the help of Apple's Vision now it's possible to recognize QR marker in camera's videofeed and track it while it is in the field of view. The framework provides us the coordinates of the QR marker square corners in the screen's coordinate system.
QR marker pose estimation
The next thing you probably want to do after detecting the QR markers is to obtain the camera pose from them.
To perform QR marker pose estimation you need to know the calibration parameters of your camera. This is the camera matrix and distortion coefficients. Each camera lens has unique parameters, such as focal length, principal point, and lens distortion model. The process of finding intrinsic camera parameters is called camera calibration. The camera calibration process is important for Augmented Reality applications because it describes the perspective transformation and lens distortion on an output image. To achieve the best user experience with Augmented Reality, visualization of an augmented object should be done using the same perspective projection.
At the end, what you get after the calibration is the camera matrix: a matrix of 3x3 elements with the focal distances and the camera center coordinates (a.k.a intrinsic parameters), and the distortion coefficients: a vector of 5 elements or more that models the distortion produced by your camera. The calibration parameters are pretty the same for most of iDevices.
With the precise location of marker corners, we can estimate a transformation between our camera and a marker in 3D space. This operation is known as pose estimation from 2D-3D correspondences. The pose estimation process finds an Euclidean transformation (that consists only of rotation and translation components) between the camera and the object.
The C is used to denote the camera center. The P1-P4 points are 3D points in the world coordinate system and the p1-p4 points are their projections on the camera's image plane. Our goal is to find relative transformation between a known marker position in the 3D world (p1-p4) and the camera C using an intrinsic matrix and known point projections on image plane (P1-P4).
OpenCV functions are used to calculate the QR marker transformation in such a way that it minimizes the reprojection error, that is the sum of squared distances between the observed projection's imagePoints and the projected objectPoints. The estimated transformation is defined by rotation (rvec) and translation components (tvec). This is also known as Euclidean transformation or rigid transformation. At the end we get rotation quaternion and a translation matrix of the QR marker.
Integration into Apple's ARKit
The final part is the integration of all the information about QR marker's pose into the 3D scene created by ARKit. ARKit uses Visual Inertial Odometry (VIO) to accurately track the world around it. VIO fuses camera sensor data with CoreMotion data. These two inputs allow the device to sense how it moves within a room with a high degree of accuracy, and without any additional calibration. All the rendering stuff is based on Apple's Metal and Apple's SceneKit above it.
In order to render SceneKit's node on our QR marker in a proper way we need to create a model matrix of our QR marker from the quaternion and translation matrix we've got from OpenCV. The next step is to multiply QR marker's model matrix by SceneKit scene virtual camera's transform matrix. As a result, we can see a custom node (Axes node in our project) that repeats all the QR marker's movements in the real world while it's in the field of view of the iPhone's camera and if it is not - it stays on the last updated position so we can examine it around.
After stereo calibration, when I run the Matlab example for stereo depth estimation (SDE), the distances are wrong: at about 2 meters, it always reports distance as less than 1m.
And my 3D scene reconstruction looks cone-shaped instead of like the real scene.
Disparity map is very noisy (non-smooth), but resembles the scene.
If I 'feed' the SDE script the example file instead of webcam input, it runs okay, all looks great; when I feed it from two webcams ( 'Logitech HD Pro Webcam C920' ) that's when I get the above bad results, beginning with rough disparity map.
I've tried many different calibration attempts with just a few images up to about 60, with Matlab's checkerboard pattern at different angles (never > 45) and distance to cameras about 8 to 20'. Camera lenses are 3.8175" apart always, and are mounted to top edge of laptop. Followed Matlab's recommended workflow.
What am I doing wrong in calibration?
Matlab R2015a.
Laptop Windows 7 64-bit
Checkerboard pattern is 37" x 27"
Was creating disparity map with this:
disparityMap = disparity(frameLeftGray, frameRightGray);
However, my camera #1 is on the right, and Matlab says default disparity range is [0 64] and for cam #1 right it should be [-128 0], but that changes the disparity map to all uniform blue.
I got it working. (1) left/right of calibration and images and detection data structures must match. (2) Use mm for checkboard square size. Inches causes malfunction, because all else is in mm.
first I will try to explane what I have to do and then I will ask my question to the problem.
My task is to detect small balls (2mm) in gelatine using two webcams.
The steps for detection are these:
Image taking using two webcams (position: 90 degree to each other)
Stereo calibration of each pair of images
Masking of the areas in the images which are not necessary to analyse
Rectification of each pair of images
Circle detection resulting in structure with the positions (x, y) of the center of each circle (in reality of each ball)
Association of the resulted position to get something like a 3D coordinate to know the position of the balls (this is my problem)
Now the problem (step 6.):
What possibilities are given to compute the 3D-coordinates of each center of the balls using the 2D-coordinates of the two images.
I'm searching here
for ideas, but I hope you know some easy way and have some ideas.
I can not upload any images (because I'm new at stackoverflow)
Take a look at this example: Depth Estimation from Stereo Video. The example takes a pair images with a calibrated stereo camera, rectifies the images, detects a person, and gets the 3D coordinates of the centroid of the person. You can do the same thing to find the balls.
Calibrate the cameras using the Stereo Camera Calibrator app.
Take two images
Rectify the images using the rectifyStereoImages function
Compute stereo disparity using the disparity funciton
Get the 3D coordinates for every pixel using the reconstructScene funciton
Detect the balls in image 1
Look up the 3D coordinates of their centroids
Once you have the disparity and the dense reconstruction from reconstructScene there is no need to find correspondences between the images. disparity already did that for you.
Using Stereo vision and based on Multiple View Geometry book (http://www.robots.ox.ac.uk/~vgg/hzbook/), I have created a 3D point cloud in MATLAB. To do that, I first calibrated the cameras and rectified the stereo images. Then feature extraction and matching. Then eliminated the noisy matched based on camera locations. Finally created the 3D point cloud using triangulation.
Now my question is how to convert this 3D point cloud from pixel domain to actual millimeter/centimeter domain knowing my focal length and camera calibration matrices?
the goal is to find DEPTH IN MILLIMETERS.
I know how to do it in disparity/depth map case using formula: Z=(t*f)/d.
But here in the sparse case, can I do something like this? http://matlab.wikia.com/wiki/FAQ#How_do_I_measure_a_distance_or_area_in_real_world_units_instead_of_in_pixels.3F
or there is a more sophisticated method with more in depth explanation?
The formula you wrote is valid only in the special case when the image planes of the two cameras are on the same geometrical plane, and the motion from one to the other is a translation parallel to one of the image axes.
In the general case you'll need to triangulate actual rays in 3D space, using one of the techniques described in that book (it has a whole chapter on reconstruction). The reconstruction will be metrical if your calibration is. In particular, if the coordinate transform between the cameras has a translation vector whose units are meters (or millimeters, or inches, ...).
I have been trying to do a stereo calibration with the Matlab camera calibration toolbox. I have two cameras being triggered at the same time, and I'm grabbing corners from 25 pairs of images. The individual calibrations are working, though one camera calibration uses only 24 of the 25 images (when I reproject on images, only 24 images pop up. When I try to use the L and R calibration.mat files for a stereo calibration, it throws Disabling view XX - L and R views are found inconsistent for every single pair (and it says there are only 24 pairs of images, not 25). I've read the help file but I don't think that it addresses my problem. Please advise!
Please check out the CameraCalibrator App that is a part of the Computer Vision System Toolbox for MATLAB. It gives you a very easy way to calibrate a single camera, including automatic checkerboard detection. It does not let calibrate a stereo pair, but it gets you half-way there.