first I will try to explane what I have to do and then I will ask my question to the problem.
My task is to detect small balls (2mm) in gelatine using two webcams.
The steps for detection are these:
Image taking using two webcams (position: 90 degree to each other)
Stereo calibration of each pair of images
Masking of the areas in the images which are not necessary to analyse
Rectification of each pair of images
Circle detection resulting in structure with the positions (x, y) of the center of each circle (in reality of each ball)
Association of the resulted position to get something like a 3D coordinate to know the position of the balls (this is my problem)
Now the problem (step 6.):
What possibilities are given to compute the 3D-coordinates of each center of the balls using the 2D-coordinates of the two images.
I'm searching here
http://de.mathworks.com/help/vision/stereo-vision.html
for ideas, but I hope you know some easy way and have some ideas.
I can not upload any images (because I'm new at stackoverflow)
Take a look at this example: Depth Estimation from Stereo Video. The example takes a pair images with a calibrated stereo camera, rectifies the images, detects a person, and gets the 3D coordinates of the centroid of the person. You can do the same thing to find the balls.
Calibrate the cameras using the Stereo Camera Calibrator app.
Take two images
Rectify the images using the rectifyStereoImages function
Compute stereo disparity using the disparity funciton
Get the 3D coordinates for every pixel using the reconstructScene funciton
Detect the balls in image 1
Look up the 3D coordinates of their centroids
Once you have the disparity and the dense reconstruction from reconstructScene there is no need to find correspondences between the images. disparity already did that for you.
Related
I am trying to calibrate two cameras that are placed at angular offset (approx 45 degrees). These cameras are of different focal lengths. I did a lot of google search but couldn't really find a good description on how to calibrate cameras at angular offset.
Following the tutorial from OpenCV stereo calibration I am able to do the following steps.
1. Calibrate the left mono-camera and right mono-camera to get the camera parameters : cv2.calibrateCamera(...)
2. Find the new camera matrix for left and right cameras : cv2.getOptimalNewCameraMatrix (...)
3. Stereo calibrate two cameras: cv2.stereoCalibrate(...)
4. Stereo rectification of the two views : cv2.stereoRectify(...)
5. Mapping to undistort the rectified stereo image pairs : cv2.initUndistortRectifyMap(...
There are two things I am not clear about
Since the cameras are at an angular offset, the rectified stereo image pairs (step 4) are at an angle offset and rightly so since the two cameras at an angle. But how I find the overlapping region i.e., the view common to both cameras in the non-rectified images?
How can the two views be aligned similar to Kinect whereby index (i,j) in both images (left and right) corresponds to the same pixel? Specifically, I_left[i, j] == I_right[i, j]. Do we need to do block matching for this?
I have an RGB image and a point cloud acquired by LIDAR.
In the RGB image I detect a feature, let's say a circle.
I want to use this circle as a ROI in my 3d point cloud.
How can I do that? I was thinking to produce a 3d point cloud from the RGB image through the camera parameters and then match the 2 with icp algorithm.
The problem's that on the moment I produce the point cloud from the 2D image, my coordinates system change, so I don't know anymore the position of my circle.
To perform 3d reconstruction I use triangulateMultiview function
I was thinking to produce a 3d point cloud from the RGB image through the camera parameters and then match the 2 with icp algorithm.
-> this would not work and not efficient.
Actually, there is a much better way. Assuming that you know the extrinsic between the camera and lidar, any circle(or ellipse) on the image can be extended into a 3d cone using the camera intrinsic and by selecting the points within the cone you can do the ROI operation.
Let's say you can define an ellipse on your image plane by detecting and finding the parameters of an ellipse equation. The ellipse equation can be extended into the quadric(cone) equation which representing the 3D cone. Now the only thing left is testing if your 3d point is within the cone by putting the cone equation.
This is a mathematically little bit complicated problem if you are not comfortable with camera model or quadric equation.
Using Stereo vision and based on Multiple View Geometry book (http://www.robots.ox.ac.uk/~vgg/hzbook/), I have created a 3D point cloud in MATLAB. To do that, I first calibrated the cameras and rectified the stereo images. Then feature extraction and matching. Then eliminated the noisy matched based on camera locations. Finally created the 3D point cloud using triangulation.
Now my question is how to convert this 3D point cloud from pixel domain to actual millimeter/centimeter domain knowing my focal length and camera calibration matrices?
the goal is to find DEPTH IN MILLIMETERS.
I know how to do it in disparity/depth map case using formula: Z=(t*f)/d.
But here in the sparse case, can I do something like this? http://matlab.wikia.com/wiki/FAQ#How_do_I_measure_a_distance_or_area_in_real_world_units_instead_of_in_pixels.3F
or there is a more sophisticated method with more in depth explanation?
Thanks.
The formula you wrote is valid only in the special case when the image planes of the two cameras are on the same geometrical plane, and the motion from one to the other is a translation parallel to one of the image axes.
In the general case you'll need to triangulate actual rays in 3D space, using one of the techniques described in that book (it has a whole chapter on reconstruction). The reconstruction will be metrical if your calibration is. In particular, if the coordinate transform between the cameras has a translation vector whose units are meters (or millimeters, or inches, ...).
I copied the matlab code from: http://www.mathworks.fr/fr/help/vision/ug/stereo-image-rectification.html
I can compute the 3D coordinates but I am not sure if it is the correct one.
Starting from the disparity map and calculating the 3D coordinates, how do we take into account of the warping tform1 and tform2?
The problem here is that you are using uncalibrated cameras. In this case you can get up-to-scale reconstruction, but if you want the 3D points in world units, you would need to know actual distances to some points in the world.
I think you would be better off calibrating your stereo system. Please see this example.
I am using Camera Calibration Toolbox for Matlab. After calibration I have intrinsic and extrinsic parameters of stereo camera system. Next, I would like to determine the distance between the camera system and the object. To get this information, I used the function stereo_triangulation which is included in the Toolbox. Input are two matrixes including pixel coordinates of correspondences in the left and right image.
I tried to get coordinates of correspondences with using of Basic Block Matching method which is described in Matlab's help for Stereo Vision.
Resolution of my pictures is 1280x960 pixels. I know that the biggest disparity is around 520 pixels. I set the maximum of disparity range to 520. But then determine the coordinates takes ages. It is not possible use in practice. Calculating of disparity map is much faster with using of Matlab's function disparity(). But I want the step before - coordinates of correspondences.
Please can you suggest how can I effectively get the coordinates with Matlab?
Disparity and 3D are related by simple formulas (see below) so the time for calculating 3D data and disparity map should be the same. The notation is
f - focal length in pixels,
B - separation between cameras,
u, v - row and column in the system centered on the middle of the image,
d-disparity,
x, y, z - 3D coordinates.
z=f*B/d;
x=z*u/f;
y=z*v/f;
1280x960 is too large resolution for any correlation stereo to work in real time. Think about it: you have to loop over a 2d image, over 2d correlation window and over the range of disparities. This means 5 embedded loops! I don't work with Matlab anymore but I know that it is quite slow.