Derive a rotational/transformational matrix given an image and a rotated image in Java? - affinetransform

Need some advise and point me in the right direction.
My object detection system reads in this image(see below) and returns coordinates for bounding boxes for some detection results(in this case, a hammer)
http://i1116.photobucket.com/albums/k572/Ruihong_Zhou/z3IJx-1.png
However I wish to examine the accuracy of the detection results for the same image by feeding the system, rotated images of the original images and allow it to detect and return coordinates for detection results if any.
For example:
http://i1116.photobucket.com/albums/k572/Ruihong_Zhou/myJQA-1.jpg
Let's say the coordinates of the yellow point(in the image above) is found but it is with respect to the rotated frame of reference. How do i actually transform/rotate these coordinates and find out where do they actually lie in the original image with respect to the original frame of reference.
Someone has pointed out to me that I should use affine transformation but I'm not sure how to go about it as honestly this is the 1st time i have heard of affine transformation and i'm still trying to brute force my learning of it now.
Further research indicates that I need both the original set of coordinates in the original image and the same set of coordinates in the rotated image to come up with a transformation matrice but I only have the detected set of coordinates in the rotated image.

Related

Measuring objects in a photo taken by calibrated cameras, knowing the size of a reference object in the photo

I am writing a program that captures real time images from a scene by two calibrated cameras (so the internal parameters of the cameras are known to us). Using two view geometry, I can find the essential matrix and use OpenCV or MATLAB to find the relative position and orientation of one camera with respect to another. Having the essential matrix, it is shown in Hartley and Zisserman's Multiple View Geometry that one can reconstruct the scene using triangulation up to scale. Now I want to use a reference length to determine the scale of reconstruction and resolve ambiguity.
I know the height of the front wall and I want to use it for determining the scale of reconstruction to measure other objects and their dimensions or their distance from the center of my first camera. How can it be done in practice?
Thanks in advance.
Edit: To add more information, I have already done linear trianglation (minimizing the algebraic error) but I am not sure if it is any useful because there is still a scale ambiguity that I don't know how to get rid of it. My ultimate goal is to recognize an object (like a Pepsi can) and separate it in a rectangular area (which is going to be written as a separate module by someone else) and then find the distance of each pixel in this rectangular area, i.e. the region of interest, to the camera. Then the distance from the camera to the object will be the minimum of the distances from the camera to the 3D coordinates of the pixels in the region of interest.
Might be a bit late, but at least for someone struggling with the same staff.
As far as I remember it is actually linear problem. You got essential matrix, which gives you rotation matrix and normalized translation vector specifying relative position of cameras. If you followed Hartley and Zissermanm you probably chose one of the cameras as origin of world coordinate system. Meaning all your triangulated points are in normalized distance from this origin. What is important is, that the direction of every triangulated point is correct.
If you have some reference in the scene (lets say height of the wall), then you just have to find this reference (2 points are enough - so opposite ends of the wall) and calculate "normalization coefficient" (sorry for terminology) as
coeff = realWorldDistanceOf2Points / distanceOfTriangulatedPoints
Once you have this coeff, just mulptiply all your triangulated points with it and you got real world points.
Example:
you know that opposite corners of the wall are 5m from each other. you find these corners in both images, triangulate them (lets call triangulated points c1 and c2), calculate their distance in the "normalized" world as ||c1 - c2|| and get the
coeff = 5 / ||c1 - c2||
and you get real 3d world points as triangulatedPoint*coeff.
Maybe easier option is to have both cameras in fixed relative position and calibrate them together by stereoCalibrate openCV/Matlab function (there is actually pretty nice GUI in Matlab for that) - it returns not just intrinsic params, but also extrinsic. But I don't know if this is your case.

How can I rotate connected components so that they are upright in Matlab?

Currently I am working with a sudoku grid and I have the binary image. I am using Regionprops to get the area of the connected components and then turn the rest of the image black. After this I call the OCR method to try and read the sudoku numbers. The problem is that this only works if the sudoku grid in the image is straight and upright. If it is rotated even a little bit I am not able to pull the numbers. This is the code I have so far:
% get grid connected parts
conn_part = bwconncomp(im_binary);
% blacken area outside
stats = regionprops(conn_part,'Area');
im_out = im_binary; % Make mask
im_out(vertcat(conn_part.PixelIdxList{[stats.Area] < 825 | [stats.Area] > 2500})) = 0;
imagesc(im_out);
title("Numbers pulled");
sudokuNum = ocr(im_out,'TextLayout','Block','CharacterSet','0123456789');
sudokuNum.Text;
Where im_binary is the binary image
im_out is the output image
stats is the object returned from regionprops containing the area of the connected components
I know I can rotate the image before getting the OCR results by doing:
im_out = imrotate(im_out, angle)
However I don't know what angle the grid is at since this is part of a function that loops through for multiple images. I looked into the regionprops method because there is an attribute 'Orientation' which I can pull from there but I don't understand how I would actually use it. It also states that regionprops will return a value between -90 and 90, but my image could be rotated by more than 90 degrees.
Don't rotate the connected component or the binary image. First use the binary image to determine the rotation, then rotate the original grey-scale or color input image, and then binarize the rotated image. You'll be able to transform with interpolation, which will improve your results greatly. It does require to do the binarization step twice, but I don't think this step usually is too expensive.
The regionprops orientation feature is computed by "fitting" an ellipse to the shape. This is meaningful only for elongated objects. For a square sudoku grid this will not yield any valuable information.
Instead, look at the angle at which the smallest Feret diameter was obtained. The Feret diameters are the lengths of the projections at arbitrary angles. At one angle, this projection is smallest. By necessity it will be at an angle corresponding to one of the principal axes of the square. Here is more information about how to compute Feret diameters in MATLAB.
A different alternative is e.g. to use the Hough transform to detect the lines of the grid.
Do note that the geometry of the puzzle will never tell you about which side is up. The angle you get here should be taken modulo π/2 (i.e. constrain to the range -π/4 to π/4).
To know what direction is up you might do by trying to read the text, if it fails, rotate by 90 degrees and try again.

Verify that camera calibration is still valid

How do you determine that the intrinsic and extrinsic parameters you have calculated for a camera at time X are still valid at time Y?
My idea would be
to use a known calibration object (a chessboard) and place it in the camera's field of view at time Y.
Calculate the chessboard corner points in the camera's image (at time Y).
Define one of the chessboard corner points as world origin and calculate the world coordinates of all remaining chessboard corners based on that origin.
Relate the coordinates of 3. with the camera coordinate system.
Use the parameters calculated at time X to calculate the image points of the points from 4.
Calculate distances between points from 2. with points from 5.
Is that a clever way to go about it? I'd eventually like to implement it in MATLAB and later possibly openCV. I think I'd know how to do steps 1)-2) and step 6). Maybe someone can give a rough implementation for steps 2)-5). Especially I'd be unsure how to relate the "chessboard-world-coordinate-system" with the "camera-world-coordinate-system", which I believe I would have to do.
Thanks!
If you have a single camera you can easily follow the steps from this article:
Evaluating the Accuracy of Single Camera Calibration
For achieving step 2, you can easily use detectCheckerboardPoints function from MATLAB.
[imagePoints, boardSize, imagesUsed] = detectCheckerboardPoints(imageFileNames);
Assuming that you are talking about stereo-cameras, for stereo pairs, imagePoints(:,:,:,1) are the points from the first set of images, and imagePoints(:,:,:,2) are the points from the second set of images. The output contains M number of [x y] coordinates. Each coordinate represents a point where square corners are detected on the checkerboard. The number of points the function returns depends on the value of boardSize, which indicates the number of squares detected. The function detects the points with sub-pixel accuracy.
As you can see in the following image the points are estimated relative to the first point that covers your third step.
[The image is from this page at MATHWORKS.]
You can consider point 1 as the origin of your coordinate system (0,0). The directions of the axes are shown on the image and you know the distance between each point (in the world coordinate), so it is just the matter of depth estimation.
To find a transformation matrix between the points in the world CS and the points in the camera CS, you should collect a set of points and perform an SVD to estimate the transformation matrix.
But,
I would estimate the parameters of the camera and compare them with the initial parameters at time X. This is easier, if you have saved the images that were used when calibrating the camera at time X. By repeating the calibrating process using those images you should get very similar results, if the camera calibration is still valid.
Edit: Why you need the set of images used in the calibration process at time X?
You have a set of images to do the calibrations for the first time, right? To recalibrate the camera you need to use a new set of images. But for checking the previous calibration, you can use the previous images. If the parameters of the camera are changes, there would be an error between the re-estimation and the first estimation. This can be used for evaluating the validity of the calibration not for recalibrating the camera.

transformed image should always visible

I am trying to transform an image using Bi-linear interpolation, my input image is I, I have my affine matrix [A], which will give me transformed image I', according to bi-linear interpolation I am taking inverse of affine matrix inv([A]) and applying that to every point of output image(which is all zero at initial level), as we cant guarantee that output image size can be of any size, so first I found the bounds so I can get the size of the output image,
Now I have input image, Affine matrix, and output image which have atleast that size in which transformed image can be saved easily, But If I apply backward backward method of warping, according to that I have to iterate through every pixel of output image(which is zero right now), I want my transformed image at the center so my transformed image should always be visible, any idea how can I do that ?
Note I don't want to use matlab's built in function.
EDIT
If I transformed my A Image I got B, but You see corner of the image got cropped, I want those to be shown as well.
When rotating a rectangle from the upright position to a diagonal one, the vertical distance between the highest and lowest point will increase.
Now there are two approaches you can take:
Put the new picture in a bigger environment
OR
Rescale the rotated picture to make it fit in the original sized environment.

How to deduce angle an image was rotated through?

I have an image that was rotated to an unknown angle, and I don't have the original image. How I determine the angle of rotation with matlab commands?
I need to rotate the image back with this angle to reach the original image.
As #High Performance Mark mentions in his comment, it is difficult to give an answer when it is unclear how you can recognize that the image is rotated, or what would make you decide that the rotation has properly been corrected.
In other words, you will first have to find a way to determine the rotation angle by analyzing the image with respect to specific features that inform you about a potential rotation. For example, if your image contains a face, you'd do face detection (for which there is plenty of code on the File Exchange and then rotate so that the eyes are up and the mouth down. If your image contains lines that should be vertical and/or horizontal in an un-rotated image, you can apply a Hough-transform to your image and find the most likely angle of rotation using houghpeaks.
Finally, to rotate your image, you can use imrotate.
Without examples or a more detailed description, it's hard to give good advice. But generally, this can be done for some types of images.
For example, suppose the image shows buildings, poles, furniture or something that should have vertical edges. Run an edge detector, then take a Fourier transform. There should be peaks, or some visible pattern in the power spectrum, along the Y axis for an unrotated image. The power spectrum rotates the same way as the image. If you can devise an algorithm to find the spectral features that indicate vertical edges, you can measure its angle w.r.t. the origin (zero frequency). That is the angle of image rotation.
But you will have to distinguish that particular feature from all other image features that show in the power spectrum. Have fun with that - this is the kind of detail that will take most of your creativity and time.