What does MATLAB's estimateUncalibratedRectification do expressed mathematically ? What does it calculate exactly? - matlab

What does MATLAB's estimateUncalibratedRectification do in mathematical/geometrical terms ?
What does it calculate exactly ?
As far as I understand, if the camera parameters are unknown then only the fundamental matrix can be computed from two images, not the essential matrix.
So, as far as I understand, estimateUncalibratedRectification's result should be ambiguous up to some kind of a transformation T because the fundamental matrix - that can be computed from two images if the camera's intrinsic parameters are not known - is ambiguous up to an arbitrary projective transformation.
Is this correct ?
My guess is that estimateUncalibratedRectification computes a projective transformation P1 for image1 and another projective transformation P2 for image2 in such a way that when using these two transformations (P1 and P2) on the corresponding images then the resulting images (R1 and R2) will be rectified in the sense that the corresponding epipolar lines will be in the same rows, as shown in the image below.
My question is: how ambiguous is this result ?
My feeling is that the resulting transformations P1 and P2 are ambiguous up to some transformation T but I don't really see what this T can be.
Could someone please explain how estimateUncalibratedRectification works conceptually/mathematically/geometrically and also tell what T can be ?
In other words, what is the transformation T which when applied to R1 and R2 will result in an image pair TR1 and TR2 which will have the same rectified property as R1 and R2, namely that corresponding epipolar lines appear in matching rows in TR1 and TR2, just like they do in R1 and R2?
What is this T ? Is there such a T ?
PS.: I have read the code of estimateUncalibratedRectification.m before posting this question but it made me no wiser.

If the intrinsics are not known, the result is ambiguous up to a projective transformation. In other words, if you use estimateUncalibratedRectification to rectify a pair of images, and then compute disparity and do the 3D reconstruction, then you will reconstruct the 3D scene up to a projective transformation. Straight lines will be straight, parallel lines will be parallel, but your angles and sizes will likely be wrong.
To determine what that projective transformation is, you would need more information. If you know the camera intrinsics, then you get up to scale reconstruction. In other words you get the correct angles and relative sizes. To get the correct scale you would either need to know the baseline (distance between the cameras) or the size of some reference object in the scene.
The more straight-forward way to approach this is to calibrate your cameras using the Camera Calibrator app or the Stereo Camera Calibrator app.

Related

Problems of clipping in 4D homogeneous space?

I am learning programmable rendering pipeline by implementing a tiny software renderer. I try to implement it in a 'hardware' style. However, I am not familiar with the GPU pipeline and got some problems of homogeneous clipping.
According to this thread, suppose we have two points e0, e1 in 3D eye coordinate, which are projected to h0(-70, -70, 118, 120), h1(-32, -99, -13, -11) in the 4D homogeneous clipping space. Then we do interpolation in 4D homogeneous space, the segment h0-h1 is clipped by plane w = -x (z = -1 in NDC) at 4D point h(t)=t*h1+(1-t)*h2, with t = 0.99. Without loss of generality, suppose we have h0-h(0.99) part (which is viewable) feeding to the rasterization stage. So we need to generate the corresponding vertex properties of h(0.99) (same as the output format of vertex shader). My question is how to generate these new vertices' properties?
Update: I try to use t as the interpolate variable to get the vertex properties of h(t), and get reasonable result. I am wondering why the t from 4D space can get good interpolation result in 3D vertex properties?
I am wondering why the t from 4D space can get good interpolation result in 3D vertex properties?
Because that's how math works. Or more to the point, that's how linear math works.
Without getting too deep into the mathematics, a linear transformation is a transformation between two spaces that preserves the linear nature of the original space. For example, two lines that are parallel to one another will remain parallel after a linear transformation. If you perform a 2x scale in the Y direction, the new lines will be longer and farther from the origin. But they will still be parallel.
Let's say we have a line AB, and you define the point C which is the midpoint between A and B. If you perform the same linear transformation on A, B, and C, the new point C1 will still be on the line A1B1. Not only that, C1 will still be the midpoint of the new line.
We can even generalize this. C could be any point which fits the following equation: C = (B-A)t + A, for any t. A linear transformation of A, B and C will not affect change t in this equation.
In fact, that is what a linear transformation really means: it's a transformation that preserves t in that equation, for all points A, B, and C in the original space.
The fact that you have 4 dimensions in your space is ultimately irrelevant to the vector equation above. Linear transformations in any space will preserve t. A matrix transform represents a linear transformation from one space to another (usually).
Also, your original 3D positions were really 4D positions, with the W assumed to be 1.0.
Do be aware however that the transformation from clip-space (4D homogeneous) to normalized-device-coordinate space (3D non-homogeneous) is non-linear. The division-by-W is not a linear transformation. That's one reason why you do clipping in 4D homogeneous clip-space, where we still preserve a linear relationship between the original positions and clip-space.
This is also why perspective-correct interpolation of per-vertex outputs is important: because the space you're doing your rasterization in (window space) is not a linear transformation of the original space output by the vertex shader (clip space). This means that t is not properly preserved. When interpolating, you usually need to compensate for that in order to maintain the linear relationships of your per-vertex values.

Implementation of Radon transform in Matlab, output size

Due to the nature of my problem, I want to evaluate the numerical implementations of the Radon transform in Matlab (i.e. different interpolation methods give different numerical values).
while trying to code my own Radon, and compare it to Matlab's output, I found out that my radon projection sizes are different than Matlab's.
So a bit of intuition of how I compute the amount if radon samples needed. Let's do the 2D case.
The idea is that the maximum size would be when the diagonal (in a rectangular shape at least) part is proyected in the radon transform, so diago=sqrt(size(I,1),size(I,2)). As we dont wan nothing out, n_r=ceil(diago). n_r should be the amount of discrete samples of the radon transform should be to ensure no data is left out.
I noticed that Matlab's radon output is always even, which makes sense as you would want a "ray" through the rotation center always. And I noticed that there are 2 zeros in the endpoints of the array in all cases.
So in that case, n_r=ceil(diago)+mod(ceil(diago)+1,2)+2;
However, it seems that I get small discrepancies with Matlab.
A MWE:
% Try: 255,256
pixels=256;
I=phantom('Modified Shepp-Logan',pixels);
rd=radon(I,pi/4);
size(rd,1)
s=size(I);
diagsize=sqrt(sum(s.^2));
n_r=ceil(diagsize)+mod(ceil(diagsize)+1,2)+2
rd=
367
n_r =
365
As Matlab's Radon transform is a function I can not look into, I wonder why could it be this discrepancy.
I took another look at the problem and I believe this is actually the right answer. From the "hidden documentation" of radon.m (type in edit radon.m and scroll to the bottom)
Grandfathered syntax
R = RADON(I,THETA,N) returns a Radon transform with the
projection computed at N points. R has N rows. If you do not
specify N, the number of points the projection is computed at
is:
2*ceil(norm(size(I)-floor((size(I)-1)/2)-1))+3
This number is sufficient to compute the projection at unit
intervals, even along the diagonal.
I did not try to rederive this formula, but I think this is what you're looking for.
This is a fairly specialized question, so I'll offer up an idea without being completely sure it is the answer to your specific question (normally I would pass and let someone else answer, but I'm not sure how many readers of stackoverflow have studied radon). I think what you might be overlooking is the floor function in the documentation for the radon function call. From the doc:
The radial coordinates returned in xp are the values along the x'-axis, which is
oriented at theta degrees counterclockwise from the x-axis. The origin of both
axes is the center pixel of the image, which is defined as
floor((size(I)+1)/2)
For example, in a 20-by-30 image, the center pixel is (10,15).
This gives different behavior for odd- or even-sized problems that you pass in. Hence, in your example ("Try: 255, 256"), you would need a different case for odd versus even, and this might involve (in effect) padding with a row and column of zeros.

Image Enhancement using combination between SVD and Wavelet Transform

My objective is to handle illumination and expression variations on an image. So I tried to implement a MATLAB code in order to work with only the important information within the image. In other words, to work with only the "USEFUL" information on an image. To do that, it is necessary to delete all unimportant information from the image.
Reference: this paper
Lets see my steps:
1) Apply the Histogram Equalization in order to get an histo_equalized_image=histeq(MyGrayImage). so that large intensity variations
can be handled to some extent.
2) Apply svd approximations on the histo_equalized_image. But before to do that, I applied the svd decomposition ([L D R]=svd(histo_equalized_image)), then these singular values are used to make the derived image J=L*power(D, i)*R where i varies between 1 and 2.
3) Finally, the derived image is combined with the original image to: C=(MyGrayImage+(a*J))/1+a. Where a varies from 0 to 1.
4) But all the steps above are not able to perform well under varying conditions. So finally, wavelet transform should be used to handle those variations(we use only the LL image bloc). Low frequency component contains the useful information, also, unimportant
information gets lost in this component. The (LL) component is ineffective with illumination changes and expression variations.
I wrote a matlab code for that, and I would like to know if my code is correct or no (if no, so how to correct it). Furthermore, I am interested to know if I can optimize these steps. Can we improve this method? if yes, so how? Please I need help.
Lets see now my Matlab code:
%Read the RGB image
image=imread('img.jpg');
%convert it to grayscale
image_gray=rgb2gray(image);
%convert it to double
image_double=im2double(image_gray);
%Apply histogram equalization
histo_equalized_image=histeq(image_double);
%Apply the svd decomposition
[U S V] = svd(histo_equalized_image);
%calculate the derived image
P=U * power(S, 5/4) * V';
%Linearly combine both images
J=(single(histo_equalized_image) + (0.25 * P)) / (1 + 0.25);
%Apply DWT
[c,s]=wavedec2(J,2,'haar');
a1=appcoef2(c,s,'haar',1); % I need only the LL bloc.
You need to define, what do you mean by "USEFUL" or "important" information. And only then do some steps.
Histogram equalization is global transformation, which gives different results on different images. You can make an experiment - do histeq on image, that benefits from it. Then make two copies of the original image and draw in one black square (30% of image area) and white square on second. Then apply histeq and compare results.
Low frequency component contains the useful information, also,
unimportant information gets lost in this component.
Really? Edges and shapes - which are (at least for me) quite important are in high frequencies. Again we need definition of "useful" information.
I cannot see theoretical background why and how your approach would work. Could you a little bit explain, why do you choose this method?
P.S. I`m not sure if this papers are relevant to you, but recommend "Which Edges Matter?" by Bansal et al. and "Multi-Scale Image Contrast Enhancement" by V. Vonikakis and I. Andreadis.

How to compute fundamental matrix using a stereo pair?

referring to the question about fundamental matrix,If I had a stereo pair (2 jpeg) and I want to apply Peter Kovesi's or Zisserman's function on order to obtain F, how can I retrieve P1 and P2 ? these two matrices are 3x4 matrices from the two images, but I don't know how they are related... is it right if I take a random 3x4 matrix from grayscale first image and the corrispondent 3x4 matrix in the second image obtained from the first one by using some matching technique such as correlation ? and if it is, do you think that a 3x4 matrix is not detailed enought?
The Computer Vision System Toolbox includes a function estimateFundamentalMatrix that does what you need. Check out this example of how to estimate the fundamental matrix, and then use it for stereo rectification.
Well, most of your doubts would be clarified once you go through Camera Calibration and 3D Reconstruction.
You should not try to construct F from C0 and C1, you should rather use feature matching between the two images and compute F from the matched key points.
You can then set C0 to be eye and compute C1 from F

Matlab image centroid simulation

I was given this task, I am a noob and need some pointers to get started with centroid calculation in Matlab:
Instead of an image first I was asked to simulate a Gaussian distribution(2 dimensional), add noise(random noise) and plot the intensities, now the position of the centroid changes due to noise and I need to bring it back to its original position by
-clipping level to get rid of the noise, noise reduction by clipping or smoothing, sliding average (lpf) (averaging filter 3-5 samples ), calculating the means or using Convolution filter kernel - which does matrix operations which represent the 2-D images
Since you are a noob, even if we wrote down the answer verbatim you probably won't understand how it works. So instead I'll do what you asked, give you pointers and you'll have to read the related documentation :
a) to produce a 2-d Gaussian use meshgrid or ndgrid
b) to add noise to the image look into rand ,randn or randi, depending what exactly you need.
c) to plot the intensities use imagesc
d) to find the centroid there are several ways, try to further search SO, you'll find many discussions. Also you can check TMW File exchange for different implementations for that.