effect of padding on cross correlation - matlab

To find displacement of a particle, I calculated the cross correlation between two instants (represented by two images with the same size). Then, I padded the images with zeros to see if a translation will have an effect on the displacement.
Thus I found a difference in displacement vector( the difference can reach 1.5 pixel and the size of image is 56x56 pixels)
Is it normal to find a difference after padding?
N.B: To pad the image, I used
new_image(end+1:56,end+1:56)=0;
EDIT
The difference can even be more for some cases (22 px)

Yes, this is weird. The cross-correlation is calculated by multiplying values in both matrices with eachother and taking the sum of these. Adding zeros should not result in a greater sum.
The problem in the code you've posted is that end+1:56 should likely be end+1:end+56, since you pad it with 56 extra zeros below and to the right of the image this way.
Since your goal appears to be to get the cross-correlation of 2 matrices, I recommend you to look at the xcorr2() and xcorr() functions in Matlab. An explanation for xcorr2() and why zero padding should not have any influence (besides searching a larger image) can be found here.

Related

Scale correction for IFFT of smaller frequency space created by FFT

This might be considered a repost of this question however I am seeking a much deeper explanation on this matter and how to properly solve this problem.
I want to study the PSF/SRF of a voxel in a 44x44 matrix. For that I create a matrix 100x bigger (4400x4400) so 1 voxel in the smaller matrix corresponds to 100x100 voxels in the bigger one. I set the values to 1 of those 100^2 voxels.
Now I do a FFT of the big matrix and an IFFT of only the center portion (44x44) of the frequency space. This is the code:
A = zeros(4400,4400);
A(2201:2300,2201:2300) = 1;
B = fftshift(fft2(A));
C = ifft2(ifftshift(B(2179:2222,2179:2222)));
D = numel(C)/numel(B) * C;
figure, subplot(1,3,1), imshow(A), subplot(1,3,2), imshow(real(C)), subplot(1,3,3), imshow(real(D));
The problem is the following: I would expect the value in the voxel of the new 44x44 matrix to be 1. However, using this numel factor correction they decrease to 0.35. And if I don't apply the correction they go up to huge values.
For starters, let me try to clarify the scaling issue: For the DFT/IDFT there are various scaling conventions regarding the input size. You either need a factor of 1/N in the DFT or a factor of 1/N in the IDFT or a factor of 1/sqrt(N) in both. All have pros and cons and all are equally valid.
Matlab uses the 1/N in the IDFT convention, as you can see in the documentation.
In your example, the forward DFT has a size 4400, the backward IDFT a size of 44. Therefore the IDFT scaling is a factor 100 less than it should be to match the forward transformation and your values are a factor of 100 too large. Since you're doing a 2-D DFT/IDFT, the factor 100 is missing twice, so your rescaling should be 100^2. Your numel(C)/numel(B) does exactly that, I've just tried to give you the explanation for it.
A reason why you might not see the 1 is that you're plotting only the real part of the inverse DFT. Since you did some fftshifting you might have introduced a phase so that part of your signal is in the imaginary part.
edit: Another reason is that you truncate B to the central 44 by 44 window before transforming back. Since A is not bandlimited, B has energy also outside this window. By truncating you are losing a part of it. Therefore, it is not surprising that the resulting amplitude is lower.
Here is a zoom on the image of B to show this phenomenon:
The red square is what you keep, everything else is truncated. Due to Parsevals theorem, the total energy in image and Fourier domain is equal so by truncation you must also reduce the energy of your signal in the image domain.

What does padding do in 2d convolution of images?

Having two images , A and B of sizes n-by-m , k-by-l.
When doing convolution via Fourier transform , it is said that we have to pad with zeros the signals. What does this exactly mean?
When I get ifft2(A, B, n+k-1, m+l-1) its like padded is done? Thanks in advance
To zero-pad, you must increase the size of A and B until they are both n+k-1, m+l-1 (or greater) in size by adding rows and columns of zeros to these array/matrix variables. If you don't zero-pad, the convolution effect will wrap around (top-to-bottom and left-to-right) thus messing up your result (unless you actually want this circular convolution wrap-around effect).

Explaining corr2 function in Matlab

Can someone explain to me the correlation function corr2 in MATLAB? I know that it is for 2D comparing similarities of objects, but in the equation I have doubts what it is A and B (probably matrices for comparison), and also Amn and Bmn.
I'm not sure how MATLAB executes this function, because I have found in several cases that the correlation is not executed for the entire image (matrix) but instead it divides the image into blocks and then compares blocks of one picture with blocks of another picture.
In MATLAB's documentation, the corr2 equation is not put as referral point to the way the equation itself is calculated, like in other functions in MATLAB's documentation, such as referring to what book it is taken from and where it is explained.
The correlation coefficient is a number representing the similarity between 2 images in relation with their respective pixel intensity.
As you pointed out this function is used to calculate this coefficient:
Here A and B are the images you are comparing, whereas the subscript indices m and n refer to the pixel location in the image. Basically what Matab does is to compute, for every pixel location in both images, the difference between the intensity value at that pixel and the mean intensity of the whole image, denoted as a letter with a straightline over it.
As Kostya pointed out, typing edit corr2 in the command window will show you the code used by Matlab to compute the correlation coefficient. The formula is basically this:
a = a - mean2(a);
b = b - mean2(b);
r = sum(sum(a.*b))/sqrt(sum(sum(a.*a))*sum(sum(b.*b)));
where:
a is the input image and b is the image you wish to compare to a.
If we break down the formula, we see that a - mean2(a) and b-mean2(b) are the elements in the numerator of the above equation. mean2(a) is equivalent to mean(mean(a)) or mean(a(:)), that is the mean intensity of the whole image. This is only calculated once.
The 3rd line of code calculates the coefficient. Here sum(sum(a.*b)) calculates the double-sum present in the formula element-wise, that is considering each pixel location separately. Be aware that using sum(a) calculates the sum in every column individually, hence in order to get a single value you need to apply sum twice.
That's pretty much the same happening in the denominator, however calculations are performed on a-mean2(a)^2 and b-mean2(b)^2. You can see this a some kind of normalization process in which you consider the pixel intensity difference among each individual image.
As for your last comment, you can break down an image into small blocks and calculate the correlation coefficient on them; that might save some time for very large images but since everything is vectorized the calculation is quite fast. It might be useful in distributed processing I guess. Of course the correlation coefficient between 2 blocks of images is not necessarily identical to that of the whole image.
For the sake of curiosity you can look at this paper which highlights some caveats in using the correlation coefficient for image comparison.
Hope that makes things a bit clearer!

PSNR for intra predicted frame vs encoded frame

I have to perform Intra predicted coding on a video frame and calculate its PSNR. I am now asked to take the same original frame and encode it which consists of performing DCT, quantization, dequantization and inverse DCT. I have to calculate the PSNR of the encoded frame and compare it with the intra predicted frame.
I got the values of 53.37 db for the intra predicted frame and 32.64 db for the encoded frame. I am supposed to analyze the probability distribution of the encoded image using the histogram. Histogram for both frames look extremely similar so what am I actually supposed to look for?
EDIT
The way I am calculating the PSNR is taking the difference between the original frame and the reconstructed frame and then using the PSNR formula. Code snippet shown below:
errorFrame = orgFrame - reconstFrame;
y = 10*log10(255*255/mean(mean((errorFrame.^2))));
Should the PSNR of the intra predicted frame and the reconstructed frame be the same value? I have uploaded the histogram of the reconstructed frame with intra prediction and reconstructed frame without intra prediction
The histrograms look extremely similar so why is the PSNR value so different?
The PSNR does a point-by-point comparison between two images. The histograms capture the entire distribution of intensities as a whole. For example, if you had an image that was:
A = [0 255;
255 0];
... and another that was:
B = [255 0;
0 255];
... and let's say original image was
C = [0 128;
128 0];.
Even though the histograms between A and B are the same, the PSNRs are 9.0650 and 2.0344 dB respectively. As such, I wouldn't rely on the histograms themselves as they only capture global information. Look at it locally. You can obviously see one has higher quality than the other. In your histograms, though most of the bins of the histograms look equal, but histograms are not spatially aware. In other words, the spatial relationships of pixels are not captured in histograms, as you have seen with my example I gave above. You could have, say, 15 pixels having intensity 80 for both images, but they could be in completely different locations in each of the images. As such, you could have a completely different looking image in comparison to another, but if you counted the amount of pixels per intensity, as long as the counts per intensity are equal, the histograms will be equal.
You can see that A and C are similar in that one is simply the grayer version of the other. However, B is way off as it has white pixels where there are dark pixels in C, and dark pixels when there are gray pixels in C. Though the histograms between A and B are the same, the actual content between them are quite different compared to C.
I do realize that you need to compare the histograms / probability distributions between both of the images, but this question may have been asked on purpose. Though you can see the distribution of intensities is relatively the same, if you analyze local image patches between the two, you can definitely see that one is a lower quality than the other. To be honest, and recounting from personal experience, you should take PSNR with a grain of salt. Just because one image has a higher PSNR than the other doesn't necessarily mean that it is better quality. In fact, there have been images where they were lower PSNR, but I considered them to be better quality than one with higher PSNR.
As such, when you answer your question, make sure you reference everything that I've said here.
tl;dr: Though the histograms look equal, histograms are not spatially aware. The spatial relationships of pixels are not captured in histograms. As such, you could have a completely different looking image in comparison to another, but if you counted the amount of pixels per intensity, as long as the counts per intensity are equal, the histograms will be equal. Even with the histograms being unequal, doing PSNR does a point-by-point difference, and this (sort of) captures the spatial relationships of pixels and thus explains why the PSNRs are quite different.

How can I fasten the "histc" Matlab function

I need to fasten this part of my Matlab code :
double(sum(histc(windows, 0:1:255),2)')
It is applied on every pixel of a large image, it is for calculating the local histogram (within 'windows') so it is very quite consuming.
Do you have any suggestion to fasten the computing ?
Thanks a lot.
You can exploit the overlap between adjacent pixels. Lets say you us a window of size 3x3 and have calculated the histogram for a pixel I(x,y), then the histogram for pixel I(x+1,y) will contain 6 of the same pixels. So you need only subtract 3 values and add 3.
Your code looks wrong. histc returns bin counts and then you sum the counts which should always add up to the size of your window. Do you want to calculate the sum of pixel intensities within the window? Then you should just use the sum function directly.