I'm doing convolution of some tensors.
Here is small test in MATLAB:
ker= rand(3,4,2);
a= rand(5,7,2);
c=convn(a,ker,'valid');
c11=sum(sum(a(1:3,1:4,1).*ker(:,:,1)))+sum(sum(a(1:3,1:4,2).*ker(:,:,2)));
c(1,1)-c11 % not equal!
The third line performs a N-D convolution with convn, and I want to compare the result of the first row, first column of convn with computing the value manually. However, my computation in comparison to convn is not equal.
So what is behind MATLAB's convn? Is my understanding of tensor convolution is wrong?
You almost have it correct. There are two things slightly wrong with your understanding:
You chose valid as the convolution flag. This means that the output returned from the convolution has its size so that when you are using the kernel to sweep over the matrix, it has to fit comfortably inside the matrix itself. Therefore, the first "valid" output that is returned is actually for the computation at location (2,2,1) of your matrix. This means that you can fit your kernel comfortably at this location, and this corresponds to position (1,1) of the output. To demonstrate, this is what a and ker look like for me using your above code:
>> a
a(:,:,1) =
0.9930 0.2325 0.0059 0.2932 0.1270 0.8717 0.3560
0.2365 0.3006 0.3657 0.6321 0.7772 0.7102 0.9298
0.3743 0.6344 0.5339 0.0262 0.0459 0.9585 0.1488
0.2140 0.2812 0.1620 0.8876 0.7110 0.4298 0.9400
0.1054 0.3623 0.5974 0.0161 0.9710 0.8729 0.8327
a(:,:,2) =
0.8461 0.0077 0.5400 0.2982 0.9483 0.9275 0.8572
0.1239 0.0848 0.5681 0.4186 0.5560 0.1984 0.0266
0.5965 0.2255 0.2255 0.4531 0.5006 0.0521 0.9201
0.0164 0.8751 0.5721 0.9324 0.0035 0.4068 0.6809
0.7212 0.3636 0.6610 0.5875 0.4809 0.3724 0.9042
>> ker
ker(:,:,1) =
0.5395 0.4849 0.0970 0.3418
0.6263 0.9883 0.4619 0.7989
0.0055 0.3752 0.9630 0.7988
ker(:,:,2) =
0.2082 0.4105 0.6508 0.2669
0.4434 0.1910 0.8655 0.5021
0.7156 0.9675 0.0252 0.0674
As you can see, at position (2,2,1) in the matrix a, ker can fit comfortably inside the matrix and if you recall from convolution, it is simply a sum of element-by-element products between the kernel and the subset of the matrix at position (2,2,1) that is the same size as your kernel (actually, you need to do something else to the kernel which I will reserve for my next point - see below). Therefore, the coefficient that you are calculating is actually the output at (2,2,1), not at (1,1,1). From the gist of it though, you already know this, but I wanted to put that out there in case you didn't know.
You are forgetting that for N-D convolution, you need to flip the mask in each dimension. If you remember from 1D convolution, the mask must be flipped in the horizontally. What I mean by flipped is that you simply place the elements in reverse order. An array of [1 2 3 4] for example would become [4 3 2 1]. In 2D convolution, you must flip both horizontally and vertically. Therefore, you would take each row of your matrix and place each row in reverse order, much like the 1D case. Here, you would treat each row as a 1D signal and do the flipping. Once you accomplish this, you would take this flipped result, and treat each column as a 1D signal and do the flipping again.
Now, in your case for 3D, you must flip horizontally, vertically and temporally. This means that you would need to perform the 2D flipping for each slice of your matrix independently, you would then grab single columns in a 3D fashion and treat those as 1D signals. In MATLAB syntax, you would get ker(1,1,:), treat this as a 1D signal, then flip. You would repeat this for ker(1,2,:), ker(1,3,:) etc. until you are finished with the first slice. Bear in mind that we don't go to the second slice or any of the other slices and repeat what we just did. Because you are taking a 3D section of your matrix, you are inherently operating over all of the slices for each 3D column you extract. Therefore, only look at the first slice of your matrix, and so you need to do this to your kernel before computing the convolution:
ker_flipped = flipdim(flipdim(flipdim(ker, 1), 2), 3);
flipdim performs the flipping on a specified axis. In our case, we are doing it vertically, then taking the result and doing it horizontally, and then again doing it temporally. You would then use ker_flipped in your summation instead. Take note that it doesn't matter which order you do the flipping. flipdim operates on each dimension independently, and so as long as you remember to flip all dimensions, the output will be the same.
To demonstrate, here's what the output looks like with convn:
c =
4.1837 4.1843 5.1187 6.1535
4.5262 5.3253 5.5181 5.8375
5.1311 4.7648 5.3608 7.1241
Now, to determine what c(1,1) is by hand, you would need to do your calculation on the flipped kernel:
ker_flipped = flipdim(flipdim(flipdim(ker, 1), 2), 3);
c11 = sum(sum(a(1:3,1:4,1).*ker_flipped(:,:,1)))+sum(sum(a(1:3,1:4,2).*ker_flipped(:,:,2)));
The output of what we get is:
c11 =
4.1837
As you can see, this verifies what we get by hand with the calculation done in MATLAB using convn. If you want to compare more digits of precision, use format long and compare them both:
>> format long;
>> disp(c11)
4.183698205668000
>> disp(c(1,1))
4.183698205668001
As you can see, all of the digits are the same, except for the last one. That is attributed to numerical round-off. To be absolutely sure:
>> disp(abs(c11 - c(1,1)));
8.881784197001252e-16
... I think a difference of an order or 10-16 is good enough for me to show that they're equal, right?
Yes, your understanding of convolution is wrong. Your formula for c11 is not convolution: you just multiplied matching indices and then summed. It's more of a dot-product operation (on tensors trimmed to the same size). I'll try to explain beginning with 1 dimension.
1-dimensional arrays
Entering conv([4 5 6], [2 3]) returns [8 22 27 18]. I find it easiest to think of this in terms of multiplication of polynomials:
(4+5x+6x^2)*(2+3x) = 8+22x+27x^2+18x^3
Use the entries of each array as coefficients of a polynomial, multiply the polynomials, collect like terms, and read off the result from coefficients. The powers of x are here to keep track of what gets multiplied and added. Note that the coefficient of x^n is found in the (n+1)th entry, because powers of x begin with 0 while the indices begin with 1.
2-dimensional arrays
Entering conv2([2 3; 3 1], [4 5 6; 0 -1 1]) returns the matrix
8 22 27 18
12 17 22 9
0 -3 2 1
Again, this can be interpreted as multiplication of polynomials, but now we need two variables: say x and y. The coefficient of x^n y^m is found in (m+1, n+1) entry. The above output means that
(2+3x+3y+xy)*(4+5x+6x^2+0y-xy+x^2y) = 8+22x+27x^2+18x^3+12y+17xy+22x^2y+9x^3y-3xy^2+2x^2y^2+x^3y^2
3-dimensional arrays
Same story. You can think of the entries as coefficients of a polynomial in variables x,y,z. The polynomials get multiplied, and the coefficients of the product are the result of convolution.
'valid' parameter
This keeps only the central part of the convolution: those coefficients in which all terms of the second factor have participated. For this to be nonempty, the second array should have dimensions no greater than the first. (This is unlike the default setting, for which the order convolved arrays does not matter.) Example:
conv([4 5 6], [2 3]) returns [22 27] (compare to the 1-dimensional example above). This corresponds to the fact that in
(4+5x+6x^2)*(2+3x) = 8+22x+27x^2+18x^3
the bolded terms got contributions from both 2 and 3x.
Related
Let us have a 4D matrix (tensor), output:
[X,Y,Z] = ndgrid(-50:55,-55:60,-50:60);
a = 1:4;
output = zeros([size(X),length(a)]);
Next, we determine the area inside the ellipsoid:
position = [0,0,0];
radius = [10,20,10];
test_func = #(X,Y,Z) ((X-position(1))/radius(1)).^2 ...
+ ((Y-position(2))/radius(2)).^2 ...
+ ((Z-position(3))/radius(3)).^2 <= 1;
condition = test_func(X,Y,Z);
I need to fill the matrix output inside the ellipsoid for the first 3 dimensions. But for the fourth dimension I need to fill a. I need to do something like this:
output(condition,:) = a;
But it does not work. How to do it? Any ideas please!
If I understand your question correctly, you have a 4D matrix where each temporal blob of pixels in 3D for each 4D slice is filled with a number... from 1 up to 4 where each number tells you which slice you're in.
You can cleverly use bsxfun and permute to help you accomplish this task:
output = bsxfun(#times, double(condition), permute(a, [1 4 3 2]));
This takes a bit of imagination to imagine how this works but it's quite simple. condition is a 3D array of logical values where each location in this 3D space is either 0 if it doesn't or 1 if it does belong to a point inside an ellipsoid. a is a row vector from 1 through 4 or however many elements you want this to end with. Let's call this N to be more general.
permute(a, [1 4 3 2]) shuffles the dimensions such that we create a 4D vector where we have 1 row, 1 column and 1 slice but we have 4 elements going into the fourth dimension.
By using bsxfun in this regard, it performs automatic broadcasting on two inputs where each dimension in the output array will match whichever of the two inputs had the largest value. The condition is that for each dimension independently, they should both match or one of them is a singleton dimension (i.e. 1).
Therefore for your particular example, condition will be 106 x 116 x 111 while the output of the permute operation will be 1 x 1 x 1 x N. condition is also technically 106 x 116 x 111 x 1 and using bsxfun, we would thus get an output array of size 106 x 116 x 111 x N. Performing the element-wise #times operation, the permuted vector a will thus broadcast itself over all dimensions where each 3D slice i will contain the value of i. The condition matrix will then duplicate itself over the fourth dimension so we have N copies of the condition matrix, and performing the element-wise multiply thus completes what you need. This is doable as the logical mask you created contains only 0s and 1s. By multiplying element-wise with this mask, only the values that are 1 will register a change. Specifically, if you multiply the values at these locations by any non-zero value, they will change to these non-zero values. Using this logic, you'd want to make these values a. It is important to note that I had to cast condition to double as bsxfun only allows two inputs of the same class / data type to be used.
To visually see that this is correct, I'll show scatter plots of each blob where the colour of each blob would denote what label in a it belongs to:
close all;
N = 4;
clrs = 'rgbm';
figure;
for ii = 1 : N
blob = output(:,:,:,ii);
subplot(2,2,ii);
plot3(X(blob == ii), Y(blob == ii), Z(blob == ii), [clrs(ii) '.']);
end
We get:
Notice that the spatial extent of each ellipsoid is the same but what is different are the colours assigned to each blob. I've made it such that the values for the first blob, or those assigned to a = 1 are red, those that are assigned to a = 2 are assigned to green, a = 3 to blue and finally a = 4 to magenta.
If you absolutely want to be sure that the coordinates of each blob are equal across all blobs, you can use find in each 3D blob individually and ensure that the non-zero indices are all equal:
inds = arrayfun(#(x) find(output(:,:,:,x)), a, 'un', 0);
all_equal = isequal(inds{:});
The code finds in each blob the column major indices of all non-zero locations in 3D. To know whether or not we got it right, each 3D blob should all contain the same non-zero column major indices. The arrayfun call goes through each 3D blob and returns a cell array where each cell element is the column major indices for a particular blob. We then pipe this into isequal to ensure that all of these column major indices arrays are equal. We get:
>> all_equal
all_equal =
1
I have x,y pixel coordinates of multiple objects that have been tracked from an image (3744x5616). The coordinates are stored in a structure called objects, e.g.
objects(1).centre = [1868 1236]
The objects are each uniquely identified by a numerical code, e.g.
objects(i).code = 33
I want to be able to record each time any two objects come within a radius of 300 pixels each other. What would be the best way to check through if any objects are touching and then record the identity of both objects involved in the interaction, like object 33 interacts with object 34.
Thanks!
Best thing I can think of right now is a brute force approach. Simply check the distances from one object's centre with the rest of the other objects and manually check if the distances are < 300 pixels.
If you want this fast, we should probably do this without any toolboxes. You can intelligently do this with vanilla MATLAB using bsxfun. First, create separate arrays for the X and Y coordinates of each object:
points = reshape([objects.centre], 2, []);
X = points(1,:);
Y = points(2,:);
[objects.centre] accesses the individual coordinates of each centre field in your structure and unpacks them into a comma-separated list. I reshape this array so that it is 2 rows where the first row is the X coordinate and the second row is the Y coordinate. I extract out the rows and place them into separate arrays.
Next, create two difference matrices for each X and Y where the rows denote one unique coordinate and the columns denote another unique coordinate. The values inside this matrix are the differences between the point i at row i and point j at column j:
Xdiff = bsxfun(#minus, X.', X);
Ydiff = bsxfun(#minus, Y.', Y);
bsxfun stands for Binary Singleton EXpansion FUNction. If you're familiar with the repmat function, it essentially replicates matrices and vectors under the hood so that both inputs you're operating on have the same size. In this case, what I'm doing is specifying X or Y as both of the inputs. One is the transposed version of the other. By doing this bsxfun automatically broadcasts each input so that the inputs match in dimension. Specifically, the first input is a column vector of X and so this gets repeated and stacked horizontally for as many times as there are values in X.
Similarly this is done for the Y value. After you do this, you perform an element-wise subtraction for both outputs and you get the component wise subtraction between one point and another point for X and Y where the row gives you the first point, and the column gives you the second point. As a toy example, imagine we had X = [1 2 3]. Doing a bsxfun call using the above code gives:
>> Xdiff = bsxfun(#minus, [1 2 3].', [1 2 3])
Xdiff =
## | 1 2 3
----------------------
1 | 0 -1 -2
2 | 1 0 -1
3 | 2 1 0
There are some additional characters I placed in the output, but these are used solely for illustration and to give you a point of reference. By taking a row value from the ## column and subtracting from a column value from the ## row gives you the desired subtract. For example, the first row second column illustrates 1 - 2 = -1. The second row, third column illustrates 2 - 3 = -1. If you do this for both the X and Y points, you get the component-wise distances for one point against all of the other points in a symmetric matrix.
You'll notice that this is an anti-symmetric matrix where the diagonal is all 0 ... makes sense since the distance of one dimension of one point with respect to itself should be 0. The bottom left triangular portion of the matrix is the opposite sign of the right... because of the order of subtraction. If you subtracted point 1 with point 2, doing the opposite subtraction gives you the opposite sign. However, let's assume that the rows denote the first object and the columns denote the second object, so you'd want to concentrate on the lower half.
Now, compute the distance, and make sure you set either the upper or lower triangular half to NaN because when computing the distance, the sign gets ignored. If you don't ignore this, we'd find duplicate objects that interact, so object 3 and object 1 would be a different interaction than object 1 and object 3. You obviously don't care about the order, so set either the upper or lower triangular half to NaN for the next step. Assuming Euclidean distance:
dists = sqrt(Xdiff.^2 + Ydiff.^2);
dists(tril(ones(numel(objects))==1)) = NaN;
The first line computes the Euclidean distance of all pairs of points and we use tril to extract the lower triangular portion of a matrix that consists of all logical 1. Extracting this matrix, we use this to set the lower half of the matrix to NaN. This allows us to skip entries we're not interested in. Note that I also set the diagonal to 0, because we're not interested in distances of one object to itself.
Now that you're finally here, search for those objects that are < 300 pixels:
[I,J] = find(dists < 300);
I and J are row/column pairs that determine which rows and columns in the matrix have values < 300, so in our case, each pair of I and J in the array gives you the object locations that are close to each other.
To finally figure out the right object codes, you can do:
codes = [[objects(I).code].' [objects(J).code].'];
This uses I and J to access the corresponding codes of those objects that were similar in a comma-separated list and places them side by side into a N x 2 matrix. As such, each row of codes gives you unique pairs of objects that satisfied the distance requirements.
For copying and pasting:
points = reshape([objects.centre], 2, []);
X = points(1,:);
Y = points(2,:);
Xdiff = bsxfun(#minus, X.', X);
Ydiff = bsxfun(#minus, Y.', Y);
dists = sqrt(Xdiff.^2 + Ydiff.^2);
dists(tril(ones(numel(objects))==1)) = NaN;
[I,J] = find(dists < 300);
codes = [[objects(I).code].' [objects(J).code].'];
Toy Example
Here's an example that we can use to verify if what we have is correct:
objects(1).centre = [1868 1236];
objects(2).centre = [2000 1000];
objects(3).centre = [1900 1300];
objects(4).centre = [3000 2000];
objects(1).code = 33;
objects(2).code = 34;
objects(3).code = 35;
objects(4).code = 99;
I initialized 4 objects with different centroids and different codes. Let's see what the dists array gives us after we compute it:
>> format long g
>> dists
dists =
NaN 270.407100498489 71.5541752799933 1365.69396278961
NaN NaN 316.227766016838 1414.2135623731
NaN NaN NaN 1303.84048104053
NaN NaN NaN NaN
I intentionally made the last point farther than any of the other three points to ensure that we can show cases where there are points not near other ones.
As you can see, points (1,2) and (1,3) are all near each other, which is what we get when we complete the rest of the code. This corresponds to objects 33, 34 and 35 with pairings of (33,34) and (33,35). Points with codes 34 and 35 I made slightly smaller, but they are still greater than the 300 pixel threshold, so they don't count either:
>> codes
codes =
33 34
33 35
Now, if you want to display this in a prettified format, perhaps use a for loop:
for vec = codes.'
fprintf('Object with code %d interacted with object with code %d\n', vec(1), vec(2));
end
This for loop is a bit tricky. It's a little known fact that for loops can also accept matrices and the index variable gives you one column of each matrix at a time from left to right. Therefore, I transposed the codes array so that each pair of unique codes becomes a column. I just access the first and second element of each column and print it out.
We get:
Object with code 33 interacted with object with code 34
Object with code 33 interacted with object with code 35
I have a randomly defined H matrix of size 600 x 10. Each element in this matrix H can be represented as H(k,t). I obtained a speech spectrogram S which is 600 x 597. I obtained it using Mel features, so it should be 40 x 611 but then I used a frame stacking concept in which I stacked 15 frames together. Therefore it gave me (40x15) x (611-15+1) which is 600 x 597.
Now I want to obtain an output matrix Y which is given by the equation based on convolution Y(k,t) = ∑ H(k,τ)S(k,t-τ). The sum goes from τ=0 to τ=Lh-1. Lh in this case would be 597.
I don't know how to obtain Y. Also, my doubt is the indexing into both H and S when computing the convolution. Specifically, for Y(1,1), we have:
Y(1,1) = H(1,0)S(1,1) + H(1,1)S(1,0) + H(1,2)S(1,-1) + H(1,3)S(1,-2) + ...
Now, there is no such thing as negative indices in MATLAB - for example, S(1,-1) S(1,-2) and so on. So, what type of convolution should I use to obtain Y? I tried using conv2 or fftfilt but I think that will not give me Y because Y must also be the size of S.
That's very easy. That's a convolution on a 2D signal only being applied to 1 dimension. If we assume that the variable k is used to access the rows and t is used to access the columns, you can consider each row of H and S as separate signals where each row of S is a 1D signal and each row of H is a convolution kernel.
There are two ways you can approach this problem.
Time domain
If you want to stick with time domain, the easiest thing would be to loop over each row of the output, find the convolution of each pair of rows of S and H and store the output in the corresponding output row. From what I can tell, there is no utility that can convolve in one dimension only given an N-D signal.... unless you go into frequency domain stuff, but let's leave that for later.
Something like:
Y = zeros(size(S));
for idx = 1 : size(Y,1)
Y(idx,:) = conv(S(idx,:), H(idx,:), 'same');
end
For each row of the output, we perform a row-wise convolution with a row of S and a row of H. I use the 'same' flag because the output should be the same size as a row of S... which is the bigger row.
Frequency domain
You can also perform the same computation in frequency domain. If you know anything about the properties of convolution and the Fourier Transform, you know that convolution in time domain is multiplication in the frequency domain. You take the Fourier Transform of both signals, multiply them element-wise, then take the Inverse Fourier Transform back.
However, you need to keep the following intricacies in mind:
Performing a full convolution means that the final length of the output signal is length(A)+length(B)-1, assuming A and B are 1D signals. Therefore, you need to make sure that both A and B are zero-padded so that they both match the same size. The reason why you make sure that the signals are the same size is to allow for the multiplication operation to work.
Once you multiply the signals in the frequency domain then take the inverse, you will see that each row of Y is the full length of the convolution. To ensure that you get an output that is the same size as the input, you need to trim off some points at the beginning and at the end. Specifically, since each kernel / column length of H is 10, you would have to remove the first 5 and last 5 points of each signal in the output to match what you get in the for loop code.
Usually after the inverse Fourier Transform, there are some residual complex coefficients due to the nature of the FFT algorithm. It's good practice to use real to remove the complex valued parts of the results.
Putting all of this theory together, this is what the code would look like:
%// Define zero-padded H and S matrices
%// Rows are the same, but columns must be padded to match point #1
H2 = zeros(size(H,1), size(H,2)+size(S,2)-1);
S2 = zeros(size(S,1), size(H,2)+size(S,2)-1);
%// Place H and S at the beginning and leave the rest of the columns zero
H2(:,1:size(H,2)) = H;
S2(:,1:size(S,2)) = S;
%// Perform Fourier Transform on each row separately of padded matrices
Hfft = fft(H2, [], 2);
Sfft = fft(S2, [], 2);
%// Perform convolution
Yfft = Hfft .* Sfft;
%// Take inverse Fourier Transform and convert to real
Y2 = real(ifft(Yfft, [], 2));
%// Trim off unnecessary values
Y2 = Y2(:,size(H,2)/2 + 1 : end - size(H,2)/2 + 1);
Y2 should be the convolved result and should match Y in the previous for loop code.
Comparison between them both
If you actually want to compare them, we can. What we'll need to do first is define H and S. To reconstruct what I did, I generated random values with a known seed:
rng(123);
H = rand(600,10);
S = rand(600,597);
Once we run the above code for both the time domain version and frequency domain version, let's see how they match up in the command prompt. Let's show the first 5 rows and 5 columns:
>> format long g;
>> Y(1:5,1:5)
ans =
1.63740867892464 1.94924208172753 2.38365646354643 2.05455605619097 2.21772526557861
2.04478411247085 2.15915645246324 2.13672842742653 2.07661341840867 2.61567534623066
0.987777477630861 1.3969752201781 2.46239452105228 3.07699790208937 3.04588738611503
1.36555260994797 1.48506871890027 1.69896157726456 1.82433906982894 1.62526864072424
1.52085236885395 2.53506897420001 2.36780282057747 2.22335617436888 3.04025523335182
>> Y2(1:5,1:5)
ans =
1.63740867892464 1.94924208172753 2.38365646354643 2.05455605619097 2.21772526557861
2.04478411247085 2.15915645246324 2.13672842742653 2.07661341840867 2.61567534623066
0.987777477630861 1.3969752201781 2.46239452105228 3.07699790208937 3.04588738611503
1.36555260994797 1.48506871890027 1.69896157726456 1.82433906982894 1.62526864072424
1.52085236885395 2.53506897420001 2.36780282057747 2.22335617436888 3.04025523335182
Looks good to me! As another measure, let's figure out what the largest difference is between one value in Y and a corresponding value in Y2:
>> max(abs(Y(:) - Y2(:)))
ans =
5.32907051820075e-15
That's saying that the max error seen between both outputs is in the order of 10-15. I'd say that's pretty good.
I have declared 2 matrixes like this:
a = [ 1 2;
11 12];
[m, n] = size(a);
b = a(2,:);
dist( b , a ); % the first column is not interesting
This works, I get a vector
[ 10.0499 9.0000 ]
However if I want to add a column or a line to my matrix a:
a = [ 1 2 3 ;
11 12 13];
then apply the same algorithm, than above, ignoring or not the first column, I get this error:
Error using -
Matrix dimensions must agree
I have no idea why it does not work, can someone explain to me please?
Actually I don't even know how to retrieve the way this euclidian distance is computed, I failed at trying to retrieve those values [ 10.0499 9.0000 ] by hand.
The Matlab mathworks manual says he algorithm used is the following:
d = sum((x-y).^2).^0.5
Any help
It does not work because the dist function when called with two arguments works like this:
Z = dist(W,P) takes an SxR weight matrix and RxQ input matrix and
returns the SxQ matrix of distances between W's rows and P's columns.
dist(P',P) returns the same result as dist(P).
That is, if you do this:
a = [ 1 2 3 ;
11 12 13]
b = a(2,:) % Then b = [11 12 13]
...and call:
dist(b, a)
It will try to compute the distance between b's rows (in this case, only a row with three numbers, that is, a 3D point) and a's columns (each column has two numbers, that is, a 2D point). Measuring a distance between them makes no sense.
The reason it worked on your first example was because the matrix was square (2x2). Therefore, you're computing distances between a row (2D) to the other columns (also 2D).
A distance by definition is a single number, not a vector. The fact that you've getting a vector for matrices of the same size already indicates that something is wrong. In particular, it gives you distance between each of the corresponding column vectors to each other. So it doesn't work when your matrices are of different sizes. A distance between matrices is not defined in any one particular way. I don't know of a notion of Euclidean distance between two matrices.
Data: Say I have a 2000 rows by 500 column matrix (image)
What I need: Compute the FFT of 64 rows by 10 column chunks of above data. In other words, I want to compute the FFT of 64X10 window that is run across the entire data matrix. The FFT result is used to compute a scalar value (say peak amplitude frequency) which is used to create a new "FFT value" image.
Now, I need the final FFT image to be the same size as the original data (2000 X 500).
What is the fastest way to accomplish this in MATLAB? I am currently using for loops which is relatively slow. Also I use interpolation to size up the final image to the original data size.
As #EitanT pointed out, you can use blockproc for batch block processing of an image J. However you should define your function handle as
fun = #(block_struct) fft2(block_struct.data);
B = blockproc(J, [64 10], fun);
For a [2000 x 500] matrix this will give you a [2000 x 500] output of complex Fourier values, evaluated at sub-sampled pixel locations with a local support (size of the input to FFT) of [64 x 10]. Now, to replace those values with a single, e.g. with the peak log-magnitude, you can further specify
fun = #(block_struct) max(max(log(abs(fft2(block_struct.data)))));
B = blockproc(J, [64 10], fun);
The output then is a [2000/64 x 500/10] output of block-patch values, which you can resize by nearest-neighbor interpolation (or something else for smoother versions) to the desired [2000 x 500] original size
C = imresize(B, [2000 500], 'nearest');
I can include a real image example if it will further help.
Update: To get overlapping blocks you can use the 'Bordersize' option of blockproc by setting the overlap [V H] such that the final windows of size [M + 2*V, N + 2*H] will still be [64, 10] in size. Example:
fun = #(block_struct) log(abs(fft2(block_struct.data)));
V = 16; H = 3; % overlap values
overlap = [V H];
M = 32; N = 4; % non-overlapping values
B1 = blockproc(J, [M N], fun, 'BorderSize', overlap); % final windows are 64 x 10
However, this will work with keeping the full Fourier response, not the single-value version with max(max()) above.
See also this post for filtering using blockproc:Dealing with “Really Big” Images: Block Processing.
If you want to apply the same function (in your case, the 2-D Fourier transform) on individual distinct blocks in a larger matrix, you can do that with the blkproc function, which is replaced in newer MATLAB releases by blockproc.
However, I infer that you wish to apply apply fft2 on overlapping blocks in a "sliding window" fashion. For this purpose you can use colfilt with the 'sliding' option. Note that the function that we're applying on each block is the fft:
block_size = [64, 10];
temp_size = 5 * block_size;
col_func = #(x)cellfun(#(y)max(max(abs(fft2(y)))), num2cell(x, 1), 'Un', 0);
B = colfilt(A, block_size, 10 * block_size, 'sliding', col_func);
How does this work? colfilt processes the matrix A by rearranging each "sliding" block into a separate column of a new temporary matrix, and then applying the col_func to this new matrix. col_func in turn restores each column into the original block and applies fft2 on it, returning the largest amplitude value for each column.
Important things to note:
Since this mentioned temporary matrix includes all possible "sliding" blocks, memory could be a limitation. Therefore, in order to use less memory in calculations, colfilt breaks up the original matrix A into sub-matrices of temp_size, and performs calculations separately on each. The resulting matrix B is still the same, of course.
Each element in the resulting matrix B is computed from the corresponding block neighborhood. The larger your image is, the more blocks you will need to process, so the computation time will increase geometrically. I believe that you'll have to wait quite a bit until MATLAB finishes processing all sliding windows on your 2000-by-500 matrix.