I have a 3D (x,y,nframes) matrix/ movie ( named ch), and a logical mask (x,y). I want to do the mean of the mask pixels in each frame , at the end I get a vector of dim 1xnframes. And I want to do it with the reshape instead than frame by frame, but both results doesn't match and I don't understand why... Could you please let me know why???
for i=1:nframes
ch_aux=ch(:,:,i);
Mean_for(i)= mean(ch_aux(mask));
end
% C with reshape
[row,col] = find(mask);
A=ch(row,col,:);
B=reshape(A,1,length(row).^2,nframes );
Mean_res=mean(B);
plot( Mean_for,'r')
hold on
plot( Mean_res(:))
legend({'for','reshape'})
thanks!
Solution:
Using reshape
r = reshape( ch, [], size(ch,3) );
Mean_res = mean( r(mask(:),: ), 2 );
Benchmarking (comparing this solution to the two proposed by Divakar) can be found here showing:
Shai
Elapsed time is 0.0234721 seconds.
Divakar 1
Elapsed time is 0.743586 seconds.
Divakar 2
Elapsed time is 0.025841 seconds.
bsxfun is significantly slower,
What caused the error in the original code?
I suspect your problem lies in the expression A=ch(row, col,:);:
Suppose ch is of size 2-by-2-by-n and mask = [ 1 0; 0 1];, in that case
[rox, col] = find(mask);
Results with
row = [1,2];
col = [1,2];
And, obviously, A=ch(row,col,:); results with A equals ch exactly, which is not what you want...
For efficiency, you could use another vectorized solution with bsxfun alongwith your favourite reshape -
Mean_bsxfun = sum(reshape(bsxfun(#times,ch,mask),[],size(ch,3)),1)./sum(mask(:))
Or even better, abuse the fast matrix multiplication in MATLAB -
Mean_matmult = mask(:).'*reshape(ch,[],size(ch,3))./sum(mask(:))
Related
I have this code:
abs(mean(exp(1i*( a(:,1) - a(:,2) ))))
where a is a 550-by-129 double matrix. How can I write code using that code to replace a(:,1) with a(:,2) and then a(:,3) and so on because I need each column to subtract from every other column?
Another method using matrix multiplication:
E = exp(1i*a);
result = abs(E.'*(1./E)/size(E,1));
Explanation:
You can rewrite the expression
exp(1i*( a - b) ))
as
exp(1i*a)/exp(1i*b)
so
exp(1i*a)*(1/exp(1i*b))
and mean(x) is sum(x)/n
Using that you can do your task using very fast matrix multiplication.
Result of a comparison between different methods in Octave:
Matrix Multiplication:
Elapsed time is 0.0133181 seconds.
BSXFUN:
Elapsed time is 1.33882 seconds.
REPMAT:
Elapsed time is 1.43535 seconds.
FOR LOOP:
Elapsed time is 3.10798 seconds.
Here is the code for comparing different methods.
Looped, this is an easy trick; let an outer loop run over all indices, and an inner loop as well.
a = rand(550,129);
out = zeros(size(a,2),size(a,2));
for ii = 1:size(a,2)
for jj = 1:size(a,2)
out(ii,jj) = abs(mean(exp(1i*(a(:,ii)-a(:,jj)))));
end
end
No loops, one line:
result = permute(abs(mean(exp(1i*bsxfun(#minus, a, permute(a, [1 3 2]))),1)), [2 3 1]);
This computes all pairs of row differences as a 3D array, where the second and third dimensions refer to the two row indices in the original 2D arrays; then applies the required operations along the first dimension; and finally permutes the dimensions to yield a 2D array result.
a bit off-topic, but you can do that with indexing too
a = rand(550,129);
c = repmat(1:size(a,2),1,size(a,2));
c(2,:) = imresize(1:size(a,2), [1 length(c)], 'nearest');
out = abs(mean(exp(1i*( a(:,c(1,:)) - a(:,c(2,:)) ))));
out = reshape(out,[size(a,2) size(a,2)]); % 129x129 format
during my matlab profiling, i noticed one line of code that consumes much more time than i imagined. Any idea how to make it faster?
X = Y(ids_A, ids_A) - (Y(ids_A,k) * Y(k,ids_A))/Y(k,k);
X, and Y are symmetric matrices with the same size (dxd), k is an index of a single row/column in Y, ids_A is a vector of indices of all the other rows/columns( therefore Y(ids_A,k) is a column vector and Y(k,ids_A) is a row vector)
ids_A = setxor(1:d,k);
Thanks!
You can perhaps replace the outer product multiplication with a call to bsxfun:
X = Y(ids_A, ids_A) - (bsxfun(#times, Y(ids_A,k), Y(k,ids_A))/Y(k,k));
So how does the above code work? Let's take a look at the definition of the outer product when one vector is 4 elements and the other 3 elements:
Source: Wikipedia
As you can see, the outer product is created by element-wise products where the first vector u is replicated horizontally while the second vector v is replicated vertically. You then find the element-wise products of each element to produce your result. This is eloquently done with bsxfun:
bsxfun(#times, u, v.');
u would be a column vector and v.' would be a row vector. bsxfun naturally replicates the data to follow the above pattern, and then we use #times to perform the element-wise products.
I am assuming your code to look something like this -
for k = 1:d
ids_A = setxor(1:d,k);
X = Y(ids_A, ids_A) - (Y(ids_A,k) * Y(k,ids_A))/Y(k,k);
end
With the given code snippet, it's safe to assume that you are somehow using X within that loop. You can calculate all the X matrices as a pre-calculation step before the start of such a loop and these calculations could be performed as a vectorized approach.
Regarding the code snippet itself, it can be seen that you are "escaping" one index at each iteration with setxor. Now, if you are going with a vectorized approach, you can perform all those mathematical operations in one-go and later on remove the elements that got incorporated in the vectorized approach, but weren't intended. This really is the essence of a bsxfun based vectorized approach listed next -
%// Perform all matrix-multiplications in one go with bsxfun and permute
mults = bsxfun(#times,permute(Y,[1 3 2]),permute(Y,[3 2 1]));
%// Scale those with diagonal elements from Y and get X for every iteration
scaledvals = bsxfun(#rdivide,mults,permute(Y(1:d+1:end),[1 3 2]));
X_vectorized = bsxfun(#minus,Y,scaledvals);
%// Find row and column indices as linear indices to be removed from X_all
row_idx = bsxfun(#plus,[0:d-1]*d+1,[0:d-1]'*(d*d+1));
col_idx = bsxfun(#plus,[1:d]',[0:d-1]*(d*(d+1)));
%// Remove those "setxored" indices and then reshape to expected size
X_vectorized([row_idx col_idx])=[];
X_vectorized = reshape(X_vectorized,d-1,d-1,d);
Benchmarking
Benchmarking Code
d = 50; %// Datasize
Y = rand(d,d); %// Create random input
num_iter = 100; %// Number of iterations to be run for each approach
%// Warm up tic/toc.
for k = 1:100000
tic(); elapsed = toc();
end
disp('------------------------------ With original loopy approach')
tic
for iter = 1:num_iter
for k = 1:d
ids_A = setxor(1:d,k);
X = Y(ids_A, ids_A) - (Y(ids_A,k) * Y(k,ids_A))/Y(k,k);
end
end
toc
clear X k ids_A
disp('------------------------------ With proposed vectorized approach')
tic
for iter = 1:num_iter
mults = bsxfun(#times,permute(Y,[1 3 2]),permute(Y,[3 2 1]));
scaledvals = bsxfun(#rdivide,mults,permute(Y(1:d+1:end),[1 3 2]));
X_vectorized = bsxfun(#minus,Y,scaledvals);
row_idx = bsxfun(#plus,[0:d-1]*d+1,[0:d-1]'*(d*d+1));
col_idx = bsxfun(#plus,[1:d]',[0:d-1]*(d*(d+1)));
X_vectorized([row_idx col_idx])=[];
X_vectorized = reshape(X_vectorized,d-1,d-1,d);
end
toc
Results
Case #1: d = 50
------------------------------ With original loopy approach
Elapsed time is 0.849518 seconds.
------------------------------ With proposed vectorized approach
Elapsed time is 0.154395 seconds.
Case #2: d = 100
------------------------------ With original loopy approach
Elapsed time is 2.079886 seconds.
------------------------------ With proposed vectorized approach
Elapsed time is 2.285884 seconds.
Case #1: d = 200
------------------------------ With original loopy approach
Elapsed time is 7.592865 seconds.
------------------------------ With proposed vectorized approach
Elapsed time is 19.012421 seconds.
Conclusions
One can easily notice that the proposed vectorized approach might be a better choice when dealing with matrixes of sizes upto 100 x 100 beyond which
the memory-hungry bsxfun slows us down.
Suppose I have a matrix 1a1 which is 1 x n, and I want to find the average value between each element of a and its neighbors.
What's a smart way to do this?
EX:
If
a=[0 1 2 1 0 1];
Then the "average value matrix" is:
b=[0.5 1 1.33 1 0.5];
Where the first entry of b is:
b(1) = (0+1)/2 = 0.5
b(2) = (0+1+2)/3 = 1
etc.
I would suggest doing the middle as vector ops and handling the edge conditions as scalars.
b=zeros(size(a));
b(2:end-1)=(a(1:end-2)+a(2:end-1)+a(3:end))/3;
b(1)=(a(1)+a(2))/2;
b(end)=(a(end-1)+a(end))/2;
If you get into bigger averages...
% scale and sum elements with a sliding window 3 long.
b=conv(a,[1,1,1]/3)
%
% remove the tails
b=b(2:end-1)
%
% and rescale the edge cases.
b(1)=b(1)*3/2
b(end)=b(end)*3/2
I compared the first method above(vector), the convolution method, and the hankel method suggested by RDizzl3. (Sorry Luis, I don't have the Statistics package, though I expect the nanmean method to be slower due to the amount of condition checking.) The comparison was with a 10000 length random a vector, to make the timing significant. b was initialized to a zeros matrix of the correct size before these timings were done.The hankel matrix(h) of correct size was precomputed before the these timings as well.
% hankle method
tic; b(1)=mean(a([1,2])); b(2:(n-1))=mean(a(h),2); b(2)=mean(a([n-1,n])); toc
Elapsed time is 0.001698 seconds.
% convolution method
tic; c=conv(a,[1,1,1]/3) ; b=c(2:(2+n-1)); b(1)=b(1)*3/2; b(n)=b(n)*3/2; toc;
Elapsed time is 0.000339 seconds.
% vector method
tic; b(1)=mean(a([1,2])) ; b(2:(n-1))=(a(1:(n-2))+a(2:(n-1))+a(3:n))/3;b(2)=mean(a([n-1,n])); toc
Elapsed time is 0.000914 seconds.
I repeated the above 3 more times and sorted the results,
hankel convolution vector
9.2500e-04 3.3900e-04 7.2600e-04
1.3820e-03 5.2600e-04 8.7100e-04
1.6980e-03 5.5200e-04 9.1400e-04
2.1570e-03 5.5300e-04 2.6390e-03
I am a little surprised, I didn't expect the efficiency of the convolution approach to come out till larger window sizes. But it consistently did the best here.
Note that if you are using smaller data sets these timings probably aren't appropriate. I wouldn't at all be surprised if the hankel approach works better if the interest is in large numbers of shorter length vectors.
You can use this:
a=[0 1 2 1 0 1];
n = numel(a);
h = hankel(1:(n-2),(n-2):n);
b(1) = mean(a([1 2]))
b(2:(n-1)) = mean(a(h),2);
b(n) = mean(a([n-1 n]))
This will return the vector:
b = [0.5000 1.0000 1.3333 1.0000 0.6667 0.5000]
This takes the elements from the vector a and finds the average for its neighbors, so:
b(1) = (0+1)/2 = 0.5
b(2) = (0+1+2)/3 = 1
b(3) = (1+2+1)/3 = 1.3333
b(4) = (2+1+0)/3 = 1
b(5) = (1+0+1)/3 = 0.6667
b(6) = (0+1)/2 = 0.5 % last element
a = [0 1 2 1 0 1]; %// data
n = 1; %// how many neighbours to consider on each side
a2 = [NaN(1,n) a NaN(1,n)]; %// pad with NaN's (which will be ignored by nanmean)
b = arrayfun(#(k) nanmean(a2(k-n:k+n)), n+1:n+numel(a)); %// apply a
%// sliding-window mean ignoring NaN's
Easiest way to use smooth filter
output=smooth(A,3,'moving');
where 3 is the window size (should be odd value)
check documentation for smooth function
https://www.mathworks.com/help/curvefit/smooth.html
G'day,
I'm trying to find a way to use a vector of [x,y] points to index from a large matrix in MATLAB.
Usually, I would convert the subscript points to the linear index of the matrix.(for eg. Use a vector as an index to a matrix) However, the matrix is 4-dimensional, and I want to take all of the elements of the 3rd and 4th dimensions that have the same 1st and 2nd dimension. Let me hopefully demonstrate with an example:
Matrix = nan(4,4,2,2); % where the dimensions are (x,y,depth,time)
Matrix(1,2,:,:) = 999; % note that this value could change in depth (3rd dim) and time (4th time)
Matrix(3,4,:,:) = 888; % note that this value could change in depth (3rd dim) and time (4th time)
Matrix(4,4,:,:) = 124;
Now, I want to be able to index with the subscripts (1,2) and (3,4), etc and return not only the 999 and 888 which exist in Matrix(:,:,1,1) but the contents which exist at Matrix(:,:,1,2),Matrix(:,:,2,1) and Matrix(:,:,2,2), and so on (IRL, the dimensions of Matrix might be more like size(Matrix) = (300 250 30 200)
I don't want to use linear indices because I would like the results to be in a similar vector fashion. For example, I would like a result which is something like:
ans(time=1)
999 888 124
999 888 124
ans(time=2)
etc etc etc
etc etc etc
I'd also like to add that due to the size of the matrix I'm dealing with, speed is an issue here - thus why I'd like to use subscript indices to index to the data.
I should also mention that (unlike this question: Accessing values using subscripts without using sub2ind) since I want all the information stored in the extra dimensions, 3 and 4, of the i and jth indices, I don't think that a slightly faster version of sub2ind still would not cut it..
I can think of three ways to go about this
Simple loop
Just loop over all the 2D indices you have, and use colons to access the remaining dimensions:
for jj = 1:size(twoDinds,1)
M(twoDinds(jj,1),twoDinds(jj,2),:,:) = rand;
end
Vectorized calculation of Linear indices
Skip sub2ind and vectorize the computation of linear indices:
% generalized for arbitrary dimensions of M
sz = size(M);
nd = ndims(M);
arg = arrayfun(#(x)1:x, sz(3:nd), 'UniformOutput', false);
[argout{1:nd-2}] = ndgrid(arg{:});
argout = cellfun(...
#(x) repmat(x(:), size(twoDinds,1),1), ...
argout, 'Uniformoutput', false);
twoDinds = kron(twoDinds, ones(prod(sz(3:nd)),1));
% the linear indices
inds = twoDinds(:,1) + ([twoDinds(:,2) [argout{:}]]-1) * cumprod(sz(1:3)).';
Sub2ind
Just use the ready-made tool that ships with Matlab:
inds = sub2ind(size(M), twoDinds(:,1), twoDinds(:,2), argout{:});
Speed
So which one's the fastest? Let's find out:
clc
M = nan(4,4,2,2);
sz = size(M);
nd = ndims(M);
twoDinds = [...
1 2
4 3
3 4
4 4
2 1];
tic
for ii = 1:1e3
for jj = 1:size(twoDinds,1)
M(twoDinds(jj,1),twoDinds(jj,2),:,:) = rand;
end
end
toc
tic
twoDinds_prev = twoDinds;
for ii = 1:1e3
twoDinds = twoDinds_prev;
arg = arrayfun(#(x)1:x, sz(3:nd), 'UniformOutput', false);
[argout{1:nd-2}] = ndgrid(arg{:});
argout = cellfun(...
#(x) repmat(x(:), size(twoDinds,1),1), ...
argout, 'Uniformoutput', false);
twoDinds = kron(twoDinds, ones(prod(sz(3:nd)),1));
inds = twoDinds(:,1) + ([twoDinds(:,2) [argout{:}]]-1) * cumprod(sz(1:3)).';
M(inds) = rand;
end
toc
tic
for ii = 1:1e3
twoDinds = twoDinds_prev;
arg = arrayfun(#(x)1:x, sz(3:nd), 'UniformOutput', false);
[argout{1:nd-2}] = ndgrid(arg{:});
argout = cellfun(...
#(x) repmat(x(:), size(twoDinds,1),1), ...
argout, 'Uniformoutput', false);
twoDinds = kron(twoDinds, ones(prod(sz(3:nd)),1));
inds = sub2ind(size(M), twoDinds(:,1), twoDinds(:,2), argout{:});
M(inds) = rand;
end
toc
Results:
Elapsed time is 0.004778 seconds. % loop
Elapsed time is 0.807236 seconds. % vectorized linear inds
Elapsed time is 0.839970 seconds. % linear inds with sub2ind
Conclusion: use the loop.
Granted, the tests above are largely influenced by JIT's failure to compile the two last loops, and the non-specificity to 4D arrays (the last two method also work on ND arrays). Making a specialized version for 4D will undoubtedly be much faster.
Nevertheless, the indexing with simple loop is, well, simplest to do, easiest on the eyes and very fast too, thanks to JIT.
So, here is a possible answer... but it is messy. I suspect it would more computationally expensive then a more direct method... And this would definitely not be my preferred answer. It would be great if we could get the answer without any for loops!
Matrix = rand(100,200,30,400);
grabthese_x = (1 30 50 90);
grabthese_y = (61 9 180 189);
result=nan(size(length(grabthese_x),size(Matrix,3),size(Matrix,4));
for tt = 1:size(Matrix,4)
subset = squeeze(Matrix(grabthese_x,grabthese_y,:,tt));
for NN=1:size(Matrix,3)
result(:,NN,tt) = diag(subset(:,:,NN));
end
end
The resulting matrix, result should have size size(result) = (4 N tt).
I think this should work, even if Matrix isn't square. However, it is not ideal, as I said above.
I have two very large matrices (60x25000) and I'd like to compute the correlation between the columns only between the two matrices. For example:
corrVal(1) = corr(mat1(:,1), mat2(:,1);
corrVal(2) = corr(mat1(:,2), mat2(:,2);
...
corrVal(i) = corr(mat1(:,i), mat2(:,i);
For smaller matrices I can simply use:
colCorr = diag( corr( mat1, mat2 ) );
but this doesn't work for very large matrices as I run out of memory. I've considered slicing up the matrices to compute the correlations and then combining the results but it seems like a waste to compute correlation between column combinations that I'm not actually interested.
Is there a quick way to directly compute what I'm interested?
Edit: I've used a loop in the past but its just way to slow:
mat1 = rand(60,5000);
mat2 = rand(60,5000);
nCol = size(mat1,2);
corrVal = zeros(nCol,1);
tic;
for i = 1:nCol
corrVal(i) = corr(mat1(:,i), mat2(:,i));
end
toc;
This takes ~1 second
tic;
corrVal = diag(corr(mat1,mat2));
toc;
This takes ~0.2 seconds
I can obtain a x100 speed improvement by computing it by hand.
An=bsxfun(#minus,A,mean(A,1)); %%% zero-mean
Bn=bsxfun(#minus,B,mean(B,1)); %%% zero-mean
An=bsxfun(#times,An,1./sqrt(sum(An.^2,1))); %% L2-normalization
Bn=bsxfun(#times,Bn,1./sqrt(sum(Bn.^2,1))); %% L2-normalization
C=sum(An.*Bn,1); %% correlation
You can compare using that code:
A=rand(60,25000);
B=rand(60,25000);
tic;
C=zeros(1,size(A,2));
for i = 1:size(A,2)
C(i)=corr(A(:,i), B(:,i));
end
toc;
tic
An=bsxfun(#minus,A,mean(A,1));
Bn=bsxfun(#minus,B,mean(B,1));
An=bsxfun(#times,An,1./sqrt(sum(An.^2,1)));
Bn=bsxfun(#times,Bn,1./sqrt(sum(Bn.^2,1)));
C2=sum(An.*Bn,1);
toc
mean(abs(C-C2)) %% difference between methods
Here are the computing times:
Elapsed time is 10.822766 seconds.
Elapsed time is 0.119731 seconds.
The difference between the two results is very small:
mean(abs(C-C2))
ans =
3.0968e-17
EDIT: explanation
bsxfun does a column-by-column operation (or row-by-row depending on the input).
An=bsxfun(#minus,A,mean(A,1));
This line will remove (#minus) the mean of each column (mean(A,1)) to each column of A... So basically it makes the columns of A zero-mean.
An=bsxfun(#times,An,1./sqrt(sum(An.^2,1)));
This line multiply (#times) each column by the inverse of its norm. So it makes them L-2 normalized.
Once the columns are zero-mean and L2-normalized, to compute the correlation, you just have to make the dot product of each column of An with each column of B. So you multiply them element-wise An.*Bn, and then you sum each column: sum(An.*Bn);.
I think the obvious loop might be good enough for your size of problem. On my laptop it takes less than 6 seconds to do the following:
A = rand(60,25000);
B = rand(60,25000);
n = size(A,1);
m = size(A,2);
corrVal = zeros(1,m);
for k=1:m
corrVal(k) = corr(A(:,k),B(:,k));
end