Using the Matlab Profiler I found that this line of code is creating a large bottleneck and slowing down my program. w,x,y,z are all 3D matrices containing the same dimensions (A x B x C) where A does not equal B and does not equal C. Is there any way to optimize this line of code to run faster?
dt = .5;
for t = 1: tstop
w(:,:,t+1)= sum( dt*(x(:,:,t:-1:1).*(y(:,:,1:t) - .002).*z(:,:,1:t)),3);
end
If you group some terms outside the for loop, you can get up to a 2x boost:
p = dt*(y - .002).*z;
for t = 1: tstop
w(:,:,t+1)= sum( x(:,:,t:-1:1).*p(:,:,1:t), 3);
end
It is now easier to notice that we are computing convolutions of x and p along the third dimension. If that dimension C (or tstop) is large, you can try to inline or optimize those convolutions.
I would reshape the 3D matrices into 2D ones, grouping the first 2 dimensions and keeping the time dimension as the second one. Then you can try to perform row-wise convolution with conv2 (if possible, as claimed in this answer), of fft. Find below a solution with fft (and zero-padding), assuming tstop = C:
X = reshape(x, [A*B, C]); % reshape to 2D
Y = reshape(y, [A*B, C]);
Z = reshape(z, [A*B, C]);
P = dt*(Y - .002).*Z; % grouped terms
z__ = zeros(A*B, C); % zero-padding
W = real(ifft(fft([z__, X]').*fft([z__, P]'))'); % column-wise fft
W = [zeros(A*B, 1), W(:, 1:C)]; % first half
w = reshape(W, [A, B, C+1]);
The results are the same, and depending of A,B,C, this can give you a big performance boost. Example with A=13, B=14, C=1155:
original: 1.026312 seconds
grouping terms: 0.509862 seconds
FFT: 0.033699 seconds
Related
I was asked to do circular convolution between two functions by sampling them, using the functions cconv. A known result of this sort of convolution is: CCONV( sin(x), sin(x) ) == -pi*cos(x)
To test the above I did:
w = linspace(0,2*pi,1000);
l = linspace(0,2*pi,1999);
stem(l,cconv(sin(w),sin(w))
but the result I got was:
which is absolutely not -pi*cos(x).
Can anybody please explain what is wrong with my code and how to fix it?
In the documentation of cconv it says that:
c = cconv(a,b,n) circularly convolves vectors a and b. n is the length of the resulting vector. If you omit n, it defaults to length(a)+length(b)-1. When n = length(a)+length(b)-1, the circular convolution is equivalent to the linear convolution computed with conv.
I believe that the reason for your problem is that you do not specify the 3rd input to cconv, which then selects the default value, which is not the right one for you. I have made an animation showing what happens when different values of n are chosen.
If you compare my result for n=200 to your plot you will see that the amplitude of your data is 10 times larger whereas the length of your linspace is 10 times bigger. This means that some normalization is needed, likely a multiplication by the linspace step.
Indeed, after proper scaling and choice of n we get the right result:
res = 100; % resolution
w = linspace(0,2*pi,res);
dx = diff(w(1:2)); % grid step
stem( linspace(0,2*pi,res), dx * cconv(sin(w),sin(w),res) );
This is the code I used for the animation:
hF = figure();
subplot(1,2,1); hS(1) = stem(1,cconv(1,1,1)); title('Autoscaling');
subplot(1,2,2); hS(2) = stem(1,cconv(1,1,1)); xlim([0,7]); ylim(50*[-1,1]); title('Constant limits');
w = linspace(0,2*pi,100);
for ind1 = 1:200
set(hS,'XData',linspace(0,2*pi,ind1));
set(hS,'YData',cconv(sin(w),sin(w),ind1));
suptitle("n = " + ind1);
drawnow
% export_fig(char("D:\BLABLA\F" + ind1 + ".png"),'-nocrop');
end
Given a square matrix of say size 400x400, how would I go about splitting this into constituent sub-matrices of 20x20 using a for-loop? I can't even think where to begin!
I imagine I want something like :
[x,y] = size(matrix)
for i = 1:20:x
for j = 1:20:y
but I'm unsure how I would proceed. Thoughts?
Well, I know that the poster explicitly asked for a for loop, and Jeff Mather's answer provided exactly that.
But still I got curious whether it is possible to decompose a matrix into tiles (sub-matrices) of a given size without a loop. In case someone else is curious, too, here's what I have come up with:
T = permute(reshape(permute(reshape(A, size(A, 1), n, []), [2 1 3]), n, m, []), [2 1 3])
transforms a two-dimensional array A into a three-dimensional array T, where each 2d slice T(:, :, i) is one of the tiles of size m x n. The third index enumerates the tiles in standard Matlab linearized order, tile rows first.
The variant
T = permute(reshape(A, size(A, 1), n, []), [2 1 3]);
T = permute(reshape(T, n, m, [], size(T, 3)), [2 1 3 4]);
makes T a four-dimensional array where T(:, :, i, j) gives the 2d slice with tile indices i, j.
Coming up with these expressions feels a bit like solving a sliding puzzle. ;-)
I'm sorry that my answer does not use a for loop either, but this would also do the trick:
cellOf20x20matrices = mat2cell(matrix, ones(1,20)*20, ones(1,20)*20)
You can then access the individual cells like:
cellOf20x20matrices{i,j}(a,b)
where i,j is the submatrix to fetch (and a,b is the indexing into that matrix if needed)
Regards
You seem really close. Just using the problem as you described it (400-by-400, divided into 20-by-20 chunks), wouldn't this do what you want?
[x,y] = size(M);
for i = 1:20:x
for j = 1:20:y
tmp = M(i:(i+19), j:(j+19));
% Do something interesting with "tmp" here.
end
end
Even though the question is basically for 2D matrices, inspired by A. Donda's answer I would like to expand his answer to 3D matrices so that this technique could be used in cropping True Color images (3D)
A = imread('peppers.png'); %// size(384x512x3)
nCol = 4; %// number of Col blocks
nRow = 2; %// number of Row blocks
m = size(A,1)/nRow; %// Sub-matrix row size (Should be an integer)
n = size(A,2)/nCol; %// Sub-matrix column size (Should be an integer)
imshow(A); %// show original image
out1 = reshape(permute(A,[2 1 4 3]),size(A,2),m,[],size(A,3));
out2 = permute(reshape(permute(out1,[2 1 3 4]),m,n,[],size(A,3)),[1 2 4 3]);
figure;
for i = 1:nCol*nRow
subplot(nRow,nCol,i); imshow(out2(:,:,:,i));
end
The basic idea is to make the 3rd Dimension unaffected while reshaping so that the image isn't distorted. To achieve this, additional permuting was done to swap 3rd and 4th dimensions. Once the process is done, the dimensions are restored as it was, by permuting back.
Results:
Original Image
Subplots (Partitions / Sub Matrices)
Advantage of this method is, it works good on 2D images as well.
Here is an example of a Gray Scale image (2D). Example used here is MatLab in-built image 'cameraman.tif'
With some many upvotes for the answer that makes use nested calls to permute, I thought of timing it and comparing to the other answer that makes use of mat2cell.
It is true that they don't return the exact same thing but:
the cell can be easily converted into a matrix like the other (I timed this, see further down);
when this problem arises, it is preferable (in my experience) to have the data in a cell since later on one will often want to put the original back together;
Anyway, I have compared them both with the following script. The code was run in Octave (version 3.9.1) with JIT disabled.
function T = split_by_reshape_permute (A, m, n)
T = permute (reshape (permute (reshape (A, size (A, 1), n, []), [2 1 3]), n, m, []), [2 1 3]);
endfunction
function T = split_by_mat2cell (A, m, n)
l = size (A) ./ [m n];
T = mat2cell (A, repmat (m, l(1), 1), repmat (n, l (2), 1));
endfunction
function t = time_it (f, varargin)
t = cputime ();
for i = 1:100
f(varargin{:});
endfor
t = cputime () - t;
endfunction
Asizes = [30 50 80 100 300 500 800 1000 3000 5000 8000 10000];
Tsides = [2 5 10];
As = arrayfun (#rand, Asizes, "UniformOutput", false);
for d = Tsides
figure ();
t1 = t2 = [];
for A = As
A = A{1};
s = rows (A) /d;
t1(end+1) = time_it (#split_by_reshape_permute, A, s, s);
t2(end+1) = time_it (#split_by_mat2cell, A, s, s);
endfor
semilogy (Asizes, [t1(:) t2(:)]);
title (sprintf ("Splitting in %i", d));
legend ("reshape-permute", "mat2cell");
xlabel ("Length of matrix side (all squares)");
ylabel ("log (CPU time)");
endfor
Note that the Y axis is in log scale
Performance
Performance wise, using the nested permute will only be faster for smaller matrices where big changes in relative performance are actually very small changes in time. Note that the Y axis is in log scale, so the difference between the two functions for a 100x100 matrix is 0.02 seconds while for a 10000x10000 matrix is 100 seconds.
I have also tested the following which will convert the cell into a matrix so that the return values of the two functions are the same:
function T = split_by_mat2cell (A, m, n)
l = size (A) ./ [m n];
T = mat2cell (A, repmat (m, l(1), 1), repmat (n, l (2), 1), 1);
T = reshape (cell2mat (T(:)'), [m n numel(T)]);
endfunction
This does slow it down a bit but not enough to consider (the lines will cross at 600x600 instead of 400x400).
Readability
It is so much more difficult to get your head around the use of the nested permute and reshape. It's mad to use it. It will increase maintenance time by a lot (but hey, this is Matlab language, it's not supposed to be elegant and reusable).
Future
The nested calls to permute does not expand nicely at all into N dimensions. I guess it would require a for loop by dimension (which would not help at all the already quite cryptic code). On the other hand, making use of mat2cell:
function T = split_by_mat2cell (A, lengths)
dl = arrayfun (#(l, s) repmat (l, s, 1), lengths, size (A) ./ lengths, "UniformOutput", false);
T = mat2cell (A, dl{:});
endfunction
Edit (and tested in Matlab too)
The amount of upvotes on the answer suggesting to use permute and reshape got me so curious that I decided to get this tested in Matlab (R2010b). The results there were pretty much the same, i.e., it's performance is really poor. So unless this operation will be done a lot of times, in matrices that will always be small (less than 300x300), and there will always be a Matlab guru around to explain what it does, don't use it.
If you want to use a for loop you can do this:
[x,y] = size(matrix)
k=1; % counter
for i = 1:20:x
for j = 1:20:y
subMatrix=Matrix(i:i+19, j:j+19);
subMatrixCell{k}=subMatrix; % if you want to save all the
% submatrices into a cell array
k=k+1;
end
end
I'm having a problem with finding a faster way to convolve multiple vectors. All the vectors have the same length M, so these vectors can be combined as a matrix (A) with the size (N, M). N is the number of vectors.
Now I am using the below code to convolve all these vectors:
B=1;
for i=1:N
B=conv(B, A(i,:));
end
I found this piece of code becomes a speed-limit step in my program since it is frequently called. My question is, is there a way to make this calculation faster? Consider M is a small number (say 2).
It should be quite a lot faster if you implement your convolution as multiplication in the frequency domain.
Look at the way fftfilt is implemented. You can't get optimal performance using fftfilt, because you want to only convert back to time domain after all convolutions are complete, but it nicely illustrates the method.
Convolution is associative. Combine the small kernels, convolve once with the data.
Test data:
M = 2; N = 5; L = 100;
A = rand(N,M);
Bsrc = rand(1,L);
Reference (convolve each kernel with data):
B = Bsrc;
for i=1:N,
B=conv(B, A(i,:));
end
Combined kernels:
A0 = 1;
for ii=1:N,
A0 = conv(A0,A(ii,:));
end
B0 = conv(Bsrc,A0);
Compare:
>> max(abs(B-B0))
ans =
2.2204e-16
If you perform this convolution often, precompute A0 so you can just do one convolution (B0 = conv(Bsrc,A0);).
I have found several questions/answers for vectorizing and speeding up routines for multiplying a matrix and a vector in a single loop, but I am trying to do something a little more general, namely multiplying an arbitrary number of matrices together, and then performing that operation an arbitrary number of times.
I am writing a general routine for calculating thin-film reflection from an arbitrary number of layers vs optical frequency. For each optical frequency W each layer has an index of refraction N and an associated 2x2 transfer matrix L and 2x2 interface matrix I which depends on the index of refraction and the thickness of the layer. If n is the number of layers, and m is the number of frequencies, then I can vectorize the index into an n x m matrix, but then in order to calculate the reflection at each frequency, I have to do nested loops. Since I am ultimately using this as part of a fitting routine, anything I can do to speed it up would be greatly appreciated.
This should provide a minimum working example:
W = 1260:0.1:1400; %frequency in cm^-1
N = rand(4,numel(W))+1i*rand(4,numel(W)); %dummy complex index of refraction
D = [0 0.1 0.2 0]/1e4; %thicknesses in cm
[n,m] = size(N);
r = zeros(size(W));
for x = 1:m %loop over frequencies
C = eye(2); % first medium is air
for y = 2:n %loop over layers
na = N(y-1,x);
nb = N(y,x);
%I = InterfaceMatrix(na,nb); % calculate the 2x2 interface matrix
I = [1 na*nb;na*nb 1]; % dummy matrix
%L = TransferMatrix(nb) % calculate the 2x2 transfer matrix
L = [exp(-1i*nb*W(x)*D(y)) 0; 0 exp(+1i*nb*W(x)*D(y))]; % dummy matrix
C = C*I*L;
end
a = C(1,1);
c = C(2,1);
r(x) = c/a; % reflectivity, the answer I want.
end
Running this twice for two different polarizations for a three layer (air/stuff/substrate) problem with 2562 frequencies takes 0.952 seconds while solving the exact same problem with the explicit formula (vectorized) for a three layer system takes 0.0265 seconds. The problem is that beyond 3 layers, the explicit formula rapidly becomes intractable and I would have to have a different subroutine for each number of layers while the above is completely general.
Is there hope for vectorizing this code or otherwise speeding it up?
(edited to add that I've left several things out of the code to shorten it, so please don't try to use this to actually calculate reflectivity)
Edit: In order to clarify, I and L are different for each layer and for each frequency, so they change in each loop. Simply taking the exponent will not work. For a real world example, take the simplest case of a soap bubble in air. There are three layers (air/soap/air) and two interfaces. For a given frequency, the full transfer matrix C is:
C = L_air * I_air2soap * L_soap * I_soap2air * L_air;
and I_air2soap ~= I_soap2air. Thus, I start with L_air = eye(2) and then go down successive layers, computing I_(y-1,y) and L_y, multiplying them with the result from the previous loop, and going on until I get to the bottom of the stack. Then I grab the first and third values, take the ratio, and that is the reflectivity at that frequency. Then I move on to the next frequency and do it all again.
I suspect that the answer is going to somehow involve a block-diagonal matrix for each layer as mentioned below.
Not next to a matlab, so that's only a starter,
Instead of the double loop you can write na*nb as Nab=N(1:end-1,:).*N(2:end,:);
The term in the exponent nb*W(x)*D(y) can be written as e=N(2:end,:)*W'*D;
The result of I*L is a 2x2 block matrix that has this form:
M = [1, Nab; Nab, 1]*[e-, 0;0, e+] = [e- , Nab*e+ ; Nab*e- , e+]
with e- as exp(-1i*e), and e+ as exp(1i*e)'
see kron on how to get the block matrix form, to vectorize the propagation C=C*I*L just take M^n
#Lama put me on the right path by suggesting block matrices, but the ultimate answer ended up being more complicated, and so I put it here for posterity. Since the transfer and interface matrix is different for each layer, I leave in the loop over the layers, but construct a large sparse block matrix where each block represents a frequency.
W = 1260:0.1:1400; %frequency in cm^-1
N = rand(4,numel(W))+1i*rand(4,numel(W)); %dummy complex index of refraction
D = [0 0.1 0.2 0]/1e4; %thicknesses in cm
[n,m] = size(N);
r = zeros(size(W));
C = speye(2*m); % first medium is air
even = 2:2:2*m;
odd = 1:2:2*m-1;
for y = 2:n %loop over layers
na = N(y-1,:);
nb = N(y,:);
% get the reflection and transmission coefficients from subroutines as a vector
% of length m, one value for each frequency
%t = Tab(na, nb);
%r = Rab(na, nb);
t = rand(size(W)); % dummy vector for MWE
r = rand(size(W)); % dummy vector for MWE
% create diagonal and off-diagonal elements. each block is [1 r;r 1]/t
Id(even) = 1./t;
Id(odd) = Id(even);
Io(even) = 0;
Io(odd) = r./t;
It = [Io;Id/2].';
I = spdiags(It,[-1 0],2*m,2*m);
I = I + I.';
b = 1i.*(2*pi*D(n).*nb).*W;
B(even) = -b;
B(odd) = b;
L = spdiags(exp(B).',0,2*m,2*m);
C = C*I*L;
end
a = spdiags(C,0);
a = a(odd).';
c = spdiags(C,-1);
c = c(odd).';
r = c./a; % reflectivity, the answer I want.
With the 3 layer system mentioned above, it isn't quite as fast as the explicit formula, but it's close and probably can get a little faster after some profiling. The full version of the original code clocks at 0.97 seconds, the formula at 0.012 seconds and the sparse diagonal version here at 0.065 seconds.
By default, all built-in functions for computing correlation or covariance return a matrix. I am trying to write an efficient function that will compute the correlation between a seed region and various other regions, but I do not need the correlations between the other regions. I assume that computing the full correlation matrix would therefore be inefficient.
I could instead compute a the correlation matrix between each region and the seed region, choose one of the off diagonal points and store it, but I feel like looping in this situation is also inefficient.
To be more concrete, each point in my 3-dimensional space has a time dimension. I am attempting to compute the mean correlation between a given point and all points in space within a given radius. I want to repeat this procedure hundreds of thousands of times, for many different radius lengths, and so on, so I would like for this to be as efficient as possible.
So, what is the best way to compute the correlation between a single vector and several others, without computing correlations that I will just ignore?
Thank you,
Chris
EDIT: Here is my code now...
function [corrMap] = TIME_meanCorrMap(A,radius)
% Even though the variable is "radius", we work with cubes for simplicity...
% So, the radius is the distance (in voxels) from the center of the cube an edge.
denom = ((radius*2)^3)-1;
dim = size(A);
corrMap = zeros(dim(1:3));
for x = radius+1:dim(1)-radius
rx = [x-radius : x+radius];
for y = radius+1:dim(2)-radius
ry = [y-radius : y+radius];
for z = radius+1:dim(3)-radius
rz = [z-radius : z+radius];
corrCoefs = zeros(1,denom);
seed = A(x,y,z,:);
i=0;
for xx = rx
for yy = ry
for zz = rz
if ~all([x y z] == [xx yy zz])
i = i + 1;
temp = corrcoef(seed,A(xx,yy,zz,:));
corrCoeffs(i) = temp(1,2);
end
end
end
end
corrMap = mean(corrCoeffs);
end
end
end
EDIT: Here are some more times to supplement the accepted answer.
Using bsxfun() to do normalization, and matrix multiplication to compute correlations:
tic; for i=1:10000
x=rand(100);
xz = bsxfun(#rdivide,bsxfun(#minus,x,mean(x)),std(x));
cc = xz(:,2:end)' * xz(:,1) ./ 99;
end; toc
Elapsed time is 6.928251 seconds.
Using zscore() to normalize, matrix multiplication to compute correlations:
tic; for i=1:10000
x=rand(100);
xz = zscore(x);
cc = xz(:,2:end)' * xz(:,1) ./ 99;
end; toc
Elapsed time is 7.040677 seconds.
Using bsxfun() to normalize, and corr() to compute correlations.
tic; for i=1:10000
x=rand(100);
xz = bsxfun(#rdivide,bsxfun(#minus,x,mean(x)),std(x));
cc = corr(x(:,1),x(:,2:end));
end; toc
Elapsed time is 11.385707 seconds.
It is certainly possible to improve upon the for loop that you are currently employing. The correlation compuattions can be parallelized using matrix multiplications if you have sufficient RAM. However, it will require you to unwrap your 4-dimensional data matrix A into a different shape. most likely you are dealing with 3-dimensional voxelwise fMRI data, in which case you'll have to reshape from [x y z time] matrix to an [index time] matrix. I will assume you can deal with that reshaping. Once you have your seed timecourse [Time by 1] and your target timecourses [Time by NumTargets] ready, you can perform some much more efficient computations.
A quick way to efficiently compute the desired correlation is using the corr function in MATLAB. This function will accept 2 matrix arguments and it will quite efficiently compute all pairwise correlations between the columns of argument 1 and the columns of argument 2, e.g.
T = 200; %time samples
N = 20; %number of other voxels
seed = randn(T,1); %data from seed voxel
targets = randn(T,N); %data from target voxels
%here is the for loop method
tic
for n = 1:N
tmp = corrcoef(seed, targets(:,n));
tmpcc = tmp(1,2);
end
looptime = toc;
%here is the parallel method
tic
cc = corr(seed, targets);
matrixtime = toc;
On my machine, the parallel operation in corr is faster than the loop method by a factor proportional to T*N.
It is possible to go a little faster than the corr function if you are willing to perofrm the underlying matrix operations yourself, and in any case it is worth knowing what they are. The correlation between two vectors is basically a normalized dot product, so using the conventions above you can compute the correlations in the following way
zseed = zscore(seed); %normalize the seed timecourse by z-scoring
ztargets= zscore(targets); %normalize the target timecourses by z-scoring
ztargets = ztargets'; %flip columns and rows for convenience
cc2 = ztargets*zseed./(T-1); %compute many dot products with one matrix multiplication
The code above is basically what the corr function will do which is why it is much faster than the loop. Note that most of the operation time is in the zscore operations, and you can improve on the performance of the corr function if you efficiently compute the zscore using the bsxfun command. For now, I hope this gives you some direction on how to compute a correlation between a seed timecourse and many target timecourses without having to loop through and compute each one separately.