Fast way to compute row by row matrix correlation - matlab

I have two very large matrices (228453x460) and I want to compute correlation between rows.
for i=1:228453
if(vec1_preprocess(i,1))
for j=1:228453
df = effdf(vec1_preprocess(i,:)',vec2_preprocess(j,:)');
corr_temp = corr(vec1_preprocess(i,:)', vec2_preprocess(j,:)');
p = calculate_p(corr_temp, df);
temp = (meanVec(i)+p)/2;
meanVec(i) = temp;
end
disp(i);
end
end
This takes ~1day. Is there a direct way to compute this?
Edit: Code for effdf
function df = effdf(ts1,ts2);
%function df = effdf(ts1,ts2);
ts1=ts1-mean(ts1);
ts2=ts2-mean(ts2);
N=length(ts1);
ac1=xcorr(ts1);
ac1=ac1/max(ac1); % normalized autocorrelation
ac1=ac1(((length(ac1)+3)/2):((length(ac1)+3)/2+floor(N/4)));
ac2=xcorr(ts2);
ac2=ac2/max(ac2); % normalized autocorrelation
ac2=ac2(((length(ac2)+3)/2):((length(ac2)+3)/2+floor(N/4)));
df = 1/((1/N)+(2/N)*sum(((N-(1:length(ac1)))/N)'.*ac1.*ac2));

Since you didn't post the code, I assume that your custom functions calculate_p and effdf are perfectly optimized and don't represent the bottleneck of your script. Let's focus on what we have.
The first problem I see is:
if (vec1_preprocess(i,1))
A check over 228453 iterations can sensibly increase the running time. Hence, extract only the matrix rows that don't contain a 0 in the first column and perform your calculations on those:
idx = vec1_preprocess(:,1) ~= 0;
vec1_preprocess = vec1_preprocess(idx,:);
for i = 1:size(vec1_preprocess,1)
% ...
end
The second problem is corr. It seems like you are computing p-values also, using calculate_p. Why don't you use the buil-in p-values returned by the function as second output argument?
[c,p] = corr(A,B);
Alternatively, if Pearson's correlation is what you are looking for, you could replace corr with corrcoef to see if it produces a better performance.
Last but not least (in fact it's the most important thing): is there any reason why you are performing this computation row by row instead of running it on the whole matrices?

If you read the documentation, you'll see that corr computes the correlation between columns, not rows.
To convert rows into columns and columns into rows, simply transpose the matrix:
tmp1 = vec1_preprocess';
tmp2 = vec2_preprocess';
C = corr(tmp1,tmp2);

Related

How to efficiently implement Maxpooling in MATLAB?

I have implemented CNN in Matlab, but my implementation takes too much time. I have identified which part is more time consuming. It is max-pooling related code below:
%blockwise operation
fun = #(block_struct) max_matrix(block_struct.data);
%downsampling
maxpool = cell(number_feature_map,1);
for i=1:number_feature_map
maxpool{i}=blockproc(y{i},[2 2],fun);
end
function [maximum]=max_matrix(A)
maximum=max(A(:));
Without this (downsampling) it takes only 2 minutes to converge.
How can I make it efficient?
Instead of blockproc you can use kron to create indices of blocks and use accumarray to apply max to each block. assumed number of rows and column are even and assumed data are random matrices of size [6,8]
r = 6 ,c=8
idx = kron(reshape(1:(r*c/4),c/2,[]).',ones(2))
for ii=1:number_feature_map
data = rand(r,c);
maxpool{ii} = reshape(accumarray(idx(:),data(:),[],#max),c/2,[]).';
end

Normalization 3D Image according to Slices in MATLAB

I have a matrix which is 256X192X80. I want to normalize all slices (80 represents the slices) without using for loop.
The way I'm doing with for is below: (im_dom_raw is our matrix)
normalized_raw = zeros(size(im_dom_raw));
for a=1:80
slice_raw = im_dom_raw(:,:,a);
slice_raw = slice_raw-min(slice_raw(:));
slice_raw = slice_raw/(max(slice_raw(:)));
normalized_raw(:,:,a) = slice_raw;
end
The code below implements your normalization approach without using loops. Its based on bsxfun.
% Shift all values to the positive side
slices_raw = bsxfun(#minus,im_dom_raw,min(min(im_dom_raw)));
% Normalize all values with respect to the slice maximum (With input from #Daniel)
normalized_raw2 = bsxfun(#mrdivide,slices_raw,max(max(slices_raw)));
% A slightly faster approach would be
%normalized_raw2 = bsxfun(#times,slices_raw,max(max(slices_raw)).^-1);
% ... but it will differ with your approach due to numerical approximation
% Comparison to your previous loop based implementation
sum(abs(normalized_raw(:)-normalized_raw2(:)))
The last line of code outputs
ans =
0
Which (thanks to #Daniel) means that both approaches yield exact same results.

Fastest way to add multiple sparse matrices in a loop in MATLAB

I have a code that repeatedly calculates a sparse matrix in a loop (it performs this calculation 13472 times to be precise). Each of these sparse matrices is unique.
After each execution, it adds the newly calculated sparse matrix to what was originally a sparse zero matrix.
When all 13742 matrices have been added, the code exits the loop and the program terminates.
The code bottleneck occurs in adding the sparse matrices. I have made a dummy version of the code that exhibits the same behavior as my real code. It consists of a MATLAB function and a script given below.
(1) Function that generates the sparse matrix:
function out = test_evaluate_stiffness(n)
ind = randi([1 n*n],300,1);
val = rand(300,1);
[I,J] = ind2sub([n,n],ind);
out = sparse(I,J,val,n,n);
end
(2) Main script (program)
% Calculate the stiffness matrix
n=1000;
K=sparse([],[],[],n,n,n^2);
tic
for i=1:13472
temp=rand(1)*test_evaluate_stiffness(n);
K=K+temp;
end
fprintf('Stiffness Calculation Complete\nTime taken = %f s\n',toc)
I'm not very familiar with sparse matrix operations so I may be missing a critical point here that may allow my code to be sped up considerably.
Am I handling the updating of my stiffness matrix in a reasonable way in my code? Is there another way that I should be using sparse that will result in a faster solution?
A profiler report is also provided below:
If you only need the sum of those matrices, instead of building all of them individually and then summing them, simply concatenate the vectors I,J and vals and call sparse only once. If there are duplicate rows [i,j] in [I,J] the corresponding values S(i,j) will be summed automatically, so the code is absolutely equivalent. As calling sparse involves an internal call to a sorting algorithm, you save 13742-1 intermediate sorts and can get away with only one.
This involves changing the signature of test_evaluate_stiffness to output [I,J,val]:
function [I,J,val] = test_evaluate_stiffness(n)
and removing the line out = sparse(I,J,val,n,n);.
You will then change your other function to:
n = 1000;
[I,J,V] = deal([]);
tic;
for i = 1:13472
[I_i, J_i, V_i] = test_evaluate_stiffness(n);
nE = numel(I_i);
I(end+(1:nE)) = I_i;
J(end+(1:nE)) = J_i;
V(end+(1:nE)) = rand(1)*V_i;
end
K = sparse(I,J,V,n,n);
fprintf('Stiffness Calculation Complete\nTime taken = %f s\n',toc);
If you know the lengths of the output of test_evaluate_stiffness ahead of time, you can possibly save some time by preallocating the arrays I,J and V with appropriately-sized zeros matrices and set them using something like:
I((i-1)*nE + (1:nE)) = ...
J((i-1)*nE + (1:nE)) = ...
V((i-1)*nE + (1:nE)) = ...
The biggest remaining computation, taking 11s, is the sparse operation
on the final I,J,V vectors so I think we've taken it down to the bare
bones.
Nearly... but one final trick: if you can create the vectors so that J is sorted ascending then you will greatly improve the speed of the sparse call, about a factor 4 in my experience.
(If it's easier to have I sorted, then create the transpose matrix sparse(J,I,V) and un-transpose it afterwards.)

Apply function to rolling window

Say I have a long list A of values (say of length 1000) for which I want to compute the std in pairs of 100, i.e. I want to compute std(A(1:100)), std(A(2:101)), std(A(3:102)), ..., std(A(901:1000)).
In Excel/VBA one can easily accomplish this by writing e.g. =STDEV(A1:A100) in one cell and then filling down in one go. Now my question is, how could one accomplish this efficiently in Matlab without having to use any expensive for-loops.
edit: Is it also possible to do this for a list of time series, e.g. when A has dimensions 1000 x 4 (i.e. 4 time series of length 1000)? The output matrix should then have dimensions 901 x 4.
Note: For the fastest solution see Luis Mendo's answer
So firstly using a for loop for this (especially if those are your actual dimensions) really isn't going to be expensive. Unless you're using a very old version of Matlab, the JIT compiler (together with pre-allocation of course) makes for loops inexpensive.
Secondly - have you tried for loops yet? Because you should really try out the naive implementation first before you start optimizing prematurely.
Thirdly - arrayfun can make this a one liner but it is basically just a for loop with extra overhead and very likely to be slower than a for loop if speed really is your concern.
Finally some code:
n = 1000;
A = rand(n,1);
l = 100;
for loop (hardly bulky, likely to be efficient):
S = zeros(n-l+1,1); %//Pre-allocation of memory like this is essential for efficiency!
for t = 1:(n-l+1)
S(t) = std(A(t:(t+l-1)));
end
A vectorized (memory in-efficient!) solution:
[X,Y] = meshgrid(1:l)
S = std(A(X+Y-1))
A probably better vectorized solution (and a one-liner) but still memory in-efficient:
S = std(A(bsxfun(#plus, 0:l-1, (1:l)')))
Note that with all these methods you can replace std with any function so long as it is applies itself to the columns of the matrix (which is the standard in Matlab)
Going 2D:
To go 2D we need to go 3D
n = 1000;
k = 4;
A = rand(n,k);
l = 100;
ind = bsxfun(#plus, permute(o:n:(k-1)*n, [3,1,2]), bsxfun(#plus, 0:l-1, (1:l)')); %'
S = squeeze(std(A(ind)));
M = squeeze(mean(A(ind)));
%// etc...
OR
[X,Y,Z] = meshgrid(1:l, 1:l, o:n:(k-1)*n);
ind = X+Y+Z-1;
S = squeeze(std(A(ind)))
M = squeeze(mean(A(ind)))
%// etc...
OR
ind = bsxfun(#plus, 0:l-1, (1:l)'); %'
for t = 1:k
S = std(A(ind));
M = mean(A(ind));
%// etc...
end
OR (taken from Luis Mendo's answer - note in his answer he shows a faster alternative to this simple loop)
S = zeros(n-l+1,k);
M = zeros(n-l+1,k);
for t = 1:(n-l+1)
S(t,:) = std(A(k:(k+l-1),:));
M(t,:) = mean(A(k:(k+l-1),:));
%// etc...
end
What you're doing is basically a filter operation.
If you have access to the image processing toolbox,
stdfilt(A,ones(101,1)) %# assumes that data series are in columns
will do the trick (no matter the dimensionality of A). Note that if you also have access to the parallel computing toolbox, you can let filter operations like these run on a GPU, although your problem might be too small to generate noticeable speedups.
To minimize number of operations, you can exploit the fact that the standard deviation can be computed as a difference involving second and first moments,
and moments over a rolling window are obtained efficiently with a cumulative sum (using cumsum):
A = randn(1000,4); %// random data
N = 100; %// window size
c = size(A,2);
A1 = [zeros(1,c); cumsum(A)];
A2 = [zeros(1,c); cumsum(A.^2)];
S = sqrt( (A2(1+N:end,:)-A2(1:end-N,:) ...
- (A1(1+N:end,:)-A1(1:end-N,:)).^2/N) / (N-1) ); %// result
Benchmarking
Here's a comparison against a loop based solution, using timeit. The loop approach is as in Dan's solution but adapted to the 2D case, exploting the fact that std works along each column in a vectorized manner.
%// File loop_approach.m
function S = loop_approach(A,N);
[n, p] = size(A);
S = zeros(n-N+1,p);
for k = 1:(n-N+1)
S(k,:) = std(A(k:(k+N-1),:));
end
%// File bsxfun_approach.m
function S = bsxfun_approach(A,N);
[n, p] = size(A);
ind = bsxfun(#plus, permute(0:n:(p-1)*n, [3,1,2]), bsxfun(#plus, 0:n-N, (1:N).')); %'
S = squeeze(std(A(ind)));
%// File cumsum_approach.m
function S = cumsum_approach(A,N);
c = size(A,2);
A1 = [zeros(1,c); cumsum(A)];
A2 = [zeros(1,c); cumsum(A.^2)];
S = sqrt( (A2(1+N:end,:)-A2(1:end-N,:) ...
- (A1(1+N:end,:)-A1(1:end-N,:)).^2/N) / (N-1) );
%// Benchmarking code
clear all
A = randn(1000,4); %// Or A = randn(1000,1);
N = 100;
t_loop = timeit(#() loop_approach(A,N));
t_bsxfun = timeit(#() bsxfun_approach(A,N));
t_cumsum = timeit(#() cumsum_approach(A,N));
disp(' ')
disp(['loop approach: ' num2str(t_loop)])
disp(['bsxfun approach: ' num2str(t_bsxfun)])
disp(['cumsum approach: ' num2str(t_cumsum)])
disp(' ')
disp(['bsxfun/loop gain factor: ' num2str(t_loop/t_bsxfun)])
disp(['cumsum/loop gain factor: ' num2str(t_loop/t_cumsum)])
Results
I'm using Matlab R2014b, Windows 7 64 bits, dual core processor, 4 GB RAM:
4-column case:
loop approach: 0.092035
bsxfun approach: 0.023535
cumsum approach: 0.0002338
bsxfun/loop gain factor: 3.9106
cumsum/loop gain factor: 393.6526
Single-column case:
loop approach: 0.085618
bsxfun approach: 0.0040495
cumsum approach: 8.3642e-05
bsxfun/loop gain factor: 21.1431
cumsum/loop gain factor: 1023.6236
So the cumsum-based approach seems to be the fastest: about 400 times faster than the loop in the 4-column case, and 1000 times faster in the single-column case.
Several functions can do the job efficiently in Matlab.
On one side, you can use functions such as colfilt or nlfilter, which performs computations on sliding blocks. colfilt is way more efficient than nlfilter, but can be used only if the order of the elements inside a block does not matter. Here is how to use it on your data:
S = colfilt(A, [100,1], 'sliding', #std);
or
S = nlfilter(A, [100,1], #std);
On your example, you can clearly see the difference of performance. But there is a trick : both functions pad the input array so that the output vector has the same size as the input array. To get only the relevant part of the output vector, you need to skip the first floor((100-1)/2) = 49 first elements, and take 1000-100+1 values.
S(50:end-50)
But there is also another solution, close to colfilt, more efficient. colfilt calls col2im to reshape the input vector into a matrix on which it applies the given function on each distinct column. This transforms your input vector of size [1000,1] into a matrix of size [100,901]. But colfilt pads the input array with 0 or 1, and you don't need it. So you can run colfilt without the padding step, then apply std on each column and this is easy because std applied on a matrix returns a row vector of the stds of the columns. Finally, transpose it to get a column vector if you want. In brief and in one line:
S = std(im2col(X,[100 1],'sliding')).';
Remark: if you want to apply a more complex function, see the code of colfilt, line 144 and 147 (for v2013b).
If your concern is speed of the for loop, you can greatly reduce the number of loop iteration by folding your vector into an array (using reshape) with the columns having the number of element you want to apply your function on.
This will let Matlab and the JIT perform the optimization (and in most case they do that way better than us) by calculating your function on each column of your array.
You then reshape an offseted version of your array and do the same. You will still need a loop but the number of iteration will only be l (so 100 in your example case), instead of n-l+1=901 in a classic for loop (one window at a time).
When you're done, you reshape the array of result in a vector, then you still need to calculate manually the last window, but overall it is still much faster.
Taking the same input notation than Dan:
n = 1000;
A = rand(n,1);
l = 100;
It will take this shape:
width = (n/l)-1 ; %// width of each line in the temporary result array
tmp = zeros( l , width ) ; %// preallocation never hurts
for k = 1:l
tmp(k,:) = std( reshape( A(k:end-l+k-1) , l , [] ) ) ; %// calculate your stat on the array (reshaped vector)
end
S2 = [tmp(:) ; std( A(end-l+1:end) ) ] ; %// "unfold" your results then add the last window calculation
If I tic ... toc the complete loop version and the folded one, I obtain this averaged results:
Elapsed time is 0.057190 seconds. %// windows by window FOR loop
Elapsed time is 0.016345 seconds. %// "Folded" FOR loop
I know tic/toc is not the way to go for perfect timing but I don't have the timeit function on my matlab version. Besides, the difference is significant enough to show that there is an improvement (albeit not precisely quantifiable by this method). I removed the first run of course and I checked that the results are consistent with different matrix sizes.
Now regarding your "one liner" request, I suggest your wrap this code into a function like so:
function out = foldfunction( func , vec , nPts )
n = length( vec ) ;
width = (n/nPts)-1 ;
tmp = zeros( nPts , width ) ;
for k = 1:nPts
tmp(k,:) = func( reshape( vec(k:end-nPts+k-1) , nPts , [] ) ) ;
end
out = [tmp(:) ; func( vec(end-nPts+1:end) ) ] ;
Which in your main code allows you to call it in one line:
S = foldfunction( #std , A , l ) ;
The other great benefit of this format, is that you can use the very same sub function for other statistical function. For example, if you want the "mean" of your windows, you call the same just changing the func argument:
S = foldfunction( #mean , A , l ) ;
Only restriction, as it is it only works for vector as input, but with a bit of rework it could be made to take arrays as input too.

correlation coefficient data driven approach possible?

I have a matrix 64x64x32x90 which stands for pixels at x,y,z, at time t.
I have a reference signal 1x90 which stands for the behavior I expect for a pixel at some point (x,y,z).
I am constructing a new image of the correlation between each pixel versus my reference.
load('DATA.mat');
ON = ones(1,10);
OFF = zeros(1,10);
taskRef = [OFF ON OFF ON OFF ON OFF ON OFF];
corrImage = zeros(64,64,36);
for i=1:64,
for j=1:63,
for k=1:36
signal = squeeze(DATA(i,j,k,:));
coef = corrcoef(signal',taskRef);
corrImage(i,j,k) = coef(2);
end
end
end
My process is too slow. Is there a way to get rid of my loops or adjust the code to have a better runtime?
Reshape your data so that its first three dimensions are collapsed into one (so now there are 64*64*32 rows and 90 columns).
Then use pdist2 (with 'correlation' option) to compute the correlation of each row with the expected pattern.
Finally, reshape result into the desired shape.
DATA2 = reshape(DATA, [],90);
corrImage = 1 - pdist2(DATA2, taskRef, 'correlation');
corrImage = reshape(corrImage, 64,64,32);